www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - If T[new] is the container for T[], then what is the container for

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
It looks we can't make it with only T[]. We need a genuine container 
type, and T[new] was suggested. It would probably have value semantics.

T[U] seems to have the same problem. If T[U] is the range, then how do 
you call the container?

If we follow through with a no-garbage-collected option for the core 
types, then we also need to distinguish between a slice and its 
container. The fact that we (almost) got away with T[] being at the same 
time the container and the slice was that the garbage collector would 
collect all unused slices.


Andrei
Apr 25 2009
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container 
 type, and T[new] was suggested.

What do you mean by a "genuine container type"? Stewart.
Apr 25 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Stewart Gordon wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container 
 type, and T[new] was suggested.

What do you mean by a "genuine container type"?

One with capacity and probably value semantics. Andrei
Apr 25 2009
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Robert Jacques (sandford jhu.edu)'s article
 On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Stewart Gordon wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested.

What do you mean by a "genuine container type"?

One with capacity and probably value semantics. Andrei

suggest) for free-list based allocation. Which is _only_ used by malloc and mark-sweep GCs. And I'm really hoping Leandro gives us something better than mark-sweep.

But in a pointer bumping GC, the right way to do dynamic arrays would be to allocate some extra space for the array beyond the length requested, to allow it to be appended to without being reallocated. In this case, the extra space is being allocated under the hood by the array functions, not the garbage collector, but it's still there and you still need a capacity field to track it without expensive GC queries.
Apr 25 2009
parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
What about simply using the const/invariant information?

After all, an "array builder" is a mutable array.  If you don't want to 
extend it, it should be invariant or const - e.g. invariant(string).

I've always thought the separation of strings and string builders, 
arrays and array builders, etc. was flawed.  It's just like strcmp; 
sure, it works, but is it really the right way to compare strings? 
Can't I just have my == and not worry about it?

-[Unknown]


Robert Jacques wrote:
 3) Over-allocating all arrays is a wonderful waste of memory and 
 performance (re chache lines, etc). A cleaner answer is an array builder 
 class, which only pays for this overhead when it's actually needed.

Apr 25 2009
parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
I'm not talking about invariant(char)[], I'm talking about 
invariant(char[]).  That cannot be extended in length.  Really, what I 
want is something that is a "invariant(variant(char)[])", I suppose, but 
I don't even think that's necessary.

When the default string type was made invariant, many people were 
non-plussed at best.  That doesn't mean it was wrong.

How crazy would it be if, for example, std.string.split returned 
invariant(string[]) instead of string[]?  I doubt it would hurt many 
people, in actuality, and you could always get a mutable copy.

-[Unknown]


Robert Jacques wrote:
 On Sat, 25 Apr 2009 13:30:22 -0400, Unknown W. Brackets 
 <unknown simplemachines.org> wrote:
 What about simply using the const/invariant information?

 After all, an "array builder" is a mutable array.  If you don't want 
 to extend it, it should be invariant or const - e.g. invariant(string).

No, immutability really applies to the element and not just the array length. Besides, what about "hello" ~ "world"? Essentially, an "array builder" is an array with a cheaply, extend-able length and it's often used for immutable strings, etc. Also, while a concatenate strings all the time I never end up concatenating non-char arrays. And they almost always end up being mutable.
 I've always thought the separation of strings and string builders, 
 arrays and array builders, etc. was flawed.  It's just like strcmp; 
 sure, it works, but is it really the right way to compare strings? 
 Can't I just have my == and not worry about it?

Well, you can just use ~= or ~ and not worry about it. Array builders are a performance enhancement for one special case at the cost of a general performance degradation. (There was a long discussion previously about these trade-offs)

Apr 25 2009
next sibling parent Jason House <jason.james.house gmail.com> writes:
Robert Jacques Wrote:

 On Sat, 25 Apr 2009 14:21:49 -0400, Unknown W. Brackets  
 <unknown simplemachines.org> wrote:
 I'm not talking about invariant(char)[], I'm talking about  
 invariant(char[]).  That cannot be extended in length.  Really, what I  
 want is something that is a "invariant(variant(char)[])", I suppose, but  
 I don't even think that's necessary.

 When the default string type was made invariant, many people were  
 non-plussed at best.  That doesn't mean it was wrong.

Okay I think we're on different pages, so I'll reiterate:
 Robert Jacques wrote:
  No, immutability really applies to the element and not just the array  
 length.


1) Immutability is a bad way to express 'this array doesn't change length'. Consider arrays of numbers instead of characters and you'll see what I mean. 2) A large portion of the time immutable arrays are concatenated (i.e. "hello" ~ "world")
 How crazy would it be if, for example, std.string.split returned  
 invariant(string[]) instead of string[]?  I doubt it would hurt many  
 people, in actuality, and you could always get a mutable copy.

 -[Unknown]

1) At first glance, I don't see any reason why split couldn't or shouldn't, besides string[] being shorter than immutable string[]. 2) I don't see how this is relevant to array capacities / builders.

invariant(char[]) is not rebindable. There are bugs in bugzilla that show holes in the usage of invariant(char)[] instead of a rebindable invariant(char[]).
Apr 25 2009
prev sibling parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
Let me say it a different way:

split() returns results that are, often, stored or compared.  It is not 
often that these results are concatenated to or modified.  Possibly, 
they may not be sliced (further) often either.

Such a case represents a perfect example of an array (or string) that 
need not have a greater capacity than its length.  Indeed; no property 
of the array is likely to change - its length, contents, capacity, or 
even ptr.  All are likely to remain constant until the object is deleted 
from memory.

So, it would seem to me, this return value (as a quick example) could 
easily be changed to one that fits the following:

1. Is an array of data.
2. Does not have mutable contents (data.)
3. Does not allow its length or capacity to be extended (excuse me for 
considering this mutation.)
4. Is a reference, rather than a static array.

My suggestion is that "char[]" should _be_ a StringBuilder, and "int[]" 
should _be_ an ArrayBuilder, and we should adopt a separate syntax for 
something that _isn't_ one of those.

That said, I did just browse some of my D source files - an XML 1.0 
parser/tree and a custom rfc-compliant ftp server - and nearly the only 
concatenation I use is building exception strings.  I'm a little 
surprised.  Only a couple times do I do anything a builder would be 
useful for.

One time is on XML child nodes; since I'm building a list of children, 
it is a "builder" type thing.  However, my XML tree is fully incremental 
- that means you could traverse it before all of the data has been read 
from the net.  It would not be practical for me to use a builder that 
didn't have proper read access for this case.

I really think using a "StringBuilder" class is just like the solution 
of using a "String" class.  D, in my opinion, proved the latter was not 
necessary; I hold that the former isn't either.

-[Unknown]


Robert Jacques wrote:
 On Sat, 25 Apr 2009 14:21:49 -0400, Unknown W. Brackets 
 <unknown simplemachines.org> wrote:
 I'm not talking about invariant(char)[], I'm talking about 
 invariant(char[]).  That cannot be extended in length.  Really, what I 
 want is something that is a "invariant(variant(char)[])", I suppose, 
 but I don't even think that's necessary.

 When the default string type was made invariant, many people were 
 non-plussed at best.  That doesn't mean it was wrong.

Okay I think we're on different pages, so I'll reiterate:
 Robert Jacques wrote:
  No, immutability really applies to the element and not just the 
 array length.


1) Immutability is a bad way to express 'this array doesn't change length'. Consider arrays of numbers instead of characters and you'll see what I mean. 2) A large portion of the time immutable arrays are concatenated (i.e. "hello" ~ "world")
 How crazy would it be if, for example, std.string.split returned 
 invariant(string[]) instead of string[]?  I doubt it would hurt many 
 people, in actuality, and you could always get a mutable copy.

 -[Unknown]

1) At first glance, I don't see any reason why split couldn't or shouldn't, besides string[] being shorter than immutable string[]. 2) I don't see how this is relevant to array capacities / builders.

Apr 25 2009
next sibling parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
1. Well, I think that immutable(T[]) ~ immutable(T[]) should be 
possible.  I thought it was.  Just that you cannot _append_ to 
immutable(T[]) which completely makes sense.

2. I don't know that it rarely concatenates.  I took Calc, but I am not 
a mathematician... mostly I use char and object references in my arrays. 
  Occasionally int[], and maybe real[] but it's not likely.

So, yes, I probably am not properly addressing this.  Frankly I do not 
know the use-cases where you would mutate such an array.

For objects, if you don't change the array length, it's unlikely you'll 
change its contents (pointers.)  Unfortunately, the referenced object 
would need to be mutable in many cases (although this is probably less 
than one would think offhand for the case of arrays of 
already-initialized objects.)

3. Okay.  I think for a multi-purpose builder, that's true.  However, 
speaking only of having an extended capacity (as is currently and as I 
was quoting), and not other features of a builder, I think what I say 
does make sense.

4. I don't think so.  Again, I'm not talking about a full-fledged 
builder (although I think more features of one would be possible without 
too much difference except fake methods), just about features.

So, to recap, I'm really talking about not over-allocating immutable 
arrays, but I'm not talking about changing current arrays.  Does that 
make my comments make more sense?

-[Unknown]


Robert Jacques wrote:
 On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets 
 <unknown simplemachines.org> wrote:
 
 Let me say it a different way:

 split() returns results that are, often, stored or compared.  It is 
 not often that these results are concatenated to or modified.  
 Possibly, they may not be sliced (further) often either.

 Such a case represents a perfect example of an array (or string) that 
 need not have a greater capacity than its length.  Indeed; no property 
 of the array is likely to change - its length, contents, capacity, or 
 even ptr.  All are likely to remain constant until the object is 
 deleted from memory.

 So, it would seem to me, this return value (as a quick example) could 
 easily be changed to one that fits the following:

 1. Is an array of data.
 2. Does not have mutable contents (data.)
 3. Does not allow its length or capacity to be extended (excuse me for 
 considering this mutation.)
 4. Is a reference, rather than a static array.

 My suggestion is that "char[]" should _be_ a StringBuilder, and 
 "int[]" should _be_ an ArrayBuilder, and we should adopt a separate 
 syntax for something that _isn't_ one of those.

 That said, I did just browse some of my D source files - an XML 1.0 
 parser/tree and a custom rfc-compliant ftp server - and nearly the 
 only concatenation I use is building exception strings.  I'm a little 
 surprised.  Only a couple times do I do anything a builder would be 
 useful for.

 One time is on XML child nodes; since I'm building a list of children, 
 it is a "builder" type thing.  However, my XML tree is fully 
 incremental - that means you could traverse it before all of the data 
 has been read from the net.  It would not be practical for me to use a 
 builder that didn't have proper read access for this case.

 I really think using a "StringBuilder" class is just like the solution 
 of using a "String" class.  D, in my opinion, proved the latter was 
 not necessary; I hold that the former isn't either.

 -[Unknown]

Okay, I understand where you're coming from. Here are some counter-points 1) Immutable strings are often concatenated, which you don't address 2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't address 3) Rather large thread a while ago concluded the builder overhead was too high for general use. 4) You're proposal makes immutable char[] and char[] have different underlying representations.

Apr 25 2009
parent "Unknown W. Brackets" <unknown simplemachines.org> writes:
(and I know I said "char[] should _be_ a StringBuilder", which probably 
caused the confusion - sorry.  I was trying to differentiate the two, I 
worded it poorly for what I actually meant.)

-[Unknown]


Unknown W. Brackets wrote:
 1. Well, I think that immutable(T[]) ~ immutable(T[]) should be 
 possible.  I thought it was.  Just that you cannot _append_ to 
 immutable(T[]) which completely makes sense.
 
 2. I don't know that it rarely concatenates.  I took Calc, but I am not 
 a mathematician... mostly I use char and object references in my arrays. 
  Occasionally int[], and maybe real[] but it's not likely.
 
 So, yes, I probably am not properly addressing this.  Frankly I do not 
 know the use-cases where you would mutate such an array.
 
 For objects, if you don't change the array length, it's unlikely you'll 
 change its contents (pointers.)  Unfortunately, the referenced object 
 would need to be mutable in many cases (although this is probably less 
 than one would think offhand for the case of arrays of 
 already-initialized objects.)
 
 3. Okay.  I think for a multi-purpose builder, that's true.  However, 
 speaking only of having an extended capacity (as is currently and as I 
 was quoting), and not other features of a builder, I think what I say 
 does make sense.
 
 4. I don't think so.  Again, I'm not talking about a full-fledged 
 builder (although I think more features of one would be possible without 
 too much difference except fake methods), just about features.
 
 So, to recap, I'm really talking about not over-allocating immutable 
 arrays, but I'm not talking about changing current arrays.  Does that 
 make my comments make more sense?
 
 -[Unknown]
 
 
 Robert Jacques wrote:
 On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets 
 <unknown simplemachines.org> wrote:

 Let me say it a different way:

 split() returns results that are, often, stored or compared.  It is 
 not often that these results are concatenated to or modified.  
 Possibly, they may not be sliced (further) often either.

 Such a case represents a perfect example of an array (or string) that 
 need not have a greater capacity than its length.  Indeed; no 
 property of the array is likely to change - its length, contents, 
 capacity, or even ptr.  All are likely to remain constant until the 
 object is deleted from memory.

 So, it would seem to me, this return value (as a quick example) could 
 easily be changed to one that fits the following:

 1. Is an array of data.
 2. Does not have mutable contents (data.)
 3. Does not allow its length or capacity to be extended (excuse me 
 for considering this mutation.)
 4. Is a reference, rather than a static array.

 My suggestion is that "char[]" should _be_ a StringBuilder, and 
 "int[]" should _be_ an ArrayBuilder, and we should adopt a separate 
 syntax for something that _isn't_ one of those.

 That said, I did just browse some of my D source files - an XML 1.0 
 parser/tree and a custom rfc-compliant ftp server - and nearly the 
 only concatenation I use is building exception strings.  I'm a little 
 surprised.  Only a couple times do I do anything a builder would be 
 useful for.

 One time is on XML child nodes; since I'm building a list of 
 children, it is a "builder" type thing.  However, my XML tree is 
 fully incremental - that means you could traverse it before all of 
 the data has been read from the net.  It would not be practical for 
 me to use a builder that didn't have proper read access for this case.

 I really think using a "StringBuilder" class is just like the 
 solution of using a "String" class.  D, in my opinion, proved the 
 latter was not necessary; I hold that the former isn't either.

 -[Unknown]

Okay, I understand where you're coming from. Here are some counter-points 1) Immutable strings are often concatenated, which you don't address 2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't address 3) Rather large thread a while ago concluded the builder overhead was too high for general use. 4) You're proposal makes immutable char[] and char[] have different underlying representations.


Apr 25 2009
prev sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
Robert Jacques wrote:
 1) Immutable strings are often concatenated, which you don't address

There are two types of concatenation: in-place (operator ~=) and not (operator ~). The former can take advantage of capacity to avoid new allocation, the latter can not. When concatenating a lot of immutable strings, it might make sense to make the left side mutable: char[] tmp = string1.dup; tmp ~= string2; tmp ~= string3; // ... tmp ~= string99; string result = tmp.idup;
 2) int[], real[], and basically anything not a string rarely
 concatenates, but often mutable, which you don't address

I frequently append individual elements to non-string arrays. -- Rainer Deyke - rainerd eldwood.com
Apr 26 2009
parent "Unknown W. Brackets" <unknown simplemachines.org> writes:
So, how often do you have an int array that you do not intend to append 
elements to, but yet intend to change the values within it?  Excluding 
preallocation, of course.

Every time I use an int[], or honestly practically any array, I'm either 
creating it or reading it.  Nonetheless, I don't do much math with D 
(I'm a server guy), so it's interesting for me what uses other people have.

-[Unknown]


Rainer Deyke wrote:
 2) int[], real[], and basically anything not a string rarely
 concatenates, but often mutable, which you don't address

I frequently append individual elements to non-string arrays.

Apr 26 2009
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
Robert Jacques, el 25 de abril a las 10:58 me escribiste:
 On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 
Stewart Gordon wrote:
Andrei Alexandrescu wrote:
It looks we can't make it with only T[]. We need a genuine container type, and
T[new] was suggested.

What do you mean by a "genuine container type"?

One with capacity and probably value semantics. Andrei

First, capacity is only static (i.e. can be included in the array as you suggest) for free-list based allocation. Which is _only_ used by malloc and mark-sweep GCs. And I'm really hoping Leandro gives us something better than mark-sweep.

I don't think I will, not at least for my thesis (I will be concentrating in make the current GC concurrent and making pauses as small as possible). But maybe Keith will: http://www.dsource.org/projects/tango/forums/topic/743 8-) -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- The diference is simple: hackers build things, crackers break them.
Apr 25 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Stewart Gordon wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container  
 type, and T[new] was suggested.

What do you mean by a "genuine container type"?

One with capacity and probably value semantics. Andrei

First, capacity is only static (i.e. can be included in the array as you suggest) for free-list based allocation. Which is _only_ used by malloc and mark-sweep GCs. And I'm really hoping Leandro gives us something better than mark-sweep. Second, if people want value semantics, they can already use .dup or a[] = ... when needed. Adding value semantics to arrays seems to me like a recipe for unintended poor performance by new to D programmers (who don't know) and experienced ones (who make a hard to find typo) a like. (I can see that value semantics for static arrays might be useful i.e. foo(real[4] vec), but I don't see a major need for dynamic vectors to have value semantics and every reason they shouldn't, i.e. foo(real[] big_image). Counter examples are desired and welcome.)
Apr 25 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 11:57:25 -0400, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Robert Jacques (sandford jhu.edu)'s article
 On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Stewart Gordon wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested.

What do you mean by a "genuine container type"?

One with capacity and probably value semantics. Andrei

suggest) for free-list based allocation. Which is _only_ used by malloc and mark-sweep GCs. And I'm really hoping Leandro gives us something better than mark-sweep.

But in a pointer bumping GC, the right way to do dynamic arrays would be to allocate some extra space for the array beyond the length requested, to allow it to be appended to without being reallocated. In this case, the extra space is being allocated under the hood by the array functions, not the garbage collector, but it's still there and you still need a capacity field to track it without expensive GC queries.

1) There are multiple types of pointer bumping GCs and checking/exending an array can be pretty darn cheap. (You only have to find the memory page) 2) Pointer bumping is only half a GC, generally the other half is going to move/compact the array, at which point the capacity *should* be changed. 3) Over-allocating all arrays is a wonderful waste of memory and performance (re chache lines, etc). A cleaner answer is an array builder class, which only pays for this overhead when it's actually needed.
Apr 25 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Stewart Gordon wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container  
 type, and T[new] was suggested.

What do you mean by a "genuine container type"?

One with capacity and probably value semantics. Andrei

If you add capacity to arrays, why do need seperate slice and container types? i.e. .capacity > 0 -> container .capacity == 0 -> slice Note: this use of a capacity field has been mentioned before.
Apr 25 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 13:30:22 -0400, Unknown W. Brackets  
<unknown simplemachines.org> wrote:
 What about simply using the const/invariant information?

 After all, an "array builder" is a mutable array.  If you don't want to  
 extend it, it should be invariant or const - e.g. invariant(string).

No, immutability really applies to the element and not just the array length. Besides, what about "hello" ~ "world"? Essentially, an "array builder" is an array with a cheaply, extend-able length and it's often used for immutable strings, etc. Also, while a concatenate strings all the time I never end up concatenating non-char arrays. And they almost always end up being mutable.
 I've always thought the separation of strings and string builders,  
 arrays and array builders, etc. was flawed.  It's just like strcmp;  
 sure, it works, but is it really the right way to compare strings? Can't  
 I just have my == and not worry about it?

Well, you can just use ~= or ~ and not worry about it. Array builders are a performance enhancement for one special case at the cost of a general performance degradation. (There was a long discussion previously about these trade-offs)
Apr 25 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 14:21:49 -0400, Unknown W. Brackets  
<unknown simplemachines.org> wrote:
 I'm not talking about invariant(char)[], I'm talking about  
 invariant(char[]).  That cannot be extended in length.  Really, what I  
 want is something that is a "invariant(variant(char)[])", I suppose, but  
 I don't even think that's necessary.

 When the default string type was made invariant, many people were  
 non-plussed at best.  That doesn't mean it was wrong.

Okay I think we're on different pages, so I'll reiterate:
 Robert Jacques wrote:
  No, immutability really applies to the element and not just the array  
 length.


1) Immutability is a bad way to express 'this array doesn't change length'. Consider arrays of numbers instead of characters and you'll see what I mean. 2) A large portion of the time immutable arrays are concatenated (i.e. "hello" ~ "world")
 How crazy would it be if, for example, std.string.split returned  
 invariant(string[]) instead of string[]?  I doubt it would hurt many  
 people, in actuality, and you could always get a mutable copy.

 -[Unknown]

1) At first glance, I don't see any reason why split couldn't or shouldn't, besides string[] being shorter than immutable string[]. 2) I don't see how this is relevant to array capacities / builders.
Apr 25 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets  
<unknown simplemachines.org> wrote:

 Let me say it a different way:

 split() returns results that are, often, stored or compared.  It is not  
 often that these results are concatenated to or modified.  Possibly,  
 they may not be sliced (further) often either.

 Such a case represents a perfect example of an array (or string) that  
 need not have a greater capacity than its length.  Indeed; no property  
 of the array is likely to change - its length, contents, capacity, or  
 even ptr.  All are likely to remain constant until the object is deleted  
 from memory.

 So, it would seem to me, this return value (as a quick example) could  
 easily be changed to one that fits the following:

 1. Is an array of data.
 2. Does not have mutable contents (data.)
 3. Does not allow its length or capacity to be extended (excuse me for  
 considering this mutation.)
 4. Is a reference, rather than a static array.

 My suggestion is that "char[]" should _be_ a StringBuilder, and "int[]"  
 should _be_ an ArrayBuilder, and we should adopt a separate syntax for  
 something that _isn't_ one of those.

 That said, I did just browse some of my D source files - an XML 1.0  
 parser/tree and a custom rfc-compliant ftp server - and nearly the only  
 concatenation I use is building exception strings.  I'm a little  
 surprised.  Only a couple times do I do anything a builder would be  
 useful for.

 One time is on XML child nodes; since I'm building a list of children,  
 it is a "builder" type thing.  However, my XML tree is fully incremental  
 - that means you could traverse it before all of the data has been read  
 from the net.  It would not be practical for me to use a builder that  
 didn't have proper read access for this case.

 I really think using a "StringBuilder" class is just like the solution  
 of using a "String" class.  D, in my opinion, proved the latter was not  
 necessary; I hold that the former isn't either.

 -[Unknown]

Okay, I understand where you're coming from. Here are some counter-points 1) Immutable strings are often concatenated, which you don't address 2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't address 3) Rather large thread a while ago concluded the builder overhead was too high for general use. 4) You're proposal makes immutable char[] and char[] have different underlying representations.
Apr 25 2009
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 26 Apr 2009 03:21:08 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:
 Robert Jacques wrote:
 1) Immutable strings are often concatenated, which you don't address

There are two types of concatenation: in-place (operator ~=) and not (operator ~). The former can take advantage of capacity to avoid new allocation, the latter can not.

Actually, both ~= and ~ use capacity to avoid new allocation.
 When concatenating a lot of immutable strings, it might make sense to
 make the left side mutable:

 char[] tmp = string1.dup;
 tmp ~= string2;
 tmp ~= string3;
 // ...
 tmp ~= string99;
 string result = tmp.idup;

Why make 2 extra copies if you don't need too?
 2) int[], real[], and basically anything not a string rarely
 concatenates, but often mutable, which you don't address

I frequently append individual elements to non-string arrays.

Cool. Care to elaborate on your use case? I was thinking of scientific computing (linear algebra, optimization) and game programming. Since we're off the topic of T[new] and onto a capacity field in general, I'd like to point out the older newsgroup posts: http://www.digitalmars.com/d/archives/digitalmars/D/Array_Capacity_Field_71119.html http://www.digitalmars.com/d/archives/digitalmars/D/Array_append_performance_75410.html http://www.digitalmars.com/d/archives/digitalmars/D/Array_append_performance_2_75811.html
Apr 26 2009
prev sibling next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-04-25 09:07:52 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 It looks we can't make it with only T[]. We need a genuine container 
 type, and T[new] was suggested. It would probably have value semantics.
 
 T[U] seems to have the same problem. If T[U] is the range, then how do 
 you call the container?
 
 If we follow through with a no-garbage-collected option for the core 
 types, then we also need to distinguish between a slice and its 
 container. The fact that we (almost) got away with T[] being at the 
 same time the container and the slice was that the garbage collector 
 would collect all unused slices.

I always disliked T[new]. Here's a better candidate: | range | container ----- | ------ | --------- array | T[] | T.[] assoc | T[U] | T.[U] -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 25 2009
prev sibling next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 It looks we can't make it with only T[]. We need a genuine container  
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do  
 you call the container?

 If we follow through with a no-garbage-collected option for the core  
 types, then we also need to distinguish between a slice and its  
 container. The fact that we (almost) got away with T[] being at the same  
 time the container and the slice was that the garbage collector would  
 collect all unused slices.


 Andrei

No, it's perfectly possible to have T[] be the same type for the container and the slice with reference counting instead of a full GC. All you need to do is store a pointer to the memory block's node. Then it's trivial to 1) get the capacity and 2) increase / decrease the reference count. You could even add an 'allocated length' parameter to the node, so you'd avoid some of the slicing bugs in the current implementation. struct T[] { T* ptr; size_t length; void* memory_block; }
Apr 25 2009
next sibling parent "Unknown W. Brackets" <unknown simplemachines.org> writes:
The only downside to this is, it's a Bad Thing to have the size of an 
array be different between GC's, but I think it's probably not entirely 
avoidable.

But, I really think it's the only good way to do it, imho.  Maybe the 
exact implementation is different, but creating a new type just seems 
even messier... the beauty of slices in D is that they're Not Different 
(TM.)

-[Unknown]


Robert Jacques wrote:
 On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 It looks we can't make it with only T[]. We need a genuine container 
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do 
 you call the container?

 If we follow through with a no-garbage-collected option for the core 
 types, then we also need to distinguish between a slice and its 
 container. The fact that we (almost) got away with T[] being at the 
 same time the container and the slice was that the garbage collector 
 would collect all unused slices.


 Andrei

No, it's perfectly possible to have T[] be the same type for the container and the slice with reference counting instead of a full GC. All you need to do is store a pointer to the memory block's node. Then it's trivial to 1) get the capacity and 2) increase / decrease the reference count. You could even add an 'allocated length' parameter to the node, so you'd avoid some of the slicing bugs in the current implementation. struct T[] { T* ptr; size_t length; void* memory_block; }

Apr 25 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 It looks we can't make it with only T[]. We need a genuine container 
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do 
 you call the container?

 If we follow through with a no-garbage-collected option for the core 
 types, then we also need to distinguish between a slice and its 
 container. The fact that we (almost) got away with T[] being at the 
 same time the container and the slice was that the garbage collector 
 would collect all unused slices.


 Andrei

No, it's perfectly possible to have T[] be the same type for the container and the slice with reference counting instead of a full GC. All you need to do is store a pointer to the memory block's node. Then it's trivial to 1) get the capacity and 2) increase / decrease the reference count. You could even add an 'allocated length' parameter to the node, so you'd avoid some of the slicing bugs in the current implementation. struct T[] { T* ptr; size_t length; void* memory_block; }

Yes, but that way you take the owner (i.e. the memory block) completely out of the picture. It would help to give the owner an identity. Andrei
Apr 25 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 1) In what way would it help to give the owner an identity?

For example, on an implementation that doesn't want gc, they'd need the owner for deterministic destruction. Again, slices could be conflated with arrays mainly because the gc took care of the litter left after rebinding slices. If you want to offer the option to do without a gc, then arrays and slices (= ranges) must be separate entities. Then you hold on to the arrays and pass slices around. When you destroy the arrays, the slices originating from them become invalid (which can be checked or not). With hashes it's even more interesting. We need two types: the hash and the hash range that crawls over that hash. Right now they are conflated into one V[K]. If we want to define a range crawling over a hash (and consequently do away with opApply) we'd need the range to store a little stack of breadcrumbs so the range knows how to iterate. So definitely there must be two entities. Traditionally we've been using V[K] both as a range and as a container, but the usage so far was closer to a range (I think without being sure). So one issue is to find a good type literal to express "type for which the corresponding range is V[K]". Makes sense? Andrei
Apr 25 2009
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-04-25 18:04:32 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Again, slices could be conflated with arrays mainly because the gc took 
 care of the litter left after rebinding slices. If you want to offer 
 the option to do without a gc, then arrays and slices (= ranges) must 
 be separate entities. Then you hold on to the arrays and pass slices 
 around. When you destroy the arrays, the slices originating from them 
 become invalid (which can be checked or not).
 
 With hashes it's even more interesting. We need two types: the hash and 
 the hash range that crawls over that hash. Right now they are conflated 
 into one V[K]. If we want to define a range crawling over a hash (and 
 consequently do away with opApply) we'd need the range to store a 
 little stack of breadcrumbs so the range knows how to iterate. So 
 definitely there must be two entities. Traditionally we've been using 
 V[K] both as a range and as a container, but the usage so far was 
 closer to a range (I think without being sure). So one issue is to find 
 a good type literal to express "type for which the corresponding range 
 is V[K]".

You say there must be two entities. But the two entities already exist, only the container is stored by reference. T[] is a reference to a portion of a fixed-size array (static array or heap array). T[U] is a reference to an associative array. If you could call "new" and "delete" on these two types to delete the referenced container, or if the container were reference-counted, you wouldn't need a garbage collector. It's pretty much the same as for classes in fact. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 25 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2009-04-25 18:04:32 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Again, slices could be conflated with arrays mainly because the gc 
 took care of the litter left after rebinding slices. If you want to 
 offer the option to do without a gc, then arrays and slices (= ranges) 
 must be separate entities. Then you hold on to the arrays and pass 
 slices around. When you destroy the arrays, the slices originating 
 from them become invalid (which can be checked or not).

 With hashes it's even more interesting. We need two types: the hash 
 and the hash range that crawls over that hash. Right now they are 
 conflated into one V[K]. If we want to define a range crawling over a 
 hash (and consequently do away with opApply) we'd need the range to 
 store a little stack of breadcrumbs so the range knows how to iterate. 
 So definitely there must be two entities. Traditionally we've been 
 using V[K] both as a range and as a container, but the usage so far 
 was closer to a range (I think without being sure). So one issue is to 
 find a good type literal to express "type for which the corresponding 
 range is V[K]".

You say there must be two entities. But the two entities already exist, only the container is stored by reference. T[] is a reference to a portion of a fixed-size array (static array or heap array).

Yah, and the question is where is the entire array.
 T[U] is a 
 reference to an associative array.

Yah, and the question is where is the range that the associative array uses for iteration.
 If you could call "new" and "delete" 
 on these two types to delete the referenced container, or if the 
 container were reference-counted, you wouldn't need a garbage collector. 
 It's pretty much the same as for classes in fact.

It might be wise to make those types values, not references. Andrei
Apr 25 2009
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 Also, I see a major logical problem with your proposal:
 Given     A single owner is required for deterministic destruction.
 Then      The array's owner is responsible for the deterministic 
 destruction.
 Therefore When the array's owner goes out of scope, it must destruct the 
 array.
 But Then  How does a function return an array?

My understanding is that you don't know how C++ and D value semantics work. It would be impossible to answer your (albeit good) question without first writing a tutorial on the topic, which unfortunately I don't have the time for. Andrei
Apr 25 2009
prev sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Robert Jacques wrote:
 On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Robert Jacques wrote:
 1) In what way would it help to give the owner an identity?

For example, on an implementation that doesn't want gc, they'd need the owner for deterministic destruction.

Why is a single owner needed for deterministic destruction?
 Again, slices could be conflated with arrays mainly because the gc 
 took care of the litter left after rebinding slices. If you want to 
 offer the option to do without a gc, then arrays and slices (= ranges) 
 must be separate entities. Then you hold on to the arrays and pass 
 slices around. When you destroy the arrays, the slices originating 
 from them become invalid (which can be checked or not).

What is wrong with reference counting arrays just like other objects? (which is what I've been proposing) Also, I see a major logical problem with your proposal: Given A single owner is required for deterministic destruction. Then The array's owner is responsible for the deterministic destruction. Therefore When the array's owner goes out of scope, it must destruct the array. But Then How does a function return an array? Your original post mentioned value semantics, but that results in a lot of needless allocation and copying.
 With hashes it's even more interesting. We need two types: the hash 
 and the hash range that crawls over that hash. Right now they are 
 conflated into one V[K]. If we want to define a range crawling over a 
 hash (and consequently do away with opApply) we'd need the range to 
 store a little stack of breadcrumbs so the range knows how to iterate. 
 So definitely there must be two entities. Traditionally we've been 
 using V[K] both as a range and as a container, but the usage so far 
 was closer to a range (I think without being sure). So one issue is to 
 find a good type literal to express "type for which the corresponding 
 range is V[K]".

While separating hashes and ranges is important, it seems orthogonal to the topic at hand.
 Makes sense?

Yes and no. What I wanted was some high-level reason why have T[new] and T[] was fundamentally better than only having T[]. What I got was a low level assertion that they are needed to do -nogc. However, since earlier in this thread I provided a counter example, i.e. reference counted arrays, your assertion stands as false. And a language change shouldn't be based on a faulty assumption.

Simple solution: put the array definition in object.d and try implementing arrays with reference counting or manual memory management. I think stored slices break manual memory management, even with a dedicated slice type; but they should work just fine with refcounting. If you don't want to change the language, object.Array will have to implement the logic for slices and for allocated arrays. It's a bit ugly, and it makes the Array type larger. Also, Array's reference count would need to be accessed by reference.
Apr 25 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Christopher Wright wrote:
 Simple solution: put the array definition in object.d and try 
 implementing arrays with reference counting or manual memory management.
 
 I think stored slices break manual memory management, even with a 
 dedicated slice type; but they should work just fine with refcounting. 
 If you don't want to change the language, object.Array will have to 
 implement the logic for slices and for allocated arrays. It's a bit 
 ugly, and it makes the Array type larger. Also, Array's reference count 
 would need to be accessed by reference.

There are three schemes that are good starting points: 1. As right now :o). 2. Using refcounting. Arrays will be 4 words long (begin, end, end-of-store, refcount*) and slices will be 3 words long (begin, end, and owner*) 3. Using manual management. Arrays will be 3 words long (begin, end, end-of-store) and slices will be 2 words long (begin, end). This is close to C++/STL. This case is less safe but very efficient. If we manage to integrate them all... that's quite the holy grail. And I think it's entirely possible with only a few changes to the language. Andrei
Apr 25 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 1. As right now :o).
 2. Using refcounting. [...]
 3. Using manual management. [...]
 If we manage to integrate them all... that's quite the holy grail. And I 
 think it's entirely possible with only a few changes to the language.

If you add such two other ways to manage memory, you end with different code. Isn't this going to split the D codebase in three? (If I look for an already written D1 module now I have to see if it's for Tango or Phobos. In such situation I have to see what kind of memory does it use?) And in big programs isn't the usage of a GC inevitable? So are the other two options only for small programs? When you see code written by other people you must be able to understand and know how to use the refcounting too. So this adds some more things to know. D2 is getting more and more complex than D1. Eventually D2 may risk getting too much complex for normal people. Both C and Java are simple enough languages, but a cross of them may be too much complex for normal people. Bye, bearophile
Apr 26 2009
prev sibling next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-04-25 19:44:18 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 There are three schemes that are good starting points:
 
 1. As right now :o).
 
 2. Using refcounting. Arrays will be 4 words long (begin, end, 
 end-of-store, refcount*) and slices will be 3 words long (begin, end, 
 and owner*)
 
 3. Using manual management. Arrays will be 3 words long (begin, end, 
 end-of-store) and slices will be 2 words long (begin, end). This is 
 close to C++/STL. This case is less safe but very efficient.
 
 If we manage to integrate them all... that's quite the holy grail. And 
 I think it's entirely possible with only a few changes to the language.

1 and 2 certainly look useful. I've never found 3 to be efficient in the C++/STL. Making a copy every time you need to store a string in a struct isn't something I call efficient. Especially in a language that supports immutable strings, reference counting seems immencely preferable. And reference counting has the advantage that it does not require any change or addition to the language syntax or semantics. Andrei, do you have a use case for 3 that would end up worse using reference counting? Because I only see appending, and even then, appending when the refcount is 1 could be done in-place which would be pretty efficient. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 26 2009
prev sibling next sibling parent Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 Christopher Wright wrote:
 Simple solution: put the array definition in object.d and try 
 implementing arrays with reference counting or manual memory management.

 I think stored slices break manual memory management, even with a 
 dedicated slice type; but they should work just fine with refcounting. 
 If you don't want to change the language, object.Array will have to 
 implement the logic for slices and for allocated arrays. It's a bit 
 ugly, and it makes the Array type larger. Also, Array's reference 
 count would need to be accessed by reference.

There are three schemes that are good starting points: 1. As right now :o). 2. Using refcounting. Arrays will be 4 words long (begin, end, end-of-store, refcount*) and slices will be 3 words long (begin, end, and owner*)

Well, a slice will be begin, end, owner-ref-count*. The owner could move about the stack or get passed between classes on the heap or something like that.
 3. Using manual management. Arrays will be 3 words long (begin, end, 
 end-of-store) and slices will be 2 words long (begin, end). This is 
 close to C++/STL. This case is less safe but very efficient.
 
 If we manage to integrate them all... that's quite the holy grail. And I 
 think it's entirely possible with only a few changes to the language.

Agreed.
Apr 26 2009
prev sibling parent "Craig Black" <craigblack2 cox.net> writes:
Just my 2 cents.  I use a C++ array template that optimizes memory usage by 
only storing a pointer.  It stores the capacity and the size in the heap 
allocation.  The benefit here is that empty arrays only require one word.  I 
am the primary author of a simulation engine (similar to a game engine but 
less graphics-focused).  In practice I have found about half of all arrays 
are empty, so this approach provides much better memory efficiency for my 
purposes.

-Craig 
Apr 26 2009
prev sibling next sibling parent reply Steve Schveighoffer <schveiguy yahoo.com> writes:
On Sat, 25 Apr 2009 08:07:52 -0500, Andrei Alexandrescu wrote:

 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested. It would probably have value semantics.
 
 T[U] seems to have the same problem. If T[U] is the range, then how do
 you call the container?
 
 If we follow through with a no-garbage-collected option for the core
 types, then we also need to distinguish between a slice and its
 container. The fact that we (almost) got away with T[] being at the same
 time the container and the slice was that the garbage collector would
 collect all unused slices.
 
 
 Andrei

You are confusing the difference between T[] and T[U]. T[U] is a strange beast because it does not need to be new'd, but it acts completely like a reference type. But T[U] *is* the container type. What you should be asking is what is the *slice* type for T[U]. My answer would be that a slice type for a hashtable doesn't make sense. We don't need AAs to be completely on par with normal arrays, so just leave them alone ;) -Steve
Apr 25 2009
next sibling parent "Unknown W. Brackets" <unknown simplemachines.org> writes:
Well, a range of an associative array is a possibility (not a slice, but 
e.g. from std.algorithm.)

I think it'd be a mistake to discount that.  Being able to treat 
associative arrays like arrays, in some cases (e.g. with count(), etc.) 
is nice.  That said, i don't even know that std.algorithm currently 
supports this.

-[Unknown]


Steve Schveighoffer wrote:
 On Sat, 25 Apr 2009 08:07:52 -0500, Andrei Alexandrescu wrote:
 
 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do
 you call the container?

 If we follow through with a no-garbage-collected option for the core
 types, then we also need to distinguish between a slice and its
 container. The fact that we (almost) got away with T[] being at the same
 time the container and the slice was that the garbage collector would
 collect all unused slices.


 Andrei

You are confusing the difference between T[] and T[U]. T[U] is a strange beast because it does not need to be new'd, but it acts completely like a reference type. But T[U] *is* the container type. What you should be asking is what is the *slice* type for T[U]. My answer would be that a slice type for a hashtable doesn't make sense. We don't need AAs to be completely on par with normal arrays, so just leave them alone ;) -Steve

Apr 25 2009
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 25 Apr 2009 12:56:15 -0400, Unknown W. Brackets  
<unknown simplemachines.org> wrote:

 Well, a range of an associative array is a possibility (not a slice, but  
 e.g. from std.algorithm.)

 I think it'd be a mistake to discount that.  Being able to treat  
 associative arrays like arrays, in some cases (e.g. with count(), etc.)  
 is nice.  That said, i don't even know that std.algorithm currently  
 supports this.

Sure, but why should a range type of an AA be a special builtin type? It makes sense with array slices because they are valuable to use in many different ways, can be useful as sub-ranges, and keep their value long after the array has been changed. An AA range would be useful only for iterating over all the elements. Once the AA changes, the AA range is invalid. Making a subrange is pretty much useless, since there's no real order to the data. When a range is necessarily the range of the entire container, it becomes a trivial addition to the typesystem just to meet interface requirements. Array slices are way more useful than that. I'd say A[U] is the container type, and AARange!(A, U) is the range if you wish to have a type name. I see no reason to add a builtin type to deal with AA ranges. -Steve
Apr 27 2009
prev sibling next sibling parent Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 It looks we can't make it with only T[].

Citation needed. If this is a continuation of a previous discussion, a link to the previous discussion will suffice. I grant that appending to an array is expensive. Make Array a user-defined type and we can see what's appropriate.
Apr 25 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 16:16:07 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 It looks we can't make it with only T[]. We need a genuine container  
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do  
 you call the container?

 If we follow through with a no-garbage-collected option for the core  
 types, then we also need to distinguish between a slice and its  
 container. The fact that we (almost) got away with T[] being at the  
 same time the container and the slice was that the garbage collector  
 would collect all unused slices.


 Andrei

container and the slice with reference counting instead of a full GC. All you need to do is store a pointer to the memory block's node. Then it's trivial to 1) get the capacity and 2) increase / decrease the reference count. You could even add an 'allocated length' parameter to the node, so you'd avoid some of the slicing bugs in the current implementation. struct T[] { T* ptr; size_t length; void* memory_block; }

Yes, but that way you take the owner (i.e. the memory block) completely out of the picture. It would help to give the owner an identity. Andrei

1) In what way would it help to give the owner an identity? 2) How does reference counting arrays "take the owner completely out of the picture"? Isn't this simply a switch from single ownership to shared ownership?
Apr 25 2009
prev sibling next sibling parent "Danny Wilson" <bluezenix gmail.com> writes:
Op Sat, 25 Apr 2009 15:07:52 +0200 schreef Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 It looks we can't make it with only T[]. We need a genuine container  
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do  
 you call the container?

T[new U] ? T[new(U)] ? T[U new] ? you knew...? still, reusing new like that looks pretty akward to me.
Apr 25 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 1) In what way would it help to give the owner an identity?

For example, on an implementation that doesn't want gc, they'd need the owner for deterministic destruction.

Why is a single owner needed for deterministic destruction?
 Again, slices could be conflated with arrays mainly because the gc took  
 care of the litter left after rebinding slices. If you want to offer the  
 option to do without a gc, then arrays and slices (= ranges) must be  
 separate entities. Then you hold on to the arrays and pass slices  
 around. When you destroy the arrays, the slices originating from them  
 become invalid (which can be checked or not).

What is wrong with reference counting arrays just like other objects? (which is what I've been proposing) Also, I see a major logical problem with your proposal: Given A single owner is required for deterministic destruction. Then The array's owner is responsible for the deterministic destruction. Therefore When the array's owner goes out of scope, it must destruct the array. But Then How does a function return an array? Your original post mentioned value semantics, but that results in a lot of needless allocation and copying.
 With hashes it's even more interesting. We need two types: the hash and  
 the hash range that crawls over that hash. Right now they are conflated  
 into one V[K]. If we want to define a range crawling over a hash (and  
 consequently do away with opApply) we'd need the range to store a little  
 stack of breadcrumbs so the range knows how to iterate. So definitely  
 there must be two entities. Traditionally we've been using V[K] both as  
 a range and as a container, but the usage so far was closer to a range  
 (I think without being sure). So one issue is to find a good type  
 literal to express "type for which the corresponding range is V[K]".

While separating hashes and ranges is important, it seems orthogonal to the topic at hand.
 Makes sense?

Yes and no. What I wanted was some high-level reason why have T[new] and T[] was fundamentally better than only having T[]. What I got was a low level assertion that they are needed to do -nogc. However, since earlier in this thread I provided a counter example, i.e. reference counted arrays, your assertion stands as false. And a language change shouldn't be based on a faulty assumption.
Apr 25 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 19:11:57 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Robert Jacques wrote:
 On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:
 Also, I see a major logical problem with your proposal:
 Given     A single owner is required for deterministic destruction.
 Then      The array's owner is responsible for the deterministic  
 destruction.
 Therefore When the array's owner goes out of scope, it must destruct  
 the array.
 But Then  How does a function return an array?

My understanding is that you don't know how C++ and D value semantics work. It would be impossible to answer your (albeit good) question without first writing a tutorial on the topic, which unfortunately I don't have the time for.

I know how value semantics work. But 1) You haven't mentioned that T[new] has value semantics, only that you were thinking about it. 2) Struct destructors are always called and don't get optimized away even if the memory copy does. (i.e. I know there's a nice optimization to get out of the extra copy, but I don't know how you'd do it in DMD currently).
 Andrei

Apr 25 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 19:44:18 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Christopher Wright wrote:
 Simple solution: put the array definition in object.d and try  
 implementing arrays with reference counting or manual memory management.
  I think stored slices break manual memory management, even with a  
 dedicated slice type; but they should work just fine with refcounting.  
 If you don't want to change the language, object.Array will have to  
 implement the logic for slices and for allocated arrays. It's a bit  
 ugly, and it makes the Array type larger. Also, Array's reference count  
 would need to be accessed by reference.

There are three schemes that are good starting points: 1. As right now :o). 2. Using refcounting. Arrays will be 4 words long (begin, end, end-of-store, refcount*) and slices will be 3 words long (begin, end, and owner*)

Or, use (begin, length, store*) for both arrays and slices and have the store contain (start, capacity, refcount).
 3. Using manual management. Arrays will be 3 words long (begin, end,  
 end-of-store) and slices will be 2 words long (begin, end). This is  
 close to C++/STL. This case is less safe but very efficient.

Why store end-of-store? It's trivial to compute from the length. (If you have separate slice and array types, that is)
 If we manage to integrate them all... that's quite the holy grail. And I  
 think it's entirely possible with only a few changes to the language.


 Andrei

Apr 25 2009
prev sibling next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested. It would probably have value semantics.
 
 T[U] seems to have the same problem. If T[U] is the range, then how do
 you call the container?
 
 If we follow through with a no-garbage-collected option for the core
 types, then we also need to distinguish between a slice and its
 container. The fact that we (almost) got away with T[] being at the same
 time the container and the slice was that the garbage collector would
 collect all unused slices.
 
 Andrei

I've got a few problems understanding this thread. 1. What exactly do you mean by "genuine container type"? That says to me there's some ISO standard or something arrays don't conform to. I saw earlier you specify the lack of a capacity field; is that really it? If so, why not call it a capacity-aware array (or capacitive array :P). 2. I don't see why you want these to have value semantics. What's the motivation here. Note that "because we need it" isn't a valid answer; a link to an explanation would be great, if one exists. :) 3. I'm really uncomfortable with the idea that the semantics of slices changes when you switch off the GC. That means you have a valid program before, you throw -gc, and suddenly you get inscrutable crashes. As for the names, maybe we've got this backwards. What about these: T[] -- a freshly created array T[..] -- a slice of an array T[U] -- a freshly created associative array T[U..] -- a slice of an associative array Otherwise, creating a new array would be (new T[new]) which looks stupid. That, or it's (new T[x]) and the result type is (T[new]) which is inconsistent with every other type in the language. Also, this means static arrays stay as T[n]. On another topic, you mentioned that (using your notation again) T[new] would have four words: begin, end (but I liked length! :( ), end-of-capacity and refcount*. Could you pull a COM trick and stick the capacity and refcount out the front of the array data? [ ? ] [ 4 ] [ 1 ] [ 2 ] [...] [...] ^ ^ ^ ^ | | | | | | begin end | capacity refcount That allows T[new] to stay 8 bytes, and it also avoids needing a second allocation for the refcount (which should have the same lifetime as the array itself). -- Daniel
Apr 25 2009
parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 25 Apr 2009 21:14:28 -0400, Daniel Keep  
<daniel.keep.lists gmail.com> wrote:
 Andrei Alexandrescu wrote:
 It looks we can't make it with only T[]. We need a genuine container
 type, and T[new] was suggested. It would probably have value semantics.

 T[U] seems to have the same problem. If T[U] is the range, then how do
 you call the container?

 If we follow through with a no-garbage-collected option for the core
 types, then we also need to distinguish between a slice and its
 container. The fact that we (almost) got away with T[] being at the same
 time the container and the slice was that the garbage collector would
 collect all unused slices.

 Andrei

I've got a few problems understanding this thread. 1. What exactly do you mean by "genuine container type"? That says to me there's some ISO standard or something arrays don't conform to. I saw earlier you specify the lack of a capacity field; is that really it? If so, why not call it a capacity-aware array (or capacitive array :P).

From a Model-View-Container standpoint, D's arrays mix both View and Container. Many think this is a good thing, but there is an argument for separating the views (slices) from the container (arrays). The whole capacity thing is separate from this issue and exists because 1) it's fixed in a free-list based allocator. 2) it's somewhat slow to query from the GC and 3) concatenating arrays, mainly when building strings, does it a lot. P.S. Great names.
 2. I don't see why you want these to have value semantics.  What's the
 motivation here.  Note that "because we need it" isn't a valid answer; a
 link to an explanation would be great, if one exists.  :)

It hasn't been explained so far. (Although there are some arguments for more general application of value semantics in older threads)
 3. I'm really uncomfortable with the idea that the semantics of slices
 changes when you switch off the GC.  That means you have a valid program
 before, you throw -gc, and suddenly you get inscrutable crashes.

Actually, I'm pretty sure Andrei is proposing changing the semantics for both GC and no GC.
 As for the names, maybe we've got this backwards.  What about these:

 T[]      -- a freshly created array
 T[..]    -- a slice of an array
 T[U]     -- a freshly created associative array
 T[U..]   -- a slice of an associative array

 Otherwise, creating a new array would be (new T[new]) which looks
 stupid.  That, or it's (new T[x]) and the result type is (T[new]) which
 is inconsistent with every other type in the language.

 Also, this means static arrays stay as T[n].

 On another topic, you mentioned that (using your notation again) T[new]
 would have four words: begin, end (but I liked length! :( ),
 end-of-capacity and refcount*.  Could you pull a COM trick and stick the
 capacity and refcount out the front of the array data?

   [ ? ] [ 4 ] [ 1 ] [ 2 ] [...] [...]
     ^     ^     ^           ^
     |     |     |           |
     |     |   begin        end
     |  capacity
  refcount

 That allows T[new] to stay 8 bytes, and it also avoids needing a second
 allocation for the refcount (which should have the same lifetime as the
 array itself).

Or alternatively, both slices and arrays could store a pointer to the (refount, capacity) so you wouldn't need separate types.
Apr 25 2009
prev sibling parent Max Samukha <samukha voliacable.com.removethis> writes:
On Sat, 25 Apr 2009 08:07:52 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

It looks we can't make it with only T[]. We need a genuine container 
type, and T[new] was suggested. It would probably have value semantics.

T[U] seems to have the same problem. If T[U] is the range, then how do 
you call the container?

If we follow through with a no-garbage-collected option for the core 
types, then we also need to distinguish between a slice and its 
container. The fact that we (almost) got away with T[] being at the same 
time the container and the slice was that the garbage collector would 
collect all unused slices.


Andrei

T![] T![U] (kidding) T[:] T[:U] FWIW, Qt uses by-value containers with COW and it looks like they are ok for many usages. I'd like to see a std.containers module defining containers with semantics based on a policy: enum CopyPolicy { Always, // by value OnWrite, // by value with COW Never // by reference } Then we could use the most appropriate policy for built-in arrays (I think it should be by-value with COW). module object; import std.containers; template Array(T) { alias CoolArray!(T, CopyPolicy.OnWrite) Array; } and still be able to use containers with different policies when necessary.
Apr 26 2009