digitalmars.D - If T[new] is the container for T[], then what is the container for
- Andrei Alexandrescu (10/10) Apr 25 2009 It looks we can't make it with only T[]. We need a genuine container
-
Stewart Gordon
(4/6)
Apr 25 2009
- Andrei Alexandrescu (3/9) Apr 25 2009 One with capacity and probably value semantics.
- Robert Jacques (14/22) Apr 25 2009 First, capacity is only static (i.e. can be included in the array as you...
- dsimcha (7/23) Apr 25 2009 But in a pointer bumping GC, the right way to do dynamic arrays would be...
- Robert Jacques (8/36) Apr 25 2009 1) There are multiple types of pointer bumping GCs and checking/exending...
- Unknown W. Brackets (9/12) Apr 25 2009 What about simply using the const/invariant information?
- Robert Jacques (12/19) Apr 25 2009 No, immutability really applies to the element and not just the array
- Unknown W. Brackets (11/36) Apr 25 2009 I'm not talking about invariant(char)[], I'm talking about
- Robert Jacques (11/24) Apr 25 2009 Okay I think we're on different pages, so I'll reiterate:
- Jason House (2/32) Apr 25 2009 invariant(char[]) is not rebindable. There are bugs in bugzilla that sho...
- Unknown W. Brackets (34/64) Apr 25 2009 Let me say it a different way:
- Robert Jacques (10/43) Apr 25 2009 Okay, I understand where you're coming from. Here are some counter-point...
- Unknown W. Brackets (25/79) Apr 25 2009 1. Well, I think that immutable(T[]) ~ immutable(T[]) should be
- Unknown W. Brackets (5/93) Apr 25 2009 (and I know I said "char[] should _be_ a StringBuilder", which probably
- Rainer Deyke (15/18) Apr 26 2009 There are two types of concatenation: in-place (operator ~=) and not
- Unknown W. Brackets (8/14) Apr 26 2009 So, how often do you have an int array that you do not intend to append
- Robert Jacques (12/28) Apr 26 2009 Actually, both ~= and ~ use capacity to avoid new allocation.
- Leandro Lucarella (12/28) Apr 25 2009 I don't think I will, not at least for my thesis (I will be concentratin...
- Robert Jacques (8/16) Apr 25 2009 If you add capacity to arrays, why do need seperate slice and container ...
- Michel Fortin (11/22) Apr 25 2009 I always disliked T[new]. Here's a better candidate:
- Robert Jacques (13/23) Apr 25 2009 No, it's perfectly possible to have T[] be the same type for the contain...
- Unknown W. Brackets (9/39) Apr 25 2009 The only downside to this is, it's a Bad Thing to have the size of an
- Andrei Alexandrescu (4/34) Apr 25 2009 Yes, but that way you take the owner (i.e. the memory block) completely
- Robert Jacques (6/38) Apr 25 2009 1) In what way would it help to give the owner an identity?
- Andrei Alexandrescu (20/21) Apr 25 2009 For example, on an implementation that doesn't want gc, they'd need the
- Michel Fortin (13/30) Apr 25 2009 You say there must be two entities. But the two entities already exist,
- Andrei Alexandrescu (6/36) Apr 25 2009 Yah, and the question is where is the range that the associative array
- Robert Jacques (22/42) Apr 25 2009 Why is a single owner needed for deterministic destruction?
- Andrei Alexandrescu (6/15) Apr 25 2009 My understanding is that you don't know how C++ and D value semantics
- Robert Jacques (8/23) Apr 25 2009 I know how value semantics work. But
- Christopher Wright (9/63) Apr 25 2009 Simple solution: put the array definition in object.d and try
- Andrei Alexandrescu (12/21) Apr 25 2009 There are three schemes that are good starting points:
- Robert Jacques (6/26) Apr 25 2009 Or, use (begin, length, store*) for both arrays and slices and have the ...
- bearophile (7/12) Apr 26 2009 If you add such two other ways to manage memory, you end with different ...
- Michel Fortin (17/31) Apr 26 2009 1 and 2 certainly look useful.
- Christopher Wright (5/29) Apr 26 2009 Well, a slice will be begin, end, owner-ref-count*. The owner could move...
- Craig Black (8/8) Apr 26 2009 Just my 2 cents. I use a C++ array template that optimizes memory usage...
- Steve Schveighoffer (8/22) Apr 25 2009 You are confusing the difference between T[] and T[U]. T[U] is a strang...
- Unknown W. Brackets (8/33) Apr 25 2009 Well, a range of an associative array is a possibility (not a slice, but...
- Steven Schveighoffer (15/21) Apr 27 2009 Sure, but why should a range type of an AA be a special builtin type? I...
- Christopher Wright (5/6) Apr 25 2009 Citation needed. If this is a continuation of a previous discussion, a
- Danny Wilson (6/10) Apr 25 2009 T[new U] ?
- Daniel Keep (35/48) Apr 25 2009 I've got a few problems understanding this thread.
- Robert Jacques (16/64) Apr 25 2009 From a Model-View-Container standpoint, D's arrays mix both View and
- Max Samukha (23/33) Apr 26 2009 T![] T![U] (kidding)
It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container? If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices. Andrei
Apr 25 2009
Andrei Alexandrescu wrote:It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested.<snip> What do you mean by a "genuine container type"? Stewart.
Apr 25 2009
Stewart Gordon wrote:Andrei Alexandrescu wrote:One with capacity and probably value semantics. AndreiIt looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested.<snip> What do you mean by a "genuine container type"?
Apr 25 2009
On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Stewart Gordon wrote:First, capacity is only static (i.e. can be included in the array as you suggest) for free-list based allocation. Which is _only_ used by malloc and mark-sweep GCs. And I'm really hoping Leandro gives us something better than mark-sweep. Second, if people want value semantics, they can already use .dup or a[] = ... when needed. Adding value semantics to arrays seems to me like a recipe for unintended poor performance by new to D programmers (who don't know) and experienced ones (who make a hard to find typo) a like. (I can see that value semantics for static arrays might be useful i.e. foo(real[4] vec), but I don't see a major need for dynamic vectors to have value semantics and every reason they shouldn't, i.e. foo(real[] big_image). Counter examples are desired and welcome.)Andrei Alexandrescu wrote:One with capacity and probably value semantics. AndreiIt looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested.<snip> What do you mean by a "genuine container type"?
Apr 25 2009
== Quote from Robert Jacques (sandford jhu.edu)'s articleOn Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:But in a pointer bumping GC, the right way to do dynamic arrays would be to allocate some extra space for the array beyond the length requested, to allow it to be appended to without being reallocated. In this case, the extra space is being allocated under the hood by the array functions, not the garbage collector, but it's still there and you still need a capacity field to track it without expensive GC queries.Stewart Gordon wrote:First, capacity is only static (i.e. can be included in the array as you suggest) for free-list based allocation. Which is _only_ used by malloc and mark-sweep GCs. And I'm really hoping Leandro gives us something better than mark-sweep.Andrei Alexandrescu wrote:One with capacity and probably value semantics. AndreiIt looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested.<snip> What do you mean by a "genuine container type"?
Apr 25 2009
On Sat, 25 Apr 2009 11:57:25 -0400, dsimcha <dsimcha yahoo.com> wrote:== Quote from Robert Jacques (sandford jhu.edu)'s article1) There are multiple types of pointer bumping GCs and checking/exending an array can be pretty darn cheap. (You only have to find the memory page) 2) Pointer bumping is only half a GC, generally the other half is going to move/compact the array, at which point the capacity *should* be changed. 3) Over-allocating all arrays is a wonderful waste of memory and performance (re chache lines, etc). A cleaner answer is an array builder class, which only pays for this overhead when it's actually needed.On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:But in a pointer bumping GC, the right way to do dynamic arrays would be to allocate some extra space for the array beyond the length requested, to allow it to be appended to without being reallocated. In this case, the extra space is being allocated under the hood by the array functions, not the garbage collector, but it's still there and you still need a capacity field to track it without expensive GC queries.Stewart Gordon wrote:First, capacity is only static (i.e. can be included in the array as you suggest) for free-list based allocation. Which is _only_ used by malloc and mark-sweep GCs. And I'm really hoping Leandro gives us something better than mark-sweep.Andrei Alexandrescu wrote:One with capacity and probably value semantics. AndreiIt looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested.<snip> What do you mean by a "genuine container type"?
Apr 25 2009
What about simply using the const/invariant information? After all, an "array builder" is a mutable array. If you don't want to extend it, it should be invariant or const - e.g. invariant(string). I've always thought the separation of strings and string builders, arrays and array builders, etc. was flawed. It's just like strcmp; sure, it works, but is it really the right way to compare strings? Can't I just have my == and not worry about it? -[Unknown] Robert Jacques wrote:3) Over-allocating all arrays is a wonderful waste of memory and performance (re chache lines, etc). A cleaner answer is an array builder class, which only pays for this overhead when it's actually needed.
Apr 25 2009
On Sat, 25 Apr 2009 13:30:22 -0400, Unknown W. Brackets <unknown simplemachines.org> wrote:What about simply using the const/invariant information? After all, an "array builder" is a mutable array. If you don't want to extend it, it should be invariant or const - e.g. invariant(string).No, immutability really applies to the element and not just the array length. Besides, what about "hello" ~ "world"? Essentially, an "array builder" is an array with a cheaply, extend-able length and it's often used for immutable strings, etc. Also, while a concatenate strings all the time I never end up concatenating non-char arrays. And they almost always end up being mutable.I've always thought the separation of strings and string builders, arrays and array builders, etc. was flawed. It's just like strcmp; sure, it works, but is it really the right way to compare strings? Can't I just have my == and not worry about it?Well, you can just use ~= or ~ and not worry about it. Array builders are a performance enhancement for one special case at the cost of a general performance degradation. (There was a long discussion previously about these trade-offs)
Apr 25 2009
I'm not talking about invariant(char)[], I'm talking about invariant(char[]). That cannot be extended in length. Really, what I want is something that is a "invariant(variant(char)[])", I suppose, but I don't even think that's necessary. When the default string type was made invariant, many people were non-plussed at best. That doesn't mean it was wrong. How crazy would it be if, for example, std.string.split returned invariant(string[]) instead of string[]? I doubt it would hurt many people, in actuality, and you could always get a mutable copy. -[Unknown] Robert Jacques wrote:On Sat, 25 Apr 2009 13:30:22 -0400, Unknown W. Brackets <unknown simplemachines.org> wrote:What about simply using the const/invariant information? After all, an "array builder" is a mutable array. If you don't want to extend it, it should be invariant or const - e.g. invariant(string).No, immutability really applies to the element and not just the array length. Besides, what about "hello" ~ "world"? Essentially, an "array builder" is an array with a cheaply, extend-able length and it's often used for immutable strings, etc. Also, while a concatenate strings all the time I never end up concatenating non-char arrays. And they almost always end up being mutable.I've always thought the separation of strings and string builders, arrays and array builders, etc. was flawed. It's just like strcmp; sure, it works, but is it really the right way to compare strings? Can't I just have my == and not worry about it?Well, you can just use ~= or ~ and not worry about it. Array builders are a performance enhancement for one special case at the cost of a general performance degradation. (There was a long discussion previously about these trade-offs)
Apr 25 2009
On Sat, 25 Apr 2009 14:21:49 -0400, Unknown W. Brackets <unknown simplemachines.org> wrote:I'm not talking about invariant(char)[], I'm talking about invariant(char[]). That cannot be extended in length. Really, what I want is something that is a "invariant(variant(char)[])", I suppose, but I don't even think that's necessary. When the default string type was made invariant, many people were non-plussed at best. That doesn't mean it was wrong.Okay I think we're on different pages, so I'll reiterate:Robert Jacques wrote:1) Immutability is a bad way to express 'this array doesn't change length'. Consider arrays of numbers instead of characters and you'll see what I mean. 2) A large portion of the time immutable arrays are concatenated (i.e. "hello" ~ "world")No, immutability really applies to the element and not just the array length.How crazy would it be if, for example, std.string.split returned invariant(string[]) instead of string[]? I doubt it would hurt many people, in actuality, and you could always get a mutable copy. -[Unknown]1) At first glance, I don't see any reason why split couldn't or shouldn't, besides string[] being shorter than immutable string[]. 2) I don't see how this is relevant to array capacities / builders.
Apr 25 2009
Robert Jacques Wrote:On Sat, 25 Apr 2009 14:21:49 -0400, Unknown W. Brackets <unknown simplemachines.org> wrote:invariant(char[]) is not rebindable. There are bugs in bugzilla that show holes in the usage of invariant(char)[] instead of a rebindable invariant(char[]).I'm not talking about invariant(char)[], I'm talking about invariant(char[]). That cannot be extended in length. Really, what I want is something that is a "invariant(variant(char)[])", I suppose, but I don't even think that's necessary. When the default string type was made invariant, many people were non-plussed at best. That doesn't mean it was wrong.Okay I think we're on different pages, so I'll reiterate:Robert Jacques wrote:1) Immutability is a bad way to express 'this array doesn't change length'. Consider arrays of numbers instead of characters and you'll see what I mean. 2) A large portion of the time immutable arrays are concatenated (i.e. "hello" ~ "world")No, immutability really applies to the element and not just the array length.How crazy would it be if, for example, std.string.split returned invariant(string[]) instead of string[]? I doubt it would hurt many people, in actuality, and you could always get a mutable copy. -[Unknown]1) At first glance, I don't see any reason why split couldn't or shouldn't, besides string[] being shorter than immutable string[]. 2) I don't see how this is relevant to array capacities / builders.
Apr 25 2009
Let me say it a different way: split() returns results that are, often, stored or compared. It is not often that these results are concatenated to or modified. Possibly, they may not be sliced (further) often either. Such a case represents a perfect example of an array (or string) that need not have a greater capacity than its length. Indeed; no property of the array is likely to change - its length, contents, capacity, or even ptr. All are likely to remain constant until the object is deleted from memory. So, it would seem to me, this return value (as a quick example) could easily be changed to one that fits the following: 1. Is an array of data. 2. Does not have mutable contents (data.) 3. Does not allow its length or capacity to be extended (excuse me for considering this mutation.) 4. Is a reference, rather than a static array. My suggestion is that "char[]" should _be_ a StringBuilder, and "int[]" should _be_ an ArrayBuilder, and we should adopt a separate syntax for something that _isn't_ one of those. That said, I did just browse some of my D source files - an XML 1.0 parser/tree and a custom rfc-compliant ftp server - and nearly the only concatenation I use is building exception strings. I'm a little surprised. Only a couple times do I do anything a builder would be useful for. One time is on XML child nodes; since I'm building a list of children, it is a "builder" type thing. However, my XML tree is fully incremental - that means you could traverse it before all of the data has been read from the net. It would not be practical for me to use a builder that didn't have proper read access for this case. I really think using a "StringBuilder" class is just like the solution of using a "String" class. D, in my opinion, proved the latter was not necessary; I hold that the former isn't either. -[Unknown] Robert Jacques wrote:On Sat, 25 Apr 2009 14:21:49 -0400, Unknown W. Brackets <unknown simplemachines.org> wrote:I'm not talking about invariant(char)[], I'm talking about invariant(char[]). That cannot be extended in length. Really, what I want is something that is a "invariant(variant(char)[])", I suppose, but I don't even think that's necessary. When the default string type was made invariant, many people were non-plussed at best. That doesn't mean it was wrong.Okay I think we're on different pages, so I'll reiterate:Robert Jacques wrote:1) Immutability is a bad way to express 'this array doesn't change length'. Consider arrays of numbers instead of characters and you'll see what I mean. 2) A large portion of the time immutable arrays are concatenated (i.e. "hello" ~ "world")No, immutability really applies to the element and not just the array length.How crazy would it be if, for example, std.string.split returned invariant(string[]) instead of string[]? I doubt it would hurt many people, in actuality, and you could always get a mutable copy. -[Unknown]1) At first glance, I don't see any reason why split couldn't or shouldn't, besides string[] being shorter than immutable string[]. 2) I don't see how this is relevant to array capacities / builders.
Apr 25 2009
On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets <unknown simplemachines.org> wrote:Let me say it a different way: split() returns results that are, often, stored or compared. It is not often that these results are concatenated to or modified. Possibly, they may not be sliced (further) often either. Such a case represents a perfect example of an array (or string) that need not have a greater capacity than its length. Indeed; no property of the array is likely to change - its length, contents, capacity, or even ptr. All are likely to remain constant until the object is deleted from memory. So, it would seem to me, this return value (as a quick example) could easily be changed to one that fits the following: 1. Is an array of data. 2. Does not have mutable contents (data.) 3. Does not allow its length or capacity to be extended (excuse me for considering this mutation.) 4. Is a reference, rather than a static array. My suggestion is that "char[]" should _be_ a StringBuilder, and "int[]" should _be_ an ArrayBuilder, and we should adopt a separate syntax for something that _isn't_ one of those. That said, I did just browse some of my D source files - an XML 1.0 parser/tree and a custom rfc-compliant ftp server - and nearly the only concatenation I use is building exception strings. I'm a little surprised. Only a couple times do I do anything a builder would be useful for. One time is on XML child nodes; since I'm building a list of children, it is a "builder" type thing. However, my XML tree is fully incremental - that means you could traverse it before all of the data has been read from the net. It would not be practical for me to use a builder that didn't have proper read access for this case. I really think using a "StringBuilder" class is just like the solution of using a "String" class. D, in my opinion, proved the latter was not necessary; I hold that the former isn't either. -[Unknown]Okay, I understand where you're coming from. Here are some counter-points 1) Immutable strings are often concatenated, which you don't address 2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't address 3) Rather large thread a while ago concluded the builder overhead was too high for general use. 4) You're proposal makes immutable char[] and char[] have different underlying representations.
Apr 25 2009
1. Well, I think that immutable(T[]) ~ immutable(T[]) should be possible. I thought it was. Just that you cannot _append_ to immutable(T[]) which completely makes sense. 2. I don't know that it rarely concatenates. I took Calc, but I am not a mathematician... mostly I use char and object references in my arrays. Occasionally int[], and maybe real[] but it's not likely. So, yes, I probably am not properly addressing this. Frankly I do not know the use-cases where you would mutate such an array. For objects, if you don't change the array length, it's unlikely you'll change its contents (pointers.) Unfortunately, the referenced object would need to be mutable in many cases (although this is probably less than one would think offhand for the case of arrays of already-initialized objects.) 3. Okay. I think for a multi-purpose builder, that's true. However, speaking only of having an extended capacity (as is currently and as I was quoting), and not other features of a builder, I think what I say does make sense. 4. I don't think so. Again, I'm not talking about a full-fledged builder (although I think more features of one would be possible without too much difference except fake methods), just about features. So, to recap, I'm really talking about not over-allocating immutable arrays, but I'm not talking about changing current arrays. Does that make my comments make more sense? -[Unknown] Robert Jacques wrote:On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets <unknown simplemachines.org> wrote:Let me say it a different way: split() returns results that are, often, stored or compared. It is not often that these results are concatenated to or modified. Possibly, they may not be sliced (further) often either. Such a case represents a perfect example of an array (or string) that need not have a greater capacity than its length. Indeed; no property of the array is likely to change - its length, contents, capacity, or even ptr. All are likely to remain constant until the object is deleted from memory. So, it would seem to me, this return value (as a quick example) could easily be changed to one that fits the following: 1. Is an array of data. 2. Does not have mutable contents (data.) 3. Does not allow its length or capacity to be extended (excuse me for considering this mutation.) 4. Is a reference, rather than a static array. My suggestion is that "char[]" should _be_ a StringBuilder, and "int[]" should _be_ an ArrayBuilder, and we should adopt a separate syntax for something that _isn't_ one of those. That said, I did just browse some of my D source files - an XML 1.0 parser/tree and a custom rfc-compliant ftp server - and nearly the only concatenation I use is building exception strings. I'm a little surprised. Only a couple times do I do anything a builder would be useful for. One time is on XML child nodes; since I'm building a list of children, it is a "builder" type thing. However, my XML tree is fully incremental - that means you could traverse it before all of the data has been read from the net. It would not be practical for me to use a builder that didn't have proper read access for this case. I really think using a "StringBuilder" class is just like the solution of using a "String" class. D, in my opinion, proved the latter was not necessary; I hold that the former isn't either. -[Unknown]Okay, I understand where you're coming from. Here are some counter-points 1) Immutable strings are often concatenated, which you don't address 2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't address 3) Rather large thread a while ago concluded the builder overhead was too high for general use. 4) You're proposal makes immutable char[] and char[] have different underlying representations.
Apr 25 2009
(and I know I said "char[] should _be_ a StringBuilder", which probably caused the confusion - sorry. I was trying to differentiate the two, I worded it poorly for what I actually meant.) -[Unknown] Unknown W. Brackets wrote:1. Well, I think that immutable(T[]) ~ immutable(T[]) should be possible. I thought it was. Just that you cannot _append_ to immutable(T[]) which completely makes sense. 2. I don't know that it rarely concatenates. I took Calc, but I am not a mathematician... mostly I use char and object references in my arrays. Occasionally int[], and maybe real[] but it's not likely. So, yes, I probably am not properly addressing this. Frankly I do not know the use-cases where you would mutate such an array. For objects, if you don't change the array length, it's unlikely you'll change its contents (pointers.) Unfortunately, the referenced object would need to be mutable in many cases (although this is probably less than one would think offhand for the case of arrays of already-initialized objects.) 3. Okay. I think for a multi-purpose builder, that's true. However, speaking only of having an extended capacity (as is currently and as I was quoting), and not other features of a builder, I think what I say does make sense. 4. I don't think so. Again, I'm not talking about a full-fledged builder (although I think more features of one would be possible without too much difference except fake methods), just about features. So, to recap, I'm really talking about not over-allocating immutable arrays, but I'm not talking about changing current arrays. Does that make my comments make more sense? -[Unknown] Robert Jacques wrote:On Sat, 25 Apr 2009 20:17:00 -0400, Unknown W. Brackets <unknown simplemachines.org> wrote:Let me say it a different way: split() returns results that are, often, stored or compared. It is not often that these results are concatenated to or modified. Possibly, they may not be sliced (further) often either. Such a case represents a perfect example of an array (or string) that need not have a greater capacity than its length. Indeed; no property of the array is likely to change - its length, contents, capacity, or even ptr. All are likely to remain constant until the object is deleted from memory. So, it would seem to me, this return value (as a quick example) could easily be changed to one that fits the following: 1. Is an array of data. 2. Does not have mutable contents (data.) 3. Does not allow its length or capacity to be extended (excuse me for considering this mutation.) 4. Is a reference, rather than a static array. My suggestion is that "char[]" should _be_ a StringBuilder, and "int[]" should _be_ an ArrayBuilder, and we should adopt a separate syntax for something that _isn't_ one of those. That said, I did just browse some of my D source files - an XML 1.0 parser/tree and a custom rfc-compliant ftp server - and nearly the only concatenation I use is building exception strings. I'm a little surprised. Only a couple times do I do anything a builder would be useful for. One time is on XML child nodes; since I'm building a list of children, it is a "builder" type thing. However, my XML tree is fully incremental - that means you could traverse it before all of the data has been read from the net. It would not be practical for me to use a builder that didn't have proper read access for this case. I really think using a "StringBuilder" class is just like the solution of using a "String" class. D, in my opinion, proved the latter was not necessary; I hold that the former isn't either. -[Unknown]Okay, I understand where you're coming from. Here are some counter-points 1) Immutable strings are often concatenated, which you don't address 2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't address 3) Rather large thread a while ago concluded the builder overhead was too high for general use. 4) You're proposal makes immutable char[] and char[] have different underlying representations.
Apr 25 2009
Robert Jacques wrote:1) Immutable strings are often concatenated, which you don't addressThere are two types of concatenation: in-place (operator ~=) and not (operator ~). The former can take advantage of capacity to avoid new allocation, the latter can not. When concatenating a lot of immutable strings, it might make sense to make the left side mutable: char[] tmp = string1.dup; tmp ~= string2; tmp ~= string3; // ... tmp ~= string99; string result = tmp.idup;2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't addressI frequently append individual elements to non-string arrays. -- Rainer Deyke - rainerd eldwood.com
Apr 26 2009
So, how often do you have an int array that you do not intend to append elements to, but yet intend to change the values within it? Excluding preallocation, of course. Every time I use an int[], or honestly practically any array, I'm either creating it or reading it. Nonetheless, I don't do much math with D (I'm a server guy), so it's interesting for me what uses other people have. -[Unknown] Rainer Deyke wrote:2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't addressI frequently append individual elements to non-string arrays.
Apr 26 2009
On Sun, 26 Apr 2009 03:21:08 -0400, Rainer Deyke <rainerd eldwood.com> wrote:Robert Jacques wrote:Actually, both ~= and ~ use capacity to avoid new allocation.1) Immutable strings are often concatenated, which you don't addressThere are two types of concatenation: in-place (operator ~=) and not (operator ~). The former can take advantage of capacity to avoid new allocation, the latter can not.When concatenating a lot of immutable strings, it might make sense to make the left side mutable: char[] tmp = string1.dup; tmp ~= string2; tmp ~= string3; // ... tmp ~= string99; string result = tmp.idup;Why make 2 extra copies if you don't need too?Cool. Care to elaborate on your use case? I was thinking of scientific computing (linear algebra, optimization) and game programming. Since we're off the topic of T[new] and onto a capacity field in general, I'd like to point out the older newsgroup posts: http://www.digitalmars.com/d/archives/digitalmars/D/Array_Capacity_Field_71119.html http://www.digitalmars.com/d/archives/digitalmars/D/Array_append_performance_75410.html http://www.digitalmars.com/d/archives/digitalmars/D/Array_append_performance_2_75811.html2) int[], real[], and basically anything not a string rarely concatenates, but often mutable, which you don't addressI frequently append individual elements to non-string arrays.
Apr 26 2009
Robert Jacques, el 25 de abril a las 10:58 me escribiste:On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I don't think I will, not at least for my thesis (I will be concentrating in make the current GC concurrent and making pauses as small as possible). But maybe Keith will: http://www.dsource.org/projects/tango/forums/topic/743 8-) -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- The diference is simple: hackers build things, crackers break them.Stewart Gordon wrote:First, capacity is only static (i.e. can be included in the array as you suggest) for free-list based allocation. Which is _only_ used by malloc and mark-sweep GCs. And I'm really hoping Leandro gives us something better than mark-sweep.Andrei Alexandrescu wrote:One with capacity and probably value semantics. AndreiIt looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested.<snip> What do you mean by a "genuine container type"?
Apr 25 2009
On Sat, 25 Apr 2009 09:16:39 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Stewart Gordon wrote:If you add capacity to arrays, why do need seperate slice and container types? i.e. .capacity > 0 -> container .capacity == 0 -> slice Note: this use of a capacity field has been mentioned before.Andrei Alexandrescu wrote:One with capacity and probably value semantics. AndreiIt looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested.<snip> What do you mean by a "genuine container type"?
Apr 25 2009
On 2009-04-25 09:07:52 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container? If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices.I always disliked T[new]. Here's a better candidate: | range | container ----- | ------ | --------- array | T[] | T.[] assoc | T[U] | T.[U] -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 25 2009
On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container? If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices. AndreiNo, it's perfectly possible to have T[] be the same type for the container and the slice with reference counting instead of a full GC. All you need to do is store a pointer to the memory block's node. Then it's trivial to 1) get the capacity and 2) increase / decrease the reference count. You could even add an 'allocated length' parameter to the node, so you'd avoid some of the slicing bugs in the current implementation. struct T[] { T* ptr; size_t length; void* memory_block; }
Apr 25 2009
The only downside to this is, it's a Bad Thing to have the size of an array be different between GC's, but I think it's probably not entirely avoidable. But, I really think it's the only good way to do it, imho. Maybe the exact implementation is different, but creating a new type just seems even messier... the beauty of slices in D is that they're Not Different (TM.) -[Unknown] Robert Jacques wrote:On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container? If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices. AndreiNo, it's perfectly possible to have T[] be the same type for the container and the slice with reference counting instead of a full GC. All you need to do is store a pointer to the memory block's node. Then it's trivial to 1) get the capacity and 2) increase / decrease the reference count. You could even add an 'allocated length' parameter to the node, so you'd avoid some of the slicing bugs in the current implementation. struct T[] { T* ptr; size_t length; void* memory_block; }
Apr 25 2009
Robert Jacques wrote:On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yes, but that way you take the owner (i.e. the memory block) completely out of the picture. It would help to give the owner an identity. AndreiIt looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container? If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices. AndreiNo, it's perfectly possible to have T[] be the same type for the container and the slice with reference counting instead of a full GC. All you need to do is store a pointer to the memory block's node. Then it's trivial to 1) get the capacity and 2) increase / decrease the reference count. You could even add an 'allocated length' parameter to the node, so you'd avoid some of the slicing bugs in the current implementation. struct T[] { T* ptr; size_t length; void* memory_block; }
Apr 25 2009
On Sat, 25 Apr 2009 16:16:07 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Robert Jacques wrote:1) In what way would it help to give the owner an identity? 2) How does reference counting arrays "take the owner completely out of the picture"? Isn't this simply a switch from single ownership to shared ownership?On Sat, 25 Apr 2009 09:07:52 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Yes, but that way you take the owner (i.e. the memory block) completely out of the picture. It would help to give the owner an identity. AndreiIt looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container? If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices. AndreiNo, it's perfectly possible to have T[] be the same type for the container and the slice with reference counting instead of a full GC. All you need to do is store a pointer to the memory block's node. Then it's trivial to 1) get the capacity and 2) increase / decrease the reference count. You could even add an 'allocated length' parameter to the node, so you'd avoid some of the slicing bugs in the current implementation. struct T[] { T* ptr; size_t length; void* memory_block; }
Apr 25 2009
Robert Jacques wrote:1) In what way would it help to give the owner an identity?For example, on an implementation that doesn't want gc, they'd need the owner for deterministic destruction. Again, slices could be conflated with arrays mainly because the gc took care of the litter left after rebinding slices. If you want to offer the option to do without a gc, then arrays and slices (= ranges) must be separate entities. Then you hold on to the arrays and pass slices around. When you destroy the arrays, the slices originating from them become invalid (which can be checked or not). With hashes it's even more interesting. We need two types: the hash and the hash range that crawls over that hash. Right now they are conflated into one V[K]. If we want to define a range crawling over a hash (and consequently do away with opApply) we'd need the range to store a little stack of breadcrumbs so the range knows how to iterate. So definitely there must be two entities. Traditionally we've been using V[K] both as a range and as a container, but the usage so far was closer to a range (I think without being sure). So one issue is to find a good type literal to express "type for which the corresponding range is V[K]". Makes sense? Andrei
Apr 25 2009
On 2009-04-25 18:04:32 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Again, slices could be conflated with arrays mainly because the gc took care of the litter left after rebinding slices. If you want to offer the option to do without a gc, then arrays and slices (= ranges) must be separate entities. Then you hold on to the arrays and pass slices around. When you destroy the arrays, the slices originating from them become invalid (which can be checked or not). With hashes it's even more interesting. We need two types: the hash and the hash range that crawls over that hash. Right now they are conflated into one V[K]. If we want to define a range crawling over a hash (and consequently do away with opApply) we'd need the range to store a little stack of breadcrumbs so the range knows how to iterate. So definitely there must be two entities. Traditionally we've been using V[K] both as a range and as a container, but the usage so far was closer to a range (I think without being sure). So one issue is to find a good type literal to express "type for which the corresponding range is V[K]".You say there must be two entities. But the two entities already exist, only the container is stored by reference. T[] is a reference to a portion of a fixed-size array (static array or heap array). T[U] is a reference to an associative array. If you could call "new" and "delete" on these two types to delete the referenced container, or if the container were reference-counted, you wouldn't need a garbage collector. It's pretty much the same as for classes in fact. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 25 2009
Michel Fortin wrote:On 2009-04-25 18:04:32 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Yah, and the question is where is the entire array.Again, slices could be conflated with arrays mainly because the gc took care of the litter left after rebinding slices. If you want to offer the option to do without a gc, then arrays and slices (= ranges) must be separate entities. Then you hold on to the arrays and pass slices around. When you destroy the arrays, the slices originating from them become invalid (which can be checked or not). With hashes it's even more interesting. We need two types: the hash and the hash range that crawls over that hash. Right now they are conflated into one V[K]. If we want to define a range crawling over a hash (and consequently do away with opApply) we'd need the range to store a little stack of breadcrumbs so the range knows how to iterate. So definitely there must be two entities. Traditionally we've been using V[K] both as a range and as a container, but the usage so far was closer to a range (I think without being sure). So one issue is to find a good type literal to express "type for which the corresponding range is V[K]".You say there must be two entities. But the two entities already exist, only the container is stored by reference. T[] is a reference to a portion of a fixed-size array (static array or heap array).T[U] is a reference to an associative array.Yah, and the question is where is the range that the associative array uses for iteration.If you could call "new" and "delete" on these two types to delete the referenced container, or if the container were reference-counted, you wouldn't need a garbage collector. It's pretty much the same as for classes in fact.It might be wise to make those types values, not references. Andrei
Apr 25 2009
On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Robert Jacques wrote:Why is a single owner needed for deterministic destruction?1) In what way would it help to give the owner an identity?For example, on an implementation that doesn't want gc, they'd need the owner for deterministic destruction.Again, slices could be conflated with arrays mainly because the gc took care of the litter left after rebinding slices. If you want to offer the option to do without a gc, then arrays and slices (= ranges) must be separate entities. Then you hold on to the arrays and pass slices around. When you destroy the arrays, the slices originating from them become invalid (which can be checked or not).What is wrong with reference counting arrays just like other objects? (which is what I've been proposing) Also, I see a major logical problem with your proposal: Given A single owner is required for deterministic destruction. Then The array's owner is responsible for the deterministic destruction. Therefore When the array's owner goes out of scope, it must destruct the array. But Then How does a function return an array? Your original post mentioned value semantics, but that results in a lot of needless allocation and copying.With hashes it's even more interesting. We need two types: the hash and the hash range that crawls over that hash. Right now they are conflated into one V[K]. If we want to define a range crawling over a hash (and consequently do away with opApply) we'd need the range to store a little stack of breadcrumbs so the range knows how to iterate. So definitely there must be two entities. Traditionally we've been using V[K] both as a range and as a container, but the usage so far was closer to a range (I think without being sure). So one issue is to find a good type literal to express "type for which the corresponding range is V[K]".While separating hashes and ranges is important, it seems orthogonal to the topic at hand.Makes sense?Yes and no. What I wanted was some high-level reason why have T[new] and T[] was fundamentally better than only having T[]. What I got was a low level assertion that they are needed to do -nogc. However, since earlier in this thread I provided a counter example, i.e. reference counted arrays, your assertion stands as false. And a language change shouldn't be based on a faulty assumption.
Apr 25 2009
Robert Jacques wrote:On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote: Also, I see a major logical problem with your proposal: Given A single owner is required for deterministic destruction. Then The array's owner is responsible for the deterministic destruction. Therefore When the array's owner goes out of scope, it must destruct the array. But Then How does a function return an array?My understanding is that you don't know how C++ and D value semantics work. It would be impossible to answer your (albeit good) question without first writing a tutorial on the topic, which unfortunately I don't have the time for. Andrei
Apr 25 2009
On Sat, 25 Apr 2009 19:11:57 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Robert Jacques wrote:I know how value semantics work. But 1) You haven't mentioned that T[new] has value semantics, only that you were thinking about it. 2) Struct destructors are always called and don't get optimized away even if the memory copy does. (i.e. I know there's a nice optimization to get out of the extra copy, but I don't know how you'd do it in DMD currently).On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote: Also, I see a major logical problem with your proposal: Given A single owner is required for deterministic destruction. Then The array's owner is responsible for the deterministic destruction. Therefore When the array's owner goes out of scope, it must destruct the array. But Then How does a function return an array?My understanding is that you don't know how C++ and D value semantics work. It would be impossible to answer your (albeit good) question without first writing a tutorial on the topic, which unfortunately I don't have the time for.Andrei
Apr 25 2009
Robert Jacques wrote:On Sat, 25 Apr 2009 18:04:32 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Simple solution: put the array definition in object.d and try implementing arrays with reference counting or manual memory management. I think stored slices break manual memory management, even with a dedicated slice type; but they should work just fine with refcounting. If you don't want to change the language, object.Array will have to implement the logic for slices and for allocated arrays. It's a bit ugly, and it makes the Array type larger. Also, Array's reference count would need to be accessed by reference.Robert Jacques wrote:Why is a single owner needed for deterministic destruction?1) In what way would it help to give the owner an identity?For example, on an implementation that doesn't want gc, they'd need the owner for deterministic destruction.Again, slices could be conflated with arrays mainly because the gc took care of the litter left after rebinding slices. If you want to offer the option to do without a gc, then arrays and slices (= ranges) must be separate entities. Then you hold on to the arrays and pass slices around. When you destroy the arrays, the slices originating from them become invalid (which can be checked or not).What is wrong with reference counting arrays just like other objects? (which is what I've been proposing) Also, I see a major logical problem with your proposal: Given A single owner is required for deterministic destruction. Then The array's owner is responsible for the deterministic destruction. Therefore When the array's owner goes out of scope, it must destruct the array. But Then How does a function return an array? Your original post mentioned value semantics, but that results in a lot of needless allocation and copying.With hashes it's even more interesting. We need two types: the hash and the hash range that crawls over that hash. Right now they are conflated into one V[K]. If we want to define a range crawling over a hash (and consequently do away with opApply) we'd need the range to store a little stack of breadcrumbs so the range knows how to iterate. So definitely there must be two entities. Traditionally we've been using V[K] both as a range and as a container, but the usage so far was closer to a range (I think without being sure). So one issue is to find a good type literal to express "type for which the corresponding range is V[K]".While separating hashes and ranges is important, it seems orthogonal to the topic at hand.Makes sense?Yes and no. What I wanted was some high-level reason why have T[new] and T[] was fundamentally better than only having T[]. What I got was a low level assertion that they are needed to do -nogc. However, since earlier in this thread I provided a counter example, i.e. reference counted arrays, your assertion stands as false. And a language change shouldn't be based on a faulty assumption.
Apr 25 2009
Christopher Wright wrote:Simple solution: put the array definition in object.d and try implementing arrays with reference counting or manual memory management. I think stored slices break manual memory management, even with a dedicated slice type; but they should work just fine with refcounting. If you don't want to change the language, object.Array will have to implement the logic for slices and for allocated arrays. It's a bit ugly, and it makes the Array type larger. Also, Array's reference count would need to be accessed by reference.There are three schemes that are good starting points: 1. As right now :o). 2. Using refcounting. Arrays will be 4 words long (begin, end, end-of-store, refcount*) and slices will be 3 words long (begin, end, and owner*) 3. Using manual management. Arrays will be 3 words long (begin, end, end-of-store) and slices will be 2 words long (begin, end). This is close to C++/STL. This case is less safe but very efficient. If we manage to integrate them all... that's quite the holy grail. And I think it's entirely possible with only a few changes to the language. Andrei
Apr 25 2009
On Sat, 25 Apr 2009 19:44:18 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Christopher Wright wrote:Or, use (begin, length, store*) for both arrays and slices and have the store contain (start, capacity, refcount).Simple solution: put the array definition in object.d and try implementing arrays with reference counting or manual memory management. I think stored slices break manual memory management, even with a dedicated slice type; but they should work just fine with refcounting. If you don't want to change the language, object.Array will have to implement the logic for slices and for allocated arrays. It's a bit ugly, and it makes the Array type larger. Also, Array's reference count would need to be accessed by reference.There are three schemes that are good starting points: 1. As right now :o). 2. Using refcounting. Arrays will be 4 words long (begin, end, end-of-store, refcount*) and slices will be 3 words long (begin, end, and owner*)3. Using manual management. Arrays will be 3 words long (begin, end, end-of-store) and slices will be 2 words long (begin, end). This is close to C++/STL. This case is less safe but very efficient.Why store end-of-store? It's trivial to compute from the length. (If you have separate slice and array types, that is)If we manage to integrate them all... that's quite the holy grail. And I think it's entirely possible with only a few changes to the language. Andrei
Apr 25 2009
Andrei Alexandrescu:1. As right now :o). 2. Using refcounting. [...] 3. Using manual management. [...] If we manage to integrate them all... that's quite the holy grail. And I think it's entirely possible with only a few changes to the language.If you add such two other ways to manage memory, you end with different code. Isn't this going to split the D codebase in three? (If I look for an already written D1 module now I have to see if it's for Tango or Phobos. In such situation I have to see what kind of memory does it use?) And in big programs isn't the usage of a GC inevitable? So are the other two options only for small programs? When you see code written by other people you must be able to understand and know how to use the refcounting too. So this adds some more things to know. D2 is getting more and more complex than D1. Eventually D2 may risk getting too much complex for normal people. Both C and Java are simple enough languages, but a cross of them may be too much complex for normal people. Bye, bearophile
Apr 26 2009
On 2009-04-25 19:44:18 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:There are three schemes that are good starting points: 1. As right now :o). 2. Using refcounting. Arrays will be 4 words long (begin, end, end-of-store, refcount*) and slices will be 3 words long (begin, end, and owner*) 3. Using manual management. Arrays will be 3 words long (begin, end, end-of-store) and slices will be 2 words long (begin, end). This is close to C++/STL. This case is less safe but very efficient. If we manage to integrate them all... that's quite the holy grail. And I think it's entirely possible with only a few changes to the language.1 and 2 certainly look useful. I've never found 3 to be efficient in the C++/STL. Making a copy every time you need to store a string in a struct isn't something I call efficient. Especially in a language that supports immutable strings, reference counting seems immencely preferable. And reference counting has the advantage that it does not require any change or addition to the language syntax or semantics. Andrei, do you have a use case for 3 that would end up worse using reference counting? Because I only see appending, and even then, appending when the refcount is 1 could be done in-place which would be pretty efficient. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 26 2009
Andrei Alexandrescu wrote:Christopher Wright wrote:Well, a slice will be begin, end, owner-ref-count*. The owner could move about the stack or get passed between classes on the heap or something like that.Simple solution: put the array definition in object.d and try implementing arrays with reference counting or manual memory management. I think stored slices break manual memory management, even with a dedicated slice type; but they should work just fine with refcounting. If you don't want to change the language, object.Array will have to implement the logic for slices and for allocated arrays. It's a bit ugly, and it makes the Array type larger. Also, Array's reference count would need to be accessed by reference.There are three schemes that are good starting points: 1. As right now :o). 2. Using refcounting. Arrays will be 4 words long (begin, end, end-of-store, refcount*) and slices will be 3 words long (begin, end, and owner*)3. Using manual management. Arrays will be 3 words long (begin, end, end-of-store) and slices will be 2 words long (begin, end). This is close to C++/STL. This case is less safe but very efficient. If we manage to integrate them all... that's quite the holy grail. And I think it's entirely possible with only a few changes to the language.Agreed.
Apr 26 2009
Just my 2 cents. I use a C++ array template that optimizes memory usage by only storing a pointer. It stores the capacity and the size in the heap allocation. The benefit here is that empty arrays only require one word. I am the primary author of a simulation engine (similar to a game engine but less graphics-focused). In practice I have found about half of all arrays are empty, so this approach provides much better memory efficiency for my purposes. -Craig
Apr 26 2009
On Sat, 25 Apr 2009 08:07:52 -0500, Andrei Alexandrescu wrote:It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container? If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices. AndreiYou are confusing the difference between T[] and T[U]. T[U] is a strange beast because it does not need to be new'd, but it acts completely like a reference type. But T[U] *is* the container type. What you should be asking is what is the *slice* type for T[U]. My answer would be that a slice type for a hashtable doesn't make sense. We don't need AAs to be completely on par with normal arrays, so just leave them alone ;) -Steve
Apr 25 2009
Well, a range of an associative array is a possibility (not a slice, but e.g. from std.algorithm.) I think it'd be a mistake to discount that. Being able to treat associative arrays like arrays, in some cases (e.g. with count(), etc.) is nice. That said, i don't even know that std.algorithm currently supports this. -[Unknown] Steve Schveighoffer wrote:On Sat, 25 Apr 2009 08:07:52 -0500, Andrei Alexandrescu wrote:It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container? If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices. AndreiYou are confusing the difference between T[] and T[U]. T[U] is a strange beast because it does not need to be new'd, but it acts completely like a reference type. But T[U] *is* the container type. What you should be asking is what is the *slice* type for T[U]. My answer would be that a slice type for a hashtable doesn't make sense. We don't need AAs to be completely on par with normal arrays, so just leave them alone ;) -Steve
Apr 25 2009
On Sat, 25 Apr 2009 12:56:15 -0400, Unknown W. Brackets <unknown simplemachines.org> wrote:Well, a range of an associative array is a possibility (not a slice, but e.g. from std.algorithm.) I think it'd be a mistake to discount that. Being able to treat associative arrays like arrays, in some cases (e.g. with count(), etc.) is nice. That said, i don't even know that std.algorithm currently supports this.Sure, but why should a range type of an AA be a special builtin type? It makes sense with array slices because they are valuable to use in many different ways, can be useful as sub-ranges, and keep their value long after the array has been changed. An AA range would be useful only for iterating over all the elements. Once the AA changes, the AA range is invalid. Making a subrange is pretty much useless, since there's no real order to the data. When a range is necessarily the range of the entire container, it becomes a trivial addition to the typesystem just to meet interface requirements. Array slices are way more useful than that. I'd say A[U] is the container type, and AARange!(A, U) is the range if you wish to have a type name. I see no reason to add a builtin type to deal with AA ranges. -Steve
Apr 27 2009
Andrei Alexandrescu wrote:It looks we can't make it with only T[].Citation needed. If this is a continuation of a previous discussion, a link to the previous discussion will suffice. I grant that appending to an array is expensive. Make Array a user-defined type and we can see what's appropriate.
Apr 25 2009
Op Sat, 25 Apr 2009 15:07:52 +0200 schreef Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container?T[new U] ? T[new(U)] ? T[U new] ? you knew...? still, reusing new like that looks pretty akward to me.
Apr 25 2009
Andrei Alexandrescu wrote:It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container? If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices. AndreiI've got a few problems understanding this thread. 1. What exactly do you mean by "genuine container type"? That says to me there's some ISO standard or something arrays don't conform to. I saw earlier you specify the lack of a capacity field; is that really it? If so, why not call it a capacity-aware array (or capacitive array :P). 2. I don't see why you want these to have value semantics. What's the motivation here. Note that "because we need it" isn't a valid answer; a link to an explanation would be great, if one exists. :) 3. I'm really uncomfortable with the idea that the semantics of slices changes when you switch off the GC. That means you have a valid program before, you throw -gc, and suddenly you get inscrutable crashes. As for the names, maybe we've got this backwards. What about these: T[] -- a freshly created array T[..] -- a slice of an array T[U] -- a freshly created associative array T[U..] -- a slice of an associative array Otherwise, creating a new array would be (new T[new]) which looks stupid. That, or it's (new T[x]) and the result type is (T[new]) which is inconsistent with every other type in the language. Also, this means static arrays stay as T[n]. On another topic, you mentioned that (using your notation again) T[new] would have four words: begin, end (but I liked length! :( ), end-of-capacity and refcount*. Could you pull a COM trick and stick the capacity and refcount out the front of the array data? [ ? ] [ 4 ] [ 1 ] [ 2 ] [...] [...] ^ ^ ^ ^ | | | | | | begin end | capacity refcount That allows T[new] to stay 8 bytes, and it also avoids needing a second allocation for the refcount (which should have the same lifetime as the array itself). -- Daniel
Apr 25 2009
On Sat, 25 Apr 2009 21:14:28 -0400, Daniel Keep <daniel.keep.lists gmail.com> wrote:Andrei Alexandrescu wrote:From a Model-View-Container standpoint, D's arrays mix both View and Container. Many think this is a good thing, but there is an argument for separating the views (slices) from the container (arrays). The whole capacity thing is separate from this issue and exists because 1) it's fixed in a free-list based allocator. 2) it's somewhat slow to query from the GC and 3) concatenating arrays, mainly when building strings, does it a lot. P.S. Great names.It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container? If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices. AndreiI've got a few problems understanding this thread. 1. What exactly do you mean by "genuine container type"? That says to me there's some ISO standard or something arrays don't conform to. I saw earlier you specify the lack of a capacity field; is that really it? If so, why not call it a capacity-aware array (or capacitive array :P).2. I don't see why you want these to have value semantics. What's the motivation here. Note that "because we need it" isn't a valid answer; a link to an explanation would be great, if one exists. :)It hasn't been explained so far. (Although there are some arguments for more general application of value semantics in older threads)3. I'm really uncomfortable with the idea that the semantics of slices changes when you switch off the GC. That means you have a valid program before, you throw -gc, and suddenly you get inscrutable crashes.Actually, I'm pretty sure Andrei is proposing changing the semantics for both GC and no GC.As for the names, maybe we've got this backwards. What about these: T[] -- a freshly created array T[..] -- a slice of an array T[U] -- a freshly created associative array T[U..] -- a slice of an associative array Otherwise, creating a new array would be (new T[new]) which looks stupid. That, or it's (new T[x]) and the result type is (T[new]) which is inconsistent with every other type in the language. Also, this means static arrays stay as T[n]. On another topic, you mentioned that (using your notation again) T[new] would have four words: begin, end (but I liked length! :( ), end-of-capacity and refcount*. Could you pull a COM trick and stick the capacity and refcount out the front of the array data? [ ? ] [ 4 ] [ 1 ] [ 2 ] [...] [...] ^ ^ ^ ^ | | | | | | begin end | capacity refcount That allows T[new] to stay 8 bytes, and it also avoids needing a second allocation for the refcount (which should have the same lifetime as the array itself).Or alternatively, both slices and arrays could store a pointer to the (refount, capacity) so you wouldn't need separate types.
Apr 25 2009
On Sat, 25 Apr 2009 08:07:52 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:It looks we can't make it with only T[]. We need a genuine container type, and T[new] was suggested. It would probably have value semantics. T[U] seems to have the same problem. If T[U] is the range, then how do you call the container? If we follow through with a no-garbage-collected option for the core types, then we also need to distinguish between a slice and its container. The fact that we (almost) got away with T[] being at the same time the container and the slice was that the garbage collector would collect all unused slices. AndreiT![] T![U] (kidding) T[:] T[:U] FWIW, Qt uses by-value containers with COW and it looks like they are ok for many usages. I'd like to see a std.containers module defining containers with semantics based on a policy: enum CopyPolicy { Always, // by value OnWrite, // by value with COW Never // by reference } Then we could use the most appropriate policy for built-in arrays (I think it should be by-value with COW). module object; import std.containers; template Array(T) { alias CoolArray!(T, CopyPolicy.OnWrite) Array; } and still be able to use containers with different policies when necessary.
Apr 26 2009