www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - T[new]

reply Walter Bright <newshound1 digitalmars.com> writes:
D has a number of subtle problems (performance and semantic) that arise 
when arrays are resized. The solution is to separate resizeable array 
types from slices. Slices will retain the old:

    T[] slice;

syntax. Resizeable arrays will be declared as:

    T[new] array;

The new expression:

    new T[10]

will return a T[new].

T[new] will implicitly convert to T[], but not the other way.

slice.length will become read-only.

Under the hood, a T[new] will be a single pointer to a library defined 
type. This library defined type will likely contain three properties:

     size_t length;
     T* ptr;
     size_t capacity;

The usual array operations will work on T[new] as well as T[].

Doing this change will:

1. fix many nasties at the edges of array semantics

2. make arrays implementable on .net

3. make clear in function signatures if the function can resize the 
array or not
Aug 09 2009
next sibling parent reply Brad Roberts <braddr puremagic.com> writes:
Yay.  What will happen with slices over a T[new] when the underlying T[new] must
be moved due to resizing?  The behavior needs to be at least well specified, if
not well defined.

Walter Bright wrote:
 D has a number of subtle problems (performance and semantic) that arise
 when arrays are resized. The solution is to separate resizeable array
 types from slices. Slices will retain the old:
 
    T[] slice;
 
 syntax. Resizeable arrays will be declared as:
 
    T[new] array;
 
 The new expression:
 
    new T[10]
 
 will return a T[new].
 
 T[new] will implicitly convert to T[], but not the other way.
 
 slice.length will become read-only.
 
 Under the hood, a T[new] will be a single pointer to a library defined
 type. This library defined type will likely contain three properties:
 
     size_t length;
     T* ptr;
     size_t capacity;
 
 The usual array operations will work on T[new] as well as T[].
 
 Doing this change will:
 
 1. fix many nasties at the edges of array semantics
 
 2. make arrays implementable on .net
 
 3. make clear in function signatures if the function can resize the
 array or not
Aug 09 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Brad Roberts wrote:
 Yay.  What will happen with slices over a T[new] when the underlying T[new]
must
 be moved due to resizing?  The behavior needs to be at least well specified, if
 not well defined.
Great question! It's no different from what happens with any container when its contents change and there are references to the old content. What won't happen is the slice won't be pointing to unallocated memory, thanks to the gc. Resizing the T[new] will not cause the old contents to be deleted. The slice will either point to the old content, if the resize caused a move, or the new content, if it was resized in place. So it will still be implementation-defined behavior. But the likelihood of getting caught by such behavior is much less likely with T[new], as one can clearly see in the code where the resizeables are, rather than the current situation where any slice could be resized by anyone. I think that taking a slice of a T[new], then resizing the T[new], will be rare (or at least should be). Normally, a T[new] will be built, and when it is all done, it is converted to a T[] and there is no longer any way to resize it.
Aug 09 2009
next sibling parent reply Brad Roberts <braddr puremagic.com> writes:
Walter Bright wrote:
 Brad Roberts wrote:
 Yay.  What will happen with slices over a T[new] when the underlying
 T[new] must
 be moved due to resizing?  The behavior needs to be at least well
 specified, if
 not well defined.
Great question! It's no different from what happens with any container when its contents change and there are references to the old content. What won't happen is the slice won't be pointing to unallocated memory, thanks to the gc. Resizing the T[new] will not cause the old contents to be deleted. The slice will either point to the old content, if the resize caused a move, or the new content, if it was resized in place. So it will still be implementation-defined behavior. But the likelihood of getting caught by such behavior is much less likely with T[new], as one can clearly see in the code where the resizeables are, rather than the current situation where any slice could be resized by anyone. I think that taking a slice of a T[new], then resizing the T[new], will be rare (or at least should be). Normally, a T[new] will be built, and when it is all done, it is converted to a T[] and there is no longer any way to resize it.
As expected, but like I said.. this needs to be clear in the spec. Because something like this is just confusing (and yes, I know it's not new behavior): auto a = new int[10]; a[0] = 1; auto s1 = a[0..1]; a.length = <some value large enough to force a move>; auto s2 = a[0..1]; s2[0] = 2; assert(s1[0] == s2[1]); // fail Later, Brad
Aug 09 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Brad Roberts wrote:
 Walter Bright wrote:
 Brad Roberts wrote:
 Yay.  What will happen with slices over a T[new] when the underlying
 T[new] must
 be moved due to resizing?  The behavior needs to be at least well
 specified, if
 not well defined.
Great question! It's no different from what happens with any container when its contents change and there are references to the old content. What won't happen is the slice won't be pointing to unallocated memory, thanks to the gc. Resizing the T[new] will not cause the old contents to be deleted. The slice will either point to the old content, if the resize caused a move, or the new content, if it was resized in place. So it will still be implementation-defined behavior. But the likelihood of getting caught by such behavior is much less likely with T[new], as one can clearly see in the code where the resizeables are, rather than the current situation where any slice could be resized by anyone. I think that taking a slice of a T[new], then resizing the T[new], will be rare (or at least should be). Normally, a T[new] will be built, and when it is all done, it is converted to a T[] and there is no longer any way to resize it.
As expected, but like I said.. this needs to be clear in the spec. Because something like this is just confusing (and yes, I know it's not new behavior): auto a = new int[10]; a[0] = 1; auto s1 = a[0..1]; a.length = <some value large enough to force a move>; auto s2 = a[0..1]; s2[0] = 2; assert(s1[0] == s2[1]); // fail Later, Brad
Well yah but in all languages this is the case. C++'s iterators not only invalidate at the drop of a hat, but you can't really tell whether they're invalid. Java iterators can also be invalidated and may throw an exception. When resizing a D array, its underlying slices may be *orphaned*. That means they are still valid, they just don't refer to the same data as the array. I think that's a reasonable tradeoff between efficiency and safety. Andrei
Aug 09 2009
prev sibling parent Graham St Jack <Graham.StJack internode.on.net> writes:
This sounds excellent.
Aug 09 2009
prev sibling next sibling parent reply grauzone <none example.net> writes:
That's great. I think this would finally fix some of the more pressing 
issues of D.

I have a question:

will

T[new] a;

allocate something? Or will the allocation of the hidden library array 
object delayed until the length is set to non null? If the library array 
object is not allocated, will a.ptr, a.capacity and a.length simply 
return null?
Aug 09 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
grauzone wrote:
 will
 
 T[new] a;
 
 allocate something?
Nope.
 Or will the allocation of the hidden library array 
 object delayed until the length is set to non null?
Yes.
 If the library array 
 object is not allocated, will a.ptr, a.capacity and a.length simply 
 return null?
Yes. Clearly, those properties will have to be functions under the hood. So T[new] operations will be a bit slower than for slices. For faster indexing, you'd probably want to do: auto slice = a[]; and then operate on the slice.
Aug 09 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Now there are four ways to create/use "arrays" in D2:
- the * syntax on a raw memory block, like in C.
- fixed sized arrays, that are just a pointer somewhere. they may even become
fully value types........
- the slice struct, about 2 * size_t.sizeof
- the new reference arrays, a pointer on the stack, plus a 3 items struct
somewhere, probably on the heap. If it doesn't escape the scope LDC may be able
to put it too on the stack. The GC may allocate a space of 4 items anyway for
it, so there's an item wasted.
:-)
In the meantime I have understood that the slice has to be the default syntax
to keep the compatibility with C :-)


Walter Bright:

 Yes. Clearly, those properties will have to be functions under the hood. 
 So T[new] operations will be a bit slower than for slices. For faster 
 indexing, you'd probably want to do:
 auto slice = a[];
 and then operate on the slice.
Let's hope such functions can be inlined. Assuming they can be inlined, a smart compiler can remove some of those ifs where it knows the array surely exists (virtual machines today are usually able to remove some array bound tests using similar tricks). I don't hold my breath for D2 to become this smart... Bye, bearophile
Aug 09 2009
prev sibling next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
By the way, T[new] is Andrei's idea, as well as it being his idea to 
make it a reference type.
Aug 09 2009
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

I like this general proposal, it sounds like something that can improve D a
little. There are many other things that have to be improved in D, but baby
steps are enough to go somewhere :-)

Slices will retain the old:
     T[] slice;
 syntax. Resizeable arrays will be declared as:
     T[new] array;
Such syntaxes have to be chosen wisely. I don't fully understand that syntax. And maybe I don't fully like it. I think the default (simpler) syntax has to be the most flexible and safer construct, that is resizable arrays. Slices can be seen as an optimization, so they can have a bit longer syntax.
 Under the hood, a T[new] will be a single pointer to a library defined 
 type. This library defined type will likely contain three properties:
      size_t length;
      T* ptr;
      size_t capacity;
Weren't you suggestion to use the start pointer - end pointer instead?
 2. make arrays implementable on .net
than D, and it's similar anyway. So I don't think people will use D on dotnet. So even if creating a D for dotnet can be positive, I don't want D2 to change its design to allow a better implementation on dotnet. The question by Brad Roberts looks important, it has to find some kind of answer:
What will happen with slices over a T[new] when the underlying T[new] must be
moved due to resizing?<
Bye, bearophile
Aug 09 2009
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Walter Bright:
 
 I like this general proposal, it sounds like something that can improve D a
little. There are many other things that have to be improved in D, but baby
steps are enough to go somewhere :-)
 
 Slices will retain the old:
     T[] slice;
 syntax. Resizeable arrays will be declared as:
     T[new] array;
Such syntaxes have to be chosen wisely. I don't fully understand that syntax. And maybe I don't fully like it.
Why am I not surprised :o).
 I think the default (simpler) syntax has to be the most flexible and safer
construct, that is resizable arrays. Slices can be seen as an optimization, so
they can have a bit longer syntax.
Slices will remain ubiquitous, arrays probably not so much. Andrei
Aug 09 2009
prev sibling next sibling parent BCS <ao pathlink.com> writes:
Reply to bearophile,

 Walter Bright:

 2. make arrays implementable on .net
 
better than D,
not as good in a few others (tools, libs, runtime refection).
 and it's similar anyway. So I don't think people will
 use D on dotnet.
This I agree with because D looses many if not all of its advantages when forced into the .NET/CLI/managed-code world
Aug 09 2009
prev sibling next sibling parent reply grauzone <none example.net> writes:
bearophile wrote:
 2. make arrays implementable on .net
than D, and it's similar anyway. So I don't think people will use D on dotnet. So even if creating a D for dotnet can be positive, I don't want D2 to change its design to allow a better implementation on dotnet.
I see two things a dotnet implementation of D could have over native D: - better garbage collector (the D one barely does its job...) - better interoperability (*hint* OMF *hint*)
Aug 09 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
grauzone:

 I see two things a dotnet implementation of D could have over native D:
 - better garbage collector (the D one barely does its job...)
The dotnet GC is probably better than the current D GC, but I think it's designed for mostly movable objects. Currently most (or all) D objects are pinned (they can't be moved around in memory), so I don't know if the dotnet GC will do much good. D needs a GC designed for its peculiar characteristics. And I believe D will also need some extra semantics to allow the creation of such efficient half-movable half-pinned GC (I have explained such ideas one time in the past). Bye, bearophile
Aug 09 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
bearophile Wrote:

 grauzone:
 
 I see two things a dotnet implementation of D could have over native D:
 - better garbage collector (the D one barely does its job...)
The dotnet GC is probably better than the current D GC, but I think it's designed for mostly movable objects. Currently most (or all) D objects are pinned (they can't be moved around in memory), so I don't know if the dotnet GC will do much good. D needs a GC designed for its peculiar characteristics. And I believe D will also need some extra semantics to allow the creation of such efficient half-movable half-pinned GC (I have explained such ideas one time in the past). Bye, bearophile
I just finished reading about Bartosz's early entry about the shared qualifier and how it could lead to per-thead memory heaps (as only shared allocations should use the shared heap). This means the GC can maintain different heaps per thread, and perform collection locally without pausing the world, and the entire heap can just be trashed when the thread exits. I think it could be much more efficient than a moving GC (as maintaining different shared heaps and moving memory between them is a slow process). I am really tempted to modify my custom GC to do just that heh.
Aug 09 2009
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 2. make arrays implementable on .net
better than D, and it's similar anyway. So I don't think people will use D on dotnet. So even if creating a D for dotnet can be positive, I don't want D2 to change its design to allow a better implementation on dotnet.
Even if you're correct that D.net is pointless, and I don't agree with that assessment, I think the problems implementing D arrays on .net will show up elsewhere in attempts to support other targets. So I think it's a "canary" issue rather than a .net one.
Aug 09 2009
parent "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 09 Aug 2009 14:21:31 -0700, Walter Bright  
<newshound1 digitalmars.com> wrote:

 bearophile wrote:
 2. make arrays implementable on .net
better than D, and it's similar anyway. So I don't think people will use D on dotnet. So even if creating a D for dotnet can be positive, I don't want D2 to change its design to allow a better implementation on dotnet.
Even if you're correct that D.net is pointless, and I don't agree with that assessment, I think the problems implementing D arrays on .net will show up elsewhere in attempts to support other targets. So I think it's a "canary" issue rather than a .net one.
But (IIRC) it's not actually a canary. .NET has slices and there's no 'self-contained' implementation problem. The issue was that the .NET library wasn't written with slices (.NET, D or otherwise) in mind, so conversions to/from slices were required.
Aug 09 2009
prev sibling parent div0 <div0 users.sourceforge.net> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

bearophile wrote:
 Walter Bright:
 
 I like this general proposal, it sounds like something that can improve D a
little. There are many other things that have to be improved in D, but baby
steps are enough to go somewhere :-)
 
 Slices will retain the old:
     T[] slice;
 syntax. Resizeable arrays will be declared as:
     T[new] array;
Such syntaxes have to be chosen wisely. I don't fully understand that syntax. And maybe I don't fully like it. I think the default (simpler) syntax has to be the most flexible and safer construct, that is resizable arrays. Slices can be seen as an optimization, so they can have a bit longer syntax.
<snip> Ya, not a nice syntax. how about T[!static] ?! ;) - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFKgHaiT9LetA9XoXwRAoBDAJ9WAHlAjkW7TXRUWqCz80N8m4WbwACfTCfP 9MnZfV5pifUKwNC3c1HgySM= =J6pq -----END PGP SIGNATURE-----
Aug 10 2009
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:
 Under the hood, a T[new] will be a single pointer to a library defined 
 type. This library defined type will likely contain three properties:
I have another question: are there some performance penalities in using such arrays for normal random access operations? Bye, bearophile
Aug 09 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Walter Bright:
 Under the hood, a T[new] will be a single pointer to a library defined 
 type. This library defined type will likely contain three properties:
I have another question: are there some performance penalities in using such arrays for normal random access operations?
One extra indirection, usually nearby. Andrei
Aug 09 2009
parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:h5ndt0$1blj$1 digitalmars.com...
 bearophile wrote:
 Walter Bright:
 Under the hood, a T[new] will be a single pointer to a library defined 
 type. This library defined type will likely contain three properties:
I have another question: are there some performance penalities in using such arrays for normal random access operations?
One extra indirection, usually nearby. Andrei
If the size and capacity get added to the begining of the memory block containing the data, only an extra offset is needed. The compiler could emit "mov eax, [esi*4 + 8]" or something. Look ma, no extra indirection :) In fact, this is what .NET does as well, if I'm not mistaken. But then, .NET always has an extra indirection because of the movable GC stuff, right? L.
Aug 09 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Lionello Lunesu wrote:
 
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
 news:h5ndt0$1blj$1 digitalmars.com...
 bearophile wrote:
 Walter Bright:
 Under the hood, a T[new] will be a single pointer to a library 
 defined type. This library defined type will likely contain three 
 properties:
I have another question: are there some performance penalities in using such arrays for normal random access operations?
One extra indirection, usually nearby. Andrei
If the size and capacity get added to the begining of the memory block containing the data, only an extra offset is needed. The compiler could emit "mov eax, [esi*4 + 8]" or something. Look ma, no extra indirection :)
I wish too. The problem is that when you resize the array, the control block will not refer to the adjacent chunk. Andrei
Aug 09 2009
prev sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Walter Bright:
 Under the hood, a T[new] will be a single pointer to a library defined
 type. This library defined type will likely contain three properties:
I have another question: are there some performance penalities in using such
arrays for normal random access operations?
 Bye,
 bearophile
import std.stdio, std.perf; final class TNew { uint* foo; uint length; this(uint size) { foo = (new uint[size]).ptr; length = size; } ref uint opIndex(uint index) { return foo[index]; } } void main() { auto pc = new PerformanceCounter; uint[] array = new uint[1]; pc.start; foreach(i; 0..100_000_000) { array.ptr[0]++; } pc.stop; writeln("Direct: ", pc.milliseconds); auto tnew = new TNew(1); pc.start; foreach(i; 0..100_000_000) { tnew[0]++; } pc.stop; writeln("Indirect: ", pc.milliseconds); } Results: Direct: 226 Indirect: 229
Aug 09 2009
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Walter Bright wrote:
<snip>
 Under the hood, a T[new] will be a single pointer to a library defined 
 type. This library defined type will likely contain three properties:
 
     size_t length;
     T* ptr;
     size_t capacity;
 
 The usual array operations will work on T[new] as well as T[].
<snip> Would new T[10] allocate this structure and the array data on a single GC block, or on two separate blocks? And when the array is reallocated, will the structure move with it? I suppose it depends on whether you want T[new] to be (a) something whereby all references persist as the array is reallocated (b) merely a reference to an allocated array as opposed to an array slice If (a), this is currently achievable with a T[]*. If (b), what might work well is a structure like size_t length; size_t capacity; T[capacity] data; meaning still only one allocation and only one level of indirection when one is used. And the T[new] variable itself would simply hold &data[0]. Moreover, would whatever happens solve such const/invariant holes as bug 2093? Stewart.
Aug 09 2009
next sibling parent grauzone <none example.net> writes:
Stewart Gordon wrote:
 Moreover, would whatever happens solve such const/invariant holes as bug 
 2093?
Just what happens to the ~= operator anyway? Right now, it appends data inline. My vote would be to make "a~=b" do the same as "a=a~b" (with types "T[] a" and "T[] b" or "T b"). T[new]'s ~= would still append inline.
Aug 09 2009
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Stewart Gordon wrote:
 Walter Bright wrote:
 <snip>
 Under the hood, a T[new] will be a single pointer to a library defined 
 type. This library defined type will likely contain three properties:

     size_t length;
     T* ptr;
     size_t capacity;

 The usual array operations will work on T[new] as well as T[].
<snip> Would new T[10] allocate this structure and the array data on a single GC block, or on two separate blocks?
That's up to the implementation.
 And when the array is reallocated, 
 will the structure move with it?
No, that would defeat the whole purpose of making T[new] a reference type. With it being a reference type: T[new] a = ...; T[new] b = a; a.length = ... ... b.length changes too ...
 I suppose it depends on whether you want T[new] to be
 (a) something whereby all references persist as the array is reallocated 
 (b) merely a reference to an allocated array as opposed to an array slice
 
 If (a), this is currently achievable with a T[]*.
 
 If (b), what might work well is a structure like
 
     size_t length;
     size_t capacity;
     T[capacity] data;
 
 meaning still only one allocation and only one level of indirection when 
 one is used.  And the T[new] variable itself would simply hold &data[0].
 
 Moreover, would whatever happens solve such const/invariant holes as bug 
 2093?
I believe it does.
 
 Stewart.
Aug 09 2009
prev sibling next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Walter Bright, el  9 de agosto a las 13:29 me escribiste:
 D has a number of subtle problems (performance and semantic) that arise
 when arrays are resized. The solution is to separate resizeable array
 types from slices. Slices will retain the old:
 
    T[] slice;
 
 syntax. Resizeable arrays will be declared as:
 
    T[new] array;
 
 The new expression:
 
    new T[10]
 
 will return a T[new].
 
 T[new] will implicitly convert to T[], but not the other way.
 
 slice.length will become read-only.
 
 Under the hood, a T[new] will be a single pointer to a library defined
 type. This library defined type will likely contain three properties:
 
     size_t length;
     T* ptr;
     size_t capacity;
 
 The usual array operations will work on T[new] as well as T[].
 
 Doing this change will:
 
 1. fix many nasties at the edges of array semantics
 
 2. make arrays implementable on .net
 
 3. make clear in function signatures if the function can resize the
 array or not
What about writing a DIP? ;) -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- VECINOS RESCATARON A CABALLITO ATROPELLADO -- Crónica TV
Aug 09 2009
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2009-08-09 16:29:21 -0400, Walter Bright <newshound1 digitalmars.com> said:

 D has a number of subtle problems (performance and semantic) that arise 
 when arrays are resized. The solution is to separate resizeable array 
 types from slices. Slices will retain the old:
 
     T[] slice;
 
 syntax. Resizeable arrays will be declared as:
 
     T[new] array;
 
 The new expression:
 
     new T[10]
 
 will return a T[new].
I have nothing against the concept of container, but I dislike like the syntax. Is this really better than Array!T ?
 T[new] will implicitly convert to T[], but not the other way.
 
 slice.length will become read-only.
All fine by me.
 Doing this change will:
 
 1. fix many nasties at the edges of array semantics
Unfortunatly, not all. Solving them all could be done with unique semantics (unique slices can be expanded with no side effect).
 2. make arrays implementable on .net
 
 3. make clear in function signatures if the function can resize the 
 array or not
Unique and ref unique slices would be perfectly fine for that (assuming slices can be resized when unique). But I know, unique isn't easy to implement to fit all the use cases we'd like to solve. I'm just sharing a dream. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Aug 09 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Michel Fortin wrote:
 But I know, unique isn't easy to implement to fit all the use cases we'd 
 like to solve. I'm just sharing a dream.
We explored unique at length, and trying to make it work would render the rest of the language nearly unusably complex.
Aug 09 2009
parent reply Michiel Helvensteijn <m.helvensteijn.remove gmail.com> writes:
Walter Bright wrote:

 But I know, unique isn't easy to implement to fit all the use cases we'd
 like to solve. I'm just sharing a dream.
We explored unique at length, and trying to make it work would render the rest of the language nearly unusably complex.
I've heard 'unique' mentioned on this group now and then. What does it mean in the context of programming languages? -- Michiel Helvensteijn
Aug 09 2009
parent Sergey Gromov <snake.scaly gmail.com> writes:
Mon, 10 Aug 2009 01:56:35 +0200, Michiel Helvensteijn wrote:

 Walter Bright wrote:
 
 But I know, unique isn't easy to implement to fit all the use cases we'd
 like to solve. I'm just sharing a dream.
We explored unique at length, and trying to make it work would render the rest of the language nearly unusably complex.
I've heard 'unique' mentioned on this group now and then. What does it mean in the context of programming languages?
Unique reference is a reference which is statically known to be the only reference to the underlying data. They have some interesting properties: - Modification of underlying data does not cause side effects - Can be implicitly cast to immutable - Can be implicitly cast to mutable They prove to be rather tricky both to specify and to implement though.
Aug 09 2009
prev sibling next sibling parent reply Kagamin <spam here.lot> writes:
Walter Bright Wrote:

 Resizeable arrays will be declared as:
 
     T[new] array;
 
 The new expression:
 
     new T[10]
 
 will return a T[new].
 
 T[new] will implicitly convert to T[], but not the other way.
 
 slice.length will become read-only.
Of what type will strings be? Of what type will be the result of concatenation?
Aug 10 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Kagamin wrote:
 Of what type will strings be?
immutable(char)[]
 Of what type will be the result of concatenation?
T[new]
Aug 10 2009
next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Walter Bright wrote:
 Kagamin wrote:
 Of what type will strings be?
immutable(char)[]
I've always wondered: Why are strings of type immutable(char)[], and not immutable(char[])? I mean, there is no point in allowing resizing of strings, when assigning to the new elements is impossible. -Lars
Aug 10 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Lars T. Kyllingstad wrote:
 I've always wondered: Why are strings of type immutable(char)[], and not 
 immutable(char[])?
So: string a = "hello"; a = "foo"; works.
Aug 10 2009
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Walter Bright wrote:
 Lars T. Kyllingstad wrote:
 I've always wondered: Why are strings of type immutable(char)[], and 
 not immutable(char[])?
So: string a = "hello"; a = "foo"; works.
Ah, of course. :) Thanks. -Lars
Aug 10 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Lars T. Kyllingstad Wrote:

 Walter Bright wrote:
 Lars T. Kyllingstad wrote:
 I've always wondered: Why are strings of type immutable(char)[], and 
 not immutable(char[])?
So: string a = "hello"; a = "foo"; works.
Ah, of course. :) Thanks. -Lars
The problem with immutable(char)[] was that any string can be resized, even slices. Walter: what will the string types be aliased to now: still immutable(char)[] or immutable(char)[new]? I think it would be best to have them use the array [new] type. Functions which do not modify the string length can mark the string as an in parameter, and immutable(char)[] should be castable to const(immutable(char)[new]) in such cases to still allow slices to be passed.
Aug 10 2009
next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Jeremie Pelletier, el 10 de agosto a las 13:01 me escribiste:
 Lars T. Kyllingstad Wrote:
 
 Walter Bright wrote:
 Lars T. Kyllingstad wrote:
 I've always wondered: Why are strings of type immutable(char)[], and 
 not immutable(char[])?
So: string a = "hello"; a = "foo"; works.
Ah, of course. :) Thanks. -Lars
The problem with immutable(char)[] was that any string can be resized, even slices. Walter: what will the string types be aliased to now: still immutable(char)[] or immutable(char)[new]?
From: Walter Bright <newshound1 digitalmars.com> Date: Mon, 10 Aug 2009 01:01:56 -0700 Subject: Re: T[new] User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) Kagamin wrote:
Of what type will strings be?
immutable(char)[]
Of what type will be the result of concatenation?
T[new] -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- El techo de mi cuarto lleno de galaxias
Aug 10 2009
prev sibling next sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Jeremie Pelletier wrote:
<snip>
 Walter: what will the string types be aliased to now: still immutable(char)[]
or immutable(char)[new]?
 
 I think it would be best to have them use the array [new] type.
It would be more efficient to cut out this middleman for things that aren't going to change the length. Which means most things that operate on the string type (and similarly wstring/dstring), since these types are designed to be immutable. The change to T[] that's part of this plan would take it a stage further by making the length immutable as well.
 Functions which do not modify the string length can mark the string
 as an in parameter, and immutable(char)[] should be castable to
 const(immutable(char)[new]) in such cases to still allow slices to be
 passed.
The conversion would create an extra memory allocation. Stewart.
Aug 10 2009
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Jeremie Pelletier wrote:
 Walter: what will the string types be aliased to now: still
 immutable(char)[] or immutable(char)[new]?
immutable(char)[]
 I think it would be best to have them use the array [new] type.
The vast majority of uses will not need to be resizeable.
 Functions which do not modify the string length can mark the string
 as an in parameter, and immutable(char)[] should be castable to
 const(immutable(char)[new]) in such cases to still allow slices to be
 passed.
A slice won't be castable to a resizeable.
Aug 10 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Kagamin wrote:
 Of what type will strings be?
immutable(char)[]
 Of what type will be the result of concatenation?
T[new]
Hmmm, I see a problem. auto s1 = "Hello"; auto s2 = " world"; auto s = s1 ~ s2; Some might be surprised that the type of s is not the same as that of s1 and s2. Andrei
Aug 10 2009
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 10 Aug 2009 10:11:52 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Walter Bright wrote:
 Kagamin wrote:
 Of what type will strings be?
immutable(char)[]
 Of what type will be the result of concatenation?
T[new]
Hmmm, I see a problem. auto s1 = "Hello"; auto s2 = " world"; auto s = s1 ~ s2; Some might be surprised that the type of s is not the same as that of s1 and s2.
auto c1 = 'a'; auto c2 = 'b'; auto c = c1 + c2; But s is better, because it implicitly casts back to typeof(s1). I think it's the absolutely best behavior. -Steve
Aug 10 2009
prev sibling next sibling parent Sergey Gromov <snake.scaly gmail.com> writes:
Sun, 09 Aug 2009 13:29:21 -0700, Walter Bright wrote:

 D has a number of subtle problems (performance and semantic) that arise 
 when arrays are resized. The solution is to separate resizeable array 
 types from slices. Slices will retain the old:
 
     T[] slice;
 
 syntax. Resizeable arrays will be declared as:
 
     T[new] array;
Good news. I'm all for it. The T[new] syntax is a bit... weird... bit I'm OK with it as well. Thanks!
Aug 10 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 09 Aug 2009 16:29:21 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 D has a number of subtle problems (performance and semantic) that arise  
 when arrays are resized. The solution is to separate resizeable array  
 types from slices. Slices will retain the old:

     T[] slice;

 syntax. Resizeable arrays will be declared as:

     T[new] array;

 The new expression:

     new T[10]

 will return a T[new].

 T[new] will implicitly convert to T[], but not the other way.

 slice.length will become read-only.

 Under the hood, a T[new] will be a single pointer to a library defined  
 type. This library defined type will likely contain three properties:

      size_t length;
      T* ptr;
      size_t capacity;

 The usual array operations will work on T[new] as well as T[].

 Doing this change will:

 1. fix many nasties at the edges of array semantics

 2. make arrays implementable on .net

 3. make clear in function signatures if the function can resize the  
 array or not
Yay! Will capacity be settable? That is, can I replace this pattern: char[] buf = new buf[1024]; buf.length = 0; with something like this?: char[new] buf; buf.capacity = 1024; // ensure capacity is *at least* 1024 ---- I read elsewhere that T[new] x; doesn't allocate until length (or maybe capacity) is non-zero. So T[new] is a library defined type. What does the underlying type get called with when x.length = 5 is entered? Is the underlying type: 1. an object? 2. a templated type? ---- What happens when you do: x = null; where x is a T[new] type? Is it the same as setting x.length = 0, or x.capacity = 0? What does x == null mean? What about x is null? Will ptr be settable, or only readable? Will the compiler continue to optimize foreach calls to arrays? How will foreach be implemented, as a range or opApply? The former implies some automated method to return a range, because you wouldn't want to modify the referenced array during iteration. This is exciting :) -Steve
Aug 10 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sun, Aug 9, 2009 at 4:29 PM, Walter Bright<newshound1 digitalmars.com> w=
rote:
 D has a number of subtle problems (performance and semantic) that arise w=
hen
 arrays are resized. The solution is to separate resizeable array types fr=
om
 slices. Slices will retain the old:

 =A0 T[] slice;

 syntax. Resizeable arrays will be declared as:

 =A0 T[new] array;

 The new expression:

 =A0 new T[10]

 will return a T[new].

 T[new] will implicitly convert to T[], but not the other way.

 slice.length will become read-only.

 Under the hood, a T[new] will be a single pointer to a library defined ty=
pe.
 This library defined type will likely contain three properties:

 =A0 =A0size_t length;
 =A0 =A0T* ptr;
 =A0 =A0size_t capacity;

 The usual array operations will work on T[new] as well as T[].

 Doing this change will:

 1. fix many nasties at the edges of array semantics

 2. make arrays implementable on .net

 3. make clear in function signatures if the function can resize the array=
or
 not
This is a great change. MiniD's array semantics are very similar to D's wrt. slicing, though I made the slight modification that array objects keep track of whether or not they are a slice of another array and behave slightly differently accordingly, and I've found it to be more intuitive. I like that D is doing the same thing, and statically too ;) The T[new] syntax isn't doing much for me, but then again, I don't have any better suggestions.
Aug 10 2009
prev sibling next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Since nobody's yet asked....

What type would .dup and .idup return?

The best choice I can see is to make .dup return T[new] and .idup return 
invariant(T)[].

Stewart.
Aug 10 2009
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 10 Aug 2009 14:38:24 -0400, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:

 Since nobody's yet asked....

 What type would .dup and .idup return?

 The best choice I can see is to make .dup return T[new] and .idup return  
 invariant(T)[].
This brings up another question. Should something like immutable(T)[new] or const(T)[new] be even valid? Consider this: immutable(char)[new] str = "hello".idup; // not sure how you convert a literal to an array, but assume it works for now. assert(str.capacity == 16); // assume GC still allocates 16 minimum blocks str ~= " there"; auto slice = str[5..$]; str.length = 5; str ~= " world"; assert(slice == " there"); // fails? We most likely have the same issue here with immutable data as we do today, just not as obvious. Although it's a corner case, it does violate immutability without casts. There are several options: 1. immutable(T)[new] and const(T)[new] are invalid. This limits us more than the current regime, but is a questionable construct to begin with, considering it allows appending in place (you can always create a new array with concatenation). 2. immutable(T)[new] and const(T)[new] are equivalent to immutable(T[new]) and const(T[new]) respectively. This simplifies generic programming and still prevents abuses. 3. when you shrink an array of immutable/const elements' length, the capacity automatically shrinks to prevent overwriting previous data. This still leaves mutable data with the overwrite problem, but it doesn't violate const. Also, implicit casting to const needs to be addressed, normally you can't do this with templated types (if that's how the underlying type is implemented). -Steve
Aug 10 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Mon, 10 Aug 2009 14:38:24 -0400, Stewart Gordon <smjg_1998 yahoo.com> 
 wrote:
 
 Since nobody's yet asked....

 What type would .dup and .idup return?

 The best choice I can see is to make .dup return T[new] and .idup 
 return invariant(T)[].
This brings up another question. Should something like immutable(T)[new] or const(T)[new] be even valid? Consider this: immutable(char)[new] str = "hello".idup; // not sure how you convert a literal to an array, but assume it works for now. assert(str.capacity == 16); // assume GC still allocates 16 minimum blocks str ~= " there"; auto slice = str[5..$]; str.length = 5; str ~= " world"; assert(slice == " there"); // fails? We most likely have the same issue here with immutable data as we do today, just not as obvious. Although it's a corner case, it does violate immutability without casts. There are several options: 1. immutable(T)[new] and const(T)[new] are invalid. This limits us more than the current regime, but is a questionable construct to begin with, considering it allows appending in place (you can always create a new array with concatenation). 2. immutable(T)[new] and const(T)[new] are equivalent to immutable(T[new]) and const(T[new]) respectively. This simplifies generic programming and still prevents abuses. 3. when you shrink an array of immutable/const elements' length, the capacity automatically shrinks to prevent overwriting previous data. This still leaves mutable data with the overwrite problem, but it doesn't violate const. Also, implicit casting to const needs to be addressed, normally you can't do this with templated types (if that's how the underlying type is implemented).
I think 1 is a good option. Essentially you can't have a resizeable array of immutable data. Functional languages don't have such a thing. If we go for (3), code that looks tighter is in fact more wasteful as every shrinking+expanding cycle causes a new allocation and a copy. Andrei
Aug 10 2009
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 10 Aug 2009 15:21:48 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Mon, 10 Aug 2009 14:38:24 -0400, Stewart Gordon  
 <smjg_1998 yahoo.com> wrote:

 Since nobody's yet asked....

 What type would .dup and .idup return?

 The best choice I can see is to make .dup return T[new] and .idup  
 return invariant(T)[].
This brings up another question. Should something like immutable(T)[new] or const(T)[new] be even valid? Consider this: immutable(char)[new] str = "hello".idup; // not sure how you convert a literal to an array, but assume it works for now. assert(str.capacity == 16); // assume GC still allocates 16 minimum blocks str ~= " there"; auto slice = str[5..$]; str.length = 5; str ~= " world"; assert(slice == " there"); // fails? We most likely have the same issue here with immutable data as we do today, just not as obvious. Although it's a corner case, it does violate immutability without casts. There are several options: 1. immutable(T)[new] and const(T)[new] are invalid. This limits us more than the current regime, but is a questionable construct to begin with, considering it allows appending in place (you can always create a new array with concatenation). 2. immutable(T)[new] and const(T)[new] are equivalent to immutable(T[new]) and const(T[new]) respectively. This simplifies generic programming and still prevents abuses. 3. when you shrink an array of immutable/const elements' length, the capacity automatically shrinks to prevent overwriting previous data. This still leaves mutable data with the overwrite problem, but it doesn't violate const. Also, implicit casting to const needs to be addressed, normally you can't do this with templated types (if that's how the underlying type is implemented).
I think 1 is a good option. Essentially you can't have a resizeable array of immutable data. Functional languages don't have such a thing. If we go for (3), code that looks tighter is in fact more wasteful as every shrinking+expanding cycle causes a new allocation and a copy.
Consider this then: template ArrayOf(T) { alias T[new] ArrayOf; } Is ArrayOf!invariant(char) valid? Should it be? I'm not sure, this was my focus of option 2. -Steve
Aug 10 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
dsimcha:

 import std.stdio, std.perf;
[...]
 Results:
 Direct:  226
 Indirect:  229
Thank you. Following your example I have tried to do a bit more realistic tests. // test1.d, for D2, Phobos import std.stdio: writeln; import std.perf: PerformanceCounter; final class TNew(T) { T* ptr; size_t length, capacity; this(size_t size) { this.ptr = (new T[size]).ptr; this.length = size; this.capacity = size; } ref T opIndex(size_t index) { return this.ptr[index]; } } void main() { // increase N if you have a large L2 cache enum int N = 100_000; enum int NLOOP = 40_000; alias uint T; auto pc = new PerformanceCounter; auto a = new T[N]; auto c = new TNew!T(N); pc.start; T* p = a.ptr; for (uint j; j < NLOOP; j++) for (uint i; i < N; i++) p[i]++; pc.stop; writeln("Direct: ", pc.milliseconds); pc.start; for (uint j; j < NLOOP; j++) for (uint i; i < N; i++) c[i]++; pc.stop; writeln("Class: ", pc.milliseconds); pc.start; for (uint j; j < NLOOP; j++) for (uint i; i < N; i++) a[i]++; pc.stop; writeln("Array: ", pc.milliseconds); writeln(); pc.start; for (uint j; j < NLOOP; j++) for (uint i = N-1; i--; ) p[i]++; pc.stop; writeln("Direct: ", pc.milliseconds); pc.start; for (uint j; j < NLOOP; j++) for (uint i = N-1; i--; ) c[i]++; pc.stop; writeln("Class: ", pc.milliseconds); pc.start; for (uint j; j < NLOOP; j++) for (uint i = N-1; i--; ) a[i]++; pc.stop; writeln("Array: ", pc.milliseconds); } Timings on Windows, DMD v2.031, 2 GHz Core2 PC (use a larger N if you have a large L2 CPU Cache): Direct: 4654 Class: 4670 Array: 4095 Direct: 4650 Class: 4688 Array: 4089 It's curious how the "direct" version is slower than the normal dynamic array (this doesn't happen in LDC, see below). ----------------------------- Then I've tried to create a version that can be compiled with LDC too (with Tango). The LDC that I have compiles D1 code, so I can't use ref T opIndex(). So I have created the following version, if you spot bugs or mistakes please tell me: // test2.d, for D1, Phobos and Tango version (Tango) { import tango.stdc.stdio: printf; import tango.time.StopWatch: StopWatch; } else { import std.c.stdio: printf; import std.perf: PerformanceCounter; } final class TNew(T) { T* ptr; size_t length, capacity; this(size_t size) { this.ptr = (new T[size]).ptr; this.length = size; this.capacity = size; } T* refItem(size_t index) { return &(this.ptr[index]); } } void main() { // increase N if you have a large L2 cache const int N = 100_000; const int NLOOP = 40_000; alias uint T; version (Tango) StopWatch timer; else auto timer = new PerformanceCounter; int deltaTime; auto a = new T[N]; auto c = new TNew!(T)(N); timer.start; T* p = a.ptr; for (uint j; j < NLOOP; j++) for (uint i; i < N; i++) p[i]++; version (Tango) { deltaTime = timer.microsec() / 1000; } else { timer.stop; deltaTime = cast(int)timer.milliseconds; } printf("Direct: %d\n", deltaTime); timer.start; for (uint j; j < NLOOP; j++) for (uint i; i < N; i++) (*c.refItem(i))++; version (Tango) { deltaTime = timer.microsec() / 1000; } else { timer.stop; deltaTime = cast(int)timer.milliseconds; } printf("Class: %d\n", deltaTime); timer.start; for (uint j; j < NLOOP; j++) for (uint i; i < N; i++) a[i]++; version (Tango) { deltaTime = timer.microsec() / 1000; } else { timer.stop; deltaTime = cast(int)timer.milliseconds; } printf("Array: %d\n", deltaTime); printf("\n"); timer.start; for (uint j; j < NLOOP; j++) for (uint i = N-1; i--; ) p[i]++; version (Tango) { deltaTime = timer.microsec() / 1000; } else { timer.stop; deltaTime = cast(int)timer.milliseconds; } printf("Direct: %d\n", deltaTime); timer.start; for (uint j; j < NLOOP; j++) for (uint i = N-1; i--; ) (*c.refItem(i))++; version (Tango) { deltaTime = timer.microsec() / 1000; } else { timer.stop; deltaTime = cast(int)timer.milliseconds; } printf("Class: %d\n", deltaTime); timer.start; for (uint j; j < NLOOP; j++) for (uint i = N-1; i--; ) a[i]++; version (Tango) { deltaTime = timer.microsec() / 1000; } else { timer.stop; deltaTime = cast(int)timer.milliseconds; } printf("Array: %d\n", deltaTime); } Timings on Windows: Direct: 4654 Class: 4810 Array: 4086 Direct: 4665 Class: 4598 Array: 4084 Timings on a 32 bit Linux (ldc -O3 -release -inline): Direct: 4280 Class: 5000 Array: 4280 Direct: 4280 Class: 4860 Array: 4270 If such tests are correct, then the new resizable arrays of D2 will indeed have some performance penality (something like 13%-17% in this test, but results may change with other array sizes). Bye, bearophile
Aug 10 2009
parent reply Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Mon, Aug 10, 2009 at 6:53 PM, bearophile<bearophileHUGS lycos.com> wrote:

 If such tests are correct, then the new resizable arrays of D2 will indeed
have some performance penality (something like 13%-17% in this test, but
results may change with other array sizes).
I don't see why that's a big problem. If you're using a resizable array, you probably care more about *resizing* it. That is, you're probably only going to be using resizable arrays when you're doing ~=. Like Walter said, if you need fast access to the data, you can just slice it and get a slice array.
Aug 10 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Jarrett Billingsley:
I don't see why that's a big problem.<
Like St Thomas I touch first and believe (maybe) later :-) Bye, bearophile
Aug 11 2009