www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Small Buffer Optimization for string and friends

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter and I discussed today about using the small string optimization 
in string and other arrays of immutable small objects.

On 64 bit machines, string occupies 16 bytes. We could use the first 
byte as discriminator, which means that all strings under 16 chars need 
no memory allocation at all.

It turns out statistically a lot of strings are small. According to a 
variety of systems we use at Facebook, the small buffer optimization is 
king - it just works great in all cases. In D that means better speed, 
better locality, and less garbage.

For this to happen, we need to start an effort of migrating built-in 
arrays into runtime, essentially making them templates that the compiler 
lowers to. So I have two questions:

1. What happened to the new hash project? We need to take that to 
completion.

2. Is anyone willing to start the effort of migrating built-in slices 
into templates?


Thanks,

Andrei
Apr 07 2012
next sibling parent reply "Daniel Murphy" <yebblies nospamgmail.com> writes:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:jlr9ak$28bv$1 digitalmars.com...
 Walter and I discussed today about using the small string optimization in 
 string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first byte 
 as discriminator, which means that all strings under 16 chars need no 
 memory allocation at all.

 It turns out statistically a lot of strings are small. According to a 
 variety of systems we use at Facebook, the small buffer optimization is 
 king - it just works great in all cases. In D that means better speed, 
 better locality, and less garbage.

This sounds like it would be a great addition to phobos.
 For this to happen, we need to start an effort of migrating built-in 
 arrays into runtime, essentially making them templates that the compiler 
 lowers to.

- This has been a disaster for AAs - Is it worth doing for 32 bit? - Would generate false pointers - Run-time check on every array access? - Why should this be in the language/compiler instead of phobos? April 1st was last week!!
Apr 07 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 1:33 AM, Daniel Murphy wrote:
 - This has been a disaster for AAs

Making them "magic" has been the problem. We must revert that.
 - Is it worth doing for 32 bit?

Probably not.
 - Would generate false pointers

Fair point but we're also moving to precise collection :o).
 - Run-time check on every array access?

Cost is negligible in most cases according to the extensive measurements we've done.
 - Why should this be in the language/compiler instead of phobos?

People use built-in immutable(char)[] as strings. We want to benefit existing uses. Andrei
Apr 08 2012
prev sibling next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 4/8/12, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 essentially making them templates

I just hope that doesn't cause: 1) Awful template errors 2) Slower build times 3) More ICEs
Apr 08 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 4:30 AM, Andrej Mitrovic wrote:
 On 4/8/12, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:
 essentially making them templates

I just hope that doesn't cause: 1) Awful template errors 2) Slower build times 3) More ICEs

Walter and I agree that relying on sheer D for instead of compiler-generated magic (in this case, bitmaps etc) is the better solution. Implementing a generic marker as a template using introspection is very simple - I predict a few dozen lines. Once that is finished there will obviously be no template errors. I don't know whether build times would be impacted. There will be fewer ICEs because there will be less reliance on the compiler. Andrei
Apr 08 2012
prev sibling next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu 
wrote:
 Walter and I discussed today about using the small string 
 optimization in string and other arrays of immutable small 
 objects.

 On 64 bit machines, string occupies 16 bytes. We could use the 
 first byte as discriminator, which means that all strings under 
 16 chars need no memory allocation at all.

Don't use the first byte. Use the last byte. The last byte is the highest-order byte of the length. Limiting arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation than making assumptions about the memory layout.
Apr 08 2012
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 4:46 AM, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:
 Walter and I discussed today about using the small string optimization
 in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first
 byte as discriminator, which means that all strings under 16 chars
 need no memory allocation at all.

Don't use the first byte. Use the last byte. The last byte is the highest-order byte of the length. Limiting arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation than making assumptions about the memory layout.

Hehe. Good idea. On big endian machines the last byte is the ticket! Andrei
Apr 08 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 9:49 AM, Andrei Alexandrescu wrote:
 On 4/8/12 4:46 AM, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:
 Walter and I discussed today about using the small string optimization
 in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first
 byte as discriminator, which means that all strings under 16 chars
 need no memory allocation at all.

Don't use the first byte. Use the last byte. The last byte is the highest-order byte of the length. Limiting arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation than making assumptions about the memory layout.

Hehe. Good idea. On big endian machines the last byte is the ticket!

s/big/little/
Apr 08 2012
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 4:54 AM, Manu wrote:
 On 8 April 2012 12:46, Vladimir Panteleev <vladimir thecybershadow.net
 <mailto:vladimir thecybershadow.net>> wrote:

     On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:

         Walter and I discussed today about using the small string
         optimization in string and other arrays of immutable small objects.

         On 64 bit machines, string occupies 16 bytes. We could use the
         first byte as discriminator, which means that all strings under
         16 chars need no memory allocation at all.


     Don't use the first byte. Use the last byte.

     The last byte is the highest-order byte of the length. Limiting
     arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much
     nicer limitation than making assumptions about the memory layout.


 What is the plan for 32bit?

We can experiment with making strings shorter than 8 chars in-situ. The drawback will be that length will be limited to 29 bits, i.e. 512MB. Andrei
Apr 08 2012
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 10:03 AM, Manu wrote:
 29 bits? ...not 31?

Sorry, 31 indeed.
 How does this implementation actually work? On 32/64 bits, and
 little/big endian?
 I can only imagine it working with a carefully placed 1 bit. bit-0 of
 the size on little endian, and bit-31 of the size on big endian. That
 should only halve the address range (leaving 31 bits)... where did the
 other 2 bits go?

Essentially it will use either the first or the last bit of the representation as discriminator. That bit is most likely "taken" from the length representation. Shifting and masking can easily account for it when computing length of large strings.
 I also hope this only affects slices of chars? It will ignore this
 behaviour for anything other than char arrays right?

It works for any arrays of sufficiently small immutable data type (e.g. immutable(byte)[]), but the most advantage is reaped for string. Andrei
Apr 08 2012
prev sibling parent deadalnix <deadalnix gmail.com> writes:
Le 08/04/2012 16:52, Andrei Alexandrescu a écrit :
 On 4/8/12 4:54 AM, Manu wrote:
 On 8 April 2012 12:46, Vladimir Panteleev <vladimir thecybershadow.net
 <mailto:vladimir thecybershadow.net>> wrote:

 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:

 Walter and I discussed today about using the small string
 optimization in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the
 first byte as discriminator, which means that all strings under
 16 chars need no memory allocation at all.


 Don't use the first byte. Use the last byte.

 The last byte is the highest-order byte of the length. Limiting
 arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much
 nicer limitation than making assumptions about the memory layout.


 What is the plan for 32bit?

We can experiment with making strings shorter than 8 chars in-situ. The drawback will be that length will be limited to 29 bits, i.e. 512MB. Andrei

As it is a flag, why not limit the string size to 2GB instead of 512MB ?
Apr 10 2012
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 4:55 AM, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 09:46:28 UTC, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:
 Walter and I discussed today about using the small string
 optimization in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first
 byte as discriminator, which means that all strings under 16 chars
 need no memory allocation at all.

Don't use the first byte. Use the last byte. The last byte is the highest-order byte of the length. Limiting arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation than making assumptions about the memory layout.

Erm... never mind, I thought the pointer was the first field.

Me too, actually.
 Even so, how would you use the lowest-order byte of the length as a
 discriminator? Unless you allocated bytes 2-8 for the length (which
 would be an unaligned read, or a shift every time...)

It's amazing how fast shift works :o).
 "making assumptions about the memory layout" now seems like the only
 solution.

 Also, what will be .ptr for such arrays? It can't point inside the
 string, because the type is immutable and the array is on the stack.
 Duplication can be expensive as well.

Once anyone asks for .ptr a conservative copy will be made. Andrei
Apr 08 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/8/2012 7:53 AM, Andrei Alexandrescu wrote:
 Once anyone asks for .ptr a conservative copy will be made.

That could get expensive. You cannot just point into the small string part, because that may only exist temporarily on the stack. There are some pathological cases for this.
Apr 08 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 12:05 PM, Walter Bright wrote:
 On 4/8/2012 7:53 AM, Andrei Alexandrescu wrote:
 Once anyone asks for .ptr a conservative copy will be made.

That could get expensive. You cannot just point into the small string part, because that may only exist temporarily on the stack. There are some pathological cases for this.

As I mentioned, the first call to .ptr changes representation, thus making the allocation that the optimization had saved. Things are not worse off than before. Andrei
Apr 08 2012
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2012-04-08 17:14:37 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 4/8/12 12:05 PM, Walter Bright wrote:
 On 4/8/2012 7:53 AM, Andrei Alexandrescu wrote:
 Once anyone asks for .ptr a conservative copy will be made.

That could get expensive. You cannot just point into the small string part, because that may only exist temporarily on the stack. There are some pathological cases for this.

As I mentioned, the first call to .ptr changes representation, thus making the allocation that the optimization had saved. Things are not worse off than before.

This only works if the code has write access to that variable. Also, if the variable is a temporary copy such as a by-value function parameter, only the representation of that temporary copy will be affected. Every time you call a function which calls .ptr a new copy will be allocated on the heap. If such a function call is part of a loop, it'll degrade performance noticeably. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 08 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 1:16 PM, Michel Fortin wrote:
 On 2012-04-08 17:14:37 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:
 As I mentioned, the first call to .ptr changes representation, thus
 making the allocation that the optimization had saved. Things are not
 worse off than before.

This only works if the code has write access to that variable. Also, if the variable is a temporary copy such as a by-value function parameter, only the representation of that temporary copy will be affected. Every time you call a function which calls .ptr a new copy will be allocated on the heap. If such a function call is part of a loop, it'll degrade performance noticeably.

I think there's a bit of a confusion. Could you please clarify with an example? Thanks, Andrei
Apr 08 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 1:26 PM, Andrei Alexandrescu wrote:
 On 4/8/12 1:16 PM, Michel Fortin wrote:
 On 2012-04-08 17:14:37 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:
 As I mentioned, the first call to .ptr changes representation, thus
 making the allocation that the optimization had saved. Things are not
 worse off than before.

This only works if the code has write access to that variable. Also, if the variable is a temporary copy such as a by-value function parameter, only the representation of that temporary copy will be affected. Every time you call a function which calls .ptr a new copy will be allocated on the heap. If such a function call is part of a loop, it'll degrade performance noticeably.

I think there's a bit of a confusion. Could you please clarify with an example?

I think I understand: string s; foreach (lots) fun(s); void fun(string s) { ... s.ptr ... } I agree in this case there would be one allocation per call. Andrei
Apr 08 2012
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2012-04-08 18:30:49 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 4/8/12 1:26 PM, Andrei Alexandrescu wrote:
 On 4/8/12 1:16 PM, Michel Fortin wrote:
 Also, if the variable is a temporary copy such as a by-value function
 parameter, only the representation of that temporary copy will be
 affected. Every time you call a function which calls .ptr a new copy
 will be allocated on the heap. If such a function call is part of a
 loop, it'll degrade performance noticeably.

I think there's a bit of a confusion. Could you please clarify with an example?

I think I understand: string s; foreach (lots) fun(s); void fun(string s) { ... s.ptr ... }

Exactly.
 I agree in this case there would be one allocation per call.

Basically, it's the same problem as receiving a const(char)[] as a function parameter and .idup-ing it in the function. The recommendation is for such functions to accept only immutable(char)[] and let the caller find a way to send an immutable string. That way, if the caller is repeatedly using the same string, it can reuse the efficient representation. The same principles apply here: if your function needs to use .ptr, it should request a type on which you can call .ptr efficiently and let the caller find a way to pass think kind of string. To me, your small buffer optimization proposal looks like an argument in favor of creating a separate string type free from the constrains of offering at all time bare access to the bytes. If only that type could be the default... -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 08 2012
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 9:26 AM, Manu wrote:
 Is it realistic that anyone can actually use raw d-string's in an app
 that performs a lot of string manipulation?

Yes.
 I bet most people end up
 with a custom string class anyway...

That does happen, but much more rarely than one might think.
 Who's written a string-heavy app without their own string helper class?
 I ended up with a string class within about half an hour of trying to
 work with D strings (initially just to support stack strings, then it grew).

A lot of people write string-heavy apps with the built-in strings. "Heavy" actually describes a continuum. No matter how you put it, improving the performance of built-in strings is beneficial. Andrei
Apr 08 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--20cf302ef74c18398704bd27e142
Content-Type: text/plain; charset=UTF-8

On 8 April 2012 12:46, Vladimir Panteleev <vladimir thecybershadow.net>wrote:

 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:

 Walter and I discussed today about using the small string optimization in
 string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first byte
 as discriminator, which means that all strings under 16 chars need no
 memory allocation at all.

Don't use the first byte. Use the last byte. The last byte is the highest-order byte of the length. Limiting arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation than making assumptions about the memory layout.

What is the plan for 32bit? --20cf302ef74c18398704bd27e142 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 8 April 2012 12:46, Vladimir Panteleev <span = dir=3D"ltr">&lt;<a href=3D"mailto:vladimir thecybershadow.net">vladimir the= cybershadow.net</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" = style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"im">On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandre= scu wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> Walter and I discussed today about using the small string optimization in s= tring and other arrays of immutable small objects.<br> <br> On 64 bit machines, string occupies 16 bytes. We could use the first byte a= s discriminator, which means that all strings under 16 chars need no memory= allocation at all.<br> </blockquote> <br></div> Don&#39;t use the first byte. Use the last byte.<br> <br> The last byte is the highest-order byte of the length. Limiting arrays to 1= 8.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation tha= n making assumptions about the memory layout.<br></blockquote><div><br> </div><div>What is the plan for 32bit?</div></div><br> --20cf302ef74c18398704bd27e142--
Apr 08 2012
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 8 April 2012 at 09:46:28 UTC, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu 
 wrote:
 Walter and I discussed today about using the small string 
 optimization in string and other arrays of immutable small 
 objects.

 On 64 bit machines, string occupies 16 bytes. We could use the 
 first byte as discriminator, which means that all strings 
 under 16 chars need no memory allocation at all.

Don't use the first byte. Use the last byte. The last byte is the highest-order byte of the length. Limiting arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation than making assumptions about the memory layout.

Erm... never mind, I thought the pointer was the first field. Even so, how would you use the lowest-order byte of the length as a discriminator? Unless you allocated bytes 2-8 for the length (which would be an unaligned read, or a shift every time...) "making assumptions about the memory layout" now seems like the only solution. Also, what will be .ptr for such arrays? It can't point inside the string, because the type is immutable and the array is on the stack. Duplication can be expensive as well.
Apr 08 2012
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-04-08 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

Just don't make the same mistake as with AA. -- /Jacob Carlborg
Apr 08 2012
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 5:45 AM, Jacob Carlborg wrote:
 On 2012-04-08 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

Just don't make the same mistake as with AA.

The mistake with AAs was done long ago, but it was forced as AAs predated templates. Andrei
Apr 08 2012
parent Jacob Carlborg <doob me.com> writes:
On 2012-04-08 16:54, Andrei Alexandrescu wrote:
 On 4/8/12 5:45 AM, Jacob Carlborg wrote:
 On 2012-04-08 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

Just don't make the same mistake as with AA.

The mistake with AAs was done long ago, but it was forced as AAs predated templates. Andrei

I'm referring to the new template implementation of AAs that got reverted due everything breaking, if I recall correctly. -- /Jacob Carlborg
Apr 08 2012
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Apr 08, 2012 at 05:35:50PM +0200, Jacob Carlborg wrote:
 On 2012-04-08 16:54, Andrei Alexandrescu wrote:
On 4/8/12 5:45 AM, Jacob Carlborg wrote:


Just don't make the same mistake as with AA.

The mistake with AAs was done long ago, but it was forced as AAs predated templates. Andrei

I'm referring to the new template implementation of AAs that got reverted due everything breaking, if I recall correctly.

Huh? When was this? T -- "I suspect the best way to deal with procrastination is to put off the procrastination itself until later. I've been meaning to try this, but haven't gotten around to it yet. " -- swr
Apr 08 2012
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Apr 08, 2012 at 12:56:38AM -0500, Andrei Alexandrescu wrote:
[...]
 1. What happened to the new hash project? We need to take that to
 completion.

Sorry, I've been busy at work and haven't had too much free time to work on it. The current code is available on github: https://github.com/quickfur/New-AA-implementation The major outstanding issues are: - Qualified keys not fully working: the current code has a few corner cases that don't work with shared/immutable/inout keys. One major roadblock is how to implement this: alias someType T; inout(T) myFunc(inout(T) arg, ...) { int[inout(T)] aa; ... } The problem is that inout gets carried over into the AA template, which breaks because it instantiates into something that has: struct Slot { hash_t hash; inout(T) key; // <-- this causes a compile error Value value; } Ideally, AA keys should all be stored as immutable inside the AA, and automatically converted to/from the qualified type the user specified. - Template bloat: the current code uses template member functions, and will instantiate a new function for every implicit conversion of input key types. This also depends on IFTI, which has some quirks (compiler bugs) that make the code ugly (e.g., strings and arrays not treated equally by the compiler, requiring hacks to make implicit conversion work). Timon has suggested an alternative way of handling implicit conversions, which I think is better, but I need to take some time to actually implement it. - Static initialization of AA's (AA literals that compile directly into object code). This should be possible in principle, but I've run into what may be a CTFE bug that prevents it from working. - A not-so-major issue is to finish the toHash() implementations for all native types (currently it works for some common key types, but coverage is still incomplete). Once this is done, we can finally get rid of getHash from TypeInfo; UFCS will let us simply write x.toHash() for pretty much any type x. Once these issues are resolved, there remains the major task of actually integrating this code with druntime/dmd. A lot of work is expected on the dmd end, because of the current amount of hacks in dmd to make AA's work. T -- Valentine's Day: an occasion for florists to reach into the wallets of nominal lovers in dire need of being reminded to profess their hypothetical love for their long-forgotten.
Apr 08 2012
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 8:54 AM, H. S. Teoh wrote:
 Sorry, I've been busy at work and haven't had too much free time to work
 on it.  The current code is available on github:

 	https://github.com/quickfur/New-AA-implementation

Thanks for the update! Let me reiterate this is important work for many reasons, so it would be great if you got around to pushing it through. Do you think someone else in the community could help you? Andrei
Apr 08 2012
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 8:54 AM, H. S. Teoh wrote:
 - Qualified keys not fully working: the current code has a few corner
    cases that don't work with shared/immutable/inout keys. One major
    roadblock is how to implement this:

 	alias someType T;
 	inout(T) myFunc(inout(T) arg, ...) {
 		int[inout(T)] aa;
 		...
 	}

I wonder how frequently such code is used. Andrei
Apr 08 2012
prev sibling next sibling parent "Tove" <tove fransson.se> writes:
On Sunday, 8 April 2012 at 13:53:07 UTC, H. S. Teoh wrote:
 On Sun, Apr 08, 2012 at 12:56:38AM -0500, Andrei Alexandrescu 
 wrote:
 [...]
 1. What happened to the new hash project? We need to take that 
 to
 completion.

Sorry, I've been busy at work and haven't had too much free time to work on it. The current code is available on github: https://github.com/quickfur/New-AA-implementation The major outstanding issues are: - Qualified keys not fully working: the current code has a few corner cases that don't work with shared/immutable/inout keys. One major roadblock is how to implement this: alias someType T; inout(T) myFunc(inout(T) arg, ...) { int[inout(T)] aa; ... } The problem is that inout gets carried over into the AA template, which breaks because it instantiates into something that has: struct Slot { hash_t hash; inout(T) key; // <-- this causes a compile error Value value; } Ideally, AA keys should all be stored as immutable inside the AA, and automatically converted to/from the qualified type the user specified. - Template bloat: the current code uses template member functions, and will instantiate a new function for every implicit conversion of input key types. This also depends on IFTI, which has some quirks (compiler bugs) that make the code ugly (e.g., strings and arrays not treated equally by the compiler, requiring hacks to make implicit conversion work). Timon has suggested an alternative way of handling implicit conversions, which I think is better, but I need to take some time to actually implement it. - Static initialization of AA's (AA literals that compile directly into object code). This should be possible in principle, but I've run into what may be a CTFE bug that prevents it from working. - A not-so-major issue is to finish the toHash() implementations for all native types (currently it works for some common key types, but coverage is still incomplete). Once this is done, we can finally get rid of getHash from TypeInfo; UFCS will let us simply write x.toHash() for pretty much any type x. Once these issues are resolved, there remains the major task of actually integrating this code with druntime/dmd. A lot of work is expected on the dmd end, because of the current amount of hacks in dmd to make AA's work. T

doesn't this work? immutable std.traits.Unqual!(inout(T)) key;
Apr 08 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Apr 08, 2012 at 04:11:10PM +0200, Tove wrote:
 On Sunday, 8 April 2012 at 13:53:07 UTC, H. S. Teoh wrote:
On Sun, Apr 08, 2012 at 12:56:38AM -0500, Andrei Alexandrescu
wrote:
[...]
1. What happened to the new hash project? We need to take that to
completion.



The major outstanding issues are:

- Qualified keys not fully working: the current code has a few corner
  cases that don't work with shared/immutable/inout keys. One major
  roadblock is how to implement this:

	alias someType T;
	inout(T) myFunc(inout(T) arg, ...) {
		int[inout(T)] aa;
		...
	}

  The problem is that inout gets carried over into the AA template,
  which breaks because it instantiates into something that has:

	struct Slot {
		hash_t hash;
		inout(T) key;	// <-- this causes a compile error
		Value value;
	}


 doesn't this work?
 immutable std.traits.Unqual!(inout(T)) key;

I suppose so, but the problem is more with the return type of various AA functions. If the user writes int[inout(T)] aa, then he would expect that aa.keys should return inout(T)[]. But the problem here is that this is impossible to express in the current system, because inout has a different meaning when you write it in the AA method, than its meaning in the context of the calling function. For example, this doesn't quite work: struct AA { ... inout(Key) keys() { ... // Problem: inout here doesn't mean what it // means in the context of the caller } } T -- Tech-savvy: euphemism for nerdy.
Apr 08 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--20cf30363eb7dc4e7e04bd2bad38
Content-Type: text/plain; charset=UTF-8

On 8 April 2012 12:54, Manu <turkeyman gmail.com> wrote:

 On 8 April 2012 12:46, Vladimir Panteleev <vladimir thecybershadow.net>wrote:

 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:

 Walter and I discussed today about using the small string optimization
 in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first
 byte as discriminator, which means that all strings under 16 chars need no
 memory allocation at all.

Don't use the first byte. Use the last byte. The last byte is the highest-order byte of the length. Limiting arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation than making assumptions about the memory layout.

What is the plan for 32bit?

The only way I can see this working is if the marker happens to be the top bit of the size (limiting arrays to 2gb on 32bit systems, which is probably fine), and if set, the next 7 bits are the size, leaving 7 bytes for a string... 7 bytes is pushing it, 15 bytes is very useful, 7 is borderline... That all said, I'll ultimately end out with my own string type anyway which is multiples of, and aligned to 16 bytes, which will support SSE string opcodes. I wonder if these considerations can be factored into the built-in string? Is it realistic that anyone can actually use raw d-string's in an app that performs a lot of string manipulation? I bet most people end up with a custom string class anyway... Who's written a string-heavy app without their own string helper class? I ended up with a string class within about half an hour of trying to work with D strings (initially just to support stack strings, then it grew). --20cf30363eb7dc4e7e04bd2bad38 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 8 April 2012 12:54, Manu <span dir=3D"ltr">&l= t;<a href=3D"mailto:turkeyman gmail.com">turkeyman gmail.com</a>&gt;</span>= wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bor= der-left:1px #ccc solid;padding-left:1ex"> <div class=3D"gmail_quote"><div><div class=3D"h5">On 8 April 2012 12:46, Vl= adimir Panteleev <span dir=3D"ltr">&lt;<a href=3D"mailto:vladimir thecybers= hadow.net" target=3D"_blank">vladimir thecybershadow.net</a>&gt;</span> wro= te:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> <div>On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:<br=

x #ccc solid;padding-left:1ex"> Walter and I discussed today about using the small string optimization in s= tring and other arrays of immutable small objects.<br> <br> On 64 bit machines, string occupies 16 bytes. We could use the first byte a= s discriminator, which means that all strings under 16 chars need no memory= allocation at all.<br> </blockquote> <br></div> Don&#39;t use the first byte. Use the last byte.<br> <br> The last byte is the highest-order byte of the length. Limiting arrays to 1= 8.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation tha= n making assumptions about the memory layout.<br></blockquote><div><br> </div></div></div><div>What is the plan for 32bit?</div></div></blockquote>= <div><br></div><div>The only way I can see this working is if the marker ha= ppens to be the top bit of the size (limiting arrays to 2gb on 32bit system= s, which is probably fine), and if set, the next 7 bits are the size, leavi= ng 7 bytes for a string... 7 bytes is pushing it, 15 bytes is very useful, = 7 is borderline...</div> <div><br></div><div>That all said, I&#39;ll ultimately end out with my own = string type anyway which is multiples of, and aligned to 16 bytes, which wi= ll support SSE string opcodes. I wonder if these considerations can be fact= ored into the built-in string?</div> <div><br></div><div>Is it realistic that anyone can actually use raw d-stri= ng&#39;s in an app that performs a lot of string manipulation? I bet most p= eople end up with a custom string class anyway...</div><div>Who&#39;s writt= en a string-heavy app without their own string helper class? I ended up wit= h a string class within about half an hour of trying to work with D strings= (initially just to support stack strings, then it grew).</div> </div> --20cf30363eb7dc4e7e04bd2bad38--
Apr 08 2012
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2012-04-08 05:56:38 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Walter and I discussed today about using the small string optimization 
 in string and other arrays of immutable small objects.
 
 On 64 bit machines, string occupies 16 bytes. We could use the first 
 byte as discriminator, which means that all strings under 16 chars need 
 no memory allocation at all.
 
 It turns out statistically a lot of strings are small. According to a 
 variety of systems we use at Facebook, the small buffer optimization is 
 king - it just works great in all cases. In D that means better speed, 
 better locality, and less garbage.

Small buffer optimization is a very good thing to have indeed. But… how can you preserve existing semantics? For instance, let's say you have this: string s = "abcd"; which is easily representable as a small string. Do you use the small buffer optimization in the assignment? That seems like a definitive yes. But as soon as you take a pointer to that string, you break the immutability guaranty: immutable(char)[] s = "abcd"; immutable(char)* p = s.ptr; s = "defg"; // assigns to where? There's also the issue of this code being legal currently: immutable(char)* getPtr(string s) { return s.ptr; } If you pass a small string to getPtr, it'll be copied to the local stack frame and you'll be returning a pointer to that local copy. You could mitigate this by throwing an error when trying to get the pointer to a small string, but then you have to disallow taking the pointer of a const(char)[] pointing to it: const(char)* getPtr2(const(char)[] s) { return s.ptr; } const(char)* getAbcdPtr() { string s = "abcd"; // s implicitly converted to regular const(char)[] pointing to local stack frame const(char)* c = getPtr2(s); // c points to the storage of s, which is the local stack frame return c; } So it's sad, but I am of the opinion that the only way to implement small buffer optimization is to have a higher-level abstraction, a distinct type for such small strings. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 08 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 9:59 AM, Michel Fortin wrote:
 But as soon as you take a pointer to that string, you break the
 immutability guaranty:

 immutable(char)[] s = "abcd";
 immutable(char)* p = s.ptr;
 s = "defg"; // assigns to where?

Taking .ptr will engender a copy. A small regression will be that address of individual chars cannot be taken. Andrei
Apr 08 2012
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2012-04-08 15:06:13 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 4/8/12 9:59 AM, Michel Fortin wrote:
 But as soon as you take a pointer to that string, you break the
 immutability guaranty:
 
 immutable(char)[] s = "abcd";
 immutable(char)* p = s.ptr;
 s = "defg"; // assigns to where?

Taking .ptr will engender a copy. A small regression will be that address of individual chars cannot be taken.

You know, many people have been wary of hidden memory allocations in the past. That's not going to make them happy. I'm not complaining, but I think .ptr should return null in those cases. Let people use toStringz when they need a C string, and let people deal with the ugly details themselves if they're using .ptr to bypass array bound checking. Because if someone used .ptr somewhere to bypass bound checking and instead he gets a memory allocation at each loop iteration… it won't be pretty. And what about implicit conversions to const(char)[]? That too will require a copy, because otherwise it could point to the local stack frame where your immutable(char)[] resides. That said, maybe copies of small-string optimized immutable(char)[] could be small-string optimized const(char)[]. That'd not have any side effect since no one can have a mutable pointer/slice to the const copy anyway. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 08 2012
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/8/12 10:48 AM, Michel Fortin wrote:
 On 2012-04-08 15:06:13 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 4/8/12 9:59 AM, Michel Fortin wrote:
 But as soon as you take a pointer to that string, you break the
 immutability guaranty:

 immutable(char)[] s = "abcd";
 immutable(char)* p = s.ptr;
 s = "defg"; // assigns to where?

Taking .ptr will engender a copy. A small regression will be that address of individual chars cannot be taken.

You know, many people have been wary of hidden memory allocations in the past.

Well, the optimization makes for fewer allocations total. In fact .ptr does the allocation that was formerly mandatory.
 That's not going to make them happy. I'm not complaining, but I
 think .ptr should return null in those cases.

That would be too large a regression I think. And it's not detectable during compilation.
 Let people use toStringz
 when they need a C string, and let people deal with the ugly details
 themselves if they're using .ptr to bypass array bound checking. Because
 if someone used .ptr somewhere to bypass bound checking and instead he
 gets a memory allocation at each loop iteration… it won't be pretty.

Only one allocation. First invocation of .ptr effectively changes representation.
 And what about implicit conversions to const(char)[]? That too will
 require a copy, because otherwise it could point to the local stack
 frame where your immutable(char)[] resides. That said, maybe copies of
 small-string optimized immutable(char)[] could be small-string optimized
 const(char)[]. That'd not have any side effect since no one can have a
 mutable pointer/slice to the const copy anyway.

I think casting to const(char)[] should work without allocation. Andrei
Apr 08 2012
prev sibling parent Somedude <lovelydear mailmetrash.com> writes:
Le 08/04/2012 17:48, Michel Fortin a écrit :
 On 2012-04-08 15:06:13 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:
 
 On 4/8/12 9:59 AM, Michel Fortin wrote:
 But as soon as you take a pointer to that string, you break the
 immutability guaranty:

 immutable(char)[] s = "abcd";
 immutable(char)* p = s.ptr;
 s = "defg"; // assigns to where?

Taking .ptr will engender a copy. A small regression will be that address of individual chars cannot be taken.

You know, many people have been wary of hidden memory allocations in the past. That's not going to make them happy. I'm not complaining, but I think .ptr should return null in those cases. Let people use toStringz when they need a C string, and let people deal with the ugly details themselves if they're using .ptr to bypass array bound checking. Because if someone used .ptr somewhere to bypass bound checking and instead he gets a memory allocation at each loop iteration… it won't be pretty. And what about implicit conversions to const(char)[]? That too will require a copy, because otherwise it could point to the local stack frame where your immutable(char)[] resides. That said, maybe copies of small-string optimized immutable(char)[] could be small-string optimized const(char)[]. That'd not have any side effect since no one can have a mutable pointer/slice to the const copy anyway.

You've raised some very good points. I also think this won't do. It is quite clear with your examples (especially the temporary copy as a by value function parameter) that the small strings have some very specific semantics that are completely different from normal string semantics. Therefore I also have the feeling that they should be another type, sstring or something. sstrings could be promoted to full string when necessary, but the whole thing would probably look very very ugly pretty soon. This to me smells like the idea of small string optimization should be put to rest for a while, requiring a little more thought.
Apr 08 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--0023544718dc8943d004bd2c2f07
Content-Type: text/plain; charset=UTF-8

On 8 April 2012 17:52, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>wrote:

 On 4/8/12 4:54 AM, Manu wrote:

 On 8 April 2012 12:46, Vladimir Panteleev <vladimir thecybershadow.net
 <mailto:vladimir **thecybershadow.net <vladimir thecybershadow.net>>>
 wrote:

    On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:

        Walter and I discussed today about using the small string
        optimization in string and other arrays of immutable small objects.

        On 64 bit machines, string occupies 16 bytes. We could use the
        first byte as discriminator, which means that all strings under
        16 chars need no memory allocation at all.


    Don't use the first byte. Use the last byte.

    The last byte is the highest-order byte of the length. Limiting
    arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much
    nicer limitation than making assumptions about the memory layout.


 What is the plan for 32bit?

We can experiment with making strings shorter than 8 chars in-situ. The drawback will be that length will be limited to 29 bits, i.e. 512MB.

29 bits? ...not 31? How does this implementation actually work? On 32/64 bits, and little/big endian? I can only imagine it working with a carefully placed 1 bit. bit-0 of the size on little endian, and bit-31 of the size on big endian. That should only halve the address range (leaving 31 bits)... where did the other 2 bits go? I also hope this only affects slices of chars? It will ignore this behaviour for anything other than char arrays right? --0023544718dc8943d004bd2c2f07 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 8 April 2012 17:52, Andrei Alexandrescu <span= dir=3D"ltr">&lt;<a href=3D"mailto:SeeWebsiteForEmail erdani.org">SeeWebsit= eForEmail erdani.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_qu= ote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex= "> <div class=3D"im">On 4/8/12 4:54 AM, Manu wrote:<br> </div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l= eft:1px #ccc solid;padding-left:1ex"><div class=3D"im"> On 8 April 2012 12:46, Vladimir Panteleev &lt;<a href=3D"mailto:vladimir th= ecybershadow.net" target=3D"_blank">vladimir thecybershadow.net</a><br></di= v><div><div class=3D"h5"> &lt;mailto:<a href=3D"mailto:vladimir thecybershadow.net" target=3D"_blank"=
vladimir <u></u>thecybershadow.net</a>&gt;&gt; wrote:<br>

=C2=A0 =C2=A0On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu = wrote:<br> <br> =C2=A0 =C2=A0 =C2=A0 =C2=A0Walter and I discussed today about using the sm= all string<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0optimization in string and other arrays of immu= table small objects.<br> <br> =C2=A0 =C2=A0 =C2=A0 =C2=A0On 64 bit machines, string occupies 16 bytes. W= e could use the<br> =C2=A0 =C2=A0 =C2=A0 =C2=A0first byte as discriminator, which means that a= ll strings under<br> =C2=A0 =C2=A0 =C2=A0 =C2=A016 chars need no memory allocation at all.<br> <br> <br> =C2=A0 =C2=A0Don&#39;t use the first byte. Use the last byte.<br> <br> =C2=A0 =C2=A0The last byte is the highest-order byte of the length. Limiti= ng<br> =C2=A0 =C2=A0arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a = much<br> =C2=A0 =C2=A0nicer limitation than making assumptions about the memory lay= out.<br> <br> <br></div></div><div class=3D"im"> What is the plan for 32bit?<br> </div></blockquote> <br> We can experiment with making strings shorter than 8 chars in-situ. The dra= wback will be that length will be limited to 29 bits, i.e. 512MB.=C2=A0</bl= ockquote><div><br></div><div>29 bits? ...not 31?</div><div>How does this im= plementation actually work? On 32/64 bits, and little/big endian?</div> <div>I can only imagine it working with a carefully placed 1 bit. bit-0 of = the size on little endian, and bit-31 of the size on big endian. That shoul= d only halve the address range (leaving 31 bits)... where did the other 2 b= its go?</div> <div><br></div><div>I also hope this only affects slices of chars? It will = ignore this behaviour for anything other than char arrays right?</div></div=

--0023544718dc8943d004bd2c2f07--
Apr 08 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Apr 08, 2012 at 10:00:37AM -0500, Andrei Alexandrescu wrote:
 On 4/8/12 8:54 AM, H. S. Teoh wrote:
Sorry, I've been busy at work and haven't had too much free time to work
on it.  The current code is available on github:

	https://github.com/quickfur/New-AA-implementation

Thanks for the update! Let me reiterate this is important work for many reasons, so it would be great if you got around to pushing it through.

I'll try my best to work on it when I have free time.
 Do you think someone else in the community could help you?

Well, I put the code up on github for a reason. :-) I'll only be too glad to have someone else contribute. The code itself isn't all that complicated; it's basically just a facsimile of aaA.d with many bugs fixed due to the fact that we now have direct access to key/value types. T -- "Life is all a great joke, but only the brave ever get the point." -- Kenneth Rexroth
Apr 08 2012
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
On 08.04.2012 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

vote -= real.infinity. That would kill D.
Apr 09 2012
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 04/09/2012 10:24 AM, Don wrote:
 On 08.04.2012 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

vote -= real.infinity. That would kill D.

Why does this even compile? void main(){ long vote; vote -= real.infinity; assert(vote == long.min); }
Apr 09 2012
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 3:24 AM, Don wrote:
 On 08.04.2012 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

vote -= real.infinity. That would kill D.

Why? Andrei
Apr 09 2012
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/9/12 4:21 AM, Manu wrote:
 After thinking on it a bit, I'm becoming a little worried about this
 move for 2 rarely considered reasons:
 Using lowering to a template, debug(/unoptimised) performance will
 probably get a lot slower, which is really annoying. And
 debugging/stepping might become considerably more annoying too, if every
 time I press F11 (step in) over a function call that happens to receive
 an arg from an array, the debugger then steps into the array templates
 index operator... We'd be no better off than with STL, unless the
 language has clever ways of hiding this magic from the debugger too, and
 optimising/inlining the index even in debug builds...? But this is the
 built-in array, and not a library we can optionally not use.

I agree. So we have the counterarguments: 1. Lowering would treat array primitives as sheer D code, subject to refusal of inlining. That means worse performance. 2. Unless the compiler takes special measures, source-level debuggers will trace through core, uninteresting code for array operations. 3. There are patterns that attempt to optimize by e.g. using .ptr, but end up pessimizing code because they trigger multiple memory allocations. Andrei
Apr 09 2012
parent "Nick Sabalausky" <a a.a> writes:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:jlut8k$j4o$1 digitalmars.com...
 I agree. So we have the counterarguments:

 1. Lowering would treat array primitives as sheer D code, subject to 
 refusal of inlining. That means worse performance.

So in other words, we need the forceinline that some people have strongly requested? Make it work even without -inline, and then add -noinline for any rare cases where someone might need to forcefully disable forceinline. Shouldn't that take care of it?
 2. Unless the compiler takes special measures, source-level debuggers will 
 trace through core, uninteresting code for array operations.

Would forceinline fix this?
 3. There are patterns that attempt to optimize by e.g. using .ptr, but end 
 up pessimizing code because they trigger multiple memory allocations.

Someone's suggestion of just making .ptr null instead of doing implicit allocations was an interesting idea.
Apr 09 2012
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Monday, 9 April 2012 at 14:55:16 UTC, Andrei Alexandrescu 
wrote:
 3. There are patterns that attempt to optimize by e.g. using 
 .ptr, but end up pessimizing code because they trigger multiple 
 memory allocations.


 Andrei

It's important to note that this pattern is probably most common in glue code to C libraries, not bounds-checking related optimizations. There are countless of C library functions which receive the equivalent of an array by taking a pointer and a length, and implicit allocation on `foo.ptr` is completely unacceptable in these cases. It's also common to avoid the `toStringz` function for strings you know are zero-terminated, using `.ptr` directly instead, as the toStringz function unconditionally appends a zero these days (and for good reasons, its previous optimization was extremely optimistic about its input).
Apr 09 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 4/9/12, Jakob Ovrum <jakobovrum gmail.com> wrote:
 It's also common to avoid the `toStringz` function for strings
 you know are zero-terminated, using `.ptr` directly instead.

Yup. E.g. WinAPI text drawing functions take a wchar* and a length. I don't have to call toUTF16z but just pass a pointer, or even a pointer to a specific element via &arr[index] (after calling std.utf.stride, of course).
 the toStringz function unconditionally appends a zero these days

The one taking (const(char)[] s) does this, but not the other overload taking (string s). Whether or not that's safe I don't really know. I've had an argument over this on github, but I don't know if it was about toStringz or maybe toUTF16z. I haven't got the link to the discussion.
Apr 09 2012
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Monday, 9 April 2012 at 15:37:37 UTC, Andrej Mitrovic wrote:
 On 4/9/12, Jakob Ovrum <jakobovrum gmail.com> wrote:
 The one taking (const(char)[] s) does this, but not the other 
 overload
 taking (string s). Whether or not that's safe I don't really 
 know.
 I've had an argument over this on github, but I don't know if 
 it was
 about toStringz or maybe toUTF16z. I haven't got the link to the
 discussion.

You're right, I just confirmed the optimization is still in place for the `string` version. The documentation is identical for both functions. I think this is a mistake. It assumes that the string is either a compiler-generated literal or a GC allocated string, while the documentation does not mention such assumptions. With all the focus on manual memory management and pluggable allocators going on, I think the optimization must be removed or the documentation for the `string` overload changed. This optimization can always be put back in without narrowing the scope of the `string` overload once the above conditions can be reliably checked. Another option is to add a known-bug section to the `string` overload informing users that the function may fail on custom-allocated strings.
Apr 09 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 4/9/12, Jakob Ovrum <jakobovrum gmail.com> wrote:
 With all the focus on manual memory
 management and pluggable allocators going on, I think the
 optimization must be removed or the documentation for the
 `string` overload changed.

Or add a compile-time argument: toStringz(bool ForceAllocate = true)(string s) Or split the unsafe version into another function.
Apr 09 2012
prev sibling parent Manu <turkeyman gmail.com> writes:
--00235452f83495073104bd43b39e
Content-Type: text/plain; charset=UTF-8

On 9 April 2012 17:55, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>wrote:

 On 4/9/12 4:21 AM, Manu wrote:

 After thinking on it a bit, I'm becoming a little worried about this
 move for 2 rarely considered reasons:
 Using lowering to a template, debug(/unoptimised) performance will
 probably get a lot slower, which is really annoying. And
 debugging/stepping might become considerably more annoying too, if every
 time I press F11 (step in) over a function call that happens to receive
 an arg from an array, the debugger then steps into the array templates
 index operator... We'd be no better off than with STL, unless the
 language has clever ways of hiding this magic from the debugger too, and
 optimising/inlining the index even in debug builds...? But this is the
 built-in array, and not a library we can optionally not use.

I agree. So we have the counterarguments: 1. Lowering would treat array primitives as sheer D code, subject to refusal of inlining. That means worse performance. 2. Unless the compiler takes special measures, source-level debuggers will trace through core, uninteresting code for array operations. 3. There are patterns that attempt to optimize by e.g. using .ptr, but end up pessimizing code because they trigger multiple memory allocations.

Indeed. I don't think the small array optimisation would benefit us in any way that could even come close to balancing the loss. I don't think it would benefit us much at all regardless, since we've already proven the need for a string class anyway, and we can make such small-string optimisations already. I am very wary of removing a fundamental language primitive, that is, the basic ability to express an array with absolutely no frills. This idea only really benefits utf-8 strings, so why not just make a real d-string class in phobos and be done with it? People complain about UTF issues with raw strings, and this is what you intend to do anyway to implement the optimisation. d-string could perhaps be made to appear as a first-class language feature via the 'string' keyword, and lower to the library in the way you describe. Just don't blanket this across all arrays in general. --00235452f83495073104bd43b39e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 9 April 2012 17:55, Andrei Alexandrescu <span= dir=3D"ltr">&lt;<a href=3D"mailto:SeeWebsiteForEmail erdani.org">SeeWebsit= eForEmail erdani.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_qu= ote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex= "> <div class=3D"im">On 4/9/12 4:21 AM, Manu wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> After thinking on it a bit, I&#39;m becoming a little worried about this<br=

Using lowering to a template, debug(/unoptimised) performance will<br> probably get a lot slower, which is really annoying. And<br> debugging/stepping might become considerably more annoying too, if every<br=

an arg from an array, the debugger then steps into the array templates<br> index operator... We&#39;d be no better off than with STL, unless the<br> language has clever ways of hiding this magic from the debugger too, and<br=

built-in array, and not a library we can optionally not use.<br> </blockquote> <br></div> I agree. So we have the counterarguments:<br> <br> 1. Lowering would treat array primitives as sheer D code, subject to refusa= l of inlining. That means worse performance.<br> <br> 2. Unless the compiler takes special measures, source-level debuggers will = trace through core, uninteresting code for array operations.<br> <br> 3. There are patterns that attempt to optimize by e.g. using .ptr, but end = up pessimizing code because they trigger multiple memory allocations.</bloc= kquote><div><br></div><div>Indeed. I don&#39;t think the small array optimi= sation would benefit us in any way that could even come close to balancing = the loss. I don&#39;t think it would benefit us much at all regardless, sin= ce we&#39;ve already proven the need for a string class anyway, and we can = make such small-string optimisations already.</div> <div>I am very wary of removing a fundamental language primitive, that is, = the basic ability to express an array with absolutely no frills.</div><div>= <br></div><div>This idea only really benefits utf-8 strings, so why not jus= t make a real d-string class in phobos and be done with it? People complain= about UTF issues with raw strings, and this is what you intend to do anywa= y to implement the optimisation.</div> <div>d-string could perhaps be made to appear as a first-class language fea= ture via the &#39;string&#39; keyword, and lower to the library in the way = you describe. Just don&#39;t blanket this across all arrays in general.</di= v> </div> --00235452f83495073104bd43b39e--
Apr 09 2012
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--20cf302ef74ccf548404bd3b8802
Content-Type: text/plain; charset=UTF-8

On 9 April 2012 11:24, Don <nospam nospam.com> wrote:

 On 08.04.2012 07:56, Andrei Alexandrescu wrote:

  For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

vote -= real.infinity. That would kill D.

How do you figure? After thinking on it a bit, I'm becoming a little worried about this move for 2 rarely considered reasons: Using lowering to a template, debug(/unoptimised) performance will probably get a lot slower, which is really annoying. And debugging/stepping might become considerably more annoying too, if every time I press F11 (step in) over a function call that happens to receive an arg from an array, the debugger then steps into the array templates index operator... We'd be no better off than with STL, unless the language has clever ways of hiding this magic from the debugger too, and optimising/inlining the index even in debug builds...? But this is the built-in array, and not a library we can optionally not use. --20cf302ef74ccf548404bd3b8802 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 9 April 2012 11:24, Don <span dir=3D"ltr">&lt= ;<a href=3D"mailto:nospam nospam.com">nospam nospam.com</a>&gt;</span> wrot= e:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l= eft:1px #ccc solid;padding-left:1ex"> <div class=3D"im">On 08.04.2012 07:56, Andrei Alexandrescu wrote:<br> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> For this to happen, we need to start an effort of migrating built-in<br> arrays into runtime, essentially making them templates that the compiler<br=

</blockquote> <br></div> vote -=3D real.infinity.<br> <br> That would kill D.</blockquote><div><br></div><div>How do you figure?</div>= <div><br></div><div>After thinking on it a bit, I&#39;m becoming a little w= orried about this move for 2 rarely considered reasons:</div><div>Using low= ering to a template, debug(/unoptimised) performance will probably get a lo= t slower, which is really annoying. And debugging/stepping might become con= siderably more annoying too, if every time I press F11 (step in) over a fun= ction call that happens to receive an arg from an array, the debugger then = steps into the array templates index operator... We&#39;d be no better off = than with STL, unless the language has clever ways of hiding this magic fro= m the debugger too, and optimising/inlining the index even in debug builds.= ..? But this is the built-in array, and not a library we can optionally not= use.</div> </div> --20cf302ef74ccf548404bd3b8802--
Apr 09 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 08 Apr 2012 01:56:38 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Walter and I discussed today about using the small string optimization  
 in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first  
 byte as discriminator, which means that all strings under 16 chars need  
 no memory allocation at all.

 It turns out statistically a lot of strings are small. According to a  
 variety of systems we use at Facebook, the small buffer optimization is  
 king - it just works great in all cases. In D that means better speed,  
 better locality, and less garbage.

 For this to happen, we need to start an effort of migrating built-in  
 arrays into runtime, essentially making them templates that the compiler  
 lowers to. So I have two questions:

 1. What happened to the new hash project? We need to take that to  
 completion.

 2. Is anyone willing to start the effort of migrating built-in slices  
 into templates?

No, this would suck. A better solution - make an *actual* string type that does this, and fixes all the shitty problems that we have from shoehorning arrays into UTF strings. Then alias that type to string. I'm so sick of phobos trying to pretend char[] is not an array, and this would just be another mark against D. -Steve
Apr 09 2012
prev sibling next sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 04/10/12 07:01, Nick Sabalausky wrote:
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
 news:jlut8k$j4o$1 digitalmars.com...
 I agree. So we have the counterarguments:

 1. Lowering would treat array primitives as sheer D code, subject to 
 refusal of inlining. That means worse performance.

So in other words, we need the forceinline that some people have strongly requested? Make it work even without -inline, and then add -noinline for any rare cases where someone might need to forcefully disable forceinline. Shouldn't that take care of it?

Obviously, yes, but should wait until enough attribute support is in place and not be just a inline hack (no point in naming it forceinline - there's no other kind of inline).
 2. Unless the compiler takes special measures, source-level debuggers will 
 trace through core, uninteresting code for array operations.

Would forceinline fix this?

No, but the compiler could just omit the debuginfo for the lowerings, unless requested with a flag.
 3. There are patterns that attempt to optimize by e.g. using .ptr, but end 
 up pessimizing code because they trigger multiple memory allocations.

Someone's suggestion of just making .ptr null instead of doing implicit allocations was an interesting idea.

I hope that wasn't a serious suggestion. What might be possible is passing strings by reference and atomically changing the representation when (if) a pointer is needed. That would have problems too (ABI change, can't have RO strings, can't keep the small strings in registers etc). Doing the small buffer optimization for a new type may be possible, but trying to add it to the current char[] arrays is probably not a good idea. artur
Apr 10 2012
prev sibling next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 10 Apr 2012 10:50:24 +0200
schrieb Artur Skawina <art.08.09 gmail.com>:

 Obviously, yes, but should wait until enough attribute support is in place and
 not be just a  inline hack.

If you refer to the proposed user attributes, they wont change the operation of the compiler. Only your own program code will know how to use them. inline, safe, property, final, nothrow, ... on the other hand are keywords that directly map to flags and hard wired logic in the compiler. Correct me if I'm wrong. -- Marco
Apr 10 2012
prev sibling next sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 04/10/12 19:25, Marco Leise wrote:
 Am Tue, 10 Apr 2012 10:50:24 +0200
 schrieb Artur Skawina <art.08.09 gmail.com>:
 
 Obviously, yes, but should wait until enough attribute support is in place and
 not be just a  inline hack.

If you refer to the proposed user attributes, they wont change the operation of the compiler. Only your own program code will know how to use them. inline, safe, property, final, nothrow, ... on the other hand are keywords that directly map to flags and hard wired logic in the compiler. Correct me if I'm wrong.

I'm saying that introducing new function attributes like inline to the language, when there's a real possibility of "generic" attributes being invented in the near future, may not be a good idea. Any generic scheme should also work for inline and the many other attrs that i've mentioned before - there's no reason to artificially limit the support to *just* user attributes. artur
Apr 10 2012
prev sibling parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 10 Apr 2012 20:52:56 +0200
schrieb Artur Skawina <art.08.09 gmail.com>:

 I'm saying that introducing new function attributes like  inline to the
language,
 when there's a real possibility of "generic" attributes being invented in the
near
 future, may not be a good idea. Any generic scheme should also work for
 inline and
 the many other attrs that i've mentioned before - there's no reason to
artificially
 limit the support to *just* user attributes.
 
 artur

I had to read up on your older posts again. So you are not expecting compiler hooks that allow to change the language semantics and code gen through user attributes, but a common syntax especially for bundling multiple compiler/user attributes like " attr(safe, nothrow, userattr(abc), inline, ...) my_attr_alias" in the event that there will be a lot of platform specific and other pragmas/attributes/keywords like in GCC in the future? Then I tend to agree. -- Marco
Apr 10 2012