digitalmars.D - Small Buffer Optimization for string and friends

Andrei Alexandrescu (18/18) Apr 07 2012 Walter and I discussed today about using the small string optimization

Daniel Murphy (9/21) Apr 07 2012 This sounds like it would be a great addition to phobos.

Andrei Alexandrescu (9/14) Apr 08 2012 Probably not.

Walter Bright (2/5) Apr 18 2018 I don't know of a good generic way to do precise collection with unions.

Manu (9/14) Apr 18 2018 I wonder if precise collectors could leverage runtime support for ambigu...
Manu (3/20) Apr 18 2018 This would be useful too for applications that use bit-packed or

Andrej Mitrovic (5/6) Apr 08 2012 I just hope that doesn't cause:

Andrei Alexandrescu (9/15) Apr 08 2012 Walter and I agree that relying on sheer D for instead of

Vladimir Panteleev (6/12) Apr 08 2012 Don't use the first byte. Use the last byte.

Manu (2/14) Apr 08 2012 What is the plan for 32bit?

Andrei Alexandrescu (4/17) Apr 08 2012 We can experiment with making strings shorter than 8 chars in-situ. The

Manu (10/36) Apr 08 2012 29 bits? ...not 31?

Andrei Alexandrescu (9/18) Apr 08 2012 Essentially it will use either the first or the last bit of the

deadalnix (2/27) Apr 10 2012 As it is a flag, why not limit the string size to 2GB instead of 512MB ?

Vladimir Panteleev (10/24) Apr 08 2012 Erm... never mind, I thought the pointer was the first field.

Andrei Alexandrescu (5/28) Apr 08 2012 It's amazing how fast shift works :o).

Walter Bright (4/5) Apr 08 2012 That could get expensive. You cannot just point into the small string pa...

Andrei Alexandrescu (5/10) Apr 08 2012 As I mentioned, the first call to .ptr changes representation, thus

Michel Fortin (12/23) Apr 08 2012 This only works if the code has write access to that variable.

Andrei Alexandrescu (5/16) Apr 08 2012 I think there's a bit of a confusion. Could you please clarify with an

Andrei Alexandrescu (11/27) Apr 08 2012 I think I understand:

Michel Fortin (20/42) Apr 08 2012 Exactly.

Manu (15/33) Apr 08 2012 The only way I can see this working is if the marker happens to be the t...

Andrei Alexandrescu (7/14) Apr 08 2012 That does happen, but much more rarely than one might think.
Walter Bright (3/9) Apr 19 2018 Warp (a fast C preprocessor)

Manu (3/13) Apr 19 2018 Cool story... 6 years later! :P

Andrei Alexandrescu (3/14) Apr 08 2012 Hehe. Good idea. On big endian machines the last byte is the ticket!

Andrei Alexandrescu (2/17) Apr 08 2012 s/big/little/

Patrick Schluter (6/20) Apr 15 2018 If the length has multi purpose it would be even better to

Jacob Carlborg (4/7) Apr 08 2012 Just don't make the same mistake as with AA.

Andrei Alexandrescu (4/9) Apr 08 2012 The mistake with AAs was done long ago, but it was forced as AAs

Jacob Carlborg (5/16) Apr 08 2012 I'm referring to the new template implementation of AAs that got

H. S. Teoh (9/21) Apr 08 2012 [...]

H. S. Teoh (49/51) Apr 08 2012 [...]

Tove (3/74) Apr 08 2012 doesn't this work?

H. S. Teoh (21/50) Apr 08 2012 [...]

Andrei Alexandrescu (5/8) Apr 08 2012 Thanks for the update! Let me reiterate this is important work for many

H. S. Teoh (10/20) Apr 08 2012 [...]

Andrei Alexandrescu (3/11) Apr 08 2012 I wonder how frequently such code is used.

Michel Fortin (41/52) Apr 08 2012 Small buffer optimization is a very good thing to have indeed. But… ho...

Andrei Alexandrescu (4/9) Apr 08 2012 Taking .ptr will engender a copy. A small regression will be that

Michel Fortin (20/30) Apr 08 2012 You know, many people have been wary of hidden memory allocations in

Andrei Alexandrescu (9/36) Apr 08 2012 Well, the optimization makes for fewer allocations total. In fact .ptr
Somedude (11/40) Apr 08 2012 You've raised some very good points. I also think this won't do.

Don (3/6) Apr 09 2012 vote -= real.infinity.

Timon Gehr (7/13) Apr 09 2012 Why does this even compile?
Manu (13/20) Apr 09 2012 How do you figure?

Andrei Alexandrescu (9/20) Apr 09 2012 I agree. So we have the counterarguments:

Jakob Ovrum (13/17) Apr 09 2012 It's important to note that this pattern is probably most common

Andrej Mitrovic (10/13) Apr 09 2012 Yup. E.g. WinAPI text drawing functions take a wchar* and a length. I

Jakob Ovrum (16/25) Apr 09 2012 You're right, I just confirmed the optimization is still in place

Andrej Mitrovic (4/8) Apr 09 2012 Or add a compile-time argument:

Manu (15/35) Apr 09 2012 Indeed. I don't think the small array optimisation would benefit us in a...
Nick Sabalausky (9/16) Apr 09 2012 So in other words, we need the @forceinline that some people have strong...

Artur Skawina (13/36) Apr 10 2012 Obviously, yes, but should wait until enough attribute support is in pla...

Marco Leise (5/7) Apr 10 2012 If you refer to the proposed user attributes, they wont change the opera...

Artur Skawina (7/14) Apr 10 2012 I'm saying that introducing new function attributes like @inline to the ...

Marco Leise (5/12) Apr 10 2012 I had to read up on your older posts again. So you are not expecting com...

Andrei Alexandrescu (3/9) Apr 09 2012 Why?

Steven Schveighoffer (9/25) Apr 09 2012 No, this would suck.
Per =?UTF-8?B?Tm9yZGzDtnc=?= (6/7) Apr 15 2018 Have anybody put together code that implements this idea in a
Per =?UTF-8?B?Tm9yZGzDtnc=?= (181/184) Apr 17 2018 I put together SSOString at

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter and I discussed today about using the small string optimization 
in string and other arrays of immutable small objects.

On 64 bit machines, string occupies 16 bytes. We could use the first 
byte as discriminator, which means that all strings under 16 chars need 
no memory allocation at all.

It turns out statistically a lot of strings are small. According to a 
variety of systems we use at Facebook, the small buffer optimization is 
king - it just works great in all cases. In D that means better speed, 
better locality, and less garbage.

For this to happen, we need to start an effort of migrating built-in 
arrays into runtime, essentially making them templates that the compiler 
lowers to. So I have two questions:

1. What happened to the new hash project? We need to take that to 
completion.

2. Is anyone willing to start the effort of migrating built-in slices 
into templates?


Thanks,

Andrei

Apr 07 2012

"Daniel Murphy" <yebblies nospamgmail.com> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:jlr9ak$28bv$1 digitalmars.com...
 Walter and I discussed today about using the small string optimization in 
 string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first byte 
 as discriminator, which means that all strings under 16 chars need no 
 memory allocation at all.

 It turns out statistically a lot of strings are small. According to a 
 variety of systems we use at Facebook, the small buffer optimization is 
 king - it just works great in all cases. In D that means better speed, 
 better locality, and less garbage.

This sounds like it would be a great addition to phobos.

 For this to happen, we need to start an effort of migrating built-in 
 arrays into runtime, essentially making them templates that the compiler 
 lowers to.

- This has been a disaster for AAs
- Is it worth doing for 32 bit?
- Would generate false pointers
- Run-time check on every array access?
- Why should this be in the language/compiler instead of phobos?

April 1st was last week!!

Apr 07 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 1:33 AM, Daniel Murphy wrote:
 - This has been a disaster for AAs

Making them "magic" has been the problem. We must revert that.

 - Is it worth doing for 32 bit?

Probably not.

 - Would generate false pointers

Fair point but we're also moving to precise collection :o).

 - Run-time check on every array access?

Cost is negligible in most cases according to the extensive measurements 
we've done.

 - Why should this be in the language/compiler instead of phobos?

People use built-in immutable(char)[] as strings. We want to benefit 
existing uses.


Andrei

Apr 08 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 4/8/2012 7:29 AM, Andrei Alexandrescu wrote:
 On 4/8/12 1:33 AM, Daniel Murphy wrote:
 - Would generate false pointers

 Fair point but we're also moving to precise collection :o).


I don't know of a good generic way to do precise collection with unions.

Apr 18 2018

Manu <turkeyman gmail.com> writes:

On Wed., 18 Apr. 2018, 1:00 pm Walter Bright via Digitalmars-d, <
digitalmars-d puremagic.com> wrote:

 On 4/8/2012 7:29 AM, Andrei Alexandrescu wrote:
 On 4/8/12 1:33 AM, Daniel Murphy wrote:
 - Would generate false pointers

 Fair point but we're also moving to precise collection :o).


 I don't know of a good generic way to do precise collection with unions.

I wonder if precise collectors could leverage runtime support for ambiguous
cases? opPreciseCollect() which might return an array of pointers contained
in T, which would allow runtime logic to determine how the union should be
interpreted... Or maybe the function should receive a delegate which the
function should call on each embedded pointer.

I'm sure some standardised runtime support function can help out in these
cases...

Apr 18 2018

Manu <turkeyman gmail.com> writes:

On Wed., 18 Apr. 2018, 8:36 pm Manu, <turkeyman gmail.com> wrote:

 On Wed., 18 Apr. 2018, 1:00 pm Walter Bright via Digitalmars-d, <
 digitalmars-d puremagic.com> wrote:

 On 4/8/2012 7:29 AM, Andrei Alexandrescu wrote:
 On 4/8/12 1:33 AM, Daniel Murphy wrote:
 - Would generate false pointers

 Fair point but we're also moving to precise collection :o).


 I don't know of a good generic way to do precise collection with unions.

 I wonder if precise collectors could leverage runtime support for
 ambiguous cases? opPreciseCollect() which might return an array of pointers
 contained in T, which would allow runtime logic to determine how the union
 should be interpreted... Or maybe the function should receive a delegate
 which the function should call on each embedded pointer.

 I'm sure some standardised runtime support function can help out in these
 cases...

This would be useful too for applications that use bit-packed or
encoded/implied pointers...

Apr 18 2018

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 4/8/12, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 essentially making them templates

I just hope that doesn't cause:
1) Awful template errors
2) Slower build times
3) More ICEs

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 4:30 AM, Andrej Mitrovic wrote:
 On 4/8/12, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:
 essentially making them templates

 I just hope that doesn't cause:
 1) Awful template errors
 2) Slower build times
 3) More ICEs

Walter and I agree that relying on sheer D for instead of 
compiler-generated magic (in this case, bitmaps etc) is the better 
solution. Implementing a generic marker as a template using 
introspection is very simple - I predict a few dozen lines. Once that is 
finished there will obviously be no template errors. I don't know 
whether build times would be impacted. There will be fewer ICEs because 
there will be less reliance on the compiler.

Andrei

Apr 08 2012

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu 
wrote:
 Walter and I discussed today about using the small string 
 optimization in string and other arrays of immutable small 
 objects.

 On 64 bit machines, string occupies 16 bytes. We could use the 
 first byte as discriminator, which means that all strings under 
 16 chars need no memory allocation at all.

Don't use the first byte. Use the last byte.

The last byte is the highest-order byte of the length. Limiting 
arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much 
nicer limitation than making assumptions about the memory layout.

Apr 08 2012

Manu <turkeyman gmail.com> writes:

On 8 April 2012 12:46, Vladimir Panteleev <vladimir thecybershadow.net>wrote:

 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:

 Walter and I discussed today about using the small string optimization in
 string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first byte
 as discriminator, which means that all strings under 16 chars need no
 memory allocation at all.

 Don't use the first byte. Use the last byte.

 The last byte is the highest-order byte of the length. Limiting arrays to
 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation
 than making assumptions about the memory layout.

What is the plan for 32bit?

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 4:54 AM, Manu wrote:
 On 8 April 2012 12:46, Vladimir Panteleev <vladimir thecybershadow.net
 <mailto:vladimir thecybershadow.net>> wrote:

     On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:

         Walter and I discussed today about using the small string
         optimization in string and other arrays of immutable small objects.

         On 64 bit machines, string occupies 16 bytes. We could use the
         first byte as discriminator, which means that all strings under
         16 chars need no memory allocation at all.


     Don't use the first byte. Use the last byte.

     The last byte is the highest-order byte of the length. Limiting
     arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much
     nicer limitation than making assumptions about the memory layout.


 What is the plan for 32bit?

We can experiment with making strings shorter than 8 chars in-situ. The 
drawback will be that length will be limited to 29 bits, i.e. 512MB.

Andrei

Apr 08 2012

Manu <turkeyman gmail.com> writes:

On 8 April 2012 17:52, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>wrote:

 On 4/8/12 4:54 AM, Manu wrote:

 On 8 April 2012 12:46, Vladimir Panteleev <vladimir thecybershadow.net
 <mailto:vladimir **thecybershadow.net <vladimir thecybershadow.net>>>
 wrote:

    On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:

        Walter and I discussed today about using the small string
        optimization in string and other arrays of immutable small objects.

        On 64 bit machines, string occupies 16 bytes. We could use the
        first byte as discriminator, which means that all strings under
        16 chars need no memory allocation at all.


    Don't use the first byte. Use the last byte.

    The last byte is the highest-order byte of the length. Limiting
    arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much
    nicer limitation than making assumptions about the memory layout.


 What is the plan for 32bit?

 We can experiment with making strings shorter than 8 chars in-situ. The
 drawback will be that length will be limited to 29 bits, i.e. 512MB.


29 bits? ...not 31?
How does this implementation actually work? On 32/64 bits, and little/big
endian?
I can only imagine it working with a carefully placed 1 bit. bit-0 of the
size on little endian, and bit-31 of the size on big endian. That should
only halve the address range (leaving 31 bits)... where did the other 2
bits go?

I also hope this only affects slices of chars? It will ignore this
behaviour for anything other than char arrays right?

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 10:03 AM, Manu wrote:
 29 bits? ...not 31?

Sorry, 31 indeed.

 How does this implementation actually work? On 32/64 bits, and
 little/big endian?
 I can only imagine it working with a carefully placed 1 bit. bit-0 of
 the size on little endian, and bit-31 of the size on big endian. That
 should only halve the address range (leaving 31 bits)... where did the
 other 2 bits go?

Essentially it will use either the first or the last bit of the 
representation as discriminator. That bit is most likely "taken" from 
the length representation. Shifting and masking can easily account for 
it when computing length of large strings.

 I also hope this only affects slices of chars? It will ignore this
 behaviour for anything other than char arrays right?

It works for any arrays of sufficiently small immutable data type (e.g. 
immutable(byte)[]), but the most advantage is reaped for string.


Andrei

Apr 08 2012

deadalnix <deadalnix gmail.com> writes:

Le 08/04/2012 16:52, Andrei Alexandrescu a écrit :
 On 4/8/12 4:54 AM, Manu wrote:
 On 8 April 2012 12:46, Vladimir Panteleev <vladimir thecybershadow.net
 <mailto:vladimir thecybershadow.net>> wrote:

 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:

 Walter and I discussed today about using the small string
 optimization in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the
 first byte as discriminator, which means that all strings under
 16 chars need no memory allocation at all.


 Don't use the first byte. Use the last byte.

 The last byte is the highest-order byte of the length. Limiting
 arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a much
 nicer limitation than making assumptions about the memory layout.


 What is the plan for 32bit?

 We can experiment with making strings shorter than 8 chars in-situ. The
 drawback will be that length will be limited to 29 bits, i.e. 512MB.

 Andrei

As it is a flag, why not limit the string size to 2GB instead of 512MB ?

Apr 10 2012

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Sunday, 8 April 2012 at 09:46:28 UTC, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu 
 wrote:
 Walter and I discussed today about using the small string 
 optimization in string and other arrays of immutable small 
 objects.

 On 64 bit machines, string occupies 16 bytes. We could use the 
 first byte as discriminator, which means that all strings 
 under 16 chars need no memory allocation at all.

 Don't use the first byte. Use the last byte.

 The last byte is the highest-order byte of the length. Limiting 
 arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a 
 much nicer limitation than making assumptions about the memory 
 layout.

Erm... never mind, I thought the pointer was the first field.

Even so, how would you use the lowest-order byte of the length as 
a discriminator? Unless you allocated bytes 2-8 for the length 
(which would be an unaligned read, or a shift every time...)

"making assumptions about the memory layout" now seems like the 
only solution.

Also, what will be .ptr for such arrays? It can't point inside 
the string, because the type is immutable and the array is on the 
stack. Duplication can be expensive as well.

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 4:55 AM, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 09:46:28 UTC, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:
 Walter and I discussed today about using the small string
 optimization in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first
 byte as discriminator, which means that all strings under 16 chars
 need no memory allocation at all.

 Don't use the first byte. Use the last byte.

 The last byte is the highest-order byte of the length. Limiting arrays
 to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer
 limitation than making assumptions about the memory layout.

 Erm... never mind, I thought the pointer was the first field.

Me too, actually.

 Even so, how would you use the lowest-order byte of the length as a
 discriminator? Unless you allocated bytes 2-8 for the length (which
 would be an unaligned read, or a shift every time...)

It's amazing how fast shift works :o).

 "making assumptions about the memory layout" now seems like the only
 solution.

 Also, what will be .ptr for such arrays? It can't point inside the
 string, because the type is immutable and the array is on the stack.
 Duplication can be expensive as well.

Once anyone asks for .ptr a conservative copy will be made.


Andrei

Apr 08 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 4/8/2012 7:53 AM, Andrei Alexandrescu wrote:
 Once anyone asks for .ptr a conservative copy will be made.

That could get expensive. You cannot just point into the small string part, 
because that may only exist temporarily on the stack. There are some 
pathological cases for this.

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 12:05 PM, Walter Bright wrote:
 On 4/8/2012 7:53 AM, Andrei Alexandrescu wrote:
 Once anyone asks for .ptr a conservative copy will be made.

 That could get expensive. You cannot just point into the small string
 part, because that may only exist temporarily on the stack. There are
 some pathological cases for this.

As I mentioned, the first call to .ptr changes representation, thus 
making the allocation that the optimization had saved. Things are not 
worse off than before.

Andrei

Apr 08 2012

Michel Fortin <michel.fortin michelf.com> writes:

On 2012-04-08 17:14:37 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 4/8/12 12:05 PM, Walter Bright wrote:
 On 4/8/2012 7:53 AM, Andrei Alexandrescu wrote:
 Once anyone asks for .ptr a conservative copy will be made.

 
 That could get expensive. You cannot just point into the small string
 part, because that may only exist temporarily on the stack. There are
 some pathological cases for this.

 
 As I mentioned, the first call to .ptr changes representation, thus 
 making the allocation that the optimization had saved. Things are not 
 worse off than before.

This only works if the code has write access to that variable.

Also, if the variable is a temporary copy such as a by-value function 
parameter, only the representation of that temporary copy will be 
affected. Every time you call a function which calls .ptr a new copy 
will be allocated on the heap. If such a function call is part of a 
loop, it'll degrade performance noticeably.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 1:16 PM, Michel Fortin wrote:
 On 2012-04-08 17:14:37 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:
 As I mentioned, the first call to .ptr changes representation, thus
 making the allocation that the optimization had saved. Things are not
 worse off than before.

 This only works if the code has write access to that variable.

 Also, if the variable is a temporary copy such as a by-value function
 parameter, only the representation of that temporary copy will be
 affected. Every time you call a function which calls .ptr a new copy
 will be allocated on the heap. If such a function call is part of a
 loop, it'll degrade performance noticeably.

I think there's a bit of a confusion. Could you please clarify with an 
example?

Thanks,

Andrei

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 1:26 PM, Andrei Alexandrescu wrote:
 On 4/8/12 1:16 PM, Michel Fortin wrote:
 On 2012-04-08 17:14:37 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:
 As I mentioned, the first call to .ptr changes representation, thus
 making the allocation that the optimization had saved. Things are not
 worse off than before.

 This only works if the code has write access to that variable.

 Also, if the variable is a temporary copy such as a by-value function
 parameter, only the representation of that temporary copy will be
 affected. Every time you call a function which calls .ptr a new copy
 will be allocated on the heap. If such a function call is part of a
 loop, it'll degrade performance noticeably.

 I think there's a bit of a confusion. Could you please clarify with an
 example?

I think I understand:

string s;
foreach (lots)
   fun(s);

void fun(string s)
{
     ... s.ptr ...
}

I agree in this case there would be one allocation per call.


Andrei

Apr 08 2012

Michel Fortin <michel.fortin michelf.com> writes:

On 2012-04-08 18:30:49 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 4/8/12 1:26 PM, Andrei Alexandrescu wrote:
 On 4/8/12 1:16 PM, Michel Fortin wrote:
 Also, if the variable is a temporary copy such as a by-value function
 parameter, only the representation of that temporary copy will be
 affected. Every time you call a function which calls .ptr a new copy
 will be allocated on the heap. If such a function call is part of a
 loop, it'll degrade performance noticeably.

 
 I think there's a bit of a confusion. Could you please clarify with an
 example?

 
 I think I understand:
 
 string s;
 foreach (lots)
    fun(s);
 
 void fun(string s)
 {
      ... s.ptr ...
 }

Exactly.

 I agree in this case there would be one allocation per call.

Basically, it's the same problem as receiving a const(char)[] as a 
function parameter and .idup-ing it in the function. The recommendation 
is for such functions to accept only immutable(char)[] and let the 
caller find a way to send an immutable string. That way, if the caller 
is repeatedly using the same string, it can reuse the efficient 
representation.

The same principles apply here: if your function needs to use .ptr, it 
should request a type on which you can call .ptr efficiently and let 
the caller find a way to pass think kind of string. To me, your small 
buffer optimization proposal looks like an argument in favor of 
creating a separate string type free from the constrains of offering at 
all time bare access to the bytes. If only that type could be the 
default...


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 08 2012

Manu <turkeyman gmail.com> writes:

On 8 April 2012 12:54, Manu <turkeyman gmail.com> wrote:

 On 8 April 2012 12:46, Vladimir Panteleev <vladimir thecybershadow.net>wrote:

 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:

 Walter and I discussed today about using the small string optimization
 in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first
 byte as discriminator, which means that all strings under 16 chars need no
 memory allocation at all.

 Don't use the first byte. Use the last byte.

 The last byte is the highest-order byte of the length. Limiting arrays to
 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer limitation
 than making assumptions about the memory layout.

 What is the plan for 32bit?

The only way I can see this working is if the marker happens to be the top
bit of the size (limiting arrays to 2gb on 32bit systems, which is probably
fine), and if set, the next 7 bits are the size, leaving 7 bytes for a
string... 7 bytes is pushing it, 15 bytes is very useful, 7 is borderline...

That all said, I'll ultimately end out with my own string type anyway which
is multiples of, and aligned to 16 bytes, which will support SSE string
opcodes. I wonder if these considerations can be factored into the built-in
string?

Is it realistic that anyone can actually use raw d-string's in an app that
performs a lot of string manipulation? I bet most people end up with a
custom string class anyway...
Who's written a string-heavy app without their own string helper class? I
ended up with a string class within about half an hour of trying to work
with D strings (initially just to support stack strings, then it grew).

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 9:26 AM, Manu wrote:
 Is it realistic that anyone can actually use raw d-string's in an app
 that performs a lot of string manipulation?

Yes.

 I bet most people end up
 with a custom string class anyway...

That does happen, but much more rarely than one might think.

 Who's written a string-heavy app without their own string helper class?
 I ended up with a string class within about half an hour of trying to
 work with D strings (initially just to support stack strings, then it grew).

A lot of people write string-heavy apps with the built-in strings. 
"Heavy" actually describes a continuum. No matter how you put it, 
improving the performance of built-in strings is beneficial.


Andrei

Apr 08 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 4/8/2012 7:26 AM, Manu wrote:
 Is it realistic that anyone can actually use raw d-string's in an app that 
 performs a lot of string manipulation? I bet most people end up with a custom 
 string class anyway...
 Who's written a string-heavy app without their own string helper class? I
ended 
 up with a string class within about half an hour of trying to work with D 
 strings (initially just to support stack strings, then it grew).

Warp (a fast C preprocessor)

https://github.com/facebookarchive/warp

Apr 19 2018

Manu <turkeyman gmail.com> writes:

On 19 April 2018 at 02:22, Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 4/8/2012 7:26 AM, Manu wrote:
 Is it realistic that anyone can actually use raw d-string's in an app that
 performs a lot of string manipulation? I bet most people end up with a
 custom string class anyway...
 Who's written a string-heavy app without their own string helper class? I
 ended up with a string class within about half an hour of trying to work
 with D strings (initially just to support stack strings, then it grew).


 Warp (a fast C preprocessor)

 https://github.com/facebookarchive/warp

Cool story... 6 years later! :P

Apr 19 2018

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 4:46 AM, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:
 Walter and I discussed today about using the small string optimization
 in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first
 byte as discriminator, which means that all strings under 16 chars
 need no memory allocation at all.

 Don't use the first byte. Use the last byte.

 The last byte is the highest-order byte of the length. Limiting arrays
 to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer
 limitation than making assumptions about the memory layout.

Hehe. Good idea. On big endian machines the last byte is the ticket!

Andrei

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 9:49 AM, Andrei Alexandrescu wrote:
 On 4/8/12 4:46 AM, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu wrote:
 Walter and I discussed today about using the small string optimization
 in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first
 byte as discriminator, which means that all strings under 16 chars
 need no memory allocation at all.

 Don't use the first byte. Use the last byte.

 The last byte is the highest-order byte of the length. Limiting arrays
 to 18.37 exabytes, as opposed to 18.45 exabytes, is a much nicer
 limitation than making assumptions about the memory layout.

 Hehe. Good idea. On big endian machines the last byte is the ticket!

s/big/little/

Apr 08 2012

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Sunday, 8 April 2012 at 09:46:28 UTC, Vladimir Panteleev wrote:
 On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu 
 wrote:
 Walter and I discussed today about using the small string 
 optimization in string and other arrays of immutable small 
 objects.

 On 64 bit machines, string occupies 16 bytes. We could use the 
 first byte as discriminator, which means that all strings 
 under 16 chars need no memory allocation at all.

 Don't use the first byte. Use the last byte.

 The last byte is the highest-order byte of the length. Limiting 
 arrays to 18.37 exabytes, as opposed to 18.45 exabytes, is a 
 much nicer limitation than making assumptions about the memory 
 layout.

If the length has multi purpose it would be even better to 
reserve more than just one bit. For all practical purpose 48 bits 
or 56 bits are more than enough to handle all possible lengths. 
This would liberate 8 or even 16 bits that can be used for other 
purposes.

Apr 15 2018

Jacob Carlborg <doob me.com> writes:

On 2012-04-08 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

Just don't make the same mistake as with AA.

-- 
/Jacob Carlborg

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 5:45 AM, Jacob Carlborg wrote:
 On 2012-04-08 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

 Just don't make the same mistake as with AA.

The mistake with AAs was done long ago, but it was forced as AAs 
predated templates.

Andrei

Apr 08 2012

Jacob Carlborg <doob me.com> writes:

On 2012-04-08 16:54, Andrei Alexandrescu wrote:
 On 4/8/12 5:45 AM, Jacob Carlborg wrote:
 On 2012-04-08 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

 Just don't make the same mistake as with AA.

 The mistake with AAs was done long ago, but it was forced as AAs
 predated templates.

 Andrei

I'm referring to the new template implementation of AAs that got 
reverted due everything breaking, if I recall correctly.

-- 
/Jacob Carlborg

Apr 08 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sun, Apr 08, 2012 at 05:35:50PM +0200, Jacob Carlborg wrote:
 On 2012-04-08 16:54, Andrei Alexandrescu wrote:
On 4/8/12 5:45 AM, Jacob Carlborg wrote:


[...]
Just don't make the same mistake as with AA.

The mistake with AAs was done long ago, but it was forced as AAs
predated templates.

Andrei

 
 I'm referring to the new template implementation of AAs that got
 reverted due everything breaking, if I recall correctly.

[...]

Huh? When was this?


T

-- 
"I suspect the best way to deal with procrastination is to put off the
procrastination itself until later. I've been meaning to try this, but
haven't gotten around to it yet. " -- swr

Apr 08 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sun, Apr 08, 2012 at 12:56:38AM -0500, Andrei Alexandrescu wrote:
[...]
 1. What happened to the new hash project? We need to take that to
 completion.

[...]

Sorry, I've been busy at work and haven't had too much free time to work
on it.  The current code is available on github:

	https://github.com/quickfur/New-AA-implementation

The major outstanding issues are:

- Qualified keys not fully working: the current code has a few corner
  cases that don't work with shared/immutable/inout keys. One major
  roadblock is how to implement this:

	alias someType T;
	inout(T) myFunc(inout(T) arg, ...) {
		int[inout(T)] aa;
		...
	}

  The problem is that inout gets carried over into the AA template,
  which breaks because it instantiates into something that has:

	struct Slot {
		hash_t hash;
		inout(T) key;	// <-- this causes a compile error
		Value value;
	}

  Ideally, AA keys should all be stored as immutable inside the AA, and
  automatically converted to/from the qualified type the user specified.

- Template bloat: the current code uses template member functions, and
  will instantiate a new function for every implicit conversion of input
  key types. This also depends on IFTI, which has some quirks (compiler
  bugs) that make the code ugly (e.g., strings and arrays not treated
  equally by the compiler, requiring hacks to make implicit conversion
  work). Timon has suggested an alternative way of handling implicit
  conversions, which I think is better, but I need to take some time to
  actually implement it.

- Static initialization of AA's (AA literals that compile directly into
  object code). This should be possible in principle, but I've run into
  what may be a CTFE bug that prevents it from working.

- A not-so-major issue is to finish the toHash() implementations for all
  native types (currently it works for some common key types, but
  coverage is still incomplete). Once this is done, we can finally get
  rid of getHash from TypeInfo; UFCS will let us simply write x.toHash()
  for pretty much any type x.

Once these issues are resolved, there remains the major task of actually
integrating this code with druntime/dmd. A lot of work is expected on
the dmd end, because of the current amount of hacks in dmd to make AA's
work.


T

-- 
Valentine's Day: an occasion for florists to reach into the wallets of
nominal lovers in dire need of being reminded to profess their
hypothetical love for their long-forgotten.

Apr 08 2012

"Tove" <tove fransson.se> writes:

On Sunday, 8 April 2012 at 13:53:07 UTC, H. S. Teoh wrote:
 On Sun, Apr 08, 2012 at 12:56:38AM -0500, Andrei Alexandrescu 
 wrote:
 [...]
 1. What happened to the new hash project? We need to take that 
 to
 completion.

 [...]

 Sorry, I've been busy at work and haven't had too much free 
 time to work
 on it.  The current code is available on github:

 	https://github.com/quickfur/New-AA-implementation

 The major outstanding issues are:

 - Qualified keys not fully working: the current code has a few 
 corner
   cases that don't work with shared/immutable/inout keys. One 
 major
   roadblock is how to implement this:

 	alias someType T;
 	inout(T) myFunc(inout(T) arg, ...) {
 		int[inout(T)] aa;
 		...
 	}

   The problem is that inout gets carried over into the AA 
 template,
   which breaks because it instantiates into something that has:

 	struct Slot {
 		hash_t hash;
 		inout(T) key;	// <-- this causes a compile error
 		Value value;
 	}

   Ideally, AA keys should all be stored as immutable inside the 
 AA, and
   automatically converted to/from the qualified type the user 
 specified.

 - Template bloat: the current code uses template member 
 functions, and
   will instantiate a new function for every implicit conversion 
 of input
   key types. This also depends on IFTI, which has some quirks 
 (compiler
   bugs) that make the code ugly (e.g., strings and arrays not 
 treated
   equally by the compiler, requiring hacks to make implicit 
 conversion
   work). Timon has suggested an alternative way of handling 
 implicit
   conversions, which I think is better, but I need to take some 
 time to
   actually implement it.

 - Static initialization of AA's (AA literals that compile 
 directly into
   object code). This should be possible in principle, but I've 
 run into
   what may be a CTFE bug that prevents it from working.

 - A not-so-major issue is to finish the toHash() 
 implementations for all
   native types (currently it works for some common key types, 
 but
   coverage is still incomplete). Once this is done, we can 
 finally get
   rid of getHash from TypeInfo; UFCS will let us simply write 
 x.toHash()
   for pretty much any type x.

 Once these issues are resolved, there remains the major task of 
 actually
 integrating this code with druntime/dmd. A lot of work is 
 expected on
 the dmd end, because of the current amount of hacks in dmd to 
 make AA's
 work.


 T

doesn't this work?
immutable std.traits.Unqual!(inout(T)) key;

Apr 08 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sun, Apr 08, 2012 at 04:11:10PM +0200, Tove wrote:
 On Sunday, 8 April 2012 at 13:53:07 UTC, H. S. Teoh wrote:
On Sun, Apr 08, 2012 at 12:56:38AM -0500, Andrei Alexandrescu
wrote:
[...]
1. What happened to the new hash project? We need to take that to
completion.

[...]


[...]
The major outstanding issues are:

- Qualified keys not fully working: the current code has a few corner
  cases that don't work with shared/immutable/inout keys. One major
  roadblock is how to implement this:

	alias someType T;
	inout(T) myFunc(inout(T) arg, ...) {
		int[inout(T)] aa;
		...
	}

  The problem is that inout gets carried over into the AA template,
  which breaks because it instantiates into something that has:

	struct Slot {
		hash_t hash;
		inout(T) key;	// <-- this causes a compile error
		Value value;
	}


[...]
 doesn't this work?
 immutable std.traits.Unqual!(inout(T)) key;

I suppose so, but the problem is more with the return type of various AA
functions. If the user writes int[inout(T)] aa, then he would expect
that aa.keys should return inout(T)[]. But the problem here is that this
is impossible to express in the current system, because inout has a
different meaning when you write it in the AA method, than its meaning
in the context of the calling function. For example, this doesn't quite
work:

	struct AA {
		...
		inout(Key) keys() {
			...
			// Problem: inout here doesn't mean what it
			// means in the context of the caller
		}
	}


T

-- 
Tech-savvy: euphemism for nerdy.

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 8:54 AM, H. S. Teoh wrote:
 Sorry, I've been busy at work and haven't had too much free time to work
 on it.  The current code is available on github:

 	https://github.com/quickfur/New-AA-implementation

Thanks for the update! Let me reiterate this is important work for many 
reasons, so it would be great if you got around to pushing it through. 
Do you think someone else in the community could help you?

Andrei

Apr 08 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sun, Apr 08, 2012 at 10:00:37AM -0500, Andrei Alexandrescu wrote:
 On 4/8/12 8:54 AM, H. S. Teoh wrote:
Sorry, I've been busy at work and haven't had too much free time to work
on it.  The current code is available on github:

	https://github.com/quickfur/New-AA-implementation

 
 Thanks for the update! Let me reiterate this is important work for
 many reasons, so it would be great if you got around to pushing it
 through.

I'll try my best to work on it when I have free time.


 Do you think someone else in the community could help you?

[...]

Well, I put the code up on github for a reason. :-) I'll only be too
glad to have someone else contribute. The code itself isn't all that
complicated; it's basically just a facsimile of aaA.d with many bugs
fixed due to the fact that we now have direct access to key/value types.


T

-- 
"Life is all a great joke, but only the brave ever get the point." -- Kenneth
Rexroth

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 8:54 AM, H. S. Teoh wrote:
 - Qualified keys not fully working: the current code has a few corner
    cases that don't work with shared/immutable/inout keys. One major
    roadblock is how to implement this:

 	alias someType T;
 	inout(T) myFunc(inout(T) arg, ...) {
 		int[inout(T)] aa;
 		...
 	}

I wonder how frequently such code is used.

Andrei

Apr 08 2012

Michel Fortin <michel.fortin michelf.com> writes:

On 2012-04-08 05:56:38 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Walter and I discussed today about using the small string optimization 
 in string and other arrays of immutable small objects.
 
 On 64 bit machines, string occupies 16 bytes. We could use the first 
 byte as discriminator, which means that all strings under 16 chars need 
 no memory allocation at all.
 
 It turns out statistically a lot of strings are small. According to a 
 variety of systems we use at Facebook, the small buffer optimization is 
 king - it just works great in all cases. In D that means better speed, 
 better locality, and less garbage.

Small buffer optimization is a very good thing to have indeed. But… how 
can you preserve existing semantics? For instance, let's say you have 
this:

	string s = "abcd";

which is easily representable as a small string. Do you use the small 
buffer optimization in the assignment? That seems like a definitive yes.

But as soon as you take a pointer to that string, you break the 
immutability guaranty:

	immutable(char)[] s = "abcd";
	immutable(char)* p = s.ptr;
	s = "defg"; // assigns to where?

There's also the issue of this code being legal currently:

	immutable(char)* getPtr(string s) {
		return s.ptr;
	}

If you pass a small string to getPtr, it'll be copied to the local 
stack frame and you'll be returning a pointer to that local copy.

You could mitigate this by throwing an error when trying to get the 
pointer to a small string, but then you have to disallow taking the 
pointer of a const(char)[] pointing to it:

	const(char)* getPtr2(const(char)[] s) {
		return s.ptr;
	}

	const(char)* getAbcdPtr() {
		string s = "abcd";
	
		// s implicitly converted to regular const(char)[] pointing to local 
stack frame
		const(char)* c = getPtr2(s);

		// c points to the storage of s, which is the local stack frame
		return c;
	}

So it's sad, but I am of the opinion that the only way to implement 
small buffer optimization is to have a higher-level abstraction, a 
distinct type for such small strings.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 9:59 AM, Michel Fortin wrote:
 But as soon as you take a pointer to that string, you break the
 immutability guaranty:

 immutable(char)[] s = "abcd";
 immutable(char)* p = s.ptr;
 s = "defg"; // assigns to where?

Taking .ptr will engender a copy. A small regression will be that 
address of individual chars cannot be taken.

Andrei

Apr 08 2012

Michel Fortin <michel.fortin michelf.com> writes:

On 2012-04-08 15:06:13 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 4/8/12 9:59 AM, Michel Fortin wrote:
 But as soon as you take a pointer to that string, you break the
 immutability guaranty:
 
 immutable(char)[] s = "abcd";
 immutable(char)* p = s.ptr;
 s = "defg"; // assigns to where?

 
 Taking .ptr will engender a copy. A small regression will be that 
 address of individual chars cannot be taken.

You know, many people have been wary of hidden memory allocations in 
the past. That's not going to make them happy. I'm not complaining, but 
I think .ptr should return null in those cases. Let people use 
toStringz when they need a C string, and let people deal with the ugly 
details themselves if they're using .ptr to bypass array bound 
checking. Because if someone used .ptr somewhere to bypass bound 
checking and instead he gets a memory allocation at each loop 
iteration… it won't be pretty.

And what about implicit conversions to const(char)[]? That too will 
require a copy, because otherwise it could point to the local stack 
frame where your immutable(char)[] resides. That said, maybe copies of 
small-string optimized immutable(char)[] could be small-string 
optimized const(char)[]. That'd not have any side effect since no one 
can have a mutable pointer/slice to the const copy anyway.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Apr 08 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/8/12 10:48 AM, Michel Fortin wrote:
 On 2012-04-08 15:06:13 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 4/8/12 9:59 AM, Michel Fortin wrote:
 But as soon as you take a pointer to that string, you break the
 immutability guaranty:

 immutable(char)[] s = "abcd";
 immutable(char)* p = s.ptr;
 s = "defg"; // assigns to where?

 Taking .ptr will engender a copy. A small regression will be that
 address of individual chars cannot be taken.

 You know, many people have been wary of hidden memory allocations in the
 past.

Well, the optimization makes for fewer allocations total. In fact .ptr 
does the allocation that was formerly mandatory.

 That's not going to make them happy. I'm not complaining, but I
 think .ptr should return null in those cases.

That would be too large a regression I think. And it's not detectable 
during compilation.

 Let people use toStringz
 when they need a C string, and let people deal with the ugly details
 themselves if they're using .ptr to bypass array bound checking. Because
 if someone used .ptr somewhere to bypass bound checking and instead he
 gets a memory allocation at each loop iteration… it won't be pretty.

Only one allocation. First invocation of .ptr effectively changes 
representation.

 And what about implicit conversions to const(char)[]? That too will
 require a copy, because otherwise it could point to the local stack
 frame where your immutable(char)[] resides. That said, maybe copies of
 small-string optimized immutable(char)[] could be small-string optimized
 const(char)[]. That'd not have any side effect since no one can have a
 mutable pointer/slice to the const copy anyway.

I think casting to const(char)[] should work without allocation.


Andrei

Apr 08 2012

Somedude <lovelydear mailmetrash.com> writes:

Le 08/04/2012 17:48, Michel Fortin a écrit :
 On 2012-04-08 15:06:13 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:
 
 On 4/8/12 9:59 AM, Michel Fortin wrote:
 But as soon as you take a pointer to that string, you break the
 immutability guaranty:

 immutable(char)[] s = "abcd";
 immutable(char)* p = s.ptr;
 s = "defg"; // assigns to where?

 Taking .ptr will engender a copy. A small regression will be that
 address of individual chars cannot be taken.

 
 You know, many people have been wary of hidden memory allocations in the
 past. That's not going to make them happy. I'm not complaining, but I
 think .ptr should return null in those cases. Let people use toStringz
 when they need a C string, and let people deal with the ugly details
 themselves if they're using .ptr to bypass array bound checking. Because
 if someone used .ptr somewhere to bypass bound checking and instead he
 gets a memory allocation at each loop iteration… it won't be pretty.
 
 And what about implicit conversions to const(char)[]? That too will
 require a copy, because otherwise it could point to the local stack
 frame where your immutable(char)[] resides. That said, maybe copies of
 small-string optimized immutable(char)[] could be small-string optimized
 const(char)[]. That'd not have any side effect since no one can have a
 mutable pointer/slice to the const copy anyway.
 

You've raised some very good points. I also think this won't do.
It is quite clear with your examples (especially the temporary copy as a
by value function parameter) that the small strings have some very
specific semantics that are completely different from normal string
semantics. Therefore I also have the feeling that they should be another
type, sstring or something.

sstrings could be promoted to full string when necessary, but the whole
thing would probably look very very ugly pretty soon. This to me smells
like the idea of small string optimization should be put to rest for a
while, requiring a little more thought.

Apr 08 2012

Don <nospam nospam.com> writes:

On 08.04.2012 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

vote -= real.infinity.

That would kill D.

Apr 09 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 04/09/2012 10:24 AM, Don wrote:
 On 08.04.2012 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

 vote -= real.infinity.

 That would kill D.

Why does this even compile?

void main(){
     long vote;
     vote -= real.infinity;
     assert(vote == long.min);
}

Apr 09 2012

Manu <turkeyman gmail.com> writes:

On 9 April 2012 11:24, Don <nospam nospam.com> wrote:

 On 08.04.2012 07:56, Andrei Alexandrescu wrote:

  For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

 vote -= real.infinity.

 That would kill D.


How do you figure?

After thinking on it a bit, I'm becoming a little worried about this move
for 2 rarely considered reasons:
Using lowering to a template, debug(/unoptimised) performance will probably
get a lot slower, which is really annoying. And debugging/stepping might
become considerably more annoying too, if every time I press F11 (step in)
over a function call that happens to receive an arg from an array, the
debugger then steps into the array templates index operator... We'd be no
better off than with STL, unless the language has clever ways of hiding
this magic from the debugger too, and optimising/inlining the index even in
debug builds...? But this is the built-in array, and not a library we can
optionally not use.

Apr 09 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/9/12 4:21 AM, Manu wrote:
 After thinking on it a bit, I'm becoming a little worried about this
 move for 2 rarely considered reasons:
 Using lowering to a template, debug(/unoptimised) performance will
 probably get a lot slower, which is really annoying. And
 debugging/stepping might become considerably more annoying too, if every
 time I press F11 (step in) over a function call that happens to receive
 an arg from an array, the debugger then steps into the array templates
 index operator... We'd be no better off than with STL, unless the
 language has clever ways of hiding this magic from the debugger too, and
 optimising/inlining the index even in debug builds...? But this is the
 built-in array, and not a library we can optionally not use.

I agree. So we have the counterarguments:

1. Lowering would treat array primitives as sheer D code, subject to 
refusal of inlining. That means worse performance.

2. Unless the compiler takes special measures, source-level debuggers 
will trace through core, uninteresting code for array operations.

3. There are patterns that attempt to optimize by e.g. using .ptr, but 
end up pessimizing code because they trigger multiple memory allocations.


Andrei

Apr 09 2012

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Monday, 9 April 2012 at 14:55:16 UTC, Andrei Alexandrescu 
wrote:
 3. There are patterns that attempt to optimize by e.g. using 
 .ptr, but end up pessimizing code because they trigger multiple 
 memory allocations.


 Andrei

It's important to note that this pattern is probably most common 
in glue code to C libraries, not bounds-checking related 
optimizations. There are countless of C library functions which 
receive the equivalent of an array by taking a pointer and a 
length, and implicit allocation on `foo.ptr` is completely 
unacceptable in these cases.

It's also common to avoid the `toStringz` function for strings 
you know are zero-terminated, using `.ptr` directly instead, as 
the toStringz function unconditionally appends a zero these days 
(and for good reasons, its previous optimization was extremely 
optimistic about its input).

Apr 09 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 4/9/12, Jakob Ovrum <jakobovrum gmail.com> wrote:
 It's also common to avoid the `toStringz` function for strings
 you know are zero-terminated, using `.ptr` directly instead.

Yup. E.g. WinAPI text drawing functions take a wchar* and a length. I
don't have to call toUTF16z but just pass a pointer, or even a pointer
to a specific element via &arr[index] (after calling std.utf.stride,
of course).

 the toStringz function unconditionally appends a zero these days

The one taking (const(char)[] s) does this, but not the other overload
taking (string s). Whether or not that's safe I don't really know.
I've had an argument over this on github, but I don't know if it was
about toStringz or maybe toUTF16z. I haven't got the link to the
discussion.

Apr 09 2012

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Monday, 9 April 2012 at 15:37:37 UTC, Andrej Mitrovic wrote:
 On 4/9/12, Jakob Ovrum <jakobovrum gmail.com> wrote:
 The one taking (const(char)[] s) does this, but not the other 
 overload
 taking (string s). Whether or not that's safe I don't really 
 know.
 I've had an argument over this on github, but I don't know if 
 it was
 about toStringz or maybe toUTF16z. I haven't got the link to the
 discussion.

You're right, I just confirmed the optimization is still in place 
for the `string` version. The documentation is identical for both 
functions. I think this is a mistake.

It assumes that the string is either a compiler-generated literal 
or a GC allocated string, while the documentation does not 
mention such assumptions. With all the focus on manual memory 
management and pluggable allocators going on, I think the 
optimization must be removed or the documentation for the 
`string` overload changed.

This optimization can always be put back in without narrowing the 
scope of the `string` overload once the above conditions can be 
reliably checked.

Another option is to add a known-bug section to the `string` 
overload informing users that the function may fail on 
custom-allocated strings.

Apr 09 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 4/9/12, Jakob Ovrum <jakobovrum gmail.com> wrote:
 With all the focus on manual memory
 management and pluggable allocators going on, I think the
 optimization must be removed or the documentation for the
 `string` overload changed.

Or add a compile-time argument:
toStringz(bool ForceAllocate = true)(string s)

Or split the unsafe version into another function.

Apr 09 2012

Manu <turkeyman gmail.com> writes:

On 9 April 2012 17:55, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>wrote:

 On 4/9/12 4:21 AM, Manu wrote:

 After thinking on it a bit, I'm becoming a little worried about this
 move for 2 rarely considered reasons:
 Using lowering to a template, debug(/unoptimised) performance will
 probably get a lot slower, which is really annoying. And
 debugging/stepping might become considerably more annoying too, if every
 time I press F11 (step in) over a function call that happens to receive
 an arg from an array, the debugger then steps into the array templates
 index operator... We'd be no better off than with STL, unless the
 language has clever ways of hiding this magic from the debugger too, and
 optimising/inlining the index even in debug builds...? But this is the
 built-in array, and not a library we can optionally not use.

 I agree. So we have the counterarguments:

 1. Lowering would treat array primitives as sheer D code, subject to
 refusal of inlining. That means worse performance.

 2. Unless the compiler takes special measures, source-level debuggers will
 trace through core, uninteresting code for array operations.

 3. There are patterns that attempt to optimize by e.g. using .ptr, but end
 up pessimizing code because they trigger multiple memory allocations.


Indeed. I don't think the small array optimisation would benefit us in any
way that could even come close to balancing the loss. I don't think it
would benefit us much at all regardless, since we've already proven the
need for a string class anyway, and we can make such small-string
optimisations already.
I am very wary of removing a fundamental language primitive, that is, the
basic ability to express an array with absolutely no frills.

This idea only really benefits utf-8 strings, so why not just make a real
d-string class in phobos and be done with it? People complain about UTF
issues with raw strings, and this is what you intend to do anyway to
implement the optimisation.
d-string could perhaps be made to appear as a first-class language feature
via the 'string' keyword, and lower to the library in the way you describe.
Just don't blanket this across all arrays in general.

Apr 09 2012

"Nick Sabalausky" <a a.a> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:jlut8k$j4o$1 digitalmars.com...
 I agree. So we have the counterarguments:

 1. Lowering would treat array primitives as sheer D code, subject to 
 refusal of inlining. That means worse performance.

So in other words, we need the  forceinline that some people have strongly 
requested? Make it work even without -inline, and then add -noinline for any 
rare cases where someone might need to forcefully disable  forceinline. 
Shouldn't that take care of it?

 2. Unless the compiler takes special measures, source-level debuggers will 
 trace through core, uninteresting code for array operations.

Would  forceinline fix this?

 3. There are patterns that attempt to optimize by e.g. using .ptr, but end 
 up pessimizing code because they trigger multiple memory allocations.

Someone's suggestion of just making .ptr null instead of doing implicit 
allocations was an interesting idea.

Apr 09 2012

Artur Skawina <art.08.09 gmail.com> writes:

On 04/10/12 07:01, Nick Sabalausky wrote:
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
 news:jlut8k$j4o$1 digitalmars.com...
 I agree. So we have the counterarguments:

 1. Lowering would treat array primitives as sheer D code, subject to 
 refusal of inlining. That means worse performance.

 
 So in other words, we need the  forceinline that some people have strongly 
 requested? Make it work even without -inline, and then add -noinline for any 
 rare cases where someone might need to forcefully disable  forceinline. 
 Shouldn't that take care of it?

Obviously, yes, but should wait until enough attribute support is in place and
not be just a  inline hack (no point in naming it forceinline - there's no
other kind of inline).

 2. Unless the compiler takes special measures, source-level debuggers will 
 trace through core, uninteresting code for array operations.

 
 Would  forceinline fix this?

No, but the compiler could just omit the debuginfo for the lowerings, unless
requested with a flag.

 3. There are patterns that attempt to optimize by e.g. using .ptr, but end 
 up pessimizing code because they trigger multiple memory allocations.

 
 Someone's suggestion of just making .ptr null instead of doing implicit 
 allocations was an interesting idea.

I hope that wasn't a serious suggestion. What might be possible is passing
strings by reference and atomically changing the representation when (if) a
pointer is needed. That would have problems too (ABI change, can't have RO
strings, can't keep the small strings in registers etc).
Doing the small buffer optimization for a new type may be possible, but trying
to add it to the current char[] arrays is probably not a good idea.

artur

Apr 10 2012

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 10 Apr 2012 10:50:24 +0200
schrieb Artur Skawina <art.08.09 gmail.com>:

 Obviously, yes, but should wait until enough attribute support is in place and
 not be just a  inline hack.

If you refer to the proposed user attributes, they wont change the operation of
the compiler. Only your own program code will know how to use them.  inline,
 safe,  property, final, nothrow, ... on the other hand are keywords that
directly map to flags and hard wired logic in the compiler. Correct me if I'm
wrong.

-- 
Marco

Apr 10 2012

Artur Skawina <art.08.09 gmail.com> writes:

On 04/10/12 19:25, Marco Leise wrote:
 Am Tue, 10 Apr 2012 10:50:24 +0200
 schrieb Artur Skawina <art.08.09 gmail.com>:
 
 Obviously, yes, but should wait until enough attribute support is in place and
 not be just a  inline hack.

 
 If you refer to the proposed user attributes, they wont change the operation
of the compiler. Only your own program code will know how to use them.  inline,
 safe,  property, final, nothrow, ... on the other hand are keywords that
directly map to flags and hard wired logic in the compiler. Correct me if I'm
wrong.

I'm saying that introducing new function attributes like  inline to the
language,
when there's a real possibility of "generic" attributes being invented in the
near
future, may not be a good idea. Any generic scheme should also work for  inline
and
the many other attrs that i've mentioned before - there's no reason to
artificially
limit the support to *just* user attributes.

artur

Apr 10 2012

Marco Leise <Marco.Leise gmx.de> writes:

Am Tue, 10 Apr 2012 20:52:56 +0200
schrieb Artur Skawina <art.08.09 gmail.com>:

 I'm saying that introducing new function attributes like  inline to the
language,
 when there's a real possibility of "generic" attributes being invented in the
near
 future, may not be a good idea. Any generic scheme should also work for
 inline and
 the many other attrs that i've mentioned before - there's no reason to
artificially
 limit the support to *just* user attributes.
 
 artur

I had to read up on your older posts again. So you are not expecting compiler
hooks that allow to change the language semantics and code gen through user
attributes, but a common syntax especially for bundling multiple compiler/user
attributes like " attr(safe, nothrow, userattr(abc), inline, ...)
my_attr_alias" in the event that there will be a lot of platform specific and
other pragmas/attributes/keywords like in GCC in the future? Then I tend to
agree.

-- 
Marco

Apr 10 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/9/12 3:24 AM, Don wrote:
 On 08.04.2012 07:56, Andrei Alexandrescu wrote:

 For this to happen, we need to start an effort of migrating built-in
 arrays into runtime, essentially making them templates that the compiler
 lowers to. So I have two questions:

 vote -= real.infinity.

 That would kill D.

Why?

Andrei

Apr 09 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 08 Apr 2012 01:56:38 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Walter and I discussed today about using the small string optimization  
 in string and other arrays of immutable small objects.

 On 64 bit machines, string occupies 16 bytes. We could use the first  
 byte as discriminator, which means that all strings under 16 chars need  
 no memory allocation at all.

 It turns out statistically a lot of strings are small. According to a  
 variety of systems we use at Facebook, the small buffer optimization is  
 king - it just works great in all cases. In D that means better speed,  
 better locality, and less garbage.

 For this to happen, we need to start an effort of migrating built-in  
 arrays into runtime, essentially making them templates that the compiler  
 lowers to. So I have two questions:

 1. What happened to the new hash project? We need to take that to  
 completion.

 2. Is anyone willing to start the effort of migrating built-in slices  
 into templates?

No, this would suck.

A better solution - make an *actual* string type that does this, and fixes  
all the shitty problems that we have from shoehorning arrays into UTF  
strings.  Then alias that type to string.

I'm so sick of phobos trying to pretend char[] is not an array, and this  
would just be another mark against D.

-Steve

Apr 09 2012

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu 
wrote:
 Andrei

Have anybody put together code that implements this idea in a 
library?

That is, a small strings up to length 15 bytes unioned with a 
`string`.

Apr 15 2018

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Sunday, 8 April 2012 at 05:56:36 UTC, Andrei Alexandrescu 
wrote:
 Walter and I discussed today about using the small string 
 optimization in string and other arrays of immutable small 
 objects.

I put together SSOString at

https://github.com/nordlow/phobos-next/blob/967eb1088fbfab8be5ccd811b66e7b5171b46acf/src/sso_string.d

that uses small-string-optimization on top of a normal D string 
(slice).

I'm satisfied with everything excepts that -dip1000 doesn't 
vorbids `f` from compiling.

I also don't understand why `x[0]` cannot be returned by ref in 
the function `g`.

Comments are welcome.

Contents of sso_string.d follows:

module sso_string;

/** Small-size-optimized string.
  *
  * Store on the stack if constructed with <= `smallCapacity` 
number of
  * characters, otherwise on the GC heap.
  */
struct SSOString
{
     private alias E = immutable(char); // immutable element type
     private alias ME = char;           // mutable element type

     pure nothrow:

     /** Construct from `elements`, with potential GC-allocation 
(iff
      * `elements.length > smallCapacity`).
      */
     this()(scope ME[] elements)  trusted // template-lazy
     {
         if (elements.length <= smallCapacity)
         {
             small.data[0 .. elements.length] = elements;
             small.length = 
cast(typeof(small.length))(2*elements.length);
         }
         else
         {
             large = elements.idup; // GC-allocate
             raw.length *= 2;  // shift up
             raw.length |= 1;  // tag as large
         }
     }

      nogc:

     // TODO add  nogc overload to construct from mutable static 
array <= smallCapacity

     /** Construct from `elements` without any kind of heap 
allocation.
      */
     this()(immutable(E)[] elements)  trusted // template-lazy
     {
         if (elements.length <= smallCapacity)
         {
             small.data[0 .. elements.length] = elements;
             small.length = 
cast(typeof(small.length))(2*elements.length);
         }
         else
         {
             large = elements;   //  nogc
             raw.length *= 2;    // shift up
             raw.length |= 1;    // tag as large
         }
     }

      property size_t length() const  trusted
     {
         if (isLarge)
         {
             return large.length/2; // skip first bit
         }
         else
         {
             return small.length/2; // skip fist bit
         }
     }

     scope ref inout(E) opIndex(size_t index) inout return  trusted
     {
         return opSlice()[index]; // automatic range checking
     }

     scope inout(E)[] opSlice() inout return  trusted
     {
         if (isLarge)
         {
             union RawLarge
             {
                 Raw raw;
                 Large large;
             }
             RawLarge copy = void;
             copy.large = cast(Large)large;
             copy.raw.length /= 2; // adjust length
             return copy.large;
         }
         else
         {
             return small.data[0 .. small.length/2]; // scoped
         }
     }

     private  property bool isLarge() const  trusted
     {
         return large.length & 1; // first bit discriminates small 
from large
     }

private:
     struct Raw                  // same memory layout as `E[]`
     {
         size_t length;          // can be bit-fiddled without GC 
allocation
         E* ptr;
     }

     alias Large = E[];

     enum smallCapacity = Large.sizeof - Small.length.sizeof;
     static assert(smallCapacity > 0, "No room for small elements 
for E being " ~ E.stringof);
     version(LittleEndian) // see: 
http://forum.dlang.org/posting/zifyahfohbwavwkwbgmw
     {
         struct Small
         {
             ubyte length;
             E[smallCapacity] data;
         }
     }
     else
     {
         static assert(0, "BigEndian support and test");
     }

     union
     {
         Raw raw;
         Large large;
         Small small;
     }
}

///
 safe pure nothrow  nogc unittest
{
     import container_traits : mustAddGCRange;
     alias S = SSOString;

     static assert(S.sizeof == 2*size_t.sizeof); // two words
     static assert(S.smallCapacity == 15);
     static assert(mustAddGCRange!S); // `Large large.ptr` must be 
scanned

     auto s0 = S.init;
     assert(s0.length == 0);
     assert(!s0.isLarge);
     assert(s0[] == []);

     const s7 = S("0123456");
     static assert(is(typeof(s7[]) == string));
     assert(!s7.isLarge);
     assert(s7.length == 7);
     assert(s7[] == "0123456");
     // TODO assert(s7[0 .. 4] == "0123");

     const s15 = S("012345678901234");
     static assert(is(typeof(s15[]) == string));
     assert(!s15.isLarge);
     assert(s15.length == 15);
     assert(s15[] == "012345678901234");

     const s16 = S("0123456789abcdef");
     static assert(is(typeof(s16[]) == string));
     assert(s16.isLarge);
     assert(s16.length == 16);
     assert(s16[] == "0123456789abcdef");
     assert(s16[0] == '0');
     assert(s16[10] == 'a');
     assert(s16[15] == 'f');

     // TODO static assert(!__traits(compiles, { auto _ = 
S((char[]).init); }));

     string f()  safe pure nothrow  nogc
     {
         S x;
         return x[];             // TODO should fail with -dip1000
     }

     // TODO activate
     // ref char g()  safe pure nothrow  nogc
     // {
     //     S x;
     //     return x[0];             // TODO should fail with 
-dip1000
     // }
}

Apr 17 2018

D Programming

C/C++ Programming

Other

digitalmars.D - Small Buffer Optimization for string and friends