www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Short strings optimization

I've found this simple blog post:
http://journal.thobe.org/2011/02/better-support-for-short-strings-in.html

It suggests the creation of a ShortArray struct, that like arrays is
size_t.sizeof*2, that's usable for short arrays and short strings.

In 64 bit mode that's 16 bytes long, just like a dynamic array. The higher bit
of the struct may denote a "short array". If it's 0, then the the second nibble
of the first byte represents the length (even if it's an array of bytes its
length can't be more than 15), otherwise the struct contains a length/ptr pair,
like a normal dynamic array. This is able to remove many heap allocations
(without any compression or tricks), because in programs a good percentage of
strings are short.

Every time an item of the array is accessed, the struct opIndex/opIndexAssign
has to test the first bit, this lowers the performance, but there are no
pointers to follow in other parts of the memory, so the overall performance is
probably not much different, and if the array is in RAM instead of caches, it
may be even faster.

The ptr is not always a pointer, so the GC needs to be aware of this and do not
follow the pointer if the first bit of the struct is 0. To manage this cleanly,
it's useful a standard method like onGCScan() I have suggested time ago, to
specify the runtime semantics for the garbage collector.

Bye,
bearophile
Feb 18 2011