digitalmars.D - Array append performance 2

bearophile (46/46) Aug 30 2008 Until some more basic solution is found (and/or answers from Walter rega...

bearophile (4/4) Aug 30 2008 IsReferenceType!() isn't fully correct because it sees fixed size arrays...

dsimcha (15/18) Aug 30 2008 reference types. I have posted a more correct (and cleaned, and single-t...

Sergey Gromov (21/29) Aug 31 2008 I may be missing something, but the following seems to work:

Sergey Gromov (28/45) Aug 31 2008 Fixed arrays: dynamic and associative arrays always contain references,

dsimcha (24/24) Aug 30 2008 Nice work on the reference template. I had thought that the typeid(T).f...

bearophile (11/17) Aug 30 2008 Right, I have just read the D docs that say it.
Sean Kelly (3/10) Aug 30 2008 Not with Tango.

Fawzi Mohamed (9/23) Aug 31 2008 I would have done it quite differently: keep a linked list (if you want

bearophile (10/17) Aug 31 2008 Yes, my code was experimental, my main purpose was to ask if it's correc...

Fawzi Mohamed (7/40) Aug 31 2008 I had not thought about an important thing: the content of the array

bearophile (4/6) Aug 31 2008 Yes, and there are other problems, that make the situation/code a mess, ...

Denis Koroskin (51/59) Sep 01 2008 Please, test my version, too!

bearophile (49/52) Sep 01 2008 Sorry, I'm not using D 2.x. I (hopefully) will soon show an improved ver...

bearophile (5/5) Sep 01 2008 This is the first beta version, it seems to work well enough (despite to...

bearophile (5/5) Sep 01 2008 Well, a finished version of ArrayBuilder is in my libs, in the 'builders...

Benji Smith (9/16) Sep 01 2008 Very cool.

bearophile (5/9) Sep 01 2008 My libs are meant to improve std libs, if/where/when the compiler can't ...

Dave (8/21) Sep 02 2008 There have been quite a few examples of contributors modifying the D run...

bearophile (6/11) Sep 03 2008 It seems to work, it speeds up the code where I have used it (I'm using ...

BCS (8/35) Sep 01 2008 I think that in many cases the information needed for the compiler to pr...

Denis Koroskin (2/33) Aug 31 2008 Static doesn't apply to a struct, only to inner classes, IIRC.

Christian Kamm (4/14) Aug 31 2008 Actually, I don't think the D spec requires typeof(T.init) == T. And T i...

bearophile (23/25) Aug 31 2008 Right, but the following program:

downs (3/8) Sep 01 2008 This should do what you want:

Nick B (34/34) Sep 01 2008 Hi

Lars Ivar Igesund (13/56) Sep 01 2008 That is well and fine, but you shouldn't hijack a thread for it, rather
Brian Price (24/58) Sep 01 2008 Hate to say it but this is yet another case of reinventing the wheel. T...

Leandro Lucarella (19/23) Sep 01 2008 Different problems has differents solutions. What you say is really nice

Brian Price (19/35) Sep 01 2008 In a (near) realtime embedded system with a high wire cost (cellphone, s...

Mitja Slenc (23/81) Sep 02 2008 Can you explain where you see violation of DRY exactly? Also, can you

bearophile <bearophileHUGS lycos.com> writes:

Until some more basic solution is found (and/or answers from Walter regarding
how to solve this append performance problem and regarding the idea of adding a
third capacity field to dynamical arrays), someone has suggested to create an
array builder class, like the string builder of Java. So I have created one,
the code is in the attach. Note that code is raw still, not commented, and it
does the bare minimum (append one item and extract the array, I'd like to add
generic extend methods, and tye possibility to change the capacity/length), its
tests are messy still, etc, once I know it's correct I'll improve it and I'll
add it to my libs (of course, I can offer a version of this code to
Phobos/Tango too).

It's not easy for me to write such GC-based code, so my questions are:
Is ArrayBuilder!() correct in every situation?
Why are the performance of the benchmark3 so low (a bit worse than normal array
append)? Can it be improved?


Extra note: while creating it I have found another bug in DMD I didn't know
about:

import std.stdio: writefln;
void main() {
    alias int[2] T;
    writefln(T.init);
}

This program has to print [0,0] instead of 0.


Timings, on a Core Duo   2 GHz, 2 GB RAM:

benchmark 1, N=2_000_000:
  Array append: 0.160 s
  ArrayBuilder: 0.026 s
benchmark 1, N=20_000_000:
  Array append: 1.837 s
  ArrayBuilder: 0.325 s
benchmark 1, N=50_000_000:
  Array append: 4.489 s
  ArrayBuilder: 0.731 s
benchmark 1, N=200_000_000:
  Array append: 32.293 s
  ArrayBuilder: Out of memory

benchmark 2, N=200_000:
  Array append: 0.099 s
  ArrayBuilder: 0.004 s
benchmark 2, N=2_000_000:
  Array append: 6.896 s
  ArrayBuilder: 0.050 s

benchmark 3, N=200_000:
  Array append: 0.090 s
  ArrayBuilder: 0.076 s
benchmark 3, N=2_000_000:
  Array append: 1.014 s
  ArrayBuilder: 0.923 s (why is this so slow?)
benchmark 3, N=6_000_000:
  Array append: 5.295 s
  ArrayBuilder: 4.431 s

benchmark 4, N=500_000:
  Array append: 1.109 s
  ArrayBuilder: 0.414 s
benchmark 4, N=1_000_000:
  Array append: 3.103 s

Bye,
bearophile

Aug 30 2008

bearophile <bearophileHUGS lycos.com> writes:

IsReferenceType!() isn't fully correct because it sees fixed size arrays as
reference types. I have posted a more correct (and cleaned, and
single-template) version here:

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=75813

Bye,
bearophile

Aug 30 2008

dsimcha <dsimcha yahoo.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 IsReferenceType!() isn't fully correct because it sees fixed size arrays as

reference types. I have posted a more correct (and cleaned, and single-template)
version here:

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=75813
 Bye,
 bearophile

One small issue:  Your IsReference template doesn't work with invariant, and
presumably const, types.  For example,

IsReferenceType!(typeof("Foo"[0]));  //Returns true.

The easiest way to fix this is to add a Mutable!():

static if (IsType!(Mutable!(Types[0]), bool, byte, ubyte, short, ushort, int,
uint,
                           long, ulong, float, double, real, ifloat, idouble,
                           ireal, cfloat, cdouble, creal, char, dchar, wchar) )

Also, although it may be the least bad way to do this (I can't think of a better
one), the process of elimination method this relies on seems kind of brittle
because it would have to be modified for any new type that is added.  If anyone
knows of a less kludge-y way, please post.

Aug 30 2008

Sergey Gromov <snake.scaly gmail.com> writes:

dsimcha <dsimcha yahoo.com> wrote:
 static if (IsType!(Mutable!(Types[0]), bool, byte, ubyte, short, ushort, int,
uint,
                            long, ulong, float, double, real, ifloat, idouble,
                            ireal, cfloat, cdouble, creal, char, dchar, wchar) )
 
 Also, although it may be the least bad way to do this (I can't think of a
better
 one), the process of elimination method this relies on seems kind of brittle
 because it would have to be modified for any new type that is added.  If anyone
 knows of a less kludge-y way, please post.

I may be missing something, but the following seems to work:

template Category(T)
{
	static if (
		is(T == interface) ||
		is(T == class) ||
		is(T Base : Base*))
	{
		const Category = "reference";
	}
	else
		const Category = "value";
}

template Category(T : T[U], U)
{
	const Category = Category!(T);
}

You need to extend this for structs and unions but that seems routine...

-- 
SnakE

Aug 31 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Sergey Gromov <snake.scaly gmail.com> wrote:
 template Category(T)
 {
 	static if (
 		is(T == interface) ||
 		is(T == class) ||
 		is(T Base : Base*))
 	{
 		const Category = "reference";
 	}
 	else
 		const Category = "value";
 }
 
 template Category(T : T[U], U)
 {
 	const Category = Category!(T);
 }

Fixed arrays: dynamic and associative arrays always contain references, 
static arrays must be checked recursively:

template Category(T)
{
	static if (
		is(T == interface) ||
		is(T == class) ||
		is(T Elem : Elem[]) ||	// dyn array
		is(T Base : Base*))
	{
		const Category = "reference";
	}
	else
		const Category = "value";
}

template Category(T : T[U], U)
{
	// AAs always contain references
	const Category = "reference";
}

template Category(T : T[U], size_t U)
{
	// Static arrays may or may not contain references
	const Category = Category!(T);
}

-- 
SnakE

Aug 31 2008

dsimcha <dsimcha yahoo.com> writes:

Nice work on the reference template.  I had thought that the typeid(T).flags()
would be inlined, then constant folded, but CTFE doesn't work on it, so I guess
not.  Probably pretty negligible, but if I have a template laying around to do
this at compile time, I may as well use it.

Also, to answer your question, you do need to call hasNoPointers() each time you
realloc.

import std.stdio, std.gc, std.process;

void main() {
    uint[] foo = (cast(uint[]) malloc(uint.sizeof))[0..1];
    hasNoPointers(foo.ptr);
    foo = cast(uint[]) realloc(foo.ptr, 100_000 * uint.sizeof)[0..100_000];
    //Commenting next line out causes severe mem leaks.
    hasNoPointers(foo.ptr);
    foreach(ref f; foo) {
        f = cast(uint) test();  //Create false pointers.
    }
    system("pause");  //So mem usage can be checked.
}

uint* test() {
    return (new uint[1_000]).ptr;
}

Oh, one more thing:  What's a static struct as opposed to a regular struct? 
I've
never seen it used before.  I had assumed that this means a struct composed only
of static members, but apparently not.

Aug 30 2008

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

Nice work on the reference template.<

It comes from several expansion and compression cycles, has requires few hours
to be written ;-)


Also, to answer your question, you do need to call hasNoPointers() each time
you realloc.<

Right, I have just read the D docs that say it.
But what's the rationale behind it? If I have a pointer and I want to realloc
it, to make its space bigger, I never want to change the fact that the GC scans
its pointers or not. Stating it once must suffice.


Oh, one more thing:  What's a static struct as opposed to a regular struct?<

I have taken a look at the documentation, and I have performed some benchmarks,
now I think it does nothing, I have removed that attribute.
I have flagged that struct fields as private because originally that was a
class, but a struct is better and faster, so I have removed those "private" too.


One small issue:  Your IsReference template doesn't work with invariant, and
presumably const, types.<

Because I (as most people here, I presume) use D 1. Sorry for not stating it
clearly.


 The easiest way to fix this is to add a Mutable!():
 static if (IsType!(Mutable!(Types[0]), bool, byte, [...]

Thank you, I have added a note into my libs, even if they are for D 1.

Hopefully some one else will tell me if the ArrayBuilder is correct (even if
this group isn't D.learn).

Bye and thank you,
bearophile

Aug 30 2008

Sean Kelly <sean invisibleduck.org> writes:

dsimcha wrote:
 Nice work on the reference template.  I had thought that the typeid(T).flags()
 would be inlined, then constant folded, but CTFE doesn't work on it, so I guess
 not.  Probably pretty negligible, but if I have a template laying around to do
 this at compile time, I may as well use it.
 
 Also, to answer your question, you do need to call hasNoPointers() each time
you
 realloc.

Not with Tango.


Sean

Aug 30 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-31 06:01:26 +0200, Sean Kelly <sean invisibleduck.org> said:

 dsimcha wrote:
 Nice work on the reference template.  I had thought that the typeid(T).flags()
 would be inlined, then constant folded, but CTFE doesn't work on it, so I guess
 not.  Probably pretty negligible, but if I have a template laying around to do
 this at compile time, I may as well use it.
 
 Also, to answer your question, you do need to call hasNoPointers() each 
 time you
 realloc.

 
 Not with Tango.
 
 
 Sean

I would have done it quite differently: keep a linked list (if you want 
to improve of batches of 10 arrays), with a pointer to the end, append 
in O(1), and keep the total length.

only upon a call to toArray allocate an array of the requested length 
and fill it.

An array (even with tricks) is not a good structure to append to, so it 
seems a bad idea to me to use it.

Fawzi

Aug 31 2008

bearophile <bearophileHUGS lycos.com> writes:

Fawzi Mohamed:
 I would have done it quite differently: keep a linked list (if you want 
 to improve of batches of 10 arrays), with a pointer to the end, append 
 in O(1), and keep the total length.
 only upon a call to toArray allocate an array of the requested length 
 and fill it.
 An array (even with tricks) is not a good structure to append to, so it 
 seems a bad idea to me to use it.

Yes, my code was experimental, my main purpose was to ask if it's correct, in
its GC management, etc. Algorithmic and code tuning is something I think about
later.

I'm now implementing ArrayBuilder with a deque-like, to see how it performs in
comparison.
There are 3 possible things you may want to join to such ArrayBuilder:
- Fixed size arrays
- Single items
- Dynamic arrays

For the dynamic arrays can be copied just a reference, for the other is better
to copy the data. The single items can be added to short (64 KB) structs with
fixed size arrays inside, organized into a linked list. The static arrays can
equally be copied in pieces into such blocks. The dynamic arrays instead
require short arrays filled with the struct of the dynamic arrays. This makes
things messy. If the user keeps alternating single items to dynamic arrays the
data structure has to avoid wasting too much memory. So I don't see an easy
solution yet, beside my original one or a similar one (that consists in copying
all items of dynamic arrays too).

Bye and thank you,
bearophile

Aug 31 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-08-31 17:37:13 +0200, bearophile <bearophileHUGS lycos.com> said:

 Fawzi Mohamed:
 I would have done it quite differently: keep a linked list (if you want
 to improve of batches of 10 arrays), with a pointer to the end, append
 in O(1), and keep the total length.
 only upon a call to toArray allocate an array of the requested length
 and fill it.
 An array (even with tricks) is not a good structure to append to, so it
 seems a bad idea to me to use it.

 
 Yes, my code was experimental, my main purpose was to ask if it's 
 correct, in its GC management, etc. Algorithmic and code tuning is 
 something I think about later.
 
 I'm now implementing ArrayBuilder with a deque-like, to see how it 
 performs in comparison.
 There are 3 possible things you may want to join to such ArrayBuilder:
 - Fixed size arrays
 - Single items
 - Dynamic arrays
 
 For the dynamic arrays can be copied just a reference, for the other is 
 better to copy the data. The single items can be added to short (64 KB) 
 structs with fixed size arrays inside, organized into a linked list. 
 The static arrays can equally be copied in pieces into such blocks. The 
 dynamic arrays instead require short arrays filled with the struct of 
 the dynamic arrays. This makes things messy. If the user keeps 
 alternating single items to dynamic arrays the data structure has to 
 avoid wasting too much memory. So I don't see an easy solution yet, 
 beside my original one or a similar one (that consists in copying all 
 items of dynamic arrays too).
 
 Bye and thank you,
 bearophile

I had not thought about an important thing: the content of the array 
can change... so copying is safer.
Probably using (as you said) a linked list of fixed arrays is the best 
approach for many small appends, maybe append might have a bool 
argument "fixed=false" to know if copying can be avoided.

Fawzi

Aug 31 2008

bearophile <bearophileHUGS lycos.com> writes:

Fawzi Mohamed:
 I had not thought about an important thing: the content of the array 
 can change... so copying is safer.

Yes, and there are other problems, that make the situation/code a mess, so I
copy all items now, keeping code simpler. I'm debugging/benchmarking it now...

Later,
bearophile

Aug 31 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Mon, 01 Sep 2008 01:11:46 +0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Fawzi Mohamed:
 I had not thought about an important thing: the content of the array
 can change... so copying is safer.

 Yes, and there are other problems, that make the situation/code a mess,  
 so I copy all items now, keeping code simpler. I'm  
 debugging/benchmarking it now...

 Later,
 bearophile

Please, test my version, too!
While it is slighly slower, it allows appending invariant arrays (i.e.  
strings) without copying data (although it is currently not implemented).

An idea is to store an array (or a list) of invariant elements.
Current buffer is mutable until it is completely full. It then casted to  
invariant and appended to the list and an new mutable buffer (of a higher  
size) created.

struct ArrayBuilder(T) {
     //private BearophileArrayBuilder!(invariant(T[])) subArrays;
     private invariant(T[])[] subArrays;
     private int lengthSoFar;
	
     private T[] tmpArray;
     private int curOffset = 0;

     void opCatAssign(T x) {
         int capacity = tmpArray.length;
         if (curOffset >= capacity) {
             int old_capacity = capacity;

             if (capacity == 0)
                 capacity = 4; // starting point not too much small
             else if (capacity < (1 << 26)) // 67_108_864
                 capacity *= 2; // for amortized O(1) and less RAM  
fragmentation
             else
                 capacity *= 1.3; // slower growing rate to save RAM

             if (old_capacity != 0) {
                 subArrays ~= cast(invariant(T[]))tmpArray;
             }

             tmpArray = new T[capacity];
			curOffset = 0;
         }

         tmpArray[curOffset] = x;
         assert(tmpArray[curOffset] == x);
         ++curOffset;
         ++lengthSoFar;
     }

     ///
     T[] toarray() {
         T[] array = new T[lengthSoFar];
         int offset = 0;
         foreach (subArray; subArrays) {
             int newOffset = offset+subArray.length;
             array[offset..newOffset] = subArray[];
             offset = newOffset;
         }
         array[offset..$] = tmpArray[0..curOffset];
         return array;
     }
}

Sep 01 2008

bearophile <bearophileHUGS lycos.com> writes:

Denis Koroskin:
 Please, test my version, too!
 While it is slighly slower, it allows appending invariant arrays (i.e.  
 strings) without copying data (although it is currently not implemented).

Sorry, I'm not using D 2.x. I (hopefully) will soon show an improved version of
my code, so you can test your version.

Note: I have benchmarked the version with the deque-like data structure, and
it's slower, the code is a bit more complex (and I may have hit a bug in the GC
(I'd like to know if Tango behaves at the same way)), so I'll probably just use
the first version, with the reallocs.

----------------

With Phobos:

import std.stdio: writefln;
import std.gc: malloc;

void main() {
    int M = 1 << 16;
    int N = M / int.sizeof;
    const int N_POINTERS = 18;

    int*[N_POINTERS] pointers;
    foreach (ref ptr; pointers) {
        ptr = cast(int*)malloc(N * int.sizeof);
        if (ptr is null) {
            throw new Error("OUT OF MEM");
            return 1;
        }
    }

    int last = 0;
    foreach (i, ptr; pointers) {
        int int_ptr = cast(int)ptr;
        int diff = int_ptr - last;
        writefln(i, " ", int_ptr, " ", diff, " ", M - diff);
        last = int_ptr;
    }
}

Prints:

0 9052160 9052160 -8986624
1 9117696 65536 0
2 9183232 65536 0
3 9248768 65536 0
4 9314304 65536 0
5 9379840 65536 0
6 9445376 65536 0
7 9510912 65536 0
8 9576448 65536 0
9 9641984 65536 0
10 9707520 65536 0
11 9773056 65536 0
12 9838592 65536 0
13 9904128 65536 0
14 9969664 65536 0
15 10092544 122880 -57344
16 10158080 65536 0
17 10223616 65536 0

Is that result at n = 15 normal/correct?

Bye,
bearophile

Sep 01 2008

bearophile <bearophileHUGS lycos.com> writes:

This is the first beta version, it seems to work well enough (despite too much
easy memory overflows), I'll add ddoc comments, better comments, I'll clean the
code, improve unit tests, etc. If you spot problems I'd like to know them.
You will probably find its finished version inside my libs in one or two days.
I presume Walter isn't interested to put this into Phobos, nor to 'fix' the
built-in arrays  (or maybe he's just busy).

Bye,
bearophile

Sep 01 2008

bearophile <bearophileHUGS lycos.com> writes:

Well, a finished version of ArrayBuilder is in my libs, in the 'builders'
module:
http://www.fantascienza.net/leonardo/so/libs_d.zip

I have already used it to replace the old string appending used in the putr()
(printing functions) and now printing is about 2.5 times faster (and using it
for that purpose is a way to perform hundreds of unit tests on ArrayBuilder),
so I'm quite pleased.

Bye, and thank you for all the help and answers (but more bugs can be present
still),
bearophile

Sep 01 2008

Benji Smith <dlanguage benjismith.net> writes:

bearophile wrote:
 Well, a finished version of ArrayBuilder is in my libs, in the 'builders'
module:
 http://www.fantascienza.net/leonardo/so/libs_d.zip
 
 I have already used it to replace the old string appending used in the putr()
(printing functions) and now printing is about 2.5 times faster (and using it
for that purpose is a way to perform hundreds of unit tests on ArrayBuilder),
so I'm quite pleased.
 
 Bye, and thank you for all the help and answers (but more bugs can be present
still),
 bearophile

Very cool.

Philosophically, though, isn't the whole purpose of having dynamic 
arrays in D to avoid having to create library implementations like this?

I can definitely appreciate how something like this provides 
proof-of-concept for compiler & stdlib improvements, but in the end, 
having to rely on an array-builder class seems very non-D-ish.

Thoughts?

--benji

Sep 01 2008

bearophile <bearophileHUGS lycos.com> writes:

Benji Smith:
 Philosophically, though, isn't the whole purpose of having dynamic 
 arrays in D to avoid having to create library implementations like this?

I agree, but so far I am not able to modify/improve the language. And an
ArrayBuilder isn't meant to replace dynamic arrays, it's just a way to build
them faster, "patching" what I perceive as one of their problems.


 I can definitely appreciate how something like this provides 
 proof-of-concept for compiler & stdlib improvements,

My libs are meant to improve std libs, if/where/when the compiler can't be
modified. They are un-standard but they are largish libs anyway, and I'm
developing them doing my best :-)

Bye,
bearophile

Sep 01 2008

"Dave" <Dave_member pathlink.com> writes:

"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:g9i1nb$26pd$1 digitalmars.com...
 Benji Smith:
 Philosophically, though, isn't the whole purpose of having dynamic
 arrays in D to avoid having to create library implementations like this?

 I agree, but so far I am not able to modify/improve the language. And an 
 ArrayBuilder isn't meant to replace dynamic arrays, it's just a way to 
 build them faster, "patching" what I perceive as one of their problems.

There have been quite a few examples of contributors modifying the D runtime 
to improve things like array concatenation. If it generally performs better 
and is demonstrably correct, I'm pretty sure Walter and the Tango crew would 
incorporate it.

In other words, have a go at improving the arraycat code in internal/gc/gc.d 
if you have the time. It's simple enough to modify and build.

 I can definitely appreciate how something like this provides
 proof-of-concept for compiler & stdlib improvements,

 My libs are meant to improve std libs, if/where/when the compiler can't be 
 modified. They are un-standard but they are largish libs anyway, and I'm 
 developing them doing my best :-)

 Bye,
 bearophile

Sep 02 2008

bearophile <bearophileHUGS lycos.com> writes:

Dave:
 There have been quite a few examples of contributors modifying the D runtime 
 to improve things like array concatenation.

This requires three fields in the dynamic array struct, so maybe it requires
changes in several points of the front-end.


If it generally performs better and is demonstrably correct, I'm pretty sure
Walter and the Tango crew would incorporate it.<

It seems to work, it speeds up the code where I have used it (I'm using it more
and more in my libs, it has speed up the select() two times, it contains an
array append because in general you don't know how many items you filter away
from the input iterable) but I am not able to demostrate it correct. Its main
fault so far is that it causes MemoryOverflow if you try to add something like
200 million size_t to it.


 In other words, have a go at improving the arraycat code in internal/gc/gc.d 
 if you have the time. It's simple enough to modify and build.

I'll take a look, but I presume it's not easy as you say :-)

Bye,
bearophile

Sep 03 2008

BCS <ao pathlink.com> writes:

Reply to Benji,

 bearophile wrote:
 
 Well, a finished version of ArrayBuilder is in my libs, in the
 'builders' module: http://www.fantascienza.net/leonardo/so/libs_d.zip
 
 I have already used it to replace the old string appending used in
 the putr() (printing functions) and now printing is about 2.5 times
 faster (and using it for that purpose is a way to perform hundreds of
 unit tests on ArrayBuilder), so I'm quite pleased.
 
 Bye, and thank you for all the help and answers (but more bugs can be
 present still), bearophile
 

 Very cool.
 
 Philosophically, though, isn't the whole purpose of having dynamic
 arrays in D to avoid having to create library implementations like
 this?
 
 I can definitely appreciate how something like this provides
 proof-of-concept for compiler & stdlib improvements, but in the end,
 having to rely on an array-builder class seems very non-D-ish.
 
 Thoughts?
 
 --benji
 

I think that in many cases the information needed for the compiler to produce 
the needed optimizations is to hard to encode in D like languages. D can 
handle the individual cats but when to do the rest is a problem that can 
span to much code to expect the compiler to solve it. A human with knowledge 
of what is supposed to happen can use libs to improve the situation.

OTOH the best solution would be to describe programs in a way that lets a 
computer do that work. We can dream.

Sep 01 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Sun, 31 Aug 2008 04:58:10 +0400, dsimcha <dsimcha yahoo.com> wrote:

 Nice work on the reference template.  I had thought that the  
 typeid(T).flags()
 would be inlined, then constant folded, but CTFE doesn't work on it, so  
 I guess
 not.  Probably pretty negligible, but if I have a template laying around  
 to do
 this at compile time, I may as well use it.

 Also, to answer your question, you do need to call hasNoPointers() each  
 time you
 realloc.

 import std.stdio, std.gc, std.process;

 void main() {
     uint[] foo = (cast(uint[]) malloc(uint.sizeof))[0..1];
     hasNoPointers(foo.ptr);
     foo = cast(uint[]) realloc(foo.ptr, 100_000 *  
 uint.sizeof)[0..100_000];
     //Commenting next line out causes severe mem leaks.
     hasNoPointers(foo.ptr);
     foreach(ref f; foo) {
         f = cast(uint) test();  //Create false pointers.
     }
     system("pause");  //So mem usage can be checked.
 }

 uint* test() {
     return (new uint[1_000]).ptr;
 }

 Oh, one more thing:  What's a static struct as opposed to a regular  
 struct?  I've
 never seen it used before.  I had assumed that this means a struct  
 composed only
 of static members, but apparently not.

Static doesn't apply to a struct, only to inner classes, IIRC.

Aug 31 2008

Christian Kamm <kamm-incasoftware removethis.de> writes:

 Extra note: while creating it I have found another bug in DMD I didn't
 know about:
 
 import std.stdio: writefln;
 void main() {
     alias int[2] T;
     writefln(T.init);
 }
 
 This program has to print [0,0] instead of 0.

Actually, I don't think the D spec requires typeof(T.init) == T. And T is a
valid initializer for T[N]:

int[3] arr = 5; // sets all elements to 5

Christian

Aug 31 2008

bearophile <bearophileHUGS lycos.com> writes:

Christian Kamm:
 Actually, I don't think the D spec requires typeof(T.init) == T.

Maybe the D specs have to be fixed then.


 And T is a valid initializer for T[N]:

Right, but the following program:

import std.stdio: writefln;
void main() {
    alias int[2] T;
    writefln(T.init);

    int[2] T_init;
    writefln(T_init);
}

prints:
0
[0,0]

So if I have a dynamic array of static arrays (where T is the static array) I
can't do this:
array[0 .. $][] = T.init;

I have to do:

T T_init;
array[0 .. $][] = T_init;

While if T is a dynamic array I can do:

array[0 .. $] = T.init;

So even if they aren't bugs I don't like all such differences much...

Bye,
bearophile

Aug 31 2008

downs <default_357-line yahoo.de> writes:

bearophile wrote:
 Christian Kamm:
 Actually, I don't think the D spec requires typeof(T.init) == T.

 
 Maybe the D specs have to be fixed then.
 

This should do what you want:

template Init(T) { const T Init; }

Sep 01 2008

Nick B <nick.barbalich gmail.com> writes:

Hi

I came across this the other day, and no one has mentioned this, on this 
news group before, I thought I would bring the subject to the 
communities attention, so to speak.

Google has very recently, open sourced "Protocol Buffers".

What is it you ask ?  In a couple of lines it is a language-neutral, 
platform-neutral, extensible way of serializing structured data for use 
in communications protocols, data storage, and more.

Think XML, but smaller, faster, and simpler.

Why not just use XML ?

Protocol buffers have many advantages over XML for serializing 
structured data.  Protocol buffers:
* are simpler
* are 3 to 10 times smaller
* are 20 to 100 times faster
* are less ambiguous
* generate data access classes that are easier to use programmatically.

PB supports Java, Python,and C++ currently.

A more detailed overview can be found here:
http://code.google.com/apis/protocolbuffers/docs/overview.html

and the FAQ here:
http://code.google.com/apis/protocolbuffers/docs/faq.html

See the answer to the question "Can I add support for a new language to 
protocol buffers?" inside the FAQ.

Some Tips and comments can be found here:
http://zunger.livejournal.com/164024.html


My questions.
Does the D community see this of interest ?
Is this something they might use ?
Do they see value in it being added to the respective
Tango or Phobos frameworks ?

any other comments ?

cheers

Nick B.

Sep 01 2008

Lars Ivar Igesund <larsivar igesund.net> writes:

Nick B wrote:

 Hi
 
 I came across this the other day, and no one has mentioned this, on this
 news group before, I thought I would bring the subject to the
 communities attention, so to speak.

That is well and fine, but you shouldn't hijack a thread for it, rather
start a new one.

 Google has very recently, open sourced "Protocol Buffers".
 
 What is it you ask ?  In a couple of lines it is a language-neutral,
 platform-neutral, extensible way of serializing structured data for use
 in communications protocols, data storage, and more.
 
 Think XML, but smaller, faster, and simpler.
 
 Why not just use XML ?
 
 Protocol buffers have many advantages over XML for serializing
 structured data.  Protocol buffers:
 * are simpler
 * are 3 to 10 times smaller
 * are 20 to 100 times faster

Not necessarily a valid point if you use the right parser, say, one from
Tango.

 * are less ambiguous
 * generate data access classes that are easier to use programmatically.
 
 PB supports Java, Python,and C++ currently.
 
 A more detailed overview can be found here:
 http://code.google.com/apis/protocolbuffers/docs/overview.html
 
 and the FAQ here:
 http://code.google.com/apis/protocolbuffers/docs/faq.html
 
 See the answer to the question "Can I add support for a new language to
 protocol buffers?" inside the FAQ.
 
 Some Tips and comments can be found here:
 http://zunger.livejournal.com/164024.html
 
 
 My questions.
 Does the D community see this of interest ?
 Is this something they might use ?
 Do they see value in it being added to the respective
 Tango or Phobos frameworks ?

If the protocol is widely used, then it would certainly be beneficial to
support it, and I would assume that it is fairly easy to implement on top
of Tango. 

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Sep 01 2008

Brian Price <blprice61 yahoo.com> writes:

== Quote from Nick B (nick.barbalich gmail.com)'s article
 Hi
 I came across this the other day, and no one has mentioned this, on this
 news group before, I thought I would bring the subject to the
 communities attention, so to speak.
 Google has very recently, open sourced "Protocol Buffers".
 What is it you ask ?  In a couple of lines it is a language-neutral,
 platform-neutral, extensible way of serializing structured data for use
 in communications protocols, data storage, and more.
 Think XML, but smaller, faster, and simpler.
 Why not just use XML ?
 Protocol buffers have many advantages over XML for serializing
 structured data.  Protocol buffers:
 * are simpler
 * are 3 to 10 times smaller
 * are 20 to 100 times faster
 * are less ambiguous
 * generate data access classes that are easier to use programmatically.
 PB supports Java, Python,and C++ currently.
 A more detailed overview can be found here:
 http://code.google.com/apis/protocolbuffers/docs/overview.html
 and the FAQ here:
 http://code.google.com/apis/protocolbuffers/docs/faq.html
 See the answer to the question "Can I add support for a new language to
 protocol buffers?" inside the FAQ.
 Some Tips and comments can be found here:
 http://zunger.livejournal.com/164024.html
 My questions.
 Does the D community see this of interest ?
 Is this something they might use ?
 Do they see value in it being added to the respective
 Tango or Phobos frameworks ?
 any other comments ?
 cheers
 Nick B.

Hate to say it but this is yet another case of reinventing the wheel.  The worst
thing about this throwback to the early 90s is its inherent violation of DRY.
This package intermingles and confuses three separate issues, treating them as a
monolithic whole, namely: serialization, marshalling, and versioning.

Binary solutions such as this, while more efficient byte-wise, run into
portability problems especially with floating point values.  They also lose the
human readability of the data (sans the use of special tools).  Often the use
of a
binary solution is a case of premature optimization and indicates bad design at
a
higher level.  The marshalling strategy should be 'pluggable' so that one can
use
an easier to debug, human readable, data format during development.

Versioning problems can be (and have been) addressed in a number of ways over
the
years, the simplest and imo often the best, is to sidestep the problem by
serializing a map of the data rather than just the raw data.  That is, each
value
has associated metadata indicating its field name (a key-value mapping iow).

If XML is too heavy a hammer, JSON (with or without embedded metadata) is a good
alternative with quite a bit of industry support.  In fact, the use of JSON
encoding with embedded metadata gives you a lightweight solution that works with
nearly every language in present use.  Even better, any language with good
reflection support (such as Java) allows an implementation that does not
violate DRY.

This "Protocol Buffers" solution is a dinosaur, a throwback, yet another wheel
when we need ground effects or anti-gravity.

My 2c,
Brian

Sep 01 2008

Leandro Lucarella <llucax gmail.com> writes:

Brian Price, el  1 de septiembre a las 14:31 me escribiste:
 Hate to say it but this is yet another case of reinventing the wheel.  The
worst
 thing about this throwback to the early 90s is its inherent violation of DRY.
 This package intermingles and confuses three separate issues, treating them as
a
 monolithic whole, namely: serialization, marshalling, and versioning.

Different problems has differents solutions. What you say is really nice
and have sense when *latency* and other performance issues are not a problem
to you (it's obviously a problem for Google and for a lot of other people,
like me, I was doing a little "framework", similar to Google's PB at work,
just before they release it =).

When you deal with (almost) realtime systems, you can't pay the price for
human readability, extra parsing time, extra abstraction and extra bytes
thrown throgh the wire. You just can't.

You don't even care about portability (this kind of things usually run in
a very controlled environment).

I just wanted to clarify this.

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
PADRES DENUNCIAN QUE SU HIJA SE ESCAPO CON UN PARAGUAYITO
	-- Crónica TV

Sep 01 2008

Brian Price <blprice61 yahoo.com> writes:

== Quote from Leandro Lucarella (llucax gmail.com)'s article
 Brian Price, el  1 de septiembre a las 14:31 me escribiste:
 Hate to say it but this is yet another case of reinventing the wheel.  The
worst
 thing about this throwback to the early 90s is its inherent violation of DRY.
 This package intermingles and confuses three separate issues, treating them as
a
 monolithic whole, namely: serialization, marshalling, and versioning.

 Different problems has differents solutions. What you say is really nice
 and have sense when *latency* and other performance issues are not a problem
 to you (it's obviously a problem for Google and for a lot of other people,
 like me, I was doing a little "framework", similar to Google's PB at work,
 just before they release it =).
 When you deal with (almost) realtime systems, you can't pay the price for
 human readability, extra parsing time, extra abstraction and extra bytes
 thrown throgh the wire. You just can't.
 You don't even care about portability (this kind of things usually run in
 a very controlled environment).
 I just wanted to clarify this.

In a (near) realtime embedded system with a high wire cost (cellphone, smart
meters, etc) I'd have to agree with you.  However, my initial impression was
that
this was being proposed/offered as a general purpose object streaming system.

Multicore/hyperthreaded processors with multiple levels of caching have a huge
latency (comparitively) between core and wire.  Even in a high demand system
there's generally plenty of cpu cycles left over to do quite a bit of extra
parsing and data massaging (compression).  A compressed text message (XML or
JSON
format) is very competitive in size with binary format (depending of course on
the
actual makeup of the data payload).  I expect that in a few years the embedded
systems will be sporting processors with similar capabilities to today's
desktops
and servers.  At that time, it may be worthwhile re-evaluating the decision to
go
with a 'brittle' binary format in new designs.

On a personal note, some of my friends are laughing at me over this - ten or
twelve years ago I was vehemently against the idea of using human readable data
formats for transmission - for the same reasons you give above.  Of course my
cellphone today probably has nearly the power of my desktop back then.

Sorry for any misunderstanding,
Brian

Sep 01 2008

Mitja Slenc <mslenc gmail.com> writes:

Brian Price wrote:
 == Quote from Nick B (nick.barbalich gmail.com)'s article
 Hi
 I came across this the other day, and no one has mentioned this, on this
 news group before, I thought I would bring the subject to the
 communities attention, so to speak.
 Google has very recently, open sourced "Protocol Buffers".
 What is it you ask ?  In a couple of lines it is a language-neutral,
 platform-neutral, extensible way of serializing structured data for use
 in communications protocols, data storage, and more.
 Think XML, but smaller, faster, and simpler.
 Why not just use XML ?
 Protocol buffers have many advantages over XML for serializing
 structured data.  Protocol buffers:
 * are simpler
 * are 3 to 10 times smaller
 * are 20 to 100 times faster
 * are less ambiguous
 * generate data access classes that are easier to use programmatically.
 PB supports Java, Python,and C++ currently.
 A more detailed overview can be found here:
 http://code.google.com/apis/protocolbuffers/docs/overview.html
 and the FAQ here:
 http://code.google.com/apis/protocolbuffers/docs/faq.html
 See the answer to the question "Can I add support for a new language to
 protocol buffers?" inside the FAQ.
 Some Tips and comments can be found here:
 http://zunger.livejournal.com/164024.html
 My questions.
 Does the D community see this of interest ?
 Is this something they might use ?
 Do they see value in it being added to the respective
 Tango or Phobos frameworks ?
 any other comments ?
 cheers
 Nick B.

 
 Hate to say it but this is yet another case of reinventing the wheel.  The
worst
 thing about this throwback to the early 90s is its inherent violation of DRY.
 This package intermingles and confuses three separate issues, treating them as
a
 monolithic whole, namely: serialization, marshalling, and versioning.

Can you explain where you see violation of DRY exactly? Also, can you 
explain the big difference between serialization and marshalling? I find 
it hard to draw any meaningful line between the two..

 Binary solutions such as this, while more efficient byte-wise, run into
 portability problems especially with floating point values.

So, what sort of problems do you see with this specific solution? To me, 
it looks like a text based solution would have more problems with 
floating point values, because they typically get converted to/from 
decimal..

 They also lose the
 human readability of the data (sans the use of special tools).  Often the use
of a
 binary solution is a case of premature optimization and indicates bad design
at a
 higher level.  The marshalling strategy should be 'pluggable' so that one can
use
 an easier to debug, human readable, data format during development.

Well, in my experience, most data tends to live somewhere not readily 
accessible and/or human readable, so you need a tool one way or the 
other. Making generic tools for PBs shouldn't be too hard, as far as I 
can tell they do have support for generic message handling, 
'reflection', there's a DebugString() method on each of them, etc..

 Versioning problems can be (and have been) addressed in a number of ways over
the
 years, the simplest and imo often the best, is to sidestep the problem by
 serializing a map of the data rather than just the raw data.  That is, each
value
 has associated metadata indicating its field name (a key-value mapping iow).

And the PB encoding differs from this how exactly? Each value has a key 
preceding it, which identifies which field it is..

 If XML is too heavy a hammer, JSON (with or without embedded metadata) is a
good
 alternative with quite a bit of industry support.  In fact, the use of JSON
 encoding with embedded metadata gives you a lightweight solution that works
with
 nearly every language in present use.  Even better, any language with good
 reflection support (such as Java) allows an implementation that does not
violate DRY.

I haven't tried it, but it looks like generic mapping to/from JSON would 
be very easily done on top of PBs, for when you need it (like when 
publishing a JavaScript API). But why would you want to use JSON in any 
other case? Compared to PBs, it's harder/slower to parse, definitely 
takes more space/bandwidth, has fewer types, and you don't get the nice 
free API protocol buffers provide you for each message type.

 This "Protocol Buffers" solution is a dinosaur, a throwback, yet another wheel
 when we need ground effects or anti-gravity.

Well, you haven't convinced me yet :)


Mitja

Sep 02 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Array append performance 2