www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - string concatenation idea.

reply Regan Heath <regan netwin.co.nz> writes:
The thread "string performance issues" by "Daniel Horn" got me thinking of 
an idea for a change to arrays.

Bascially, concatenation can be slow, as it causes reallocations of the 
array. If you could pre-allocate the array then it wouldn't be as slow.

I tried this:

char[] p = "regan"

p.length = 10;
p ~= "fred";

and ended up with a string containing

'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

which was not what I was after :)

I remembered a thread on arrays requesting renaming the 'length' property 
to 'reserve' or something like that, and the idea for the addition of a 
reserve property that simply allocated memory to the array without 
changing it's length came to me. If we could go:

char[] p = "regan";

p.reserve = 10;
p ~= "fred";

and end up with a string containing

'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

and a length of 9. Then we could do fast concatenation. Otherwise we're 
left writing a String class that achieves this by setting length and using 
memcpy. I thought a design goal of D was to avoid this.

Thoughts?

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
next sibling parent reply "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
It means adding another member to slices, and would complicate it in other ways
also, since one would have to distinguish between slices that own and slices
that
view.

I prefer a language based approach, similar to that for Java, whereby the
elements in a single concatenation statement are actually appended to a
StringBuffer. This boosts string concatenation performance enormously.

I showed how to achieve this in a similar vein, and with similarly significant
performance benefits, for C++ in my recent (June's CUJ) article "Fast,
Non-intrusive String Concatenation". Walter was one of the reviewers, so he's au
fait with the technique.

I suggest something similar can be done for D, by transcribing ~ sequences into
calls to an underlying implementation class/API. It wouldn't be hard to do, and
the restriction would just be the same for Java and C++ (using my
fast_string_concatenator<>), in that it would only work for a single statement.
Not that that's a particularly onerous restriction, of course.

The alternative is just to have a StringBuilder class, which doesn't seem too
much of a burden either.

"Regan Heath" <regan netwin.co.nz> wrote in message
news:opr9l56hws5a2sq9 digitalmars.com...
 The thread "string performance issues" by "Daniel Horn" got me thinking of
 an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of the
 array. If you could pre-allocate the array then it wouldn't be as slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length' property
 to 'reserve' or something like that, and the idea for the addition of a
 reserve property that simply allocated memory to the array without
 changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise we're
 left writing a String class that achieves this by setting length and using
 memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

 -- 
 Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jun 14 2004
next sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Tue, 15 Jun 2004 12:31:48 +1000, Matthew 
<admin stlsoft.dot.dot.dot.dot.org> wrote:
 It means adding another member to slices, and would complicate it in 
 other ways
 also, since one would have to distinguish between slices that own and 
 slices that
 view.

Why do you need to add another member to slices? Don't we already have to distinguish between slices that own and slices that view?
 I prefer a language based approach, similar to that for Java, whereby the
 elements in a single concatenation statement are actually appended to a
 StringBuffer. This boosts string concatenation performance enormously.

 I showed how to achieve this in a similar vein, and with similarly 
 significant
 performance benefits, for C++ in my recent (June's CUJ) article "Fast,
 Non-intrusive String Concatenation". Walter was one of the reviewers, so 
 he's au
 fait with the technique.

 I suggest something similar can be done for D, by transcribing ~ 
 sequences into
 calls to an underlying implementation class/API. It wouldn't be hard to 
 do, and
 the restriction would just be the same for Java and C++ (using my
 fast_string_concatenator<>), in that it would only work for a single 
 statement.
 Not that that's a particularly onerous restriction, of course.

 The alternative is just to have a StringBuilder class, which doesn't 
 seem too
 much of a burden either.

I thought the point of adding strings (i.e. char[]) to D was to avoid having other String classes? Regan
 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opr9l56hws5a2sq9 digitalmars.com...
 The thread "string performance issues" by "Daniel Horn" got me thinking 
 of
 an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of the
 array. If you could pre-allocate the array then it wouldn't be as slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length' 
 property
 to 'reserve' or something like that, and the idea for the addition of a
 reserve property that simply allocated memory to the array without
 changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise we're
 left writing a String class that achieves this by setting length and 
 using
 memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

 --
 Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/


-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Regan Heath wrote:
 On Tue, 15 Jun 2004 12:31:48 +1000, Matthew 
 <admin stlsoft.dot.dot.dot.dot.org> wrote:
 
 It means adding another member to slices, and would complicate it in 
 other ways
 also, since one would have to distinguish between slices that own and 
 slices that
 view.

Why do you need to add another member to slices? Don't we already have to distinguish between slices that own and slices that view?

My experiments show that arrays are allocated in powers of 2 or multiples of 2K. (At least 1D arrays of an atomic type, otherwise I haven't experimented.) There obviously is a reserve property, though looking at the ABI it must be of the allocated block rather than of the array reference itself. My experiments with slicing showed that a slice becomes a newly allocated array if the length is changed, even if it doesn't exceed the originally sliced length. But there are two cases I haven't tested yet: int[] qwert = new int[10]; int[] yuiop = qwert; qwert.length = 14; // remaining within allocated block // ... put some data in qwert yuiop.length = 13; It would follow that qwert and yuiop still have the same start address, and setting the length of yuiop would overwrite qwert[10..13]. int[] qwert = new int[10]; int[] yuiop = qwert[0..5]; yuiop.length = 8; If the distinction between an owner and a viewer is simply a matter of whether the start address is the start of the allocated block, then likewise here. Hang on ... better check if length setting and concatenation really have the same effect.... Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jun 15 2004
prev sibling parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Matthew wrote:
 It means adding another member to slices, and would complicate it in other ways
 also, since one would have to distinguish between slices that own and slices
that
 view.

I don't think it means that. It has already been mentioned that s.reserve(x) is the same as s.length=x; s.length=oldLength; The compiler could simply rewrite it. Note that the idea of a global template function is not quite equivalent, since the function is not in the "namespace" of the char type, but rather in global scope. Which means that multiple such functions will only overload if they are aliased. Hauke
Jun 15 2004
next sibling parent Regan Heath <regan netwin.co.nz> writes:
On Tue, 15 Jun 2004 10:24:21 +0200, Hauke Duden <H.NS.Duden gmx.net> wrote:
 Matthew wrote:
 It means adding another member to slices, and would complicate it in 
 other ways
 also, since one would have to distinguish between slices that own and 
 slices that
 view.

I don't think it means that. It has already been mentioned that s.reserve(x) is the same as s.length=x; s.length=oldLength; The compiler could simply rewrite it.

Or rather simply do the allocation the above steps would cause. The only worry I have is that if you do this..
 s.length=x;
 s.length=oldLength;

when, if ever does it free the memory you are no longer using? Because if it does it sometime in the middle of a few thousand concat operations then were almost back at square one.
 Note that the idea of a global template function is not quite 
 equivalent, since the function is not in the "namespace" of the char 
 type, but rather in global scope. Which means that multiple such 
 functions will only overload if they are aliased.

While it's a nice workaround I cannot see why we cant have a reserve property, IMO it makes sense for all array types, it's the obvious way to do it (so people looking for a way will find it), it's dead simple to change, it does not cause any problems with existing code... Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 15 2004
prev sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cambih$16cs$1 digitaldaemon.com>, Hauke Duden says...
I don't think it means that. It has already been mentioned that

s.reserve(x)

is the same as

s.length=x;
s.length=oldLength;

The compiler could simply rewrite it.

I think it's more like:
 if (x > s.length)
 {
    uint t = s.length;
    s.length = x;
    s.length = t;
 }

I'm not convinced, though, that this is defined behavior. It seems possible to me that future (or even third-party) D-compilers would be at liberty to NOT make this equivalence, unless it were actually part of the spec. Also, it seems to me that a really supercharged optimizer might decide that the above code leaves everything unchanged, and so could be removed altogether. (Optimizers don't always have the same notion as we coders of what is intended and what is merely a side-effect). Finally, note that you can do:
 s = s[0..0];

instead of:
 s = null;

To set a string's length to zero without freeing the allocated memory. But again, not only is this not documented, but the documentation actually implies that this SHOULDN'T work. There is debate over whether this is a feature or a bug. Arcane Jill PS. It occurs to me that the statement: char[] s = a ~ b ~ c ~ d ~ e ~ f ~ g ~ h; would run a lot faster if it were rewritten as: ((((((((s ~= a) ~= b) ~= c) ~= d) ~= e) ~= f) ~= g) ~= h); (Sorry about the brackets - couldn't remember the associativity order of ~=).
Jun 16 2004
next sibling parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Arcane Jill wrote:
 In article <cambih$16cs$1 digitaldaemon.com>, Hauke Duden says...
 
I don't think it means that. It has already been mentioned that

s.reserve(x)

is the same as

s.length=x;
s.length=oldLength;

The compiler could simply rewrite it.

I think it's more like:
if (x > s.length)
{
   uint t = s.length;
   s.length = x;
   s.length = t;
}

I'm not convinced, though, that this is defined behavior. It seems possible to me that future (or even third-party) D-compilers would be at liberty to NOT make this equivalence, unless it were actually part of the spec.

I agree that it should be that way. But Walter himself has recommended the length-juggling technique in this newsgroup. I don't like it either. That's why I think there should be a reserve property. The compiler is the best choice for deciding how it should be implemented. If arrays can never shrink fine, it can be implemented by changing the length property. Otherwise the compiler has to figure out the best way, depending on the implementation. If there is no reserve property then people will simply use the undocumented behaviour because it's the simplest way. Hauke
Jun 16 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <caovi5$23qt$1 digitaldaemon.com>, Hauke Duden says...

But Walter himself has recommended 
the length-juggling technique in this newsgroup.

I haven't heard Walter say that myself, but if he did, then he should know better. Nobody should EVER rely on undocumented behavior, and it would be extremely bad practice to encourage others so to do. What Walter says on this newsgroup does not constitute the D standard. Only the published manual may make that claim. Every time the DMD compiler deviates from the documentation, it ceases to be standards-compliant. (Of course, Walter has the power to change the standard as well as the implementation, but he still has to actually DO it, not just talk about doing it). There is nothing to stop anyone from writing a second D compiler tomorrow, and I do not wish to go back to the bad old days of BASIC, when every implementation of BASIC was entirely non-portable to any other because of various stupid little quirks and features unique to each. If it were officially documented that decreasing the length of a array is guaranteed not to change the address of the string, and is further guaranteed not to release any of the memory within the array's former bounds, then, and only then, could this trick be expected to work on all present and future standards-compliant D compilers. Arcane Jill (PS. Walter - sorry if this sounds rude. That wasn't my intention, I just come across that way sometimes by accident).
Jun 16 2004
parent J Anderson <REMOVEanderson badmama.com.au> writes:
Arcane Jill wrote:

In article <caovi5$23qt$1 digitaldaemon.com>, Hauke Duden says...

  

But Walter himself has recommended 
the length-juggling technique in this newsgroup.
    

I haven't heard Walter say that myself, but if he did, then he should know better. Nobody should EVER rely on undocumented behavior, and it would be extremely bad practice to encourage others so to do. What Walter says on this newsgroup does not constitute the D standard. Only the published manual may make that claim. Every time the DMD compiler deviates from the documentation, it ceases to be standards-compliant. (Of course, Walter has the power to change the standard as well as the implementation, but he still has to actually DO it, not just talk about doing it). There is nothing to stop anyone from writing a second D compiler tomorrow, and I do not wish to go back to the bad old days of BASIC, when every implementation of BASIC was entirely non-portable to any other because of various stupid little quirks and features unique to each. If it were officially documented that decreasing the length of a array is guaranteed not to change the address of the string, and is further guaranteed not to release any of the memory within the array's former bounds, then, and only then, could this trick be expected to work on all present and future standards-compliant D compilers. Arcane Jill (PS. Walter - sorry if this sounds rude. That wasn't my intention, I just come across that way sometimes by accident).

vendors can optimise there own and users can also write their own versions. -- -Anderson: http://badmama.com.au/~anderson/
Jun 16 2004
prev sibling parent Ben Hinkle <bhinkle4 juno.com> writes:
Arcane Jill wrote:

 In article <cambih$16cs$1 digitaldaemon.com>, Hauke Duden says...
I don't think it means that. It has already been mentioned that

s.reserve(x)

is the same as

s.length=x;
s.length=oldLength;

The compiler could simply rewrite it.

I think it's more like:
 if (x > s.length)
 {
    uint t = s.length;
    s.length = x;
    s.length = t;
 }

I'm not convinced, though, that this is defined behavior. It seems possible to me that future (or even third-party) D-compilers would be at liberty to NOT make this equivalence, unless it were actually part of the spec. Also, it seems to me that a really supercharged optimizer might decide that the above code leaves everything unchanged, and so could be removed altogether. (Optimizers don't always have the same notion as we coders of what is intended and what is merely a side-effect). Finally, note that you can do:
 s = s[0..0];

instead of:
 s = null;

To set a string's length to zero without freeing the allocated memory. But again, not only is this not documented, but the documentation actually implies that this SHOULDN'T work. There is debate over whether this is a feature or a bug.

The length-juggling trick also has limitations: setting the length to 0 or from 0 will wipe out the data ptr. Look in src/internal/gc/gc.d the function _d_arraysetlength. For example s.length = 10; // reserve some space s.length = 0; // sets data ptr to null s.length = 10; // reserve some more space s = s[0..0]; // preserves the data ptr s.length = 5; // allocates new data ptr
 Arcane Jill
 
 
 PS. It occurs to me that the statement:
 
 char[] s = a ~ b ~ c ~ d ~ e ~ f ~ g ~ h;
 
 would run a lot faster if it were rewritten as:
 
 ((((((((s ~= a) ~= b) ~= c) ~= d) ~= e) ~= f) ~= g) ~= h);
 
 (Sorry about the brackets - couldn't remember the associativity order of
 ~=).

Jun 16 2004
prev sibling parent reply J Anderson <REMOVEanderson badmama.com.au> writes:
Regan Heath wrote:

 The thread "string performance issues" by "Daniel Horn" got me 
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of 
 the array. If you could pre-allocate the array then it wouldn't be as 
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length' 
 property to 'reserve' or something like that, and the idea for the 
 addition of a reserve property that simply allocated memory to the 
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise 
 we're left writing a String class that achieves this by setting length 
 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away. -- -Anderson: http://badmama.com.au/~anderson/
Jun 14 2004
parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

 Regan Heath wrote:
 
 The thread "string performance issues" by "Daniel Horn" got me 
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of 
 the array. If you could pre-allocate the array then it wouldn't be as 
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length' 
 property to 'reserve' or something like that, and the idea for the 
 addition of a reserve property that simply allocated memory to the 
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise 
 we're left writing a String class that achieves this by setting length 
 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.

How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred"; -- Derek Melbourne, Australia 15/Jun/04 1:04:56 PM
Jun 14 2004
next sibling parent reply J Anderson <REMOVEanderson badmama.com.au> writes:
Derek Parnell wrote:

On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

  

Regan Heath wrote:
    

Bascially, concatenation can be slow, as it causes reallocations of 
the array. If you could pre-allocate the array then it wouldn't be as 
slow.

I tried this:

char[] p = "regan"

p.length = 10;
p ~= "fred";

and ended up with a string containing

'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

      

char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.

How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";

-- -Anderson: http://badmama.com.au/~anderson/
Jun 14 2004
parent Regan Heath <regan netwin.co.nz> writes:
On Tue, 15 Jun 2004 11:15:16 +0800, J Anderson 
<REMOVEanderson badmama.com.au> wrote:
 Derek Parnell wrote:

 On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:


 Regan Heath wrote:

 Bascially, concatenation can be slow, as it causes reallocations of 
 the array. If you could pre-allocate the array then it wouldn't be as 
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.

How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";


Yes, but not quite as neat as it could be, see the real problem in the thread "string performance issues" by "Daniel Horn". Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
prev sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward> wrote:
 On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

 Regan Heath wrote:

 The thread "string performance issues" by "Daniel Horn" got me
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of
 the array. If you could pre-allocate the array then it wouldn't be as
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length'
 property to 'reserve' or something like that, and the idea for the
 addition of a reserve property that simply allocated memory to the
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise
 we're left writing a String class that achieves this by setting length
 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.

How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";

This works, but see the other thread "string performance issues" for an example of the real problem. Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
parent reply J Anderson <REMOVEanderson badmama.com.au> writes:
Regan Heath wrote:

 On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward> 
 wrote:

 On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

 Regan Heath wrote:

 The thread "string performance issues" by "Daniel Horn" got me
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of
 the array. If you could pre-allocate the array then it wouldn't be as
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length'
 property to 'reserve' or something like that, and the idea for the
 addition of a reserve property that simply allocated memory to the
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise
 we're left writing a String class that achieves this by setting length
 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.

How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";

This works, but see the other thread "string performance issues" for an example of the real problem. Regan

what you'd do in C. Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). I don't think strings should be treated as a specific type of array. Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course. Adding a reserve property could increase the string overhead, unless all it did was: template reserveT(T) { void reserve(inout char [] array, uint length) { int oldlen = array.length; array.length = length; array.length = oldlen; } } alias reserveT!(char).reserve reserve; Hay, what do you know - I just solved your problem *grin*. Now you can write: array.reserve(10); -- -Anderson: http://badmama.com.au/~anderson/
Jun 14 2004
next sibling parent reply "Ivan Senji" <ivan.senji public.srce.hr> writes:
"J Anderson" <REMOVEanderson badmama.com.au> wrote in message
news:calte9$fm0$1 digitaldaemon.com...
 Regan Heath wrote:

 On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward>
 wrote:

 On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

 Regan Heath wrote:

 The thread "string performance issues" by "Daniel Horn" got me
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of
 the array. If you could pre-allocate the array then it wouldn't be as
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length'
 property to 'reserve' or something like that, and the idea for the
 addition of a reserve property that simply allocated memory to the
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise
 we're left writing a String class that achieves this by setting





 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.

How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";

This works, but see the other thread "string performance issues" for an example of the real problem. Regan

what you'd do in C. Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). I don't think strings should be treated as a specific type of array. Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course. Adding a reserve property could increase the string overhead, unless all it did was: template reserveT(T) { void reserve(inout char [] array, uint length)

Did you mean to write: void reserve(inout T [] array, uint length) :)
    {
        int oldlen = array.length;
        array.length = length;
        array.length = oldlen;
    }
 }

 alias reserveT!(char).reserve reserve;

 Hay, what do you know - I just solved your problem *grin*.

 Now you can write:

 array.reserve(10);

Nice :)
 --
 -Anderson: http://badmama.com.au/~anderson/

Jun 14 2004
parent J Anderson <REMOVEanderson badmama.com.au> writes:
Ivan Senji wrote:

"J Anderson" <REMOVEanderson badmama.com.au> wrote in message
news:calte9$fm0$1 digitaldaemon.com...
  

Regan Heath wrote:

    

On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward>
wrote:

      

On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

        

Regan Heath wrote:

          

The thread "string performance issues" by "Daniel Horn" got me
thinking of an idea for a change to arrays.

Bascially, concatenation can be slow, as it causes reallocations of
the array. If you could pre-allocate the array then it wouldn't be as
slow.

I tried this:

char[] p = "regan"

p.length = 10;
p ~= "fred";

and ended up with a string containing

'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

which was not what I was after :)

I remembered a thread on arrays requesting renaming the 'length'
property to 'reserve' or something like that, and the idea for the
addition of a reserve property that simply allocated memory to the
array without changing it's length came to me. If we could go:

char[] p = "regan";

p.reserve = 10;
p ~= "fred";

and end up with a string containing

'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

and a length of 9. Then we could do fast concatenation. Otherwise
we're left writing a String class that achieves this by setting
            





and using memcpy. I thought a design goal of D was to avoid this.

Thoughts?

            

char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.

char[] p = "regan" p.length = 10; p[5..9] = "fred";

an example of the real problem. Regan

what you'd do in C. Then after woods you simply trim the array to the size you really want (or use length = block; length = 0; beforehand). I don't think strings should be treated as a specific type of array. Whatever applies to char array should also apply to every other type of array, except for the automatic conversion to zero terminate arrays of course. Adding a reserve property could increase the string overhead, unless all it did was: template reserveT(T) { void reserve(inout char [] array, uint length)

Did you mean to write: void reserve(inout T [] array, uint length) :)

Good catch.
  

   {
       int oldlen = array.length;
       array.length = length;
       array.length = oldlen;
   }
}

alias reserveT!(char).reserve reserve;

Hay, what do you know - I just solved your problem *grin*.

Now you can write:

array.reserve(10);

    

Nice :)
--
-Anderson: http://badmama.com.au/~anderson/
    


-- -Anderson: http://badmama.com.au/~anderson/
Jun 14 2004
prev sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Tue, 15 Jun 2004 12:20:31 +0800, J Anderson 
<REMOVEanderson badmama.com.au> wrote:

 Regan Heath wrote:

 On Tue, 15 Jun 2004 13:11:11 +1000, Derek Parnell <derek psych.ward> 
 wrote:

 On Tue, 15 Jun 2004 10:35:20 +0800, J Anderson wrote:

 Regan Heath wrote:

 The thread "string performance issues" by "Daniel Horn" got me
 thinking of an idea for a change to arrays.

 Bascially, concatenation can be slow, as it causes reallocations of
 the array. If you could pre-allocate the array then it wouldn't be as
 slow.

 I tried this:

 char[] p = "regan"

 p.length = 10;
 p ~= "fred";

 and ended up with a string containing

 'r' 'e' 'g' 'a' 'n' '0' '0' '0' '0' '0' 'f' 'r' 'e' 'd'

 which was not what I was after :)

 I remembered a thread on arrays requesting renaming the 'length'
 property to 'reserve' or something like that, and the idea for the
 addition of a reserve property that simply allocated memory to the
 array without changing it's length came to me. If we could go:

 char[] p = "regan";

 p.reserve = 10;
 p ~= "fred";

 and end up with a string containing

 'r' 'e' 'g' 'a' 'n' 'f' 'r' 'e' 'd' '0'

 and a length of 9. Then we could do fast concatenation. Otherwise
 we're left writing a String class that achieves this by setting 
 length
 and using memcpy. I thought a design goal of D was to avoid this.

 Thoughts?

char[] p = "regan" p.length = 10; p.length = 5; p ~= "fred"; as D doesn't clean up the memory straight away.

How about ... char[] p = "regan" p.length = 10; p[5..9] = "fred";

This works, but see the other thread "string performance issues" for an example of the real problem. Regan

what you'd do in C.

Yes, the poster was simply stating that it should be easier than that in D, after all that is the point of having a concatenation operator.
 Then after woods you simply trim the array to the size you really want 
 (or use length = block; length = 0; beforehand).

 I don't think strings should be treated as a specific type of array.

Neither do I, nor was I suggesting that. I want this property on all arrays, the example I gave used a char[] as that is what the original poster wanted it for.
 Whatever applies to char array should also apply to every other type of 
 array, except for the automatic conversion to zero terminate arrays of 
 course.

Agreed.
 Adding a reserve property could increase the string overhead, unless all 
 it did was:

 template reserveT(T)
 {
    void reserve(inout char [] array, uint length)
    {
        int oldlen = array.length;
        array.length = length;
        array.length = oldlen;
    }
 }

 alias reserveT!(char).reserve reserve;

 Hay, what do you know - I just solved your problem *grin*.

 Now you can write:

 array.reserve(10);

This should work. :) So why not add this to arrays then? Let me re-iterate what I am suggesting... I think all arrays should have a property which you can use to specify the # of elements worth of memory for it to grab right now, this should be different to the length property which specifies the number of elements in the array. Example.. char[] p; p.length = 10; the above adds 10 elements to the array, initializing them to the default value. This is the current behaviour. char[] p; p.reserve = 10; the above grabs enough memory for 10 elements, but does not add/initialize any. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
parent reply J Anderson <REMOVEanderson badmama.com.au> writes:
Regan Heath wrote:

 Yes, the poster was simply stating that it should be easier than that 
 in D, after all that is the point of having a concatenation operator.

 Then after woods you simply trim the array to the size you really 
 want (or use length = block; length = 0; beforehand).

 I don't think strings should be treated as a specific type of array.

Neither do I, nor was I suggesting that. I want this property on all arrays, the example I gave used a char[] as that is what the original poster wanted it for.
 Whatever applies to char array should also apply to every other type 
 of array, except for the automatic conversion to zero terminate 
 arrays of course.

Agreed.
 Adding a reserve property could increase the string overhead, unless 
 all it did was:

 template reserveT(T)
 {
    void reserve(inout char [] array, uint length)
    {
        int oldlen = array.length;
        array.length = length;
        array.length = oldlen;
    }
 }

 alias reserveT!(char).reserve reserve;

 Hay, what do you know - I just solved your problem *grin*.

 Now you can write:

 array.reserve(10);

This should work. :) So why not add this to arrays then? Let me re-iterate what I am suggesting... I think all arrays should have a property which you can use to specify the # of elements worth of memory for it to grab right now, this should be different to the length property which specifies the number of elements in the array.

You can make your own class to do that. The reason against that approach is that it requires more overhead (requiring both memory and CPU). Having the most primitive array type is essential because then we can do things like you suggest. If this overhead was included in the primitive type then we would have to lug it around even when we don't need the overhead. Most of the time you can write your program so that you don't even need reserve tricks for efficiency by determining the amount of memory nessary. An array with a reserve would look like: struct array { uint length; uint reserve; T * pointer } compare that to: struct array { uint length; T * pointer } Not to mention all the extra operations to maintain the reserved size. Its not to hard to write an array class that has a reserve (10 lines or so), but it would be much harder the other way round.
 Example..

 char[] p;
 p.length = 10;

 the above adds 10 elements to the array, initializing them to the 
 default value.
 This is the current behaviour.

 char[] p;
 p.reserve = 10;

 the above grabs enough memory for 10 elements, but does not 
 add/initialize any.

Initialization is pretty cheap for arrays. That is initialization only take a fraction of the time it requires to allocate the memory for the array. So whatever the time saved with initialization, the user will most probably not notice it.
 Regan.

All this talk just makes me like D arrays even more. -- -Anderson: http://badmama.com.au/~anderson/
Jun 15 2004
next sibling parent reply J Anderson <REMOVEanderson badmama.com.au> writes:
J Anderson wrote:

 This should work. :)

 So why not add this to arrays then?


include it). I probably didn't outline that enough in the previous email. That way different vendors can optimise the reserve to what works best for their program and individuals can overload it with there own optimised versions. -- -Anderson: http://badmama.com.au/~anderson/
Jun 15 2004
next sibling parent "Ivan Senji" <ivan.senji public.srce.hr> writes:
"J Anderson" <REMOVEanderson badmama.com.au> wrote in message
news:cam7n9$vbv$2 digitaldaemon.com...
 J Anderson wrote:

 This should work. :)

 So why not add this to arrays then?



I agree. D lets us add "properties" to arrays so why not show how this feature can be useful!
 include it).  I probably didn't outline that enough in the previous
 email.  That way different vendors can optimise the reserve to what
 works best for their program and individuals can overload it with there
 own optimised versions.

 --
 -Anderson: http://badmama.com.au/~anderson/

Jun 15 2004
prev sibling parent Regan Heath <regan netwin.co.nz> writes:
On Tue, 15 Jun 2004 15:15:59 +0800, J Anderson 
<REMOVEanderson badmama.com.au> wrote:

 J Anderson wrote:

 This should work. :)

 So why not add this to arrays then?


include it). I probably didn't outline that enough in the previous email. That way different vendors can optimise the reserve to what works best for their program and individuals can overload it with there own optimised versions.

FYI Walter mentioned that std.outbuffer has a struct/class for doing what I want also. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 15 2004
prev sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Tue, 15 Jun 2004 15:12:59 +0800, J Anderson 
<REMOVEanderson badmama.com.au> wrote:
 Regan Heath wrote:

 Yes, the poster was simply stating that it should be easier than that 
 in D, after all that is the point of having a concatenation operator.

 Then after woods you simply trim the array to the size you really want 
 (or use length = block; length = 0; beforehand).

 I don't think strings should be treated as a specific type of array.

Neither do I, nor was I suggesting that. I want this property on all arrays, the example I gave used a char[] as that is what the original poster wanted it for.
 Whatever applies to char array should also apply to every other type 
 of array, except for the automatic conversion to zero terminate arrays 
 of course.

Agreed.
 Adding a reserve property could increase the string overhead, unless 
 all it did was:

 template reserveT(T)
 {
    void reserve(inout char [] array, uint length)
    {
        int oldlen = array.length;
        array.length = length;
        array.length = oldlen;
    }
 }

 alias reserveT!(char).reserve reserve;

 Hay, what do you know - I just solved your problem *grin*.

 Now you can write:

 array.reserve(10);

This should work. :) So why not add this to arrays then? Let me re-iterate what I am suggesting... I think all arrays should have a property which you can use to specify the # of elements worth of memory for it to grab right now, this should be different to the length property which specifies the number of elements in the array.

You can make your own class to do that.

I know I can, I have.
 The reason against that approach is that it requires more overhead 
 (requiring both memory and CPU).

I dont think so. For example current array behaviour if you go. a.length = 1000; a.length = 1; is to allocate space for 1000 entries and not free it. Meaning the array already stores (or calculates) it's size in memory in a seperate place to the length. Meaning all the reserve property needs to do, is the same thing the above 2 lines already does. With one possible exception, I am not sure when/if the array frees the un-used memory, so it's possible that after those lines sometime in the middle of 1000 or so concatentation ops it decides to free the excess and I'd be back at square one (almost).
 Having the most primitive array type is essential because then we can do 
 things like you suggest.  If this overhead was included in the primitive 
 type then we would have to lug it around even when we don't need the 
 overhead.  Most of the time you can write your program so that you don't 
 even need reserve tricks for efficiency by determining the amount of 
 memory nessary.

I agree totally, I don't believe my suggestion adds any overhead whatsoever.
 An array with a reserve would look like:

 struct array
 {
     uint length;
     uint reserve;
     T * pointer }

 compare that to:

 struct array
 {
     uint length;
     T * pointer }

 Not to mention all the extra operations to maintain the reserved size.

I think you'll find they're already there in some form as arrays already display the behaviour I want, just not in an easy to use/understand/find form. It looks like a hack currently.
 Its not to hard to write an array class that has a reserve (10 lines or 
 so), but it would be much harder the other way round.
 Example..

 char[] p;
 p.length = 10;

 the above adds 10 elements to the array, initializing them to the 
 default value.
 This is the current behaviour.

 char[] p;
 p.reserve = 10;

 the above grabs enough memory for 10 elements, but does not 
 add/initialize any.

Initialization is pretty cheap for arrays. That is initialization only take a fraction of the time it requires to allocate the memory for the array. So whatever the time saved with initialization, the user will most probably not notice it.

Everything counts if you do it enough times. Why do something you do not need?
 Regan.

All this talk just makes me like D arrays even more.

I like em too. Writing the class/struct to give me this behaviour was trivial, easier than in C/C++. But I still believe this change is a good one, it has pro's and no cons. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 15 2004
parent J Anderson <REMOVEanderson badmama.com.au> writes:
Regan Heath wrote:

 On Tue, 15 Jun 2004 15:12:59 +0800, J Anderson 
 <REMOVEanderson badmama.com.au> wrote:

 Regan Heath wrote:

 Yes, the poster was simply stating that it should be easier than 
 that in D, after all that is the point of having a concatenation 
 operator.

 Then after woods you simply trim the array to the size you really 
 want (or use length = block; length = 0; beforehand).

 I don't think strings should be treated as a specific type of array.

Neither do I, nor was I suggesting that. I want this property on all arrays, the example I gave used a char[] as that is what the original poster wanted it for.
 Whatever applies to char array should also apply to every other 
 type of array, except for the automatic conversion to zero 
 terminate arrays of course.

Agreed.
 Adding a reserve property could increase the string overhead, 
 unless all it did was:

 template reserveT(T)
 {
    void reserve(inout char [] array, uint length)
    {
        int oldlen = array.length;
        array.length = length;
        array.length = oldlen;
    }
 }

 alias reserveT!(char).reserve reserve;

 Hay, what do you know - I just solved your problem *grin*.

 Now you can write:

 array.reserve(10);

This should work. :) So why not add this to arrays then? Let me re-iterate what I am suggesting... I think all arrays should have a property which you can use to specify the # of elements worth of memory for it to grab right now, this should be different to the length property which specifies the number of elements in the array.

You can make your own class to do that.

I know I can, I have.
 The reason against that approach is that it requires more overhead 
 (requiring both memory and CPU).

I dont think so. For example current array behaviour if you go. a.length = 1000; a.length = 1; is to allocate space for 1000 entries and not free it. Meaning the array already stores (or calculates) it's size in memory in a seperate place to the length. Meaning all the reserve property needs to do, is the same thing the above 2 lines already does. With one possible exception, I am not sure when/if the array frees the un-used memory, so it's possible that after those lines sometime in the middle of 1000 or so concatentation ops it decides to free the excess and I'd be back at square one (almost).

I already mentioned that way of doing it without overhead. After some time though, that memory will be re-claimed, which I don't mind.
 Its not to hard to write an array class that has a reserve (10 lines 
 or so), but it would be much harder the other way round.

 Example..

 char[] p;
 p.length = 10;

 the above adds 10 elements to the array, initializing them to the 
 default value.
 This is the current behaviour.

 char[] p;
 p.reserve = 10;

 the above grabs enough memory for 10 elements, but does not 
 add/initialize any.

Initialization is pretty cheap for arrays. That is initialization only take a fraction of the time it requires to allocate the memory for the array. So whatever the time saved with initialization, the user will most probably not notice it.

Everything counts if you do it enough times. Why do something you do not need?

Because the more rules you have the more complex things become. -- -Anderson: http://badmama.com.au/~anderson/
Jun 16 2004