D - .length modification question

John Boucher (12/12) Sep 19 2003 If

J Anderson (18/31) Sep 19 2003 It's not nessarily a poor design decision. There's been quite a bit of

John Boucher (2/91) Sep 19 2003

Charles Sanders (6/111) Sep 19 2003 You guess you'll stick with C#, what is that supposed to mean ?
Andrew Edwards (4/5) Sep 19 2003 You still here? Don't let the door hit you in the ass on the way out!

Helmut Leitner (54/96) Sep 19 2003 Is it really "to discourage bad programming"?

J Anderson (12/114) Sep 20 2003 The reserve thing was debated in much detail as well. I think there
Walter (5/5) Sep 22 2003 Actually, the implementation of D arrays does have a bit of a 'reserve',

J C Calvarese (7/12) Sep 22 2003 But how do you keep up with the currently-filled length using this

Walter (7/18) Sep 23 2003 is

Hauke Duden (15/18) Sep 23 2003 Hmmm. Does that mean that the amount of memory used up by arrays can

Antti =?iso-8859-1?Q?Syk=E4ri?= (9/20) Sep 23 2003 reserve(int i)
Sean L. Palmer (10/27) Sep 24 2003 to

Hauke Duden (4/7) Sep 24 2003 But what happens if the GC does this in the first iteration of the loop?...

Sean L. Palmer (8/15) Sep 24 2003 What, you want cake and want to eat it too? ;)

Hauke Duden (12/15) Sep 24 2003

J Anderson (5/51) Sep 25 2003 What about other platforms. It may not always be the case that a reserve...

J C Calvarese (3/7) Sep 23 2003 OK. That sounds like just the kind of feature I was requesting. Thanks...

Helmut Leitner (72/79) Sep 23 2003 This will help little. Lets assume, someone decides to read a file

Walter (23/59) Sep 23 2003 No. For each line, maybe 2 to 4 times.

Helmut Leitner (23/37) Sep 23 2003 It may not be that simple, because I would hope that

Helmut Leitner (7/19) Sep 23 2003 This should read:

Julio C�sar Carrascal Urquijo (9/14) Sep 23 2003 A question: Does the GC respects any number of buckets allocated like th...

Walter (4/13) Dec 10 2003 this?

Vathix (7/12) Sep 23 2003 fill

Walter (4/16) Sep 23 2003 is

Riccardo De Agostini (6/14) Sep 23 2003 You have my vote! BTW, what if length grows above reserve? Should there ...

John Boucher <John_member pathlink.com> writes:

If 
args.length = args.length - 1 ;
is OK, why do
args.length -= 1 ;
and
args.length-- ;
produce the compilation error
'args.length' is not an lvalue
?

I hope it's a bug rather than a (poor) design decision.

John Boucher
The King had Humpty pushed.

Sep 19 2003

J Anderson <anderson badmama.com.au.REMOVE> writes:

John Boucher wrote:

If 
args.length = args.length - 1 ;
is OK, why do
args.length -= 1 ;
and
args.length-- ;
produce the compilation error
'args.length' is not an lvalue
?

I hope it's a bug rather than a (poor) design decision.

John Boucher
The King had Humpty pushed.
  

It's not nessarily a poor design decision.  There's been quite a bit of 
debate about this.  It's really to keep programmers from doing:

for (int n=0; n<x; x++)
{
    args.length++; //or args.length--;
    ...
}

Which is less much less efficient then.

args.length = args.length + x;
for (int n=0; n<x; x++)
{
    ...
}

As I see it, most of the time, you should be increasing/decreasing an 
array size in large blocks. Code should very rarely need to increase an 
array size by one.  The longer syntax is to discourage bad programming.

-Anderson

Sep 19 2003

John Boucher <John_member pathlink.com> writes:



In article <bkfq6p$5mt$1 digitaldaemon.com>, J Anderson says...
This is a multi-part message in MIME format.
--------------060601020803030507000701
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

John Boucher wrote:

If 
args.length = args.length - 1 ;
is OK, why do
args.length -= 1 ;
and
args.length-- ;
produce the compilation error
'args.length' is not an lvalue
?

I hope it's a bug rather than a (poor) design decision.

John Boucher
The King had Humpty pushed.
  

It's not nessarily a poor design decision.  There's been quite a bit of 
debate about this.  It's really to keep programmers from doing:

for (int n=0; n<x; x++)
{
    args.length++; //or args.length--;
    ...
}

Which is less much less efficient then.

args.length = args.length + x;
for (int n=0; n<x; x++)
{
    ...
}

As I see it, most of the time, you should be increasing/decreasing an 
array size in large blocks. Code should very rarely need to increase an 
array size by one.  The longer syntax is to discourage bad programming.

-Anderson

--------------060601020803030507000701
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
  <title></title>
</head>

John Boucher wrote:<br>
<blockquote type="cite" cite="midbkfms1$2uag$1 digitaldaemon.com">
  <pre wrap="">If 
args.length = args.length - 1 ;
is OK, why do
args.length -= 1 ;
and
args.length-- ;
produce the compilation error
'args.length' is not an lvalue
?

I hope it's a bug rather than a (poor) design decision.

John Boucher
The King had Humpty pushed.
  </pre>
</blockquote>
It's not nessarily a poor design decision.&nbsp; There's been quite a bit of
debate about this.&nbsp; It's really to keep programmers from doing:<br>
<br>
for (int n=0; n&lt;x; x++)<br>
{<br>
&nbsp;&nbsp;&nbsp; args.length++; //or args.length--;<br>
&nbsp;&nbsp;&nbsp; ...<br>
}<br>
<br>
Which is less much less efficient then.<br>
<br>
args.length = args.length + x;<br>
for (int n=0; n&lt;x; x++)<br>
{<br>
&nbsp;&nbsp;&nbsp; ...<br>
}<br>
<br>
As I see it, most of the time, you should be increasing/decreasing an
array size in large blocks. Code should very rarely need to increase an
array size by one.&nbsp; The longer syntax is to discourage bad programming.<br>
<br>
-Anderson<br>
</body>
</html>

--------------060601020803030507000701--

Sep 19 2003

"Charles Sanders" <sanders-consulting comcast.net> writes:



C

"John Boucher" <John_member pathlink.com> wrote in message
news:bkfr7m$8qm$1 digitaldaemon.com...


 In article <bkfq6p$5mt$1 digitaldaemon.com>, J Anderson says...
This is a multi-part message in MIME format.
--------------060601020803030507000701
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

John Boucher wrote:

If
args.length = args.length - 1 ;
is OK, why do
args.length -= 1 ;
and
args.length-- ;
produce the compilation error
'args.length' is not an lvalue
?

I hope it's a bug rather than a (poor) design decision.

John Boucher
The King had Humpty pushed.

It's not nessarily a poor design decision.  There's been quite a bit of
debate about this.  It's really to keep programmers from doing:

for (int n=0; n<x; x++)
{
    args.length++; //or args.length--;
    ...
}

Which is less much less efficient then.

args.length = args.length + x;
for (int n=0; n<x; x++)
{
    ...
}

As I see it, most of the time, you should be increasing/decreasing an
array size in large blocks. Code should very rarely need to increase an
array size by one.  The longer syntax is to discourage bad programming.

-Anderson

--------------060601020803030507000701
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
  <title></title>
</head>

John Boucher wrote:<br>
<blockquote type="cite" cite="midbkfms1$2uag$1 digitaldaemon.com">
  <pre wrap="">If
args.length = args.length - 1 ;
is OK, why do
args.length -= 1 ;
and
args.length-- ;
produce the compilation error
'args.length' is not an lvalue
?

I hope it's a bug rather than a (poor) design decision.

John Boucher
The King had Humpty pushed.
  </pre>
</blockquote>
It's not nessarily a poor design decision.&nbsp; There's been quite a bit


of
debate about this.&nbsp; It's really to keep programmers from doing:<br>
<br>
for (int n=0; n&lt;x; x++)<br>
{<br>
&nbsp;&nbsp;&nbsp; args.length++; //or args.length--;<br>
&nbsp;&nbsp;&nbsp; ...<br>
}<br>
<br>
Which is less much less efficient then.<br>
<br>
args.length = args.length + x;<br>
for (int n=0; n&lt;x; x++)<br>
{<br>
&nbsp;&nbsp;&nbsp; ...<br>
}<br>
<br>
As I see it, most of the time, you should be increasing/decreasing an
array size in large blocks. Code should very rarely need to increase an
array size by one.&nbsp; The longer syntax is to discourage bad


programming.<br>
<br>
-Anderson<br>
</body>
</html>

--------------060601020803030507000701--

Sep 19 2003

"Andrew Edwards" <edwardsac spamfreeusa.com> writes:

"John Boucher" <John_member pathlink.com> wrote in message
news:bkfr7m$8qm$1 digitaldaemon.com...


You still here? Don't let the door hit you in the ass on the way out!

Andrew

Sep 19 2003

Helmut Leitner <helmut.leitner chello.at> writes:

 J Anderson wrote:
 
 John Boucher wrote:
 
 If
 args.length = args.length - 1 ;
 is OK, why do
 args.length -= 1 ;
 and
 args.length-- ;
 produce the compilation error
 'args.length' is not an lvalue
 ?

 I hope it's a bug rather than a (poor) design decision.

 John Boucher
 The King had Humpty pushed.

 It's not nessarily a poor design decision.  There's been quite a bit of debate
about this.  It's
 really to keep programmers from doing:
 
 for (int n=0; n<x; x++)
 {
     args.length++; //or args.length--;
     ...
 }
 
 Which is less much less efficient then.
 
 args.length = args.length + x;
 for (int n=0; n<x; x++)
 {
     ...
 }
 
 As I see it, most of the time, you should be increasing/decreasing an array
size in large blocks.
 Code should very rarely need to increase an array size by one.  The longer
syntax is to discourage
 bad programming.
 
 -Anderson

Is it really "to discourage bad programming"?

I would have assumed, that this is an arbitrary implementation detail.

Any programming language that is sufficiently complex, will have all
doors open for "bad programing" and there is no way to stop this.
Grandmothering was never C's style, at least. Is it in D?

==

One idea that hounts me, is the Java String/Stringbuffer problem, that
is somehow reflected in the D OutBuffer class. 

It basically means, that there are situations were you want to reserve 
space for an array which will then grow and shrink below this limits 
without reallocations.

In Java - and currently in D - you need to create special classes for
this purpose.

If any array had a .reserve field (Walter's C++ Array class even has),
lots of ways would open up for efficient programming.
And we could drop the OutBuffer class completly.

We could also write
   args.reserve=100;
   args.length++;
without being inefficient.

The cost of 4 byte per array seems steeper than it is. It would only hurt
with small strings. Even with them (allocated with N*16 byte) the 
effect would be small.

On the positive side, you could just write:

   alias char [] string; 
   string buffer;
   buffer.reserve=100000;
   foreach(file; sourcefiles) {
      FileGetStr(file,buffer);
      ...
   }

without any reallocation inefficiencies in typical conditions.

This would also go a long way towards solving the problem
of efficient formatted output, which is still hindered be the
fact that string reallocation is unavoidable. 

With a .reserve you could just write
   string s;
   s.length=100;
   StrFormat(s,"Name=",name);
   StrCatFormat(s,"Age=%d",age);
so that you can handle this like a normal string, while
no reallocation or object creation has to happen inside.

====

Therefore, please consider the suggestion to add a property
   .reserve
to the array type in this way:
   -  reserve can be read and set like length
   -  reallocations happen according to max(reserve,length)
advantage:
   -  if length changes below or equal reserve,
      no reallocations need to happen.

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com

Sep 19 2003

J Anderson <anderson badmama.com.au.REMOVE> writes:

Helmut Leitner wrote:

  

J Anderson wrote:

John Boucher wrote:

    

If
args.length = args.length - 1 ;
is OK, why do
args.length -= 1 ;
and
args.length-- ;
produce the compilation error
'args.length' is not an lvalue
?

I hope it's a bug rather than a (poor) design decision.

John Boucher
The King had Humpty pushed.


      

It's not nessarily a poor design decision.  There's been quite a bit of debate
about this.  It's
really to keep programmers from doing:

for (int n=0; n<x; x++)
{
    args.length++; //or args.length--;
    ...
}

Which is less much less efficient then.

args.length = args.length + x;
for (int n=0; n<x; x++)
{
    ...
}

As I see it, most of the time, you should be increasing/decreasing an array
size in large blocks.
Code should very rarely need to increase an array size by one.  The longer
syntax is to discourage
bad programming.

-Anderson
    

Is it really "to discourage bad programming"?

I would have assumed, that this is an arbitrary implementation detail.

Any programming language that is sufficiently complex, will have all
doors open for "bad programing" and there is no way to stop this.
Grandmothering was never C's style, at least. Is it in D?

==

One idea that hounts me, is the Java String/Stringbuffer problem, that
is somehow reflected in the D OutBuffer class. 

It basically means, that there are situations were you want to reserve 
space for an array which will then grow and shrink below this limits 
without reallocations.

In Java - and currently in D - you need to create special classes for
this purpose.

If any array had a .reserve field (Walter's C++ Array class even has),
lots of ways would open up for efficient programming.
And we could drop the OutBuffer class completly.

  

C++'s vector class does also.

We could also write
   args.reserve=100;
   args.length++;
without being inefficient.

The cost of 4 byte per array seems steeper than it is. It would only hurt
with small strings. Even with them (allocated with N*16 byte) the 
effect would be small.

On the positive side, you could just write:

   alias char [] string; 
   string buffer;
   buffer.reserve=100000;
   foreach(file; sourcefiles) {
      FileGetStr(file,buffer);
      ...
   }

without any reallocation inefficiencies in typical conditions.

This would also go a long way towards solving the problem
of efficient formatted output, which is still hindered be the
fact that string reallocation is unavoidable. 

With a .reserve you could just write
   string s;
   s.length=100;
   StrFormat(s,"Name=",name);
   StrCatFormat(s,"Age=%d",age);
so that you can handle this like a normal string, while
no reallocation or object creation has to happen inside.

====

Therefore, please consider the suggestion to add a property
   .reserve
to the array type in this way:
   -  reserve can be read and set like length
   -  reallocations happen according to max(reserve,length)
advantage:
   -  if length changes below or equal reserve,
      no reallocations need to happen.

  

The reserve thing was debated in much detail as well. I think there 
where nine or ten different "reserve" techniques offered.  Actually a 
couple of "reserve" techniques required no extra memory at all.  
However, they all had overhead (not just memory overhead).  There's an 
extra check at each resize.  The resulting consensus was to have another 
type, defined in the standard lib. 

A programmer should be given the option of using a reserve or not. 
That's exactly what leaving it out of the standard D array does because 
they can use the standard lib one, or any other allocator scheme they like.

-Anderson

Sep 20 2003

"Walter" <walter digitalmars.com> writes:

Actually, the implementation of D arrays does have a bit of a 'reserve',
since the garbage collector allocates by using power of 2 buckets. This is
an implementation detail, though. It still makes for much more efficient
code to set the .length to some reasonably expected reserve value, then fill
up the array, then reset the .length to the final size.

Sep 22 2003

J C Calvarese <jcc7 cox.net> writes:

Walter wrote:
 Actually, the implementation of D arrays does have a bit of a 'reserve',
 since the garbage collector allocates by using power of 2 buckets. This is
 an implementation detail, though. It still makes for much more efficient
 code to set the .length to some reasonably expected reserve value, then fill
 up the array, then reset the .length to the final size.

But how do you keep up with the currently-filled length using this 
method?  I'm guessing you have an extra variable whose use may not be 
obvious.  I'd rather use .length for the current length and use .reserve 
to set a reasonable expected length.  I think Helmut's idea makes a lot 
of sense.  Would it be difficult to implement?

Justin

Sep 22 2003

"Walter" <walter digitalmars.com> writes:

"J C Calvarese" <jcc7 cox.net> wrote in message
news:bko3oe$l8q$1 digitaldaemon.com...
 Walter wrote:
 Actually, the implementation of D arrays does have a bit of a 'reserve',
 since the garbage collector allocates by using power of 2 buckets. This


is
 an implementation detail, though. It still makes for much more efficient
 code to set the .length to some reasonably expected reserve value, then


fill
 up the array, then reset the .length to the final size.

 But how do you keep up with the currently-filled length using this
 method?  I'm guessing you have an extra variable whose use may not be
 obvious.  I'd rather use .length for the current length and use .reserve
 to set a reasonable expected length.  I think Helmut's idea makes a lot
 of sense.  Would it be difficult to implement?

After setting .length to the max reasonable size, then set .length back to
zero. You now have a reserve that is something like the max reasonable size
rounded up to the next bucket size.

Sep 23 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

Walter wrote:
 After setting .length to the max reasonable size, then set .length back to
 zero. You now have a reserve that is something like the max reasonable size
 rounded up to the next bucket size.

Hmmm. Does that mean that the amount of memory used up by arrays can 
never decrease?

I think you should at least leave the door open for later optimization 
of the array memory handling so that some of the memory is freed if the 
used length is below a certain threshold.

That would mean, though, that the kind of reserving memory that you 
proposed may not work in the future. Increasing the length and then 
decreasing it again also looks a bit like a dirty hack - the semantics 
depend on the internal compiler architecture. Nothing I would want to 
use and certainly nothing that is easy to understand for an uninitiated 
programmer reading someone else's code.

Is there a specific reason why you don't want to expose the reserve 
field? It looks like a much cleaner solution to me.

Hauke

Sep 23 2003

Antti =?iso-8859-1?Q?Syk=E4ri?= <jsykari gamma.hut.fi> writes:

In article <bkq1lf$r5v$1 digitaldaemon.com>, Hauke Duden wrote:
 Walter wrote:
 After setting .length to the max reasonable size, then set .length back to
 zero. You now have a reserve that is something like the max reasonable size
 rounded up to the next bucket size.

 
 That would mean, though, that the kind of reserving memory that you 
 proposed may not work in the future. Increasing the length and then 
 decreasing it again also looks a bit like a dirty hack - the semantics 
 depend on the internal compiler architecture.

 Is there a specific reason why you don't want to expose the reserve 
 field? It looks like a much cleaner solution to me.

reserve(int i)
{
    int old_length = length;
    length = i;
    length = old_length;
}

Voil�! A self-documenting, reusable solution.

-Antti

Sep 23 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:bkq1lf$r5v$1 digitaldaemon.com...
 Walter wrote:
 After setting .length to the max reasonable size, then set .length back


to
 zero. You now have a reserve that is something like the max reasonable


size
 rounded up to the next bucket size.

 Hmmm. Does that mean that the amount of memory used up by arrays can
 never decrease?

 I think you should at least leave the door open for later optimization
 of the array memory handling so that some of the memory is freed if the
 used length is below a certain threshold.

I suppose the GC might (if fed enough info) know that a block holds an
array, know where to get the size, and free up blocks beyond the power of
two above the current use.

 That would mean, though, that the kind of reserving memory that you
 proposed may not work in the future. Increasing the length and then
 decreasing it again also looks a bit like a dirty hack - the semantics
 depend on the internal compiler architecture. Nothing I would want to
 use and certainly nothing that is easy to understand for an uninitiated
 programmer reading someone else's code.

 Is there a specific reason why you don't want to expose the reserve
 field? It looks like a much cleaner solution to me.

Yes, I dislike depending on implicit semantics.  Walter, are you going to
set this reserve behavior in stone, or is it unspecified?

Sean

Sep 24 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

Sean L. Palmer wrote:
 I suppose the GC might (if fed enough info) know that a block holds an
 array, know where to get the size, and free up blocks beyond the power of
 two above the current use.

But what happens if the GC does this in the first iteration of the loop? 
Then we lose our "reservation" and end up having the inefficient case again.

Hauke

Sep 24 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

What, you want cake and want to eat it too?  ;)

I don't have an answer at present.  D could store the reserved size;  this
would take a bit of space, nothing that would kill anyone.

Sean

"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:bkrml2$4a7$2 digitaldaemon.com...
 Sean L. Palmer wrote:
 I suppose the GC might (if fed enough info) know that a block holds an
 array, know where to get the size, and free up blocks beyond the power


of
 two above the current use.

 But what happens if the GC does this in the first iteration of the loop?
 Then we lose our "reservation" and end up having the inefficient case

again.
 Hauke

Sep 24 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

Sean L. Palmer wrote:
 What, you want cake and want to eat it too?  ;)

<homer>

Mmmmmh. Caaake.

</homer>

 I don't have an answer at present.  D could store the reserved size;  this
 would take a bit of space, nothing that would kill anyone.

As I understand it, the current implementation already does this.

Also, I cannot think of an efficient implementation of dynamic arrays 
that can work with only a length field. You could only allocate exactly 
the length and would have to reallocate with each change of the length. 
Not what I'd call efficient.

So if a reserve field will be needed anyway, why not expose it to the 
programmer?

Hauke

Sep 24 2003

J Anderson <anderson badmama.com.au.REMOVE> writes:

Sean L. Palmer wrote:

"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:bkq1lf$r5v$1 digitaldaemon.com...
  

Walter wrote:
    

After setting .length to the max reasonable size, then set .length back
      


to
  

zero. You now have a reserve that is something like the max reasonable
      


size
  

rounded up to the next bucket size.
      

Hmmm. Does that mean that the amount of memory used up by arrays can
never decrease?

I think you should at least leave the door open for later optimization
of the array memory handling so that some of the memory is freed if the
used length is below a certain threshold.
    

I suppose the GC might (if fed enough info) know that a block holds an
array, know where to get the size, and free up blocks beyond the power of
two above the current use.

  

That would mean, though, that the kind of reserving memory that you
proposed may not work in the future. Increasing the length and then
decreasing it again also looks a bit like a dirty hack - the semantics
depend on the internal compiler architecture. Nothing I would want to
use and certainly nothing that is easy to understand for an uninitiated
programmer reading someone else's code.

Is there a specific reason why you don't want to expose the reserve
field? It looks like a much cleaner solution to me.
    

Yes, I dislike depending on implicit semantics.  Walter, are you going to
set this reserve behavior in stone, or is it unspecified?

Sean


  

What about other platforms. It may not always be the case that a reserve 
field is the most optimal technique.  Although, if there was a reserve 
field (but the implementation technique wasn't set in stone), then it 
could be ignored, I suppose.

Sep 25 2003

J C Calvarese <jcc7 cox.net> writes:

Walter wrote:
 
 After setting .length to the max reasonable size, then set .length back to
 zero. You now have a reserve that is something like the max reasonable size
 rounded up to the next bucket size.

OK.  That sounds like just the kind of feature I was requesting.  Thanks.

Justin

Sep 23 2003

Helmut Leitner <leitner hls.via.at> writes:

Walter wrote:
 
 Actually, the implementation of D arrays does have a bit of a 'reserve',
 since the garbage collector allocates by using power of 2 buckets. 

This will help little. Lets assume, someone decides to read a file
line by line (or read it in one sweep but ends with a line in a string 
instead of a slice). The line will be reallocated hundredes of times.

  char line[];
  line.reserve=1024;

would set aside enough space so that reallocations would happen almost never.

 This is
 an implementation detail, though. It still makes for much more efficient
 code to set the .length to some reasonably expected reserve value, then fill
 up the array, then reset the .length to the final size.

Yes, I know that. But very often this is a repeated process and reallocation
can only be avoided with a .reserve.

How does your dmd compiler avoid reallocation of the "current source buffer"?
If he does, he must use some kind of .reserve . 
If he doesn't there would be room for making it faster.

This reserve thing is not theoretical. After 20 years of C-programming I've
based all my C dynamic array handling on a "generic" Buffer structure, that
looks like (translated to fit):

  typedef struct buffer {
     void *ptr; 
     int length;
     int reserve;         
     (int element_size;  sometimes implicit)
  } BUFFER ;

There are generic functions for reallocations, insertions, deletions, sorting...

The reserve is an essential feature for optimization, for you can often use 
heuristics. For example you build a hash-bucket-table for a dictionary. You
don't
know how many entries you will have (perhaps 2000 or 10000), but you definitely
don't want to start at the default (maybe 50?) to reallocate and rehash 5-8
times to reach final size. A simple .reserve=1000 will give you a much better
start and give you the feeling that you have incorporated your partial knowledge
as good as possible.

How do you (Walter) build symbol tables during DMD compilation? Do you
reallocate 
them when the symbols come? You can't know before how large these tables will
grow.
But I suppose you won't think much about the small sources and to mimize 
the compiler memory need for these. So you will reserve moderate space for 
these arrays, don't you?

====

I see two problems with reserve.

The first is an implementation problem. An array is not an object. It is stored
and passed as a (ptr, size) duple. And this is also how slices are handled (I
think).
So reserve it is not justing adding a field to an object. 

The second is a performance issue. Typically compiler builders have different
performance interests than programmers. Compiler builders want to look good 
in benchmarks. Programmers want to write fast applications easily. A .reserve 
might worsen benchmarks (I don't think so, but we don't know), but would be 
an enormous benefit for the application programmer, because he needn't write 
special classes (which he often won't do) to get optimum performance.

An example of this conflict of interest is the wc official sample program, 
which will perform  but not scale, because
        if (inword)
        {   char[] word = input[wstart .. input.length];
            dictionary[word]++;
        }
will keep the basic input buffer (the whole file) in memory. So you would 
be surprised about this code, if you would extend it to count the symbols 
in a larger set of files. It will eat all your memory. And it would be a 
good "test for experienced D programming capability" to change this to a 
performing and scaling example. 

====

I also want to restate, that "basic IO" is still missing in D. 

This means, you can't write a tutorial that makes sense, because no 
sensible IO is available (you have to go back to printf and its friends) that
you want to show anyone new to the language.

Why is this so? Because "char []" can't be used for IO in a performing way!
So this decision and implementation has been pushed and pushed ....
And you can't work with "char buffer[N]" because 
  - you can never be sure about the buffer size
  - you will need conversions all the time
The obvious solution, the final, powerful and performing String class
is a pure vision. I suppose it will never come. 

.reserve would be a pragmatic solution to this.

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com

Sep 23 2003

"Walter" <walter digitalmars.com> writes:

"Helmut Leitner" <leitner hls.via.at> wrote in message
news:3F700109.7EDDD804 hls.via.at...
 Walter wrote:
 Actually, the implementation of D arrays does have a bit of a 'reserve',
 since the garbage collector allocates by using power of 2 buckets.

 This will help little. Lets assume, someone decides to read a file
 line by line (or read it in one sweep but ends with a line in a string
 instead of a slice). The line will be reallocated hundredes of times.

No. For each line, maybe 2 to 4 times.

   char line[];
   line.reserve=1024;
 would set aside enough space so that reallocations would happen almost

never.

You can do the same thing with:
    char line[];
    line.length = 1024;
    line.length = 0;

Now fill the array.

 How does your dmd compiler avoid reallocation of the "current source

buffer"?

The compiler doesn't. The garbage collector runtime does. It does it by
figuring out by the address which bucket it is in, and how much room is left
in the bucket.

 This reserve thing is not theoretical. After 20 years of C-programming

I've
 based all my C dynamic array handling on a "generic" Buffer structure,

that
 looks like (translated to fit):

You're right. I use something almost exactly the same in my C programming.
But with the garbage collector, I don't need to carry around the extra
'reserve' size, as it is implicit in the address of the buffer.

 An example of this conflict of interest is the wc official sample program,
 which will perform  but not scale, because
         if (inword)
         {   char[] word = input[wstart .. input.length];
             dictionary[word]++;
         }
 will keep the basic input buffer (the whole file) in memory. So you would
 be surprised about this code, if you would extend it to count the symbols
 in a larger set of files. It will eat all your memory. And it would be a
 good "test for experienced D programming capability" to change this to a
 performing and scaling example.

To process hundreds of megabytes of source with wc, just change the line to:
    char[] word = input[wstart .. input.length].dup;

 I also want to restate, that "basic IO" is still missing in D.

 This means, you can't write a tutorial that makes sense, because no
 sensible IO is available (you have to go back to printf and its friends)

that
 you want to show anyone new to the language.

 Why is this so? Because "char []" can't be used for IO in a performing

way!

I think the reason is more my sloth than that!

 So this decision and implementation has been pushed and pushed ....
 And you can't work with "char buffer[N]" because
   - you can never be sure about the buffer size
   - you will need conversions all the time
 The obvious solution, the final, powerful and performing String class
 is a pure vision. I suppose it will never come.

 .reserve would be a pragmatic solution to this.

Sep 23 2003

Helmut Leitner <helmut.leitner chello.at> writes:

Walter wrote:
 An example of this conflict of interest is the wc official sample program,
 which will perform  but not scale, because
         if (inword)
         {   char[] word = input[wstart .. input.length];
             dictionary[word]++;
         }
 will keep the basic input buffer (the whole file) in memory. So you would
 be surprised about this code, if you would extend it to count the symbols
 in a larger set of files. It will eat all your memory. And it would be a
 good "test for experienced D programming capability" to change this to a
 performing and scaling example.

 
 To process hundreds of megabytes of source with wc, just change the line to:
     char[] word = input[wstart .. input.length].dup;

It may not be that simple, because I would hope that

      char[] word = input[wstart .. input.length];
      if(dictionary[word]!=NULL) {
          dictionary[word]++;
      } else {
          dictionary[word.dup]++;
      }

performs better, because only those words are created as objects that
are actually needed.

But it may also be, that

      char[] word = input[wstart .. input.length];
      int *p=dictionary[word];
      if(p) {
          dictionary[word]++;
      } else {
          dictionary[word.dup]++;
      }

performs even better, because it avoids an hash access that we can't 
be sure about whether the compiler optimizes it away or not.

-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com

Sep 23 2003

Helmut Leitner <helmut.leitner chello.at> writes:

Sorry for:

Helmut Leitner wrote:
 But it may also be, that
 
       char[] word = input[wstart .. input.length];
       int *p=dictionary[word];
       if(p) {
           dictionary[word]++;

This should read:      

            *p++;

       } else {
           dictionary[word.dup]++;
       }
 
 performs even better, because it avoids an hash access that we can't
 be sure about whether the compiler optimizes it away or not.


-- 
Helmut Leitner    leitner hls.via.at
Graz, Austria   www.hls-software.com

Sep 23 2003

"Julio C�sar Carrascal Urquijo" <adnoctum phreaker.net> writes:

 You can do the same thing with:
     char line[];
     line.length = 1024;
     line.length = 0;

 Now fill the array.

A question: Does the GC respects any number of buckets allocated like this?

char line1[];
line1.length = 4096;
line1.length = 0;

char line2[];
line2.length = 4096;
line2.length = 0;

I mean, if I reserve more than 1 bucket (of 1024) will still be available
after allocating another array?

Sep 23 2003

"Walter" <walter digitalmars.com> writes:

"Julio C�sar Carrascal Urquijo" <adnoctum phreaker.net> wrote in message
news:bkqd4j$1c4m$1 digitaldaemon.com...
 A question: Does the GC respects any number of buckets allocated like

this?
 char line1[];
 line1.length = 4096;
 line1.length = 0;

 char line2[];
 line2.length = 4096;
 line2.length = 0;

Yes.

 I mean, if I reserve more than 1 bucket (of 1024) will still be available
 after allocating another array?

Dec 10 2003

"Vathix" <vathix dprogramming.com> writes:

"Walter" <walter digitalmars.com> wrote in message
news:bko2u7$is2$1 digitaldaemon.com...
 Actually, the implementation of D arrays does have a bit of a 'reserve',
 since the garbage collector allocates by using power of 2 buckets. This is
 an implementation detail, though. It still makes for much more efficient
 code to set the .length to some reasonably expected reserve value, then

fill
 up the array, then reset the .length to the final size.

What about this:

char[] s = new char[100];
s = s[0 .. 5];

would .length changes create a new buffer or use that 100 up first?

Sep 23 2003

"Walter" <walter digitalmars.com> writes:

"Vathix" <vathix dprogramming.com> wrote in message
news:bkp8j2$2pmi$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:bko2u7$is2$1 digitaldaemon.com...
 Actually, the implementation of D arrays does have a bit of a 'reserve',
 since the garbage collector allocates by using power of 2 buckets. This


is
 an implementation detail, though. It still makes for much more efficient
 code to set the .length to some reasonably expected reserve value, then

 fill
 up the array, then reset the .length to the final size.

 What about this:

 char[] s = new char[100];
 s = s[0 .. 5];

 would .length changes create a new buffer or use that 100 up first?

The latter. It would not reallocate it.

Sep 23 2003

"Riccardo De Agostini" <riccardo.de.agostini email.it> writes:

"Helmut Leitner" <helmut.leitner chello.at> ha scritto nel messaggio
news:3F6BF6C5.AB02DBDD chello.at...

 Therefore, please consider the suggestion to add a property
    .reserve
 to the array type in this way:
    -  reserve can be read and set like length
    -  reallocations happen according to max(reserve,length)
 advantage:
    -  if length changes below or equal reserve,
       no reallocations need to happen.

You have my vote! BTW, what if length grows above reserve? Should there also
be a property to specify subsequent allocation "steps"? Say, reserve 100
elements at first, then grow by 10 at a time?

Ric

Sep 23 2003

D Programming

C/C++ Programming

Other

D - .length modification question