www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - string performance issues

reply Daniel Horn <hellcatv hotmail.com> writes:
I'm writing a program which spits out an .obj file.
I'm doing

char[] out;
for (int i=0;i<mvert;i++)
    out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";

for (int i=0;i<mface;i++)
    out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";

the performance is simply abysmal.  To write a 50 meg file takes upwards 
of 2 minutes.

is there any way to write this to be fast without sacrificing readability;
part of the problem seems to be the realloc per face (instead of 
intelligently doubling the ram allocated)
but I also suspect allocating so many small strings with ftoa and itoa 
isn't helping.
Jun 14 2004
next sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Instead of building one huge string in memory how about
processing line by line:

 char[128] out; // 128 is max str len
 for (int i=0;i<mface;i++) {
    sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
    ... do something with out ...
 }

If the end result is going to a file the temporary buffer
might not even be needed - the printf can go right to the file.

-Ben

Daniel Horn wrote:

 I'm writing a program which spits out an .obj file.
 I'm doing
 
 char[] out;
 for (int i=0;i<mvert;i++)
     out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";
 
 for (int i=0;i<mface;i++)
     out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";
 
 the performance is simply abysmal.  To write a 50 meg file takes upwards
 of 2 minutes.
 
 is there any way to write this to be fast without sacrificing readability;
 part of the problem seems to be the realloc per face (instead of
 intelligently doubling the ram allocated)
 but I also suspect allocating so many small strings with ftoa and itoa
 isn't helping.

Jun 14 2004
parent reply Daniel Horn <hellcatv hotmail.com> writes:
That's the code I was expecting to see and exactly the code I was 
wishing to avoid:
a) what if the type changes (double to real and suddenly string takes 
too much memory and buffer overruns)
b) not type safe (what if I say %d but pass in a float)
c) you still have to realloc every face
d) sprintf isn't part of D--it's a nasty hanging chad from C...
I'd like to see a clean solution in D entirely

Ben Hinkle wrote:
 Instead of building one huge string in memory how about
 processing line by line:
 
  char[128] out; // 128 is max str len
  for (int i=0;i<mface;i++) {
     sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
     ... do something with out ...
  }
 
 If the end result is going to a file the temporary buffer
 might not even be needed - the printf can go right to the file.
 
 -Ben
 
 Daniel Horn wrote:
 
 
I'm writing a program which spits out an .obj file.
I'm doing

char[] out;
for (int i=0;i<mvert;i++)
    out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";

for (int i=0;i<mface;i++)
    out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";

the performance is simply abysmal.  To write a 50 meg file takes upwards
of 2 minutes.

is there any way to write this to be fast without sacrificing readability;
part of the problem seems to be the realloc per face (instead of
intelligently doubling the ram allocated)
but I also suspect allocating so many small strings with ftoa and itoa
isn't helping.


Jun 14 2004
next sibling parent reply Regan Heath <regan netwin.co.nz> writes:
What about...

f = fopen("file.txt","w");
if (!f) ..barf..

for (int i = 0; i < mvert; i++) {
	fprintf(f,"f %s %s %s\n",toString(x[i]),toString(y[i]),toString(z[i]));
}

fclose(f);

On Mon, 14 Jun 2004 15:22:47 -0700, Daniel Horn <hellcatv hotmail.com> 
wrote:
 That's the code I was expecting to see and exactly the code I was 
 wishing to avoid:
 a) what if the type changes (double to real and suddenly string takes 
 too much memory and buffer overruns)
 b) not type safe (what if I say %d but pass in a float)
 c) you still have to realloc every face
 d) sprintf isn't part of D--it's a nasty hanging chad from C...
 I'd like to see a clean solution in D entirely

 Ben Hinkle wrote:
 Instead of building one huge string in memory how about
 processing line by line:

  char[128] out; // 128 is max str len
  for (int i=0;i<mface;i++) {
     sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
     ... do something with out ...
  }

 If the end result is going to a file the temporary buffer
 might not even be needed - the printf can go right to the file.

 -Ben

 Daniel Horn wrote:


 I'm writing a program which spits out an .obj file.
 I'm doing

 char[] out;
 for (int i=0;i<mvert;i++)
    out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";

 for (int i=0;i<mface;i++)
    out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";

 the performance is simply abysmal.  To write a 50 meg file takes 
 upwards
 of 2 minutes.

 is there any way to write this to be fast without sacrificing 
 readability;
 part of the problem seems to be the realloc per face (instead of
 intelligently doubling the ram allocated)
 but I also suspect allocating so many small strings with ftoa and itoa
 isn't helping.



-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
parent reply Daniel Horn <hellcatv hotmail.com> writes:
That uses a string and is not typesafe
and what if I want to send it over the net.

Basically I want to do stringops internally :-) and I want to do it the 
"D" way.

I'm debating whether there should be a struct String that did the 
resizing appropriately so appends work fast... or a class String (I'm 
leaning towards struct since it would be a wrapper around char[] with an 
xtra length field)
that way I could dynamically size it appropriately (each overrun 
multiplying allocated length by some constant >= 1.5)  walter would this 
be a good idea? or is there some magic you can pull so that when you 
assign .length or append it won't call realloc or some other slow function.

i.e. does a string always hold exactly .length (rounded up to some 
constant mallocable size) or does it really double the length when you 
overrun to aggregate decent performance out of things


Regan Heath wrote:
 What about...
 
 f = fopen("file.txt","w");
 if (!f) ..barf..
 
 for (int i = 0; i < mvert; i++) {
     fprintf(f,"f %s %s %s\n",toString(x[i]),toString(y[i]),toString(z[i]));
 }
 
 fclose(f);
 
 On Mon, 14 Jun 2004 15:22:47 -0700, Daniel Horn <hellcatv hotmail.com> 
 wrote:
 
 That's the code I was expecting to see and exactly the code I was 
 wishing to avoid:
 a) what if the type changes (double to real and suddenly string takes 
 too much memory and buffer overruns)
 b) not type safe (what if I say %d but pass in a float)
 c) you still have to realloc every face
 d) sprintf isn't part of D--it's a nasty hanging chad from C...
 I'd like to see a clean solution in D entirely

 Ben Hinkle wrote:

 Instead of building one huge string in memory how about
 processing line by line:

  char[128] out; // 128 is max str len
  for (int i=0;i<mface;i++) {
     sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
     ... do something with out ...
  }

 If the end result is going to a file the temporary buffer
 might not even be needed - the printf can go right to the file.

 -Ben

 Daniel Horn wrote:


 I'm writing a program which spits out an .obj file.
 I'm doing

 char[] out;
 for (int i=0;i<mvert;i++)
    out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";

 for (int i=0;i<mface;i++)
    out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";

 the performance is simply abysmal.  To write a 50 meg file takes 
 upwards
 of 2 minutes.

 is there any way to write this to be fast without sacrificing 
 readability;
 part of the problem seems to be the realloc per face (instead of
 intelligently doubling the ram allocated)
 but I also suspect allocating so many small strings with ftoa and itoa
 isn't helping.




Jun 14 2004
next sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Mon, 14 Jun 2004 16:29:06 -0700, Daniel Horn <hellcatv hotmail.com> 
wrote:
 That uses a string

Yep. So did your example? What am I missing? pls explain..
 and is not typesafe

Why not? change x from float[] to a double[] and it still works, change it to an int[] and it still works.. sorry.. correction.. the fprintf should have had %.*s in it. eg. fprintf(f,"f %.*s %.*s %.*s\n",toString(x[i]),toString(y[i]),toString(z[i]));
 and what if I want to send it over the net.

The f on the start of the line says it's a float, you chop the string on spaces and parse accordingly. Isn't that what the f is there for?
 Basically I want to do stringops internally :-) and I want to do it the 
 "D" way.

Im not sure I understand what you mean.. You can set the length (if you know what you need) and you can assign to a slice i.e. char[] test = "a guy named jones walked down the street" char[] foo = "regan"; test[12..17] = foo[]; So assuming your values always have a set length you can set the length of the string, then assign to the appropriate slices the data.
 I'm debating whether there should be a struct String that did the 
 resizing appropriately so appends work fast... or a class String (I'm 
 leaning towards struct since it would be a wrapper around char[] with an 
 xtra length field)
 that way I could dynamically size it appropriately (each overrun 
 multiplying allocated length by some constant >= 1.5)  walter would this 
 be a good idea? or is there some magic you can pull so that when you 
 assign .length or append it won't call realloc or some other slow 
 function.

If you set the length then append, it actually appends to the end of the new allocated length eg. char[] test = "regan"; test.length = 10; test ~= "fred"; printf("%d:= ",test.length); foreach(char c; test) printf("%02x ",c); outputs 14:= 72 65 67 61 6e 00 00 00 00 00 66 72 65 64
 i.e. does a string always hold exactly .length (rounded up to some 
 constant mallocable size) or does it really double the length when you 
 overrun to aggregate decent performance out of things

 Regan Heath wrote:
 What about...

 f = fopen("file.txt","w");
 if (!f) ..barf..

 for (int i = 0; i < mvert; i++) {
     fprintf(f,"f %s %s 
 %s\n",toString(x[i]),toString(y[i]),toString(z[i]));
 }

 fclose(f);

 On Mon, 14 Jun 2004 15:22:47 -0700, Daniel Horn <hellcatv hotmail.com> 
 wrote:

 That's the code I was expecting to see and exactly the code I was 
 wishing to avoid:
 a) what if the type changes (double to real and suddenly string takes 
 too much memory and buffer overruns)
 b) not type safe (what if I say %d but pass in a float)
 c) you still have to realloc every face
 d) sprintf isn't part of D--it's a nasty hanging chad from C...
 I'd like to see a clean solution in D entirely

 Ben Hinkle wrote:

 Instead of building one huge string in memory how about
 processing line by line:

  char[128] out; // 128 is max str len
  for (int i=0;i<mface;i++) {
     sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
     ... do something with out ...
  }

 If the end result is going to a file the temporary buffer
 might not even be needed - the printf can go right to the file.

 -Ben

 Daniel Horn wrote:


 I'm writing a program which spits out an .obj file.
 I'm doing

 char[] out;
 for (int i=0;i<mvert;i++)
    out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";

 for (int i=0;i<mface;i++)
    out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";

 the performance is simply abysmal.  To write a 50 meg file takes 
 upwards
 of 2 minutes.

 is there any way to write this to be fast without sacrificing 
 readability;
 part of the problem seems to be the realloc per face (instead of
 intelligently doubling the ram allocated)
 but I also suspect allocating so many small strings with ftoa and 
 itoa
 isn't helping.





-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
parent reply Daniel Horn <hellcatv hotmail.com> writes:
I don't know the length of my numbers...
so slice assignment is painful... basically I have to have some sort of 
ftoa and itoa function (that need to avoid assigning memory, returning 
statically sized structs and lengths) then assign it to the slice, after 
keeping track of the last length and finding the next length.

then I need a separate counter to see how much I've allocated...it's 
just a mess...and this is exactly what D was supposed to avoid.  This is 
the old fashioned "C buffer safe" way...and I'm not happy with it or I'd 
still be using C.

And the bottom line is that I don't want to print to a file, I want to 
keep it in a string.  the code for concat is so clear--but why is it so 
slow?

PS: make sure to use toStringz when using printf with %s.  toString is 
not guaranteed to zero terminate.
Regan Heath wrote:
 On Mon, 14 Jun 2004 16:29:06 -0700, Daniel Horn <hellcatv hotmail.com> 
 wrote:
 
 That uses a string

Yep. So did your example? What am I missing? pls explain..
 and is not typesafe

Why not? change x from float[] to a double[] and it still works, change it to an int[] and it still works.. sorry.. correction.. the fprintf should have had %.*s in it. eg. fprintf(f,"f %.*s %.*s %.*s\n",toString(x[i]),toString(y[i]),toString(z[i]));
 and what if I want to send it over the net.

The f on the start of the line says it's a float, you chop the string on spaces and parse accordingly. Isn't that what the f is there for?
 Basically I want to do stringops internally :-) and I want to do it 
 the "D" way.

Im not sure I understand what you mean.. You can set the length (if you know what you need) and you can assign to a slice i.e.

Jun 14 2004
next sibling parent Regan Heath <regan netwin.co.nz> writes:
On Mon, 14 Jun 2004 17:02:25 -0700, Daniel Horn <hellcatv hotmail.com> 
wrote:
 I don't know the length of my numbers...
 so slice assignment is painful... basically I have to have some sort of 
 ftoa and itoa function (that need to avoid assigning memory, returning 
 statically sized structs and lengths) then assign it to the slice, after 
 keeping track of the last length and finding the next length.

 then I need a separate counter to see how much I've allocated...it's 
 just a mess...and this is exactly what D was supposed to avoid.  This is 
 the old fashioned "C buffer safe" way...and I'm not happy with it or I'd 
 still be using C.

I think the bottom line is, efficient is not always easy: -to type -to understand -to implement etc.. or even possible in all situations. If it was, everything would be efficient. Perhaps the solution in your case is to write a string class. One that assigns a length to a char[] then uses memcpy to slice data into it. I thought one of the design goals for D was to avoid this being necessary. I do remember a discussion on arrays in generate and how it would be nice to be able to set a property called 'reserve' which allocated the space indicated, this property could be independant of length and not effect the append operation such that... char[] line; line.reserve = 1000; // allocates space for 1000 chars line ~= "boo"; // appends to string at length (which == 0) // length is now 3, but string has 1000 chars allocated to it. Not sure if this is a big change or not.
 And the bottom line is that I don't want to print to a file, I want to 
 keep it in a string.

That was my mistake, I thought you were printing to a file.
 the code for concat is so clear--but why is it so slow?

I'm not sure, given Vathix's statement: "Actual allocations are the smallest power of 2 that holds the requested size. I don't think they reallocate when shrinking because you could have sliced that memory to use somewhere else." If you could set a reasonable initial length AND append to the end of the data in the string rather than the end of the string length (as it currently does), then it would not need to reallocate so would be much faster. The comment above suggests a string has a stored length, and also a stored size in allocate memory, in which case my proposed change above is very easy.
 PS: make sure to use toStringz when using printf with %s.  toString is 
 not guaranteed to zero terminate.

Or %.*s (tho this relies on the implementation of char[]) Regan
 Regan Heath wrote:
 On Mon, 14 Jun 2004 16:29:06 -0700, Daniel Horn <hellcatv hotmail.com> 
 wrote:

 That uses a string

Yep. So did your example? What am I missing? pls explain..
 and is not typesafe

Why not? change x from float[] to a double[] and it still works, change it to an int[] and it still works.. sorry.. correction.. the fprintf should have had %.*s in it. eg. fprintf(f,"f %.*s %.*s %.*s\n",toString(x[i]),toString(y[i]),toString(z[i]));
 and what if I want to send it over the net.

The f on the start of the line says it's a float, you chop the string on spaces and parse accordingly. Isn't that what the f is there for?
 Basically I want to do stringops internally :-) and I want to do it 
 the "D" way.

Im not sure I understand what you mean.. You can set the length (if you know what you need) and you can assign to a slice i.e.


-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
prev sibling parent reply Regan Heath <regan netwin.co.nz> writes:
------------SOmrjbn1OSMzIFZIzESVg3
Content-Type: text/plain; format=flowed; charset=iso-8859-15
Content-Transfer-Encoding: 8bit

Try this faststring class I just wrote quickly, it's very bare bones as 
yet.

You use it just like the string in your first example, except the 
constructor takes an initial string size which it sets the string length 
to.

Does it increase performance?

On Mon, 14 Jun 2004 17:02:25 -0700, Daniel Horn <hellcatv hotmail.com> 
wrote:
 I don't know the length of my numbers...
 so slice assignment is painful... basically I have to have some sort of 
 ftoa and itoa function (that need to avoid assigning memory, returning 
 statically sized structs and lengths) then assign it to the slice, after 
 keeping track of the last length and finding the next length.

 then I need a separate counter to see how much I've allocated...it's 
 just a mess...and this is exactly what D was supposed to avoid.  This is 
 the old fashioned "C buffer safe" way...and I'm not happy with it or I'd 
 still be using C.

 And the bottom line is that I don't want to print to a file, I want to 
 keep it in a string.  the code for concat is so clear--but why is it so 
 slow?

 PS: make sure to use toStringz when using printf with %s.  toString is 
 not guaranteed to zero terminate.
 Regan Heath wrote:
 On Mon, 14 Jun 2004 16:29:06 -0700, Daniel Horn <hellcatv hotmail.com> 
 wrote:

 That uses a string

Yep. So did your example? What am I missing? pls explain..
 and is not typesafe

Why not? change x from float[] to a double[] and it still works, change it to an int[] and it still works.. sorry.. correction.. the fprintf should have had %.*s in it. eg. fprintf(f,"f %.*s %.*s %.*s\n",toString(x[i]),toString(y[i]),toString(z[i]));
 and what if I want to send it over the net.

The f on the start of the line says it's a float, you chop the string on spaces and parse accordingly. Isn't that what the f is there for?
 Basically I want to do stringops internally :-) and I want to do it 
 the "D" way.

Im not sure I understand what you mean.. You can set the length (if you know what you need) and you can assign to a slice i.e.


-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ ------------SOmrjbn1OSMzIFZIzESVg3 Content-Disposition: attachment; filename=faststring.d Content-Type: application/octet-stream; name=faststring.d Content-Transfer-Encoding: 8bit module regan.faststring; class FastString { char[] buffer; int blen; this(int _length = 0) { buffer.length = _length; blen = 0; } this(FastString str) { buffer = str.buffer.dup; blen = str.blen; } FastString opCat(char[] str) { FastString f = new FastString(this); f ~= str; return f; } FastString opCatAssign(char[] str) { if (blen + str.length > buffer.length) buffer.length = buffer.length + (buffer.length/2); buffer[blen..blen+str.length] = str; blen += str.length; return this; } char[] toString() { return buffer; } uint size() { return buffer.length; } uint length() { return blen; } } ------------SOmrjbn1OSMzIFZIzESVg3--
Jun 14 2004
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 15 Jun 2004 16:05:50 +1200, Regan Heath wrote:

 module regan.faststring;
 
 class FastString {
 	char[] buffer;
 	int blen;
 
 	this(int _length = 0) {
 		buffer.length = _length;
 		blen = 0;
 	}
 	this(FastString str) {
 		buffer = str.buffer.dup;
 		blen = str.blen;
 	}
 
 	FastString opCat(char[] str) {
 		FastString f = new FastString(this);
 		f ~= str;
 		return f;
 	}
 
 	FastString opCatAssign(char[] str) {
 		if (blen + str.length > buffer.length)
 			buffer.length = buffer.length + (buffer.length/2);
 		buffer[blen..blen+str.length] = str;
 		blen += str.length;
 		return this;
 	}
 
 	char[] toString() {
 		return buffer;
 	}
 
 	uint size() {
 		return buffer.length;
 	}
 
 	uint length() {
 		return blen;
 	}
 }

LOL. I just created a class (struct actually) that did almost the same as this FastString. I noticed one small issue with your opCatAssign() function. When expanding the size of the buffer, you need to ensure that the expansion amount is at least able to hold the new data. So I just added the length of the new data as well as the extra 50% growth factor. if (blen + str.length > buffer.length) buffer.length = buffer.length + (buffer.length/2) + str.length ; You might also consider this routine to, to clear the buffer. void clear() { blen = 0; buffer.length = 0; } -- Derek Melbourne, Australia 15/Jun/04 2:44:13 PM
Jun 14 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Tue, 15 Jun 2004 14:53:01 +1000, Derek Parnell <derek psych.ward> wrote:
 On Tue, 15 Jun 2004 16:05:50 +1200, Regan Heath wrote:

 module regan.faststring;

 class FastString {
 	char[] buffer;
 	int blen;

 	this(int _length = 0) {
 		buffer.length = _length;
 		blen = 0;
 	}
 	this(FastString str) {
 		buffer = str.buffer.dup;
 		blen = str.blen;
 	}

 	FastString opCat(char[] str) {
 		FastString f = new FastString(this);
 		f ~= str;
 		return f;
 	}

 	FastString opCatAssign(char[] str) {
 		if (blen + str.length > buffer.length)
 			buffer.length = buffer.length + (buffer.length/2);
 		buffer[blen..blen+str.length] = str;
 		blen += str.length;
 		return this;
 	}

 	char[] toString() {
 		return buffer;
 	}

 	uint size() {
 		return buffer.length;
 	}

 	uint length() {
 		return blen;
 	}
 }

LOL. I just created a class (struct actually) that did almost the same as this FastString. I noticed one small issue with your opCatAssign() function. When expanding the size of the buffer, you need to ensure that the expansion amount is at least able to hold the new data. So I just added the length of the new data as well as the extra 50% growth factor.

Thanks. In fact without this my code is really broken if buffer.length == 0.
   if (blen + str.length > buffer.length)
     buffer.length = buffer.length + (buffer.length/2)
                      + str.length ;


 You might also consider this routine to, to clear the buffer.

  void clear()
  {
     blen = 0;
     buffer.length = 0;
  }

Good idea. I wouldn't set buffer.length to 0 tho, that will free the memory associated with it (I believe) and we may as well keep it till we are deleted. Once Daniel gets back to me about whether it is actually faster or not, then I'll polish the class(maybe make it a struct) up. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 14 2004
parent reply Daniel Horn <hellcatv hotmail.com> writes:
I'm glad you're all looking at this...
I'll have to investigate both your struct and walters outbuffer idea.
unfortunately I may not have time until thursday ...
I'll get back to you when I have numbers...but this has been an 
enlightening discussion.
I'm not convinced that realloc idea actually does anything 
performance-wise... the allocation process itself (realloc) usually has 
to do some gc overhead and that could well cause problems, even if the 
mem were there--of course I'll throw that into a benchmarking suite and 
let you know the numbers.

I do insist that the "C" way of doing things is difficult and 
error-prone--after writing many buffer-safe C classes to do just that I 
can tell you that I moved to D to avoid just this. And don't get me 
started about snprintf.  If I use snprintf and my buffer runs out of 
space--it's still a RUNTIME error and I don't get the correct 
information into my buffer--sure it's not an EXPLOIT...but hardly better 
to have my program crash or do incorrect and mysterious things because 
of an snprintf--we've all been there.
and the concat operator has been perfect aside from the performance hit.
--Daniel

Regan Heath wrote:
 On Tue, 15 Jun 2004 14:53:01 +1000, Derek Parnell <derek psych.ward> wrote:
 
 On Tue, 15 Jun 2004 16:05:50 +1200, Regan Heath wrote:

 module regan.faststring;

 class FastString {
     char[] buffer;
     int blen;

     this(int _length = 0) {
         buffer.length = _length;
         blen = 0;
     }
     this(FastString str) {
         buffer = str.buffer.dup;
         blen = str.blen;
     }

     FastString opCat(char[] str) {
         FastString f = new FastString(this);
         f ~= str;
         return f;
     }

     FastString opCatAssign(char[] str) {
         if (blen + str.length > buffer.length)
             buffer.length = buffer.length + (buffer.length/2);
         buffer[blen..blen+str.length] = str;
         blen += str.length;
         return this;
     }

     char[] toString() {
         return buffer;
     }

     uint size() {
         return buffer.length;
     }

     uint length() {
         return blen;
     }
 }

LOL. I just created a class (struct actually) that did almost the same as this FastString. I noticed one small issue with your opCatAssign() function. When expanding the size of the buffer, you need to ensure that the expansion amount is at least able to hold the new data. So I just added the length of the new data as well as the extra 50% growth factor.

Thanks. In fact without this my code is really broken if buffer.length == 0.
   if (blen + str.length > buffer.length)
     buffer.length = buffer.length + (buffer.length/2)
                      + str.length ;


 You might also consider this routine to, to clear the buffer.

  void clear()
  {
     blen = 0;
     buffer.length = 0;
  }

Good idea. I wouldn't set buffer.length to 0 tho, that will free the memory associated with it (I believe) and we may as well keep it till we are deleted. Once Daniel gets back to me about whether it is actually faster or not, then I'll polish the class(maybe make it a struct) up. Regan.

Jun 14 2004
parent Daniel Horn <hellcatv hotmail.com> writes:
if people are so concerned about the cost of saving the reserved ammt...
if we always know that the reserved ammt is going to be the next power 
of two higher than our current value we could literally compute it each 
time instead of saving it (in current hardware it's better to recompute 
usually) :-)

that way we get the amortized cost of an append to be constant rather 
than (in this case) n :-/
i.e. if I concat n strings of size 1 I'm gonna have n^2 time just in the 
reallocation.

 Good idea. I wouldn't set buffer.length to 0 tho, that will free the 
 memory associated with it (I believe) and we may as well keep it till 
 we are deleted.

 Once Daniel gets back to me about whether it is actually faster or 
 not, then I'll polish the class(maybe make it a struct) up.

 Regan.


Jun 14 2004
prev sibling parent Arcane Jill <Arcane_member pathlink.com> writes:
Why is everyone re-inventing the wheel?

Isn't this exactly what std.outbuffer.Outbuffer is for?

Jill
Jun 15 2004
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <calcc2$2nq5$1 digitaldaemon.com>, Daniel Horn says...
i.e. does a string always hold exactly .length (rounded up to some 
constant mallocable size) or does it really double the length when you 
overrun to aggregate decent performance out of things

I've been meaning to ask this exact question :) How much memory do dynamic arrays allocate when they grow and do they ever reallocate when they shrink? Sean
Jun 14 2004
next sibling parent "Vathix" <vathixSpamFix dprogramming.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message
news:calekm$2r2h$1 digitaldaemon.com...
 In article <calcc2$2nq5$1 digitaldaemon.com>, Daniel Horn says...
i.e. does a string always hold exactly .length (rounded up to some
constant mallocable size) or does it really double the length when you
overrun to aggregate decent performance out of things

I've been meaning to ask this exact question :) How much memory do

 arrays allocate when they grow and do they ever reallocate when they


Actual allocations are the smallest power of 2 that holds the requested size. I don't think they reallocate when shrinking because you could have sliced that memory to use somewhere else.
Jun 14 2004
prev sibling parent "Matthew" <admin stlsoft.dot.dot.dot.dot.org> writes:
"Sean Kelly" <sean f4.ca> wrote in message
news:calekm$2r2h$1 digitaldaemon.com...
 In article <calcc2$2nq5$1 digitaldaemon.com>, Daniel Horn says...
i.e. does a string always hold exactly .length (rounded up to some
constant mallocable size) or does it really double the length when you
overrun to aggregate decent performance out of things

I've been meaning to ask this exact question :) How much memory do dynamic arrays allocate when they grow and do they ever reallocate when they shrink?

AFAIK, which might be woefully out-of-date, they allocate exactly what they need, up to a rounding factor.
Jun 14 2004
prev sibling parent Ben Hinkle <bhinkle4 juno.com> writes:
Daniel Horn wrote:

 
 That's the code I was expecting to see and exactly the code I was
 wishing to avoid:

oh well. you could have warned me :-)
 a) what if the type changes (double to real and suddenly string takes
 too much memory and buffer overruns)

another (possibly more common) case is switching to a template where the type isn't known. Casting is a way out: sprintf(buf,"f %g\n",cast(double)a[i]); If casting is too ugly then sprintf probably isn't the way to go. If overflow is a concern then snprintf is an option. Now that I think about it how about a D wrapper around the printf family that takes a dynamic array as the candidate output buffer and if the string fits in the array then it fills it and returns the slice holding the result and otherwise it allocates a dynamic array and fills that. It would probably be a few lines of snprintf and array allocation. The declaration is char[] sprintf(char[], char*, ...)
 b) not type safe (what if I say %d but pass in a float)

yup. true. as above casting is an option if that is a concern.
 c) you still have to realloc every face

I'm not exactly sure what you mean here but I'm now guessing you really do want to catenate all the strings up into one huge 50meg string in memory. Preallocation could help here.
 d) sprintf isn't part of D--it's a nasty hanging chad from C...
 I'd like to see a clean solution in D entirely

It is a matter of personal preference. I use C functions whenever it makes sense since I know them well and users reading my code will know them well.
 Ben Hinkle wrote:
 Instead of building one huge string in memory how about
 processing line by line:
 
  char[128] out; // 128 is max str len
  for (int i=0;i<mface;i++) {
     sprintf(out,"f %d %d %d\n",a[i],b[i],c[i]);
     ... do something with out ...
  }
 
 If the end result is going to a file the temporary buffer
 might not even be needed - the printf can go right to the file.
 
 -Ben
 
 Daniel Horn wrote:
 
 
I'm writing a program which spits out an .obj file.
I'm doing

char[] out;
for (int i=0;i<mvert;i++)
    out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";

for (int i=0;i<mface;i++)
    out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";

the performance is simply abysmal.  To write a 50 meg file takes upwards
of 2 minutes.

is there any way to write this to be fast without sacrificing
readability; part of the problem seems to be the realloc per face
(instead of intelligently doubling the ram allocated)
but I also suspect allocating so many small strings with ftoa and itoa
isn't helping.



Jun 14 2004
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
Try building the string with std.outbuffer. That way, you'll be avoiding the
reallocations and copying.

"Daniel Horn" <hellcatv hotmail.com> wrote in message
news:cakk42$1fnr$1 digitaldaemon.com...
 I'm writing a program which spits out an .obj file.
 I'm doing

 char[] out;
 for (int i=0;i<mvert;i++)
     out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";

 for (int i=0;i<mface;i++)
     out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";

 the performance is simply abysmal.  To write a 50 meg file takes upwards
 of 2 minutes.

 is there any way to write this to be fast without sacrificing readability;
 part of the problem seems to be the realloc per face (instead of
 intelligently doubling the ram allocated)
 but I also suspect allocating so many small strings with ftoa and itoa
 isn't helping.

Jun 14 2004
parent "Ivan Senji" <ivan.senji public.srce.hr> writes:
"Walter" <newshound digitalmars.com> wrote in message
news:calt3r$evc$2 digitaldaemon.com...
 Try building the string with std.outbuffer. That way, you'll be avoiding

 reallocations and copying.

Regarding OutBuffer: InBuffer is mentioned, exists?
 "Daniel Horn" <hellcatv hotmail.com> wrote in message
 news:cakk42$1fnr$1 digitaldaemon.com...
 I'm writing a program which spits out an .obj file.
 I'm doing

 char[] out;
 for (int i=0;i<mvert;i++)
     out~="v"~" "~ftoa(x[i])~" "~ftoa(y[i])~" "~ftoa(z[i])~"\n";

 for (int i=0;i<mface;i++)
     out~="f"~" "~itoa(a[i])~" "~itoa(b[i])~" "~itoa(c[i])~"\n";

 the performance is simply abysmal.  To write a 50 meg file takes upwards
 of 2 minutes.

 is there any way to write this to be fast without sacrificing


 part of the problem seems to be the realloc per face (instead of
 intelligently doubling the ram allocated)
 but I also suspect allocating so many small strings with ftoa and itoa
 isn't helping.


Jun 14 2004