digitalmars.D.learn - Implementation of char[] std.string.toString(char)

Stefan (19/19) Aug 01 2005 I recently noticed that char[] std.string.toString(char) in

David L. Davis (47/66) Aug 01 2005 At first I thought it was because 'char' and 'int' (int are 2 bytes long...
Derek Parnell (11/34) Aug 01 2005 I believe its because Walter is trying to be 'C' friendly. The returned

Stefan (6/40) Aug 01 2005 Hhm, I initially thought the same. But as I understand it, there are a

Ben Hinkle (6/42) Aug 01 2005 Since the GC allocates in blocks of 16 bytes or more allocating a single...

Stefan (4/48) Aug 01 2005 Yes, that might explain it. Thanks a lot.

Russ Lewis (10/14) Aug 01 2005 Nearly correct. toString() is not required to return something that has...

Stefan (18/32) Aug 01 2005 In my Phobos source (DMD 0.127) that code is commented out.

Russ Lewis (5/22) Aug 01 2005 It appears you are right; I guess I missed the change. Looks to me like...

Stefan <Stefan_member pathlink.com> writes:

I recently noticed that char[] std.string.toString(char) in
Phobos (DMD 0.127) is implemented this way:










Why is it not simply









Can anyone shed a light on this?

Thanks in advance,
Stefan

Aug 01 2005

David L. Davis <SpottedTiger yahoo.com> writes:

In article <dckpo7$23vs$1 digitaldaemon.com>, Stefan says...
I recently noticed that char[] std.string.toString(char) in
Phobos (DMD 0.127) is implemented this way:










Why is it not simply









Can anyone shed a light on this?

Thanks in advance,
Stefan

At first I thought it was because 'char' and 'int' (int are 2 bytes long) are
implicitly converted to one another as needed, below is an example of the
toString(char) coverting both a 'char' and a 'int' without a cast().
































C:\dmd>dmd int2char.d
C:\dmd\bin\..\..\dm\bin\link.exe int2char,,,user32+kernel32/noi;

C:\dmd>int2char
toString1(c)="C" toString1(i)="C"
toString2(c)="C" toString2(i)="C"

C:\dmd>

But that's clearly not the case...umm...not sure at this point. Sorry I wasn't
more helpful on the matter.

David L.

-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
-------------------------------------------------------------------

MKoD: http://spottedtiger.tripod.com/D_Language/D_Main_XP.html

Aug 01 2005

Derek Parnell <derek psych.ward> writes:

On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:

 I recently noticed that char[] std.string.toString(char) in
 Phobos (DMD 0.127) is implemented this way:
 







 
 
 Why is it not simply
 






 
 
 Can anyone shed a light on this?

I believe its because Walter is trying to be 'C' friendly. The returned
'string' must have a length of 1, because it only holds one char, but it
must own a 2-byte memory allocation because the byte after the string must
be zero for potential C usage.

Your alternate routine certainly returns a 1-byte string, but the byte
after the string is undetermined.

-- 
Derek Parnell
Melbourne, Australia
1/08/2005 9:47:37 PM

Aug 01 2005

Stefan <Stefan_member pathlink.com> writes:

In article <gu39ywiarmwp.1vayamiha3tm3.dlg 40tude.net>, Derek Parnell says...
On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:

 I recently noticed that char[] std.string.toString(char) in
 Phobos (DMD 0.127) is implemented this way:
 







 
 
 Why is it not simply
 






 
 
 Can anyone shed a light on this?

I believe its because Walter is trying to be 'C' friendly. The returned
'string' must have a length of 1, because it only holds one char, but it
must own a 2-byte memory allocation because the byte after the string must
be zero for potential C usage.


Hhm, I initially thought the same. But as I understand it, there are a
lot of toString() routines in there that don't zero-terminate (e.g. char[]
toString(uint u)). So, I thought I must have missed something?

Thanks for your reply,
Stefan


Your alternate routine certainly returns a 1-byte string, but the byte
after the string is undetermined.

-- 
Derek Parnell
Melbourne, Australia
1/08/2005 9:47:37 PM

Aug 01 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

"Stefan" <Stefan_member pathlink.com> wrote in message 
news:dcl59f$2jhr$1 digitaldaemon.com...
 In article <gu39ywiarmwp.1vayamiha3tm3.dlg 40tude.net>, Derek Parnell 
 says...
On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:

 I recently noticed that char[] std.string.toString(char) in
 Phobos (DMD 0.127) is implemented this way:










 Why is it not simply









 Can anyone shed a light on this?

I believe its because Walter is trying to be 'C' friendly. The returned
'string' must have a length of 1, because it only holds one char, but it
must own a 2-byte memory allocation because the byte after the string must
be zero for potential C usage.


 Hhm, I initially thought the same. But as I understand it, there are a
 lot of toString() routines in there that don't zero-terminate (e.g. char[]
 toString(uint u)). So, I thought I must have missed something?

Since the GC allocates in blocks of 16 bytes or more allocating a single 
byte will actually allocate 16 so it doesn't hurt space-wise to ask for 2. 
Other functions probably don't know they'll always fit in one block. Note 
different GCs might not behave that way.

Aug 01 2005

Stefan <Stefan_member pathlink.com> writes:

In article <dclc4k$2qgv$1 digitaldaemon.com>, Ben Hinkle says...
"Stefan" <Stefan_member pathlink.com> wrote in message 
news:dcl59f$2jhr$1 digitaldaemon.com...
 In article <gu39ywiarmwp.1vayamiha3tm3.dlg 40tude.net>, Derek Parnell 
 says...
On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:

 I recently noticed that char[] std.string.toString(char) in
 Phobos (DMD 0.127) is implemented this way:










 Why is it not simply









 Can anyone shed a light on this?

I believe its because Walter is trying to be 'C' friendly. The returned
'string' must have a length of 1, because it only holds one char, but it
must own a 2-byte memory allocation because the byte after the string must
be zero for potential C usage.


 Hhm, I initially thought the same. But as I understand it, there are a
 lot of toString() routines in there that don't zero-terminate (e.g. char[]
 toString(uint u)). So, I thought I must have missed something?

Since the GC allocates in blocks of 16 bytes or more allocating a single 
byte will actually allocate 16 so it doesn't hurt space-wise to ask for 2. 
Other functions probably don't know they'll always fit in one block. Note 
different GCs might not behave that way.

Yes, that might explain it. Thanks a lot.

Best regards,
Stefan

Aug 01 2005

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

Derek Parnell wrote:
 I believe its because Walter is trying to be 'C' friendly. The returned
 'string' must have a length of 1, because it only holds one char, but it
 must own a 2-byte memory allocation because the byte after the string must
 be zero for potential C usage.

Nearly correct.  toString() is not required to return something that has 
the "hidden" zero trailing it, but it's useful when it does.  Look at 
the implementation of toStringz() (convert to zero-terminated string). 
That will look at the trailing character and see if it just happens to 
be 0; if so, then it can convert the string without any copying.

Ofc, that implementation of toStringz() is controversial, and when 
you're talking about a string of length 1, the cost of copying is very 
small.  But I suppose that even that small of a copy might kick off a GC 
sweep, so it's probably not a bad idea that it works the way it does.

Aug 01 2005

Stefan <Stefan_member pathlink.com> writes:

In article <dcldo3$2s6r$1 digitaldaemon.com>, Russ Lewis says...
Derek Parnell wrote:
 I believe its because Walter is trying to be 'C' friendly. The returned
 'string' must have a length of 1, because it only holds one char, but it
 must own a 2-byte memory allocation because the byte after the string must
 be zero for potential C usage.

Nearly correct.  toString() is not required to return something that has 
the "hidden" zero trailing it, but it's useful when it does.  Look at 
the implementation of toStringz() (convert to zero-terminated string). 
That will look at the trailing character and see if it just happens to 
be 0; if so, then it can convert the string without any copying.

In my Phobos source (DMD 0.127) that code is commented out.
The impl is essentially:














Or are we talking about different things here?

Best regards,
Stefan


Ofc, that implementation of toStringz() is controversial, and when 
you're talking about a string of length 1, the cost of copying is very 
small.  But I suppose that even that small of a copy might kick off a GC 
sweep, so it's probably not a bad idea that it works the way it does.

Aug 01 2005

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

Stefan wrote:
 In article <dcldo3$2s6r$1 digitaldaemon.com>, Russ Lewis says...
 
Derek Parnell wrote:

I believe its because Walter is trying to be 'C' friendly. The returned
'string' must have a length of 1, because it only holds one char, but it
must own a 2-byte memory allocation because the byte after the string must
be zero for potential C usage.

Nearly correct.  toString() is not required to return something that has 
the "hidden" zero trailing it, but it's useful when it does.  Look at 
the implementation of toStringz() (convert to zero-terminated string). 
That will look at the trailing character and see if it just happens to 
be 0; if so, then it can convert the string without any copying.

 
 
 In my Phobos source (DMD 0.127) that code is commented out.

It appears you are right; I guess I missed the change.  Looks to me like 
it was commented out in version 0.113.  My thought is that, then, the 
implementation of toString(char) can be simplified.  At least, I don't 
perceive any reason not to...

Aug 01 2005

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Implementation of char[] std.string.toString(char)