www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Implementation of char[] std.string.toString(char)

reply Stefan <Stefan_member pathlink.com> writes:
I recently noticed that char[] std.string.toString(char) in
Phobos (DMD 0.127) is implemented this way:

# char[] toString(char c)
# {
#   char[] result = new char[2];
#   result[0] = c;
#   result[1] = 0;
#   return result[0 .. 1];
# }


Why is it not simply

# char[] toString(char c)
# {
#  char[] result = new char[1];
#  result[0] = c;
#  return result;
# }


Can anyone shed a light on this?

Thanks in advance,
Stefan
Aug 01 2005
next sibling parent David L. Davis <SpottedTiger yahoo.com> writes:
In article <dckpo7$23vs$1 digitaldaemon.com>, Stefan says...
I recently noticed that char[] std.string.toString(char) in
Phobos (DMD 0.127) is implemented this way:

# char[] toString(char c)
# {
#   char[] result = new char[2];
#   result[0] = c;
#   result[1] = 0;
#   return result[0 .. 1];
# }


Why is it not simply

# char[] toString(char c)
# {
#  char[] result = new char[1];
#  result[0] = c;
#  return result;
# }


Can anyone shed a light on this?

Thanks in advance,
Stefan

At first I thought it was because 'char' and 'int' (int are 2 bytes long) are implicitly converted to one another as needed, below is an example of the toString(char) coverting both a 'char' and a 'int' without a cast(). # //int2char.d # private import std.stdio; # # char[] toString1(char c) # { # char[] result = new char[2]; # result[0] = c; # result[1] = 0; # return result[0 .. 1]; # } # # char[] toString2(char c) # { # char[] result = new char[1]; # result[0] = c; # return result; # } # # int main() # { # char c; # int i = 67; # # c = i; // no cast() needed # writefln("toString1(c)=\"%s\" toString1(i)=\"%s\"", # .toString1(c), .toString1(i)); # writefln("toString2(c)=\"%s\" toString2(i)=\"%s\"", # .toString2(c), .toString2(i)); # return 0; # } C:\dmd>dmd int2char.d C:\dmd\bin\..\..\dm\bin\link.exe int2char,,,user32+kernel32/noi; C:\dmd>int2char toString1(c)="C" toString1(i)="C" toString2(c)="C" toString2(i)="C" C:\dmd> But that's clearly not the case...umm...not sure at this point. Sorry I wasn't more helpful on the matter. David L. ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!" ------------------------------------------------------------------- MKoD: http://spottedtiger.tripod.com/D_Language/D_Main_XP.html
Aug 01 2005
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:

 I recently noticed that char[] std.string.toString(char) in
 Phobos (DMD 0.127) is implemented this way:
 
 # char[] toString(char c)
 # {
 #   char[] result = new char[2];
 #   result[0] = c;
 #   result[1] = 0;
 #   return result[0 .. 1];
 # }
 
 
 Why is it not simply
 
 # char[] toString(char c)
 # {
 #  char[] result = new char[1];
 #  result[0] = c;
 #  return result;
 # }
 
 
 Can anyone shed a light on this?

I believe its because Walter is trying to be 'C' friendly. The returned 'string' must have a length of 1, because it only holds one char, but it must own a 2-byte memory allocation because the byte after the string must be zero for potential C usage. Your alternate routine certainly returns a 1-byte string, but the byte after the string is undetermined. -- Derek Parnell Melbourne, Australia 1/08/2005 9:47:37 PM
Aug 01 2005
next sibling parent reply Stefan <Stefan_member pathlink.com> writes:
In article <gu39ywiarmwp.1vayamiha3tm3.dlg 40tude.net>, Derek Parnell says...
On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:

 I recently noticed that char[] std.string.toString(char) in
 Phobos (DMD 0.127) is implemented this way:
 
 # char[] toString(char c)
 # {
 #   char[] result = new char[2];
 #   result[0] = c;
 #   result[1] = 0;
 #   return result[0 .. 1];
 # }
 
 
 Why is it not simply
 
 # char[] toString(char c)
 # {
 #  char[] result = new char[1];
 #  result[0] = c;
 #  return result;
 # }
 
 
 Can anyone shed a light on this?

I believe its because Walter is trying to be 'C' friendly. The returned 'string' must have a length of 1, because it only holds one char, but it must own a 2-byte memory allocation because the byte after the string must be zero for potential C usage.

Hhm, I initially thought the same. But as I understand it, there are a lot of toString() routines in there that don't zero-terminate (e.g. char[] toString(uint u)). So, I thought I must have missed something? Thanks for your reply, Stefan
Your alternate routine certainly returns a 1-byte string, but the byte
after the string is undetermined.

-- 
Derek Parnell
Melbourne, Australia
1/08/2005 9:47:37 PM

Aug 01 2005
parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Stefan" <Stefan_member pathlink.com> wrote in message 
news:dcl59f$2jhr$1 digitaldaemon.com...
 In article <gu39ywiarmwp.1vayamiha3tm3.dlg 40tude.net>, Derek Parnell 
 says...
On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:

 I recently noticed that char[] std.string.toString(char) in
 Phobos (DMD 0.127) is implemented this way:

 # char[] toString(char c)
 # {
 #   char[] result = new char[2];
 #   result[0] = c;
 #   result[1] = 0;
 #   return result[0 .. 1];
 # }


 Why is it not simply

 # char[] toString(char c)
 # {
 #  char[] result = new char[1];
 #  result[0] = c;
 #  return result;
 # }


 Can anyone shed a light on this?

I believe its because Walter is trying to be 'C' friendly. The returned 'string' must have a length of 1, because it only holds one char, but it must own a 2-byte memory allocation because the byte after the string must be zero for potential C usage.

Hhm, I initially thought the same. But as I understand it, there are a lot of toString() routines in there that don't zero-terminate (e.g. char[] toString(uint u)). So, I thought I must have missed something?

Since the GC allocates in blocks of 16 bytes or more allocating a single byte will actually allocate 16 so it doesn't hurt space-wise to ask for 2. Other functions probably don't know they'll always fit in one block. Note different GCs might not behave that way.
Aug 01 2005
parent Stefan <Stefan_member pathlink.com> writes:
In article <dclc4k$2qgv$1 digitaldaemon.com>, Ben Hinkle says...
"Stefan" <Stefan_member pathlink.com> wrote in message 
news:dcl59f$2jhr$1 digitaldaemon.com...
 In article <gu39ywiarmwp.1vayamiha3tm3.dlg 40tude.net>, Derek Parnell 
 says...
On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:

 I recently noticed that char[] std.string.toString(char) in
 Phobos (DMD 0.127) is implemented this way:

 # char[] toString(char c)
 # {
 #   char[] result = new char[2];
 #   result[0] = c;
 #   result[1] = 0;
 #   return result[0 .. 1];
 # }


 Why is it not simply

 # char[] toString(char c)
 # {
 #  char[] result = new char[1];
 #  result[0] = c;
 #  return result;
 # }


 Can anyone shed a light on this?

I believe its because Walter is trying to be 'C' friendly. The returned 'string' must have a length of 1, because it only holds one char, but it must own a 2-byte memory allocation because the byte after the string must be zero for potential C usage.

Hhm, I initially thought the same. But as I understand it, there are a lot of toString() routines in there that don't zero-terminate (e.g. char[] toString(uint u)). So, I thought I must have missed something?

Since the GC allocates in blocks of 16 bytes or more allocating a single byte will actually allocate 16 so it doesn't hurt space-wise to ask for 2. Other functions probably don't know they'll always fit in one block. Note different GCs might not behave that way.

Yes, that might explain it. Thanks a lot. Best regards, Stefan
Aug 01 2005
prev sibling parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Derek Parnell wrote:
 I believe its because Walter is trying to be 'C' friendly. The returned
 'string' must have a length of 1, because it only holds one char, but it
 must own a 2-byte memory allocation because the byte after the string must
 be zero for potential C usage.

Nearly correct. toString() is not required to return something that has the "hidden" zero trailing it, but it's useful when it does. Look at the implementation of toStringz() (convert to zero-terminated string). That will look at the trailing character and see if it just happens to be 0; if so, then it can convert the string without any copying. Ofc, that implementation of toStringz() is controversial, and when you're talking about a string of length 1, the cost of copying is very small. But I suppose that even that small of a copy might kick off a GC sweep, so it's probably not a bad idea that it works the way it does.
Aug 01 2005
parent reply Stefan <Stefan_member pathlink.com> writes:
In article <dcldo3$2s6r$1 digitaldaemon.com>, Russ Lewis says...
Derek Parnell wrote:
 I believe its because Walter is trying to be 'C' friendly. The returned
 'string' must have a length of 1, because it only holds one char, but it
 must own a 2-byte memory allocation because the byte after the string must
 be zero for potential C usage.

Nearly correct. toString() is not required to return something that has the "hidden" zero trailing it, but it's useful when it does. Look at the implementation of toStringz() (convert to zero-terminated string). That will look at the trailing character and see if it just happens to be 0; if so, then it can convert the string without any copying.

In my Phobos source (DMD 0.127) that code is commented out. The impl is essentially: # char* toStringz(char[] string) # { # char[] copy; # if (string.length == 0) # return ""; # # // Need to make a copy # copy = new char[string.length + 1]; # copy[0..string.length] = string; # copy[string.length] = 0; # return copy; # } Or are we talking about different things here? Best regards, Stefan
Ofc, that implementation of toStringz() is controversial, and when 
you're talking about a string of length 1, the cost of copying is very 
small.  But I suppose that even that small of a copy might kick off a GC 
sweep, so it's probably not a bad idea that it works the way it does.

Aug 01 2005
parent Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Stefan wrote:
 In article <dcldo3$2s6r$1 digitaldaemon.com>, Russ Lewis says...
 
Derek Parnell wrote:

I believe its because Walter is trying to be 'C' friendly. The returned
'string' must have a length of 1, because it only holds one char, but it
must own a 2-byte memory allocation because the byte after the string must
be zero for potential C usage.

Nearly correct. toString() is not required to return something that has the "hidden" zero trailing it, but it's useful when it does. Look at the implementation of toStringz() (convert to zero-terminated string). That will look at the trailing character and see if it just happens to be 0; if so, then it can convert the string without any copying.

In my Phobos source (DMD 0.127) that code is commented out.

It appears you are right; I guess I missed the change. Looks to me like it was commented out in version 0.113. My thought is that, then, the implementation of toString(char) can be simplified. At least, I don't perceive any reason not to...
Aug 01 2005