www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Why are string literals zero-terminated?

reply awishformore <awishformore gmail.com> writes:
Following this discussion on announce, I was wondering why string 
literals are zero-terminated. Or to re-formulate, why only string 
literals are zero-terminated. Why that inconsistency? What's the 
rationale behind it? Does anyone know?

/Max

 Did you test with a string that was not in the code itself, e.g. 
from a
 config file?
 String literals are null terminated so you wouldn't have had an 
issue if
 all your strings were literals.
 Utf8 doesn't contain the string length, so you will run in to problems
 eventually.

 You have to use toStringz or your own null terminator. Unless of 
course
 you know that the function will always be
 taking string literals. But even then leaving something like that 
up to
 the programmer to remember is not exactly
 fool proof.

 Enjoy.
 ~Rory
Hey again and thanks for the hint. I tried finding something on the DM page about string literals being null terminated and while the section about string literals didn't even mention it, it was said some place else. That explains why using string literals works even though I expected it to fail. It's indeed good to know and adding std.string.toStringz is probably a good idea ;). Thanks. Greetings, Max.
sure, I must admit it is annoying when the same code can do different things just because of where the data came from. It would be easier to notice the bug if d never added a null on literals, but then there would also be a lot more usages of toStringz. I think if you want to test it you can do: auto s = "blah"; open(s[0..$].dup.ptr); // duplicating it should put it somewhere else // just slicing will not test
When thinking about it, it makes sense to have string literals null
terminated in order to have C functions work with them. However, I wonder about some stuff, for instance:
 string s = "string";
 // is s == "string\0" now?
 char[] c = cast(char[])s;
 // is c[6] == '\0' now?
 char* p = s.ptr;
 // is *(p+6) == '\0' now?

 I think use of the zero terminator should be consistent. Either make 
every string (and char[] for that matter) zero terminated in the underlying memory for backwards compatibility with C or leave it to the user in all cases.
 /Max
perhaps the NULL is there because its there in the executable file? NULL is also often after a dynamic array simply because of d always initializing memory, and when you get an allocation often a larger amount is allocated which remains NULL.
Jul 20 2010
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Tue, 20 Jul 2010 14:59:18 +0200, awishformore wrote:

 Following this discussion on announce, I was wondering why string
 literals are zero-terminated. Or to re-formulate, why only string
 literals are zero-terminated. Why that inconsistency? What's the
 rationale behind it? Does anyone know?
So you can pass them to C functions. -Lars
Jul 20 2010
parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Tue, 20 Jul 2010 13:26:56 +0000, Lars T. Kyllingstad wrote:

 On Tue, 20 Jul 2010 14:59:18 +0200, awishformore wrote:
 
 Following this discussion on announce, I was wondering why string
 literals are zero-terminated. Or to re-formulate, why only string
 literals are zero-terminated. Why that inconsistency? What's the
 rationale behind it? Does anyone know?
So you can pass them to C functions.
Note that even though string literals are zero terminated, the actual string (the array, that is) doesn't contain the zero character. It's located at the memory position immediately following the string. string s = "hello"; assert (s[$-1] != '\0'); // Last character of s is 'o', not '\0' assert (s.ptr[s.length] == '\0'); Why is it only so for literals? That is because the compiler can only guarantee the zero-termination of string literals. The memory following a string in general could contain anything. string s = getStringFromSomewhere(); // I have no idea where s is coming from, so I don't // know whether it is zero-terminated or not. Better // make sure. someCFunction(toStringz(s)); -Lars
Jul 20 2010
parent awishformore <awishformore gmail.com> writes:
Am 20.07.2010 15:38, schrieb Lars T. Kyllingstad:
 On Tue, 20 Jul 2010 13:26:56 +0000, Lars T. Kyllingstad wrote:

 On Tue, 20 Jul 2010 14:59:18 +0200, awishformore wrote:

 Following this discussion on announce, I was wondering why string
 literals are zero-terminated. Or to re-formulate, why only string
 literals are zero-terminated. Why that inconsistency? What's the
 rationale behind it? Does anyone know?
So you can pass them to C functions.
Note that even though string literals are zero terminated, the actual string (the array, that is) doesn't contain the zero character. It's located at the memory position immediately following the string. string s = "hello"; assert (s[$-1] != '\0'); // Last character of s is 'o', not '\0' assert (s.ptr[s.length] == '\0'); Why is it only so for literals? That is because the compiler can only guarantee the zero-termination of string literals. The memory following a string in general could contain anything. string s = getStringFromSomewhere(); // I have no idea where s is coming from, so I don't // know whether it is zero-terminated or not. Better // make sure. someCFunction(toStringz(s)); -Lars
Hey. Yes, that indeed makes a lot of sense. I didn't actually try those asserts because I'm currently not on a dev machine, but what you point out basically is the behaviour I was hoping for. Thanks for clearing this up. /Max
Jul 20 2010