www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - String Theory Questions

reply "WhatMeWorry" <kheaser gmail.com> writes:
The name string is aliased to immutable(char)[]

Why was immutable chosen? Why not mutable.  Or why not just make 
another alias called

strung where it is aliased to mutable(char)[]

Also, since strings are arrays and arrays are structs with a 
length and ptr
field, I ran the following code for both an empty string and a 
null string.

string emptyStr = "";
writeln("emptyStr.ptr is ", emptyStr.ptr);
writeln("emptyStr.length is ", emptyStr.length);

string nullStr = null;
writeln("nullStr.ptr is ", nullStr.ptr);
writeln("nullStr.length is ", nullStr.length);

and got the following results:

emptyStr.ptr is 42F080
emptyStr.length is 0
nullStr.ptr is null
nullStr.length is 0

I guess I was expecting them to be equivalent.  I can understand 
why both lengths are zero.  But what is emptyStr.ptr doing with 
the 42F080 value? I presume this is a address?  If so, what does 
this address contain and what is it used for?

Or maybe a more succinct question is why not just set 
emptyStr.ptr to null and be done with it?
Sep 13 2014
next sibling parent reply ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Sat, 13 Sep 2014 17:09:56 +0000
WhatMeWorry via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 I guess I was expecting them to be equivalent.  I can understand=20
 why both lengths are zero.  But what is emptyStr.ptr doing with=20
 the 42F080 value? I presume this is a address?  If so, what does=20
 this address contain and what is it used for?
it's used to keep "empty string". ;-) note that "null string" and "empty string" aren't same things. arrays are reference types and compiler magically knows that "null-arrays" are just empty arrays (and you can assign 'null' to array to clear it). but strings are special in one funny way: when compiler sees string literal (i.e. quoted string) in source code, it actually generates C-like zero-terminated string. this is to ease C interop, so we can call C functions like this: `printf("my string!\n");` instead of this: `printf("my string!\n".toStringz);`. so your "empty string" is actually points to zero byte (and has zero length, 'cause D strings aren't zero-terminated). and "null string" is really "null", i.e. contains no data. as for "immutable": it is done this way so compiler can place string literals in read-only section of resulting binary. without immutability calling `void foo (string s);` as `foo("wow!")` will require copying string to heap first ('cause `s` contents allowed to be changed in `foo()`). adding implicit "copy-on-writing" semantic will increase compiler complexity and hidden dynamic array struct size for virtually nothing.
Sep 13 2014
parent reply "AsmMan" <jckj33 gmail.com> writes:
On Saturday, 13 September 2014 at 17:31:18 UTC, ketmar via 
Digitalmars-d-learn wrote:
 On Sat, 13 Sep 2014 17:09:56 +0000
 WhatMeWorry via Digitalmars-d-learn 
 <digitalmars-d-learn puremagic.com>
 wrote:

 I guess I was expecting them to be equivalent.  I can 
 understand why both lengths are zero.  But what is 
 emptyStr.ptr doing with the 42F080 value? I presume this is a 
 address?  If so, what does this address contain and what is it 
 used for?
it's used to keep "empty string". ;-) note that "null string" and "empty string" aren't same things. arrays are reference types and compiler magically knows that "null-arrays" are just empty arrays (and you can assign 'null' to array to clear it). but strings are special in one funny way: when compiler sees string literal (i.e. quoted string) in source code, it actually generates C-like zero-terminated string. this is to ease C interop, so we can call C functions like this: `printf("my string!\n");` instead of this: `printf("my string!\n".toStringz);`.
D string are actullay C-strings?
Sep 13 2014
next sibling parent "David Nadlinger" <code klickverbot.at> writes:
On Saturday, 13 September 2014 at 22:41:39 UTC, AsmMan wrote:
 D string are actullay C-strings?
No. But string *literals* are guaranteed to be 0-terminated for easier interoperability with C code. David
Sep 13 2014
prev sibling parent reply ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Sat, 13 Sep 2014 22:41:38 +0000
AsmMan via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 D string are actullay C-strings?
in no way. only string *LITERALS* are zero-terminated.
Sep 13 2014
parent reply "WhatMeWorry" <kheaser gmail.com> writes:
On Saturday, 13 September 2014 at 23:22:40 UTC, ketmar via 
Digitalmars-d-learn wrote:
 On Sat, 13 Sep 2014 22:41:38 +0000
 AsmMan via Digitalmars-d-learn 
 <digitalmars-d-learn puremagic.com>
 wrote:

 D string are actullay C-strings?
in no way. only string *LITERALS* are zero-terminated.
Ok. So I wrote the following: char c = *(emptyStr.ptr); if (c == '\0') writeln("emptyStr only consists of an end of line character"); and sure enough, the writeln() was executed. Ok, So an empty string has a pointer which just points to C's end of line character. So is one form (Empty strings versus null strings) considered better than the other? Or does it depend on the context? Also as an aside (and I'm not trying to be flippant here), aren't all strings literals? I mean, can someone give me an example of a string non-literal?
Sep 13 2014
next sibling parent ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Sun, 14 Sep 2014 00:34:54 +0000
WhatMeWorry via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 So is one form (Empty strings versus null strings) considered=20
 better than the other?  Or does it depend on the context?
one is better than another in the sense that blue is better than green (or vice versa). ;-) don't count on that trailing zero, and don't count on empty string being null or points to somewhere. `.length` is all that matters.
 Also as an aside (and I'm not trying to be flippant here), aren't=20
 all strings literals?  I mean, can someone give me an example of=20
 a string non-literal?
string foo () { import std.conv; string s; foreach (i; 0..10) s ~=3D to!string(i); return s; } this function returns string, but that string is in no way built from literal. note that it's string *contents* are immutable, not the whole string structure. there is a difference between `immutable(char[])` and `immutable(char)[]`. that is why you can use `~=3D` on strings.
Sep 13 2014
prev sibling next sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 09/13/2014 05:34 PM, WhatMeWorry wrote:

 aren't all strings literals?
Literals are values that are typed as is in source code: http://en.wikipedia.org/wiki/Literal_%28computer_programming%29 Ali
Sep 13 2014
prev sibling parent reply "Kagamin" <spam here.lot> writes:
On Sunday, 14 September 2014 at 00:34:56 UTC, WhatMeWorry wrote:
 So is one form (Empty strings versus null strings) considered 
 better than the other?  Or does it depend on the context?
For all practical purposes they should be equivalent in D code. I suppose the distinction exists because somebody claimed he can make sense of it. Some API may rely on distinction between null and empty string, like XML DOM, though I don't think such interface is very useful. Also for some reason boolean value of a string is derived from ptr instead of length... meh.
Sep 14 2014
next sibling parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Sunday, 14 September 2014 at 09:07:26 UTC, Kagamin wrote:
 On Sunday, 14 September 2014 at 00:34:56 UTC, WhatMeWorry wrote:
 So is one form (Empty strings versus null strings) considered 
 better than the other?  Or does it depend on the context?
For all practical purposes they should be equivalent in D code. I suppose the distinction exists because somebody claimed he can make sense of it. Some API may rely on distinction between null and empty string, like XML DOM, though I don't think such interface is very useful. Also for some reason boolean value of a string is derived from ptr instead of length... meh.
Which makes sense given the distinction exists, IMO. Compare for example with Ruby, where empty strings and `0` integers also evaluate to true, but only `nil` and `false` evaluated to false.
Sep 14 2014
parent "AsmMan" <jckj33 gmail.com> writes:
On Saturday, 13 September 2014 at 23:21:09 UTC, David Nadlinger 
wrote:
 On Saturday, 13 September 2014 at 22:41:39 UTC, AsmMan wrote:
 D string are actullay C-strings?
No. But string *literals* are guaranteed to be 0-terminated for easier interoperability with C code. David
ah makes sense. On Sunday, 14 September 2014 at 12:07:16 UTC, Marc Schütz wrote:
 On Sunday, 14 September 2014 at 09:07:26 UTC, Kagamin wrote:
 On Sunday, 14 September 2014 at 00:34:56 UTC, WhatMeWorry 
 wrote:
 So is one form (Empty strings versus null strings) considered 
 better than the other?  Or does it depend on the context?
For all practical purposes they should be equivalent in D code. I suppose the distinction exists because somebody claimed he can make sense of it. Some API may rely on distinction between null and empty string, like XML DOM, though I don't think such interface is very useful. Also for some reason boolean value of a string is derived from ptr instead of length... meh.
Which makes sense given the distinction exists, IMO. Compare for example with Ruby, where empty strings and `0` integers also evaluate to true, but only `nil` and `false` evaluated to false.
That's why I don't like most of dynamic languages... type system is a mess. I don't like even the fact one can do: x = "abc"; f(x) x = 10; g(x); and it work
Sep 14 2014
prev sibling parent reply ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Sun, 14 Sep 2014 09:07:25 +0000
Kagamin via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 Also for some reason boolean value of a string is derived from=20
 ptr instead of length... meh.
for the reason that all reference objects either "null" or "non-null". empty string is non-null, so... it's C leftover actually. there are alot such leftovers in D.
Sep 14 2014
parent "Kagamin" <spam here.lot> writes:
On Sunday, 14 September 2014 at 13:48:01 UTC, ketmar via 
Digitalmars-d-learn wrote:
 for the reason that all reference objects either "null" or 
 "non-null".
 empty string is non-null, so... it's C leftover actually. there 
 are
 alot such leftovers in D.
For pointers it's logical, but it doesn't work as good for slices: they're better thought of as either empty or non-empty.
Sep 14 2014
prev sibling parent Mike Parker <aldacron gmail.com> writes:
On 9/14/2014 2:09 AM, WhatMeWorry wrote:
 The name string is aliased to immutable(char)[]

 Why was immutable chosen? Why not mutable.  Or why not just make another
 alias called

 strung where it is aliased to mutable(char)[]
If you want a mutable array of characters, just use char[]. --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
Sep 14 2014