www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - why is string not implicit convertable to const(char*) ?

reply mta`chrono <chrono mta-international.net> writes:
does anyone know why string not implicit convertable to const(char*) ?

-------
import core.sys.posix.unistd;

void main()
{
        // ok
        unlink("foo.txt");

        // failed
        string file = "bar.txt";
        unlink(file);
}

test.d(10): Error: function core.sys.posix.unistd.unlink (const(char*))
is not callable using argument types (string)
test.d(10): Error: cannot implicitly convert expression (file) of type
string to const(char*)
Jun 29 2012
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, June 30, 2012 02:12:22 mta`chrono wrote:
 does anyone know why string not implicit convertable to const(char*) ?
 
 -------
 import core.sys.posix.unistd;
 
 void main()
 {
         // ok
         unlink("foo.txt");
 
         // failed
         string file = "bar.txt";
         unlink(file);
 }
 
 test.d(10): Error: function core.sys.posix.unistd.unlink (const(char*))
 is not callable using argument types (string)
 test.d(10): Error: cannot implicitly convert expression (file) of type
 string to const(char*)

Because it's _not_ const char*. It's an array. And passing a string directly to a C function (which is almost the only reason that you'd want a string to convert to a const char*) is generally _wrong_. Strings in D are _not_ zero- terminated. String _literals_ are (they have a '\0' one character passed their end), so as it just so happens, if string implicitly converted to const char*, your code would work, but if your string had been created from anything other than a string literal, it would _not_ be zero terminated. Even concatenating two string literals results in a string which isn't zero-terminated. So, implictly converting strinvg to const char* would just cause bugs (in fact, it _used_ to work, and it was fixed so that it doesn't precisely because it's behavior which just causes bugs). What you need to do is use std.string.toStringz. It converts a string to a zero-terminated string. It appends '\0' to the end of the string if it has to (which could result in the string having to be reallocated to make room for it), but if it can determine that it's unnecessary (which it can do at least some of the time with string literals), it'll just return the string's ptr property without doing any allocating. But since you _need_ that '\0', that's the best that you can do. Simply passing the string's ptr property to a C function would be wrong, since it's not zero-terminated. Your function call should look like unlink(toStringz(file)); Of course, you could just do std.file.remove(file), which ultimately does the same thing and does so on all platforms rather than just POSIX, but that's a separate issue from converting a string to a const char*. - Jonathan M Davis
Jun 29 2012
next sibling parent mta`chrono <chrono mta-international.net> writes:
Your answers are remarkable elaborated. Thanks for your great effort,
Jonathan!! ;-)
Jul 02 2012
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 07/05/2012 09:32 PM, dcoder wrote:
 Thanks for the thorough explanation, but it begs the question why not
 make strings be array of chars that have \0 at the end of it?

Because that is inefficient. It disables string slicing and is completely redundant. BTW: String literals are guaranteed to be zero-terminated.
 Since, lots of D programmers were/are probably C/C++ programmers, why should D
 be different here?

Because it is a superior model.
 Wouldn't it facilitate more C/C++ programmers to come to D?

Why would that matter?
Jul 05 2012
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 07/06/2012 02:57 AM, Wouter Verhelst wrote:
 Jonathan M Davis<jmdavisProg gmx.com>  writes:

 On Thursday, July 05, 2012 21:32:11 dcoder wrote:
 Thanks for the thorough explanation, but it begs the question why
 not make strings be array of chars that have \0 at the end of it?
    Since, lots of D programmers were/are probably C/C++
 programmers, why should D be different here?  Wouldn't it
 facilitate more C/C++ programmers to come to D?

 Just curious.

Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated strings are one of the largest mistakes in programming history. They're insanely inefficient. In fact, IIRC Walter Bright has stated that he thinks that having arrays without a length property was C's greatest mistake (and if they'd had that, they wouldn't have created zero-terminated strings). C++ tried to fix it with std::string, but C compatability bites you everywhere with that, so it only halfway works. C++ programmers in general would probably have thought that the designers of D were idiots if they had gone with zero- terminated strings. You don't do what another language did just to match. You do it because what they did works and you have no reason to change it. Zero-terminated strings were a horrible idea, and we're not about to copy it.

To be fair, there are a _few_ areas in which zero-terminated strings may possibly outperform zero-terminated strings (appending data in the case where you know the memory block is large enough, for instance).

It is impossible to know that the memory block is large enough unless the length of the string is known. But it isn't.
 But they're far and few between, and it would indeed be silly to switch to
 zero-terminated strings.

There is no string manipulation that is significantly faster with zero-terminated strings.
Jul 05 2012
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 07/06/2012 03:40 AM, Wouter Verhelst wrote:
 Timon Gehr<timon.gehr gmx.ch>  writes:

 On 07/06/2012 02:57 AM, Wouter Verhelst wrote:
 To be fair, there are a _few_ areas in which zero-terminated strings may
 possibly outperform zero-terminated strings (appending data in the case
 where you know the memory block is large enough, for instance).

It is impossible to know that the memory block is large enough unless the length of the string is known. But it isn't.

Sure it is, but not by looking at the string itself. Say you have a string that contains some data you need, and some other data you don't. I.e., you want to throw out parts of the string. You could allocate a memory block that's as large as the original string (so you're sure you've got enough space), and then start memcpy'ing stuff into the new memory block from the old string.

This incurs the cost of determining the original string's length, which is higher than computing the new string length for the data&length representation.
 This way you're sure you won't overrun your zero-terminated string, and
 you'll be a slight bit faster than you would be with a bounded string.

Are you talking about differences of a few operations that are completely hidden on a modern out-of-order CPU? I don't think the zero-terminated string method will even perform less operations.
 I'll readily admit I haven't don't this all that often, though :-)

 But they're far and few between, and it would indeed be silly to switch to
 zero-terminated strings.

There is no string manipulation that is significantly faster with zero-terminated strings.

Correct -- but only because you said "significantly".

I meant to say, 'measurably'.
Jul 05 2012
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, July 06, 2012 12:56:36 Wouter Verhelst wrote:
 However, I also happen
 to think that saying "zero-terminated strings were a horrendous design
 decision" is a bit short-sighted.

Well, then we're going to have to agree to disagree on that one. While some design decisions may have made more sense at the time they were made or the ultimate pros and cons may not have been clear at the time, I think that zero- terminated strings are one of the design decisions which was truly short- sighted and an enormous mistake all around, and all C/C++ programmers have had to pay for it ever since. - Jonathan M Davis
Jul 06 2012
prev sibling parent "akaz" <nemo utopia.com> writes:
 Well, then we're going to have to agree to disagree on that 
 one. While some
 design decisions may have made more sense at the time they were 
 made or the
 ultimate pros and cons may not have been clear at the time, I 
 think that zero-
 terminated strings are one of the design decisions which was 
 truly short-
 sighted and an enormous mistake all around, and all C/C++ 
 programmers have had
 to pay for it ever since.

 - Jonathan M Davis

I agree, despite the fact that it allows, in principle, creating strings as long as desired with constant cost (just one byte is sacrificed, instead of one, two, three, four etc. required to represent the length). Besides, using zero-terminated strings did not impose, in principle (forget about machine addressing issues) no upper bound on the length of a string. But, OTOH, it was also the only way to do it once the decision to not incorporate length in arrays (basically, under the assumption: an array is a pointer and nothing more) was made.
Jul 07 2012
prev sibling next sibling parent "dcoder" <dcoder nowhere.com> writes:
On Saturday, 30 June 2012 at 00:27:46 UTC, Jonathan M Davis wrote:
 On Saturday, June 30, 2012 02:12:22 mta`chrono wrote:
 does anyone know why string not implicit convertable to 
 const(char*) ?
 
 -------
 import core.sys.posix.unistd;
 
 void main()
 {
         // ok
         unlink("foo.txt");
 
         // failed
         string file = "bar.txt";
         unlink(file);
 }
 
 test.d(10): Error: function core.sys.posix.unistd.unlink 
 (const(char*))
 is not callable using argument types (string)
 test.d(10): Error: cannot implicitly convert expression (file) 
 of type
 string to const(char*)

Because it's _not_ const char*. It's an array. And passing a string directly to a C function (which is almost the only reason that you'd want a string to convert to a const char*) is generally _wrong_. Strings in D are _not_ zero- terminated. String _literals_ are (they have a '\0' one character passed their end), so as it just so happens, if string implicitly converted to const char*, your code would work, but if your string had been created from anything other than a string literal, it would _not_ be zero terminated. Even concatenating two string literals results in a string which isn't zero-terminated. So, implictly converting strinvg to const char* would just cause bugs (in fact, it _used_ to work, and it was fixed so that it doesn't precisely because it's behavior which just causes bugs). What you need to do is use std.string.toStringz. It converts a string to a zero-terminated string. It appends '\0' to the end of the string if it has to (which could result in the string having to be reallocated to make room for it), but if it can determine that it's unnecessary (which it can do at least some of the time with string literals), it'll just return the string's ptr property without doing any allocating. But since you _need_ that '\0', that's the best that you can do. Simply passing the string's ptr property to a C function would be wrong, since it's not zero-terminated. Your function call should look like unlink(toStringz(file)); Of course, you could just do std.file.remove(file), which ultimately does the same thing and does so on all platforms rather than just POSIX, but that's a separate issue from converting a string to a const char*. - Jonathan M Davis

Thanks for the thorough explanation, but it begs the question why not make strings be array of chars that have \0 at the end of it? Since, lots of D programmers were/are probably C/C++ programmers, why should D be different here? Wouldn't it facilitate more C/C++ programmers to come to D? Just curious.
Jul 05 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, July 05, 2012 21:32:11 dcoder wrote:
 Thanks for the thorough explanation, but it begs the question why
 not make strings be array of chars that have \0 at the end of it?
   Since, lots of D programmers were/are probably C/C++
 programmers, why should D be different here?  Wouldn't it
 facilitate more C/C++ programmers to come to D?
 
 Just curious.

Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated strings are one of the largest mistakes in programming history. They're insanely inefficient. In fact, IIRC Walter Bright has stated that he thinks that having arrays without a length property was C's greatest mistake (and if they'd had that, they wouldn't have created zero-terminated strings). C++ tried to fix it with std::string, but C compatability bites you everywhere with that, so it only halfway works. C++ programmers in general would probably have thought that the designers of D were idiots if they had gone with zero- terminated strings. You don't do what another language did just to match. You do it because what they did works and you have no reason to change it. Zero-terminated strings were a horrible idea, and we're not about to copy it. - Jonathan M Davis
Jul 05 2012
prev sibling next sibling parent Wouter Verhelst <wouter grep.be> writes:
Jonathan M Davis <jmdavisProg gmx.com> writes:

 On Thursday, July 05, 2012 21:32:11 dcoder wrote:
 Thanks for the thorough explanation, but it begs the question why
 not make strings be array of chars that have \0 at the end of it?
   Since, lots of D programmers were/are probably C/C++
 programmers, why should D be different here?  Wouldn't it
 facilitate more C/C++ programmers to come to D?
 
 Just curious.

Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated strings are one of the largest mistakes in programming history. They're insanely inefficient. In fact, IIRC Walter Bright has stated that he thinks that having arrays without a length property was C's greatest mistake (and if they'd had that, they wouldn't have created zero-terminated strings). C++ tried to fix it with std::string, but C compatability bites you everywhere with that, so it only halfway works. C++ programmers in general would probably have thought that the designers of D were idiots if they had gone with zero- terminated strings. You don't do what another language did just to match. You do it because what they did works and you have no reason to change it. Zero-terminated strings were a horrible idea, and we're not about to copy it.

To be fair, there are a _few_ areas in which zero-terminated strings may possibly outperform zero-terminated strings (appending data in the case where you know the memory block is large enough, for instance). But they're far and few between, and it would indeed be silly to switch to zero-terminated strings. -- The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz a
Jul 05 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, July 05, 2012 18:57:05 Wouter Verhelst wrote:
 Jonathan M Davis <jmdavisProg gmx.com> writes:
 On Thursday, July 05, 2012 21:32:11 dcoder wrote:
 Thanks for the thorough explanation, but it begs the question why
 not make strings be array of chars that have \0 at the end of it?
 
   Since, lots of D programmers were/are probably C/C++
 
 programmers, why should D be different here?  Wouldn't it
 facilitate more C/C++ programmers to come to D?
 
 Just curious.

Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated strings are one of the largest mistakes in programming history. They're insanely inefficient. In fact, IIRC Walter Bright has stated that he thinks that having arrays without a length property was C's greatest mistake (and if they'd had that, they wouldn't have created zero-terminated strings). C++ tried to fix it with std::string, but C compatability bites you everywhere with that, so it only halfway works. C++ programmers in general would probably have thought that the designers of D were idiots if they had gone with zero- terminated strings. You don't do what another language did just to match. You do it because what they did works and you have no reason to change it. Zero-terminated strings were a horrible idea, and we're not about to copy it.

To be fair, there are a _few_ areas in which zero-terminated strings may possibly outperform zero-terminated strings (appending data in the case where you know the memory block is large enough, for instance). But they're far and few between, and it would indeed be silly to switch to zero-terminated strings.

Actually, I'd expect a string that maintains its length to beat a zero- terminated string at that - especially because you'd have to already know the string's length to pull that off, which is O(n) for zero-terminated strings. The _only_ time that the zero-terminated string might outperform the one which maintained its length when you to append is if you already happen to know the length of the string being appended to and the string being appended (which you wouldn't normally with zero-terminated strings), because then the zero- terminated string would have one more byte to copy as part of its memcpy than the other string would, but the other string would have to adjust its length, making it cost _slightly_ more. But really, given the overal costs of zero- terminated length, it would be ridiculous to even count that extra bit of performance given the _huge_ performance losses everywhere else with them. The _only_ valid excuse that I'm aware of for picking such a horrid design is the fact that it costs extra memory to maintain the length of an array along with the array, and when C was created, they cared a _lot_ more about memory usage than we do today. So, regardless of what the pros or cons were in the short run, in the long run, their decision was a very poor one that pretty much no one has duplicated. I really see no reason to cut them any slack for such a horrible design decision. - Jonathan M Davis
Jul 05 2012
prev sibling next sibling parent Wouter Verhelst <wouter grep.be> writes:
Timon Gehr <timon.gehr gmx.ch> writes:

 On 07/06/2012 02:57 AM, Wouter Verhelst wrote:
 To be fair, there are a _few_ areas in which zero-terminated strings may
 possibly outperform zero-terminated strings (appending data in the case
 where you know the memory block is large enough, for instance).

It is impossible to know that the memory block is large enough unless the length of the string is known. But it isn't.

Sure it is, but not by looking at the string itself. Say you have a string that contains some data you need, and some other data you don't. I.e., you want to throw out parts of the string. You could allocate a memory block that's as large as the original string (so you're sure you've got enough space), and then start memcpy'ing stuff into the new memory block from the old string. This way you're sure you won't overrun your zero-terminated string, and you'll be a slight bit faster than you would be with a bounded string. I'll readily admit I haven't don't this all that often, though :-)
 But they're far and few between, and it would indeed be silly to switch to
 zero-terminated strings.

There is no string manipulation that is significantly faster with zero-terminated strings.

Correct -- but only because you said "significantly". -- The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz a
Jul 05 2012
prev sibling next sibling parent Wouter Verhelst <wouter grep.be> writes:
Jonathan M Davis <jmdavisProg gmx.com> writes:

 On Thursday, July 05, 2012 18:57:05 Wouter Verhelst wrote:
 To be fair, there are a _few_ areas in which zero-terminated strings may
 possibly outperform zero-terminated strings (appending data in the case
 where you know the memory block is large enough, for instance). But
 they're far and few between, and it would indeed be silly to switch to
 zero-terminated strings.

Actually, I'd expect a string that maintains its length to beat a zero- terminated string at that - especially because you'd have to already know the string's length to pull that off, which is O(n) for zero-terminated strings. The _only_ time that the zero-terminated string might outperform the one which maintained its length when you to append is if you already happen to know the length of the string being appended to and the string being appended (which you wouldn't normally with zero-terminated strings), because then the zero- terminated string would have one more byte to copy as part of its memcpy than the other string would, but the other string would have to adjust its length, making it cost _slightly_ more.

That's what I meant, yes.
 But really, given the overal costs of zero-
 terminated length, it would be ridiculous to even count that extra bit of 
 performance given the _huge_ performance losses everywhere else with them.

Absolutely.
 The _only_ valid excuse that I'm aware of for picking such a horrid design is 
 the fact that it costs extra memory to maintain the length of an array along 
 with the array, and when C was created, they cared a _lot_ more about memory 
 usage than we do today. So, regardless of what the pros or cons were in the 
 short run, in the long run, their decision was a very poor one that pretty 
 much no one has duplicated.

Well, really, strings in C are just a special case of arrays (as is true in D as well), and arrays in C are just a special case of pointers (which isn't true in D). That means the language is fairly compact, which also means the compiler has much lower resource requirements. I think that, much more than any requirements at runtime, has driven the choice for zero-terminated strings. Just for comparison, what happens to DMD's memory usage when you do extensive templating wouldn't have been possible back in 1969 ;-) -- The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz a
Jul 05 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, July 05, 2012 19:59:26 Wouter Verhelst wrote:
 Well, really, strings in C are just a special case of arrays (as is true
 in D as well), and arrays in C are just a special case of pointers
 (which isn't true in D). That means the language is fairly compact,
 which also means the compiler has much lower resource
 requirements. I think that, much more than any requirements at runtime,
 has driven the choice for zero-terminated strings.
 
 Just for comparison, what happens to DMD's memory usage when you do
 extensive templating wouldn't have been possible back in 1969 ;-)

There are a number of things that we do now with programming languages that you couldn't do when C was created. Having arrays that know their length is not one of them. Other languages in that time frame did it. C made the horrendous mistake of not doing it. - Jonathan M Davis
Jul 05 2012
prev sibling next sibling parent "dcoder" <dcoder nowhere.com> writes:
Thanks for the lengthy threaded explanations.  I just use the 
language to write applications, I have no idea of the challenges 
that you must face to implement/design a language.  Hence the 
stupid questions.  :)  Anyways, fascinating stuff.
Jul 06 2012
prev sibling parent Wouter Verhelst <wouter grep.be> writes:
Timon Gehr <timon.gehr gmx.ch> writes:
 This incurs the cost of determining the original string's length,

There are ways to know the original string's length without having to calculate it. E.g., if you do a read() with some length and you don't get an error message indicating that there aren't as many characters available, you can be pretty sure of the string's length. If the "original" string is a string you read in from a file, there'll be no need to count characters. Anyway, this is all besides the point. I think it's safe to say we agree that bounded strings and arrays are superior to zero-terminated strings (at least in all but a few corner cases). However, I also happen to think that saying "zero-terminated strings were a horrendous design decision" is a bit short-sighted. That's all. -- The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz a
Jul 06 2012