digitalmars.D.learn - why is string not implicit convertable to const(char*) ?

mta`chrono (15/15) Jun 29 2012 does anyone know why string not implicit convertable to const(char*) ?

Jonathan M Davis (26/45) Jun 29 2012 Because it's _not_ const char*. It's an array. And passing a string dire...

mta`chrono (2/2) Jul 02 2012 Your answers are remarkable elaborated. Thanks for your great effort,
dcoder (7/74) Jul 05 2012 Thanks for the thorough explanation, but it begs the question why

Timon Gehr (6/11) Jul 05 2012 Because that is inefficient. It disables string slicing and is
Jonathan M Davis (14/21) Jul 05 2012 Are you serious? I'm shocked to hear anyone suggest that. Zero-terminate...

Wouter Verhelst (10/30) Jul 05 2012 To be fair, there are a _few_ areas in which zero-terminated strings may

Timon Gehr (5/34) Jul 05 2012 It is impossible to know that the memory block is large enough unless

Wouter Verhelst (15/25) Jul 05 2012 Sure it is, but not by looking at the string itself.

Timon Gehr (8/31) Jul 05 2012 This incurs the cost of determining the original string's length, which

Wouter Verhelst (17/18) Jul 06 2012 There are ways to know the original string's length without having to

Jonathan M Davis (8/11) Jul 06 2012 Well, then we're going to have to agree to disagree on that one. While s...

akaz (9/21) Jul 07 2012 I agree, despite the fact that it allows, in principle, creating

Jonathan M Davis (22/55) Jul 05 2012 Actually, I'd expect a string that maintains its length to beat a zero-

Wouter Verhelst (15/40) Jul 05 2012 Absolutely.

Jonathan M Davis (6/15) Jul 05 2012 There are a number of things that we do now with programming languages t...

dcoder (4/4) Jul 06 2012 Thanks for the lengthy threaded explanations. I just use the

mta`chrono <chrono mta-international.net> writes:

does anyone know why string not implicit convertable to const(char*) ?

-------
import core.sys.posix.unistd;

void main()
{
        // ok
        unlink("foo.txt");

        // failed
        string file = "bar.txt";
        unlink(file);
}

test.d(10): Error: function core.sys.posix.unistd.unlink (const(char*))
is not callable using argument types (string)
test.d(10): Error: cannot implicitly convert expression (file) of type
string to const(char*)

Jun 29 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, June 30, 2012 02:12:22 mta`chrono wrote:
 does anyone know why string not implicit convertable to const(char*) ?
 
 -------
 import core.sys.posix.unistd;
 
 void main()
 {
         // ok
         unlink("foo.txt");
 
         // failed
         string file = "bar.txt";
         unlink(file);
 }
 
 test.d(10): Error: function core.sys.posix.unistd.unlink (const(char*))
 is not callable using argument types (string)
 test.d(10): Error: cannot implicitly convert expression (file) of type
 string to const(char*)

Because it's _not_ const char*. It's an array. And passing a string directly 
to a C function (which is almost the only reason that you'd want a string to 
convert to a const char*) is generally _wrong_. Strings in D are _not_ zero-
terminated. String _literals_ are (they have a '\0' one character passed their 
end), so as it just so happens, if string implicitly converted to const char*, 
your code would work, but if your string had been created from anything other 
than a string literal, it would _not_ be zero terminated. Even concatenating 
two string literals results in a string which isn't zero-terminated. So, 
implictly converting strinvg to const char* would just cause bugs (in fact, it 
_used_ to work, and it was fixed so that it doesn't precisely because it's 
behavior which just causes bugs).

What you need to do is use std.string.toStringz. It converts a string to a 
zero-terminated string. It appends '\0' to the end of the string if it has to 
(which could result in the string having to be reallocated to make room for 
it), but if it can determine that it's unnecessary (which it can do at least 
some of the time with string literals), it'll just return the string's ptr 
property without doing any allocating. But since you _need_ that '\0', that's 
the best that you can do. Simply passing the string's ptr property to a C 
function would be wrong, since it's not zero-terminated.

Your function call should look like

unlink(toStringz(file));

Of course, you could just do std.file.remove(file), which ultimately does the 
same thing and does so on all platforms rather than just POSIX, but that's a 
separate issue from converting a string to a const char*.

- Jonathan M Davis

Jun 29 2012

mta`chrono <chrono mta-international.net> writes:

Your answers are remarkable elaborated. Thanks for your great effort,
Jonathan!! ;-)

Jul 02 2012

"dcoder" <dcoder nowhere.com> writes:

On Saturday, 30 June 2012 at 00:27:46 UTC, Jonathan M Davis wrote:
 On Saturday, June 30, 2012 02:12:22 mta`chrono wrote:
 does anyone know why string not implicit convertable to 
 const(char*) ?
 
 -------
 import core.sys.posix.unistd;
 
 void main()
 {
         // ok
         unlink("foo.txt");
 
         // failed
         string file = "bar.txt";
         unlink(file);
 }
 
 test.d(10): Error: function core.sys.posix.unistd.unlink 
 (const(char*))
 is not callable using argument types (string)
 test.d(10): Error: cannot implicitly convert expression (file) 
 of type
 string to const(char*)

 Because it's _not_ const char*. It's an array. And passing a 
 string directly
 to a C function (which is almost the only reason that you'd 
 want a string to
 convert to a const char*) is generally _wrong_. Strings in D 
 are _not_ zero-
 terminated. String _literals_ are (they have a '\0' one 
 character passed their
 end), so as it just so happens, if string implicitly converted 
 to const char*,
 your code would work, but if your string had been created from 
 anything other
 than a string literal, it would _not_ be zero terminated. Even 
 concatenating
 two string literals results in a string which isn't 
 zero-terminated. So,
 implictly converting strinvg to const char* would just cause 
 bugs (in fact, it
 _used_ to work, and it was fixed so that it doesn't precisely 
 because it's
 behavior which just causes bugs).

 What you need to do is use std.string.toStringz. It converts a 
 string to a
 zero-terminated string. It appends '\0' to the end of the 
 string if it has to
 (which could result in the string having to be reallocated to 
 make room for
 it), but if it can determine that it's unnecessary (which it 
 can do at least
 some of the time with string literals), it'll just return the 
 string's ptr
 property without doing any allocating. But since you _need_ 
 that '\0', that's
 the best that you can do. Simply passing the string's ptr 
 property to a C
 function would be wrong, since it's not zero-terminated.

 Your function call should look like

 unlink(toStringz(file));

 Of course, you could just do std.file.remove(file), which 
 ultimately does the
 same thing and does so on all platforms rather than just POSIX, 
 but that's a
 separate issue from converting a string to a const char*.

 - Jonathan M Davis

Thanks for the thorough explanation, but it begs the question why 
not make strings be array of chars that have \0 at the end of it? 
  Since, lots of D programmers were/are probably C/C++ 
programmers, why should D be different here?  Wouldn't it 
facilitate more C/C++ programmers to come to D?

Just curious.

Jul 05 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 07/05/2012 09:32 PM, dcoder wrote:
 Thanks for the thorough explanation, but it begs the question why not
 make strings be array of chars that have \0 at the end of it?

Because that is inefficient. It disables string slicing and is 
completely redundant.

BTW: String literals are guaranteed to be zero-terminated.

 Since, lots of D programmers were/are probably C/C++ programmers, why should D
 be different here?

Because it is a superior model.

 Wouldn't it facilitate more C/C++ programmers to come to D?

Why would that matter?

Jul 05 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, July 05, 2012 21:32:11 dcoder wrote:
 Thanks for the thorough explanation, but it begs the question why
 not make strings be array of chars that have \0 at the end of it?
   Since, lots of D programmers were/are probably C/C++
 programmers, why should D be different here?  Wouldn't it
 facilitate more C/C++ programmers to come to D?
 
 Just curious.

Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated 
strings are one of the largest mistakes in programming history. They're 
insanely inefficient. In fact, IIRC Walter Bright has stated that he thinks
that 
having arrays without a length property was C's greatest mistake (and if 
they'd had that, they wouldn't have created zero-terminated strings).

C++ tried to fix it with std::string, but C compatability bites you everywhere 
with that, so it only halfway works. C++ programmers in general would probably 
have thought that the designers of D were idiots if they had gone with zero-
terminated strings.

You don't do what another language did just to match. You do it because what 
they did works and you have no reason to change it. Zero-terminated strings 
were a horrible idea, and we're not about to copy it.

- Jonathan M Davis

Jul 05 2012

Wouter Verhelst <wouter grep.be> writes:

Jonathan M Davis <jmdavisProg gmx.com> writes:

 On Thursday, July 05, 2012 21:32:11 dcoder wrote:
 Thanks for the thorough explanation, but it begs the question why
 not make strings be array of chars that have \0 at the end of it?
   Since, lots of D programmers were/are probably C/C++
 programmers, why should D be different here?  Wouldn't it
 facilitate more C/C++ programmers to come to D?
 
 Just curious.

 Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated 
 strings are one of the largest mistakes in programming history. They're 
 insanely inefficient. In fact, IIRC Walter Bright has stated that he thinks
that 
 having arrays without a length property was C's greatest mistake (and if 
 they'd had that, they wouldn't have created zero-terminated strings).

 C++ tried to fix it with std::string, but C compatability bites you everywhere 
 with that, so it only halfway works. C++ programmers in general would probably 
 have thought that the designers of D were idiots if they had gone with zero-
 terminated strings.

 You don't do what another language did just to match. You do it because what 
 they did works and you have no reason to change it. Zero-terminated strings 
 were a horrible idea, and we're not about to copy it.

To be fair, there are a _few_ areas in which zero-terminated strings may
possibly outperform zero-terminated strings (appending data in the case
where you know the memory block is large enough, for instance). But
they're far and few between, and it would indeed be silly to switch to
zero-terminated strings.

-- 
The volume of a pizza of thickness a and radius z can be described by
the following formula:

pi zz a

Jul 05 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 07/06/2012 02:57 AM, Wouter Verhelst wrote:
 Jonathan M Davis<jmdavisProg gmx.com>  writes:

 On Thursday, July 05, 2012 21:32:11 dcoder wrote:
 Thanks for the thorough explanation, but it begs the question why
 not make strings be array of chars that have \0 at the end of it?
    Since, lots of D programmers were/are probably C/C++
 programmers, why should D be different here?  Wouldn't it
 facilitate more C/C++ programmers to come to D?

 Just curious.

 Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated
 strings are one of the largest mistakes in programming history. They're
 insanely inefficient. In fact, IIRC Walter Bright has stated that he thinks
that
 having arrays without a length property was C's greatest mistake (and if
 they'd had that, they wouldn't have created zero-terminated strings).

 C++ tried to fix it with std::string, but C compatability bites you everywhere
 with that, so it only halfway works. C++ programmers in general would probably
 have thought that the designers of D were idiots if they had gone with zero-
 terminated strings.

 You don't do what another language did just to match. You do it because what
 they did works and you have no reason to change it. Zero-terminated strings
 were a horrible idea, and we're not about to copy it.

 To be fair, there are a _few_ areas in which zero-terminated strings may
 possibly outperform zero-terminated strings (appending data in the case
 where you know the memory block is large enough, for instance).

It is impossible to know that the memory block is large enough unless
the length of the string is known. But it isn't.


 But they're far and few between, and it would indeed be silly to switch to
 zero-terminated strings.

There is no string manipulation that is significantly faster with
zero-terminated strings.

Jul 05 2012

Wouter Verhelst <wouter grep.be> writes:

Timon Gehr <timon.gehr gmx.ch> writes:

 On 07/06/2012 02:57 AM, Wouter Verhelst wrote:
 To be fair, there are a _few_ areas in which zero-terminated strings may
 possibly outperform zero-terminated strings (appending data in the case
 where you know the memory block is large enough, for instance).

 It is impossible to know that the memory block is large enough unless
 the length of the string is known. But it isn't.

Sure it is, but not by looking at the string itself.

Say you have a string that contains some data you need, and some other
data you don't. I.e., you want to throw out parts of the string.

You could allocate a memory block that's as large as the original string
(so you're sure you've got enough space), and then start memcpy'ing
stuff into the new memory block from the old string.

This way you're sure you won't overrun your zero-terminated string, and
you'll be a slight bit faster than you would be with a bounded string.

I'll readily admit I haven't don't this all that often, though :-)

 But they're far and few between, and it would indeed be silly to switch to
 zero-terminated strings.

 There is no string manipulation that is significantly faster with
 zero-terminated strings.

Correct -- but only because you said "significantly".

-- 
The volume of a pizza of thickness a and radius z can be described by
the following formula:

pi zz a

Jul 05 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 07/06/2012 03:40 AM, Wouter Verhelst wrote:
 Timon Gehr<timon.gehr gmx.ch>  writes:

 On 07/06/2012 02:57 AM, Wouter Verhelst wrote:
 To be fair, there are a _few_ areas in which zero-terminated strings may
 possibly outperform zero-terminated strings (appending data in the case
 where you know the memory block is large enough, for instance).

 It is impossible to know that the memory block is large enough unless
 the length of the string is known. But it isn't.

 Sure it is, but not by looking at the string itself.

 Say you have a string that contains some data you need, and some other
 data you don't. I.e., you want to throw out parts of the string.

 You could allocate a memory block that's as large as the original string
 (so you're sure you've got enough space), and then start memcpy'ing
 stuff into the new memory block from the old string.

This incurs the cost of determining the original string's length, which 
is higher than computing the new string length for the data&length
representation.

 This way you're sure you won't overrun your zero-terminated string, and
 you'll be a slight bit faster than you would be with a bounded string.

Are you talking about differences of a few operations that are
completely hidden on a modern out-of-order CPU? I don't think the
zero-terminated string method will even perform less operations.

 I'll readily admit I haven't don't this all that often, though :-)

 But they're far and few between, and it would indeed be silly to switch to
 zero-terminated strings.

 There is no string manipulation that is significantly faster with
 zero-terminated strings.

 Correct -- but only because you said "significantly".

I meant to say, 'measurably'.

Jul 05 2012

Wouter Verhelst <wouter grep.be> writes:

Timon Gehr <timon.gehr gmx.ch> writes:
 This incurs the cost of determining the original string's length,

There are ways to know the original string's length without having to
calculate it. E.g., if you do a read() with some length and you don't
get an error message indicating that there aren't as many characters
available, you can be pretty sure of the string's length. If the
"original" string is a string you read in from a file, there'll be no
need to count characters.

Anyway, this is all besides the point. I think it's safe to say we agree
that bounded strings and arrays are superior to zero-terminated
strings (at least in all but a few corner cases). However, I also happen
to think that saying "zero-terminated strings were a horrendous design
decision" is a bit short-sighted.

That's all.

-- 
The volume of a pizza of thickness a and radius z can be described by
the following formula:

pi zz a

Jul 06 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, July 06, 2012 12:56:36 Wouter Verhelst wrote:
 However, I also happen
 to think that saying "zero-terminated strings were a horrendous design
 decision" is a bit short-sighted.

Well, then we're going to have to agree to disagree on that one. While some 
design decisions may have made more sense at the time they were made or the 
ultimate pros and cons may not have been clear at the time, I think that zero-
terminated strings are one of the design decisions which was truly short-
sighted and an enormous mistake all around, and all C/C++ programmers have had 
to pay for it ever since.

- Jonathan M Davis

Jul 06 2012

"akaz" <nemo utopia.com> writes:

 Well, then we're going to have to agree to disagree on that 
 one. While some
 design decisions may have made more sense at the time they were 
 made or the
 ultimate pros and cons may not have been clear at the time, I 
 think that zero-
 terminated strings are one of the design decisions which was 
 truly short-
 sighted and an enormous mistake all around, and all C/C++ 
 programmers have had
 to pay for it ever since.

 - Jonathan M Davis

I agree, despite the fact that it allows, in principle, creating 
strings as long as desired with constant cost (just one byte is 
sacrificed, instead of one, two, three, four etc. required to 
represent the length). Besides, using zero-terminated strings did 
not impose, in principle (forget about machine addressing issues) 
no upper bound on the length of a string.

But, OTOH, it was also the only way to do it once the decision to 
not incorporate length in arrays (basically, under the 
assumption: an array is a pointer and nothing more) was made.

Jul 07 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, July 05, 2012 18:57:05 Wouter Verhelst wrote:
 Jonathan M Davis <jmdavisProg gmx.com> writes:
 On Thursday, July 05, 2012 21:32:11 dcoder wrote:
 Thanks for the thorough explanation, but it begs the question why
 not make strings be array of chars that have \0 at the end of it?
 
   Since, lots of D programmers were/are probably C/C++
 
 programmers, why should D be different here?  Wouldn't it
 facilitate more C/C++ programmers to come to D?
 
 Just curious.

 
 Are you serious? I'm shocked to hear anyone suggest that. Zero-terminated
 strings are one of the largest mistakes in programming history. They're
 insanely inefficient. In fact, IIRC Walter Bright has stated that he
 thinks that having arrays without a length property was C's greatest
 mistake (and if they'd had that, they wouldn't have created
 zero-terminated strings).
 
 C++ tried to fix it with std::string, but C compatability bites you
 everywhere with that, so it only halfway works. C++ programmers in
 general would probably have thought that the designers of D were idiots
 if they had gone with zero- terminated strings.
 
 You don't do what another language did just to match. You do it because
 what they did works and you have no reason to change it. Zero-terminated
 strings were a horrible idea, and we're not about to copy it.

 
 To be fair, there are a _few_ areas in which zero-terminated strings may
 possibly outperform zero-terminated strings (appending data in the case
 where you know the memory block is large enough, for instance). But
 they're far and few between, and it would indeed be silly to switch to
 zero-terminated strings.

Actually, I'd expect a string that maintains its length to beat a zero-
terminated string at that - especially because you'd have to already know the 
string's length to pull that off, which is O(n) for zero-terminated strings. 
The _only_ time that the zero-terminated string might outperform the one which 
maintained its length when you to append is if you already happen to know the 
length of the string being appended to and the string being appended (which 
you wouldn't normally with zero-terminated strings), because then the zero-
terminated string would have one more byte to copy as part of its memcpy than 
the other string would, but the other string would have to adjust its length, 
making it cost _slightly_ more. But really, given the overal costs of zero-
terminated length, it would be ridiculous to even count that extra bit of 
performance given the _huge_ performance losses everywhere else with them.

The _only_ valid excuse that I'm aware of for picking such a horrid design is 
the fact that it costs extra memory to maintain the length of an array along 
with the array, and when C was created, they cared a _lot_ more about memory 
usage than we do today. So, regardless of what the pros or cons were in the 
short run, in the long run, their decision was a very poor one that pretty 
much no one has duplicated.

I really see no reason to cut them any slack for such a horrible design 
decision.

- Jonathan M Davis

Jul 05 2012

Wouter Verhelst <wouter grep.be> writes:

Jonathan M Davis <jmdavisProg gmx.com> writes:

 On Thursday, July 05, 2012 18:57:05 Wouter Verhelst wrote:
 To be fair, there are a _few_ areas in which zero-terminated strings may
 possibly outperform zero-terminated strings (appending data in the case
 where you know the memory block is large enough, for instance). But
 they're far and few between, and it would indeed be silly to switch to
 zero-terminated strings.

 Actually, I'd expect a string that maintains its length to beat a zero-
 terminated string at that - especially because you'd have to already know the 
 string's length to pull that off, which is O(n) for zero-terminated strings. 
 The _only_ time that the zero-terminated string might outperform the one which 
 maintained its length when you to append is if you already happen to know the 
 length of the string being appended to and the string being appended (which 
 you wouldn't normally with zero-terminated strings), because then the zero-
 terminated string would have one more byte to copy as part of its memcpy than 
 the other string would, but the other string would have to adjust its length, 
 making it cost _slightly_ more.

That's what I meant, yes.

 But really, given the overal costs of zero-
 terminated length, it would be ridiculous to even count that extra bit of 
 performance given the _huge_ performance losses everywhere else with them.

Absolutely.

 The _only_ valid excuse that I'm aware of for picking such a horrid design is 
 the fact that it costs extra memory to maintain the length of an array along 
 with the array, and when C was created, they cared a _lot_ more about memory 
 usage than we do today. So, regardless of what the pros or cons were in the 
 short run, in the long run, their decision was a very poor one that pretty 
 much no one has duplicated.

Well, really, strings in C are just a special case of arrays (as is true
in D as well), and arrays in C are just a special case of pointers
(which isn't true in D). That means the language is fairly compact,
which also means the compiler has much lower resource
requirements. I think that, much more than any requirements at runtime,
has driven the choice for zero-terminated strings.

Just for comparison, what happens to DMD's memory usage when you do
extensive templating wouldn't have been possible back in 1969 ;-)

-- 
The volume of a pizza of thickness a and radius z can be described by
the following formula:

pi zz a

Jul 05 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, July 05, 2012 19:59:26 Wouter Verhelst wrote:
 Well, really, strings in C are just a special case of arrays (as is true
 in D as well), and arrays in C are just a special case of pointers
 (which isn't true in D). That means the language is fairly compact,
 which also means the compiler has much lower resource
 requirements. I think that, much more than any requirements at runtime,
 has driven the choice for zero-terminated strings.
 
 Just for comparison, what happens to DMD's memory usage when you do
 extensive templating wouldn't have been possible back in 1969 ;-)

There are a number of things that we do now with programming languages that 
you couldn't do when C was created. Having arrays that know their length is 
not one of them. Other languages in that time frame did it. C made the 
horrendous mistake of not doing it.

- Jonathan M Davis

Jul 05 2012

"dcoder" <dcoder nowhere.com> writes:

Thanks for the lengthy threaded explanations.  I just use the 
language to write applications, I have no idea of the challenges 
that you must face to implement/design a language.  Hence the 
stupid questions.  :)  Anyways, fascinating stuff.

Jul 06 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - why is string not implicit convertable to const(char*) ?