www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Idea: Introduce zero-terminated string specifier

reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
I've noticed I'm having to do a lot of to!string calls when I want to
call the versatile writef() function. So I was thinking, why not
introduce a special zero-terminated string specifier which would both
alleviate the need to call to!string and would probably save on
needless memory allocation. If all we want to do is print something,
why waste time duplicating a string?

Let's say we call the new specifier %zs (we can debate for the actual name):

extern(C) const(void)* GetName();  // e.g. some C api functions..
extern(C) const(void)* GetLastName();

Before:
writefln("Name %s, Last Name %s", to!string(GetName()),
to!string(GetLastName()));

After:
writefln("Name %zs, Last Name %zs", GetName(), GetLastName());

Of course in this simple case you could just use printf(), but
remember that writef() is much more versatile and allows you to
specify %s to match any type. It would be great to match printf's
original meaning of %s with another specifier.
Sep 28 2012
next sibling parent reply =?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:
On 29-09-2012 04:08, Andrej Mitrovic wrote:
 I've noticed I'm having to do a lot of to!string calls when I want to
 call the versatile writef() function. So I was thinking, why not
 introduce a special zero-terminated string specifier which would both
 alleviate the need to call to!string and would probably save on
 needless memory allocation. If all we want to do is print something,
 why waste time duplicating a string?

 Let's say we call the new specifier %zs (we can debate for the actual name):

 extern(C) const(void)* GetName();  // e.g. some C api functions..
 extern(C) const(void)* GetLastName();

 Before:
 writefln("Name %s, Last Name %s", to!string(GetName()),
 to!string(GetLastName()));

 After:
 writefln("Name %zs, Last Name %zs", GetName(), GetLastName());

 Of course in this simple case you could just use printf(), but
 remember that writef() is much more versatile and allows you to
 specify %s to match any type. It would be great to match printf's
 original meaning of %s with another specifier.

While the idea is reasonable, the problem then becomes that if you accidentally pass a non-zero terminated char* to %sz, all hell breaks loose just like with printf. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Sep 28 2012
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Adam D. Ruppe wrote:
 On Saturday, 29 September 2012 at 02:11:12 UTC, Alex Rønne Petersen wrote:
 While the idea is reasonable, the problem then becomes that if you
 accidentally pass a non-zero terminated char* to %sz, all hell breaks
 loose just like with printf.

That's the same risk with to!string(), yes? We aren't really losing anything by adding it. Also this reminds me of the utter uselessness of the current behavior of "%s" and a pointer - it prints the address.

Why not specialize current "%s" for character pointer types so it will print null terminated strings? It's always possible to cast to void* to print an address.
Oct 01 2012
next sibling parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Jakob Ovrum wrote:
 On Monday, 1 October 2012 at 09:17:52 UTC, Piotr Szturmaj wrote:
 Adam D. Ruppe wrote:
 On Saturday, 29 September 2012 at 02:11:12 UTC, Alex Rønne Petersen
 wrote:
 Also this reminds me of the utter uselessness of the current behavior of
 "%s" and a pointer - it prints the address.

Why not specialize current "%s" for character pointer types so it will print null terminated strings? It's always possible to cast to void* to print an address.

It's not safe to assume that pointers to characters are generally null terminated.

Yes, but programmer should know what he's passing anyway.
Oct 01 2012
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Paulo Pinto wrote:
 On Monday, 1 October 2012 at 09:42:08 UTC, Piotr Szturmaj wrote:
 Jakob Ovrum wrote:
 On Monday, 1 October 2012 at 09:17:52 UTC, Piotr Szturmaj wrote:
 Adam D. Ruppe wrote:
 On Saturday, 29 September 2012 at 02:11:12 UTC, Alex Rønne Petersen
 wrote:
 Also this reminds me of the utter uselessness of the current
 behavior of
 "%s" and a pointer - it prints the address.

Why not specialize current "%s" for character pointer types so it will print null terminated strings? It's always possible to cast to void* to print an address.

It's not safe to assume that pointers to characters are generally null terminated.

Yes, but programmer should know what he's passing anyway.

The thinking "the programmer should" only works in one man teams. As soon as you start having teams with disparate programming knowledge among team members, you can forget everything about "the programmer should".

I experienced such team at my previous work and I know what you mean. My original thoughts was based on telling writef that I want print a null-terminated string rather than address. to!string will surely work, but it implies double iteration, one in to!string to calculate length (seeking for 0 char) and one in writef (printing). With long strings this is suboptimal. What about something like this: struct CString(T) if (isSomeChar!T) { T* str; } property auto cstring(S : T*, T)(S str) if (isSomeChar!T) { return CString!T(str); } string test = "abc"; immutable(char)* p = test.ptr; writefln("%s", p.cstring); // prints "abc" Here the char pointer type is "annotated" as null terminated string and writefln can use this information.
Oct 01 2012
parent reply Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Johannes Pfau wrote:
 struct CString(T)
       if (isSomeChar!T)
 {
       T* str;
 }

  property
 auto cstring(S : T*, T)(S str)
       if (isSomeChar!T)
 {
       return CString!T(str);
 }

 string test = "abc";
 immutable(char)* p = test.ptr;

 writefln("%s", p.cstring); // prints "abc"

 Here the char pointer type is "annotated" as null terminated string
 and writefln can use this information.

If CString implemented a toString method (probably the variant taking a sink delegate), this would already work.

I reworked this example to form a forward range: http://dpaste.dzfl.pl/7ab1eeec The major advantage over "%zs" is that it could be used anywhere, not only with writef(). For example C binding writers may change: extern(C) char* getstr(); to extern(C) cstring getstr(); so the string may be immediately used with writef();
 I'm not sure about performance
 though: Isn't writing out bigger buffers a lot faster than writing
 single chars? You could print every char individually, but wouldn't a
 p[0 .. strlen(p)] usually be faster?

I think it internally prints single characters anyway. At least it must test each character if it's not zero valued. strlen() does that.
Oct 01 2012
parent Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Andrej Mitrovic wrote:
 On 10/1/12, Piotr Szturmaj <bncrbme jadamspam.pl> wrote:
 For example C binding writers may change:

 extern(C) char* getstr();

 to

 extern(C) cstring getstr();

I don't think you can reliably do that because of semantics w.r.t. passing parameters on the stack vs in registers based on whether a type is a pointer or not. I've had this sort of bug when wrapping C++ where the C++ compiler was passing a parameter in one way but the D compiler expected the parameters to be passed, simply because I tried to be clever and fake a return type. See: http://forum.dlang.org/thread/mailman.1547.1346632732.31962.d.gnu puremagic.com#post-mailman.1557.1346690320.31962.d.gnu:40puremagic.com

I think that align(1) structs that wrap a single value should be treated as its type. After all they have the same size and representation. I don't know how this works now, though.
Oct 02 2012
prev sibling parent Piotr Szturmaj <bncrbme jadamspam.pl> writes:
Jonathan M Davis wrote:
 On Monday, October 01, 2012 11:18:16 Piotr Szturmaj wrote:
 Adam D. Ruppe wrote:
 On Saturday, 29 September 2012 at 02:11:12 UTC, Alex Rønne Petersen wrote:
 While the idea is reasonable, the problem then becomes that if you
 accidentally pass a non-zero terminated char* to %sz, all hell breaks
 loose just like with printf.

That's the same risk with to!string(), yes? We aren't really losing anything by adding it. Also this reminds me of the utter uselessness of the current behavior of "%s" and a pointer - it prints the address.

Why not specialize current "%s" for character pointer types so it will print null terminated strings? It's always possible to cast to void* to print an address.

Honestly? One of Phobos' best features is the fact that %s works for _everything_. Specializing it for _anything_ would be horrible. It would also break a _ton_ of code. Who even uses %d, %f, etc. if they don't need to use format specifiers? It's just way simpler to always use %s.

OK, I think you're right.
 I'm not completely against the idea of %zs, but I confess that I have to
 wonder what someone is doing if they really need to print zero-terminated
 strings all that often in D for anything other than quick debugging (in which
 case to!string works just fine), since only stuff directly interacting with C
 code will even care. And if it's really that big a deal, and you're constantly
 interacting with C code like that, you can always use the appropriate C
 function - printf - and then it's a non-issue.

Imagine you're serializing great amount of text when some of the text come from a C library (as null-terminated char*) and you're using format() with %s specifiers. Direct handling of C strings would be just faster because it avoids double iteration.
Oct 01 2012
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Saturday, 29 September 2012 at 02:11:12 UTC, Alex Rønne 
Petersen wrote:
 While the idea is reasonable, the problem then becomes that if 
 you accidentally pass a non-zero terminated char* to %sz, all 
 hell breaks loose just like with printf.

That's the same risk with to!string(), yes? We aren't really losing anything by adding it. Also this reminds me of the utter uselessness of the current behavior of "%s" and a pointer - it prints the address. I think this should be simply disallowed. If you want that, you can use %x, and if you want it printed, that's where the new %z comes in.
Sep 28 2012
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
If you know that a string is 0 terminated, you can easily create a slice 
from it as follow :

char* myZeroTerminatedString;
char[]  myZeroTerminatedString[0 .. strlen(myZeroTerminatedString)];

It is clean and avoid to modify the stdlib in an unsafe way.
Sep 30 2012
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 30/09/2012 21:58, Vladimir Panteleev a écrit :
 On Sunday, 30 September 2012 at 18:31:00 UTC, deadalnix wrote:
 If you know that a string is 0 terminated, you can easily create a
 slice from it as follow :

 char* myZeroTerminatedString;
 char[] myZeroTerminatedString[0 .. strlen(myZeroTerminatedString)];

 It is clean and avoid to modify the stdlib in an unsafe way.

That's what to!string already does.

How does to!string know that the string is 0 terminated ?
Oct 01 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 01/10/2012 13:29, Vladimir Panteleev a écrit :
 On Monday, 1 October 2012 at 10:56:36 UTC, deadalnix wrote:
 Le 30/09/2012 21:58, Vladimir Panteleev a écrit :
 On Sunday, 30 September 2012 at 18:31:00 UTC, deadalnix wrote:
 If you know that a string is 0 terminated, you can easily create a
 slice from it as follow :

 char* myZeroTerminatedString;
 char[] myZeroTerminatedString[0 .. strlen(myZeroTerminatedString)];

 It is clean and avoid to modify the stdlib in an unsafe way.

That's what to!string already does.

How does to!string know that the string is 0 terminated ?

By convention (it doesn't).

It is unsafe as hell oO
Oct 01 2012
parent deadalnix <deadalnix gmail.com> writes:
Le 01/10/2012 22:33, Vladimir Panteleev a écrit :
 On Monday, 1 October 2012 at 12:12:52 UTC, deadalnix wrote:
 Le 01/10/2012 13:29, Vladimir Panteleev a écrit :
 On Monday, 1 October 2012 at 10:56:36 UTC, deadalnix wrote:
 How does to!string know that the string is 0 terminated ?

By convention (it doesn't).

It is unsafe as hell oO

Forcing the programmer to put strlen calls everywhere in his code is not any safer.

I make the library safer. If the programmer manipulate unsafe construct (like c strings) it is up to the programmer to ensure safety, not the lib.
Oct 02 2012
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
On 9/30/2012 11:31 AM, deadalnix wrote:
 If you know that a string is 0 terminated, you can easily create a slice
 from it as follow :

 char* myZeroTerminatedString;
 char[] myZeroTerminatedString[0 .. strlen(myZeroTerminatedString)];

 It is clean and avoid to modify the stdlib in an unsafe way.

Of course, using strlen() is always going to be unsafe. But having %zs is equally unsafe for the same reason. deadalnix's example shows that adding a new format specifier %zs adds little value, but it gets much worse. Since %zs is inherently unsafe, it hides such unsafety in a commonly used library function, which will infect everything else that transitively calls writefln with unsafety. This makes %zs an unacceptable feature.
Oct 01 2012
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
On 10/1/2012 7:22 PM, Steven Schveighoffer wrote:
 However, we can't require an import to use a bizarre
 specifier, and you can't link un safe code to a specifier, so the zstr
 concept is far superior in requiring the user to know what he is doing,
 and having the compiler enforce that.

Yup.
 Does it make sense for Phobos to provide such a shortcut in an obscure
 header somewhere? Like std.cstring? Or should we just say "roll your own
 if you need it"?

As a matter of principle, I really don't like gobs of Phobos functions that are literally one liners. Phobos should not become a mile wide but inch deep library of trivia. It should consist of non-trivial, useful, and relatively deep functions.
Oct 02 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/2/12 4:09 AM, Walter Bright wrote:
 On 10/1/2012 7:22 PM, Steven Schveighoffer wrote:
 Does it make sense for Phobos to provide such a shortcut in an obscure
 header somewhere? Like std.cstring? Or should we just say "roll your own
 if you need it"?

As a matter of principle, I really don't like gobs of Phobos functions that are literally one liners. Phobos should not become a mile wide but inch deep library of trivia. It should consist of non-trivial, useful, and relatively deep functions.

Well there are some possible reasons. Clearly useful functionality that's nontrivial deserves being abstracted in a function. On the other hand, even a short function is valuable if frequent enough and deserving of a name. We have e.g. s.strip even though it's equivalent to s.stripLeft.stripRight. Andrei
Oct 02 2012
prev sibling parent deadalnix <deadalnix gmail.com> writes:
Le 02/10/2012 03:13, Walter Bright a écrit :
 On 9/30/2012 11:31 AM, deadalnix wrote:
 If you know that a string is 0 terminated, you can easily create a slice
 from it as follow :

 char* myZeroTerminatedString;
 char[] myZeroTerminatedString[0 .. strlen(myZeroTerminatedString)];

 It is clean and avoid to modify the stdlib in an unsafe way.

Of course, using strlen() is always going to be unsafe. But having %zs is equally unsafe for the same reason. deadalnix's example shows that adding a new format specifier %zs adds little value, but it gets much worse. Since %zs is inherently unsafe, it hides such unsafety in a commonly used library function, which will infect everything else that transitively calls writefln with unsafety. This makes %zs an unacceptable feature.

Exactly my point.
Oct 02 2012
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Sunday, 30 September 2012 at 18:31:00 UTC, deadalnix wrote:
 If you know that a string is 0 terminated, you can easily 
 create a slice from it as follow :

 char* myZeroTerminatedString;
 char[]  myZeroTerminatedString[0 .. 
 strlen(myZeroTerminatedString)];

 It is clean and avoid to modify the stdlib in an unsafe way.

+1 We don't need to preserve C's design errors regarding strings and vectors. -- Paulo
Sep 30 2012
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 30 September 2012 at 18:31:00 UTC, deadalnix wrote:
 If you know that a string is 0 terminated, you can easily 
 create a slice from it as follow :

 char* myZeroTerminatedString;
 char[]  myZeroTerminatedString[0 .. 
 strlen(myZeroTerminatedString)];

 It is clean and avoid to modify the stdlib in an unsafe way.

That's what to!string already does.
Sep 30 2012
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Saturday, 29 September 2012 at 02:07:38 UTC, Andrej Mitrovic 
wrote:
 I've noticed I'm having to do a lot of to!string calls when I 
 want to
 call the versatile writef() function. So I was thinking, why not
 introduce a special zero-terminated string specifier which 
 would both
 alleviate the need to call to!string and would probably save on
 needless memory allocation. If all we want to do is print 
 something,
 why waste time duplicating a string?

I just checked and std.conv.to always allocates a copy, even when constness doesn't require it. It should not reallocate when constness doesn't change, or is a safe conversion (e.g. immutable -> const). A discussion on a related topic (formatting of C strings results in unexpected behavior) is here: http://d.puremagic.com/issues/show_bug.cgi?id=8384
Sep 30 2012
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 30 September 2012 at 18:58:11 UTC, Paulo Pinto wrote:
 +1

 We don't need to preserve C's design errors regarding strings 
 and vectors.

The problem is that, unsurprisingly, most C APIs (not just libc, but also most C libraries and OS APIs) use zero-terminated strings. The philosophy of ignoring the existence of C strings throughout all of D makes working with such APIs needlessly verbose (and sometimes annoying, as D code will compile and produce unexpected results).
Sep 30 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 9/30/12, deadalnix <deadalnix gmail.com> wrote:
 If you know that a string is 0 terminated, you can easily create a slice
 from it as follow :

 char* myZeroTerminatedString;
 char[]  myZeroTerminatedString[0 .. strlen(myZeroTerminatedString)];

 It is clean and avoid to modify the stdlib in an unsafe way.

What does that have to do with writef()? You can call to!string, but that's beside the point. The point was getting rid of this verbosity when using C APIs.
Sep 30 2012
prev sibling next sibling parent "Muhtar" <lone gmail.com> writes:
On Sunday, 30 September 2012 at 19:58:16 UTC, Vladimir Panteleev 
wrote:
 On Sunday, 30 September 2012 at 18:31:00 UTC, deadalnix wrote:
 If you know that a string is 0 terminated, you can easily 
 create a slice from it as follow :

 char* myZeroTerminatedString;
 char[]  myZeroTerminatedString[0 .. 
 strlen(myZeroTerminatedString)];

 It is clean and avoid to modify the stdlib in an unsafe way.

That's what to!string already does.

I aggere you... <a href="http://www.tercumesirketi.com/">Tercüme</a> || <a href="http://www.tercumesirketi.com/">Tercüme Büroları</a>
Sep 30 2012
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Sunday, 30 September 2012 at 20:27:16 UTC, Andrej Mitrovic 
wrote:
 On 9/30/12, deadalnix <deadalnix gmail.com> wrote:
 If you know that a string is 0 terminated, you can easily 
 create a slice
 from it as follow :

 char* myZeroTerminatedString;
 char[]  myZeroTerminatedString[0 .. 
 strlen(myZeroTerminatedString)];

 It is clean and avoid to modify the stdlib in an unsafe way.

What does that have to do with writef()? You can call to!string, but that's beside the point. The point was getting rid of this verbosity when using C APIs.

You should anyway wrap those APIs not to pollute D call with lower level APIs. As such I don't find the verbosity, as you put it, that much of an issue. Then again, I favor the Pascal family of languages for systems programming. -- Paulo
Sep 30 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Monday, 1 October 2012 at 06:58:41 UTC, Paulo Pinto wrote:
 You should anyway wrap those APIs not to pollute D call with 
 lower level APIs.

I have to agree, esp when it applies to pointers. We should not forget that one of the objectives of D is to make coding "safe" by getting rid of the need to use pointers and other unsafe features. It encourages safe practice by making safe practice much easier to do than using unsafe practice. It however allows unsafe practice where necessary, but the programmer has to intentionally do something extra to make that happen. I think the suggestion of introducing a null string specifier fundamentally goes against the objectives of D, and if introduced will unltimately degrade the quality of the language. --rt
Oct 01 2012
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Monday, 1 October 2012 at 09:17:52 UTC, Piotr Szturmaj wrote:
 Adam D. Ruppe wrote:
 On Saturday, 29 September 2012 at 02:11:12 UTC, Alex Rønne 
 Petersen wrote:
 Also this reminds me of the utter uselessness of the current 
 behavior of
 "%s" and a pointer - it prints the address.

Why not specialize current "%s" for character pointer types so it will print null terminated strings? It's always possible to cast to void* to print an address.

It's not safe to assume that pointers to characters are generally null terminated.
Oct 01 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, October 01, 2012 11:18:16 Piotr Szturmaj wrote:
 Adam D. Ruppe wrote:
 On Saturday, 29 September 2012 at 02:11:12 UTC, Alex R=C3=B8nne Pet=


 While the idea is reasonable, the problem then becomes that if you=



 accidentally pass a non-zero terminated char* to %sz, all hell bre=



 loose just like with printf.

That's the same risk with to!string(), yes? We aren't really losing=


 anything by adding it.
=20
 Also this reminds me of the utter uselessness of the current behavi=


 "%s" and a pointer - it prints the address.

Why not specialize current "%s" for character pointer types so it wil=

 print null terminated strings? It's always possible to cast to void* =

 print an address.

Honestly? One of Phobos' best features is the fact that %s works for=20= _everything_. Specializing it for _anything_ would be horrible. It woul= d also=20 break a _ton_ of code. Who even uses %d, %f, etc. if they don't need to= use=20 format specifiers? It's just way simpler to always use %s. I'm not completely against the idea of %zs, but I confess that I have t= o=20 wonder what someone is doing if they really need to print zero-terminat= ed=20 strings all that often in D for anything other than quick debugging (in= which=20 case to!string works just fine), since only stuff directly interacting = with C=20 code will even care. And if it's really that big a deal, and you're con= stantly=20 interacting with C code like that, you can always use the appropriate C= =20 function - printf - and then it's a non-issue. - Jonathan M Davis
Oct 01 2012
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Monday, 1 October 2012 at 09:42:08 UTC, Piotr Szturmaj wrote:
 Jakob Ovrum wrote:
 On Monday, 1 October 2012 at 09:17:52 UTC, Piotr Szturmaj 
 wrote:
 Adam D. Ruppe wrote:
 On Saturday, 29 September 2012 at 02:11:12 UTC, Alex Rønne 
 Petersen
 wrote:
 Also this reminds me of the utter uselessness of the current 
 behavior of
 "%s" and a pointer - it prints the address.

Why not specialize current "%s" for character pointer types so it will print null terminated strings? It's always possible to cast to void* to print an address.

It's not safe to assume that pointers to characters are generally null terminated.

Yes, but programmer should know what he's passing anyway.

The thinking "the programmer should" only works in one man teams. As soon as you start having teams with disparate programming knowledge among team members, you can forget everything about "the programmer should". .. Paulo
Oct 01 2012
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Monday, 1 October 2012 at 10:56:36 UTC, deadalnix wrote:
 Le 30/09/2012 21:58, Vladimir Panteleev a écrit :
 On Sunday, 30 September 2012 at 18:31:00 UTC, deadalnix wrote:
 If you know that a string is 0 terminated, you can easily 
 create a
 slice from it as follow :

 char* myZeroTerminatedString;
 char[] myZeroTerminatedString[0 .. 
 strlen(myZeroTerminatedString)];

 It is clean and avoid to modify the stdlib in an unsafe way.

That's what to!string already does.

How does to!string know that the string is 0 terminated ?

By convention (it doesn't).
Oct 01 2012
prev sibling next sibling parent Johannes Pfau <nospam example.com> writes:
Am Mon, 01 Oct 2012 13:22:46 +0200
schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:

 Paulo Pinto wrote:
 On Monday, 1 October 2012 at 09:42:08 UTC, Piotr Szturmaj wrote:
 Jakob Ovrum wrote:
 On Monday, 1 October 2012 at 09:17:52 UTC, Piotr Szturmaj wrote:
 Adam D. Ruppe wrote:
 On Saturday, 29 September 2012 at 02:11:12 UTC, Alex R=C3=B8nne
 Petersen wrote:
 Also this reminds me of the utter uselessness of the current
 behavior of
 "%s" and a pointer - it prints the address.

Why not specialize current "%s" for character pointer types so it will print null terminated strings? It's always possible to cast to void* to print an address.

It's not safe to assume that pointers to characters are generally null terminated.

Yes, but programmer should know what he's passing anyway.

The thinking "the programmer should" only works in one man teams. As soon as you start having teams with disparate programming knowledge among team members, you can forget everything about "the programmer should".

I experienced such team at my previous work and I know what you mean. My original thoughts was based on telling writef that I want print a=20 null-terminated string rather than address. to!string will surely work, but it implies double iteration, one in to!string to calculate length (seeking for 0 char) and one in writef (printing). With long strings this is suboptimal. What about something like this: =20 struct CString(T) if (isSomeChar!T) { T* str; } =20 property auto cstring(S : T*, T)(S str) if (isSomeChar!T) { return CString!T(str); } =20 string test =3D "abc"; immutable(char)* p =3D test.ptr; =20 writefln("%s", p.cstring); // prints "abc" =20 Here the char pointer type is "annotated" as null terminated string and writefln can use this information.

If CString implemented a toString method (probably the variant taking a sink delegate), this would already work. I'm not sure about performance though: Isn't writing out bigger buffers a lot faster than writing single chars? You could print every char individually, but wouldn't a p[0 .. strlen(p)] usually be faster?
Oct 01 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 01 Oct 2012 05:54:30 -0400, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 I'm not completely against the idea of %zs, but I confess that I have to
 wonder what someone is doing if they really need to print zero-terminated
 strings all that often in D for anything other than quick debugging (in  
 which
 case to!string works just fine)

to!string necessarily allocates, I think that is not a small problem. I think %s should treat char * as if it is zero-terminated. Invariably, you will have two approaches to this problem: 1. writefln("%s", mycstring); => 0xptrlocation 2. hm.., I guess I'll just use to!string => vulnerable to non-zero-terminated strings! or 2. hm.., to!string will allocate, I guess I'll just use writefln("%s", mycstring[0..strlen(mycstring)]); => vulnerable to non-zero-terminated strings! So how is forcing the user to use one of these methods any safer? I don't see any casts in there...
 , since only stuff directly interacting with C
 code will even care. And if it's really that big a deal, and you're  
 constantly
 interacting with C code like that, you can always use the appropriate C
 function - printf - and then it's a non-issue.

Nobody should ever *ever* use printf, unless you are debugging druntime. It's not a non-issue. printf has no type checking whatsoever. Using it means 1) non-typechecked code (i.e., accidentally pass an int instead of a string, or forget to pass an arg for a specifier, and you've crashed your code), and 2) you have locked yourself into using C's streams (something I hope to remedy in the future). Besides, it doesn't *gain* you anything over having writef(ln) just support char *. Bottom line -- if to!string(arg) is supported, writefln("%s", arg) should be supported, and do the same thing. -Steve
Oct 01 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 10/1/12, Piotr Szturmaj <bncrbme jadamspam.pl> wrote:
 For example C binding writers may change:

 extern(C) char* getstr();

 to

 extern(C) cstring getstr();

I don't think you can reliably do that because of semantics w.r.t. passing parameters on the stack vs in registers based on whether a type is a pointer or not. I've had this sort of bug when wrapping C++ where the C++ compiler was passing a parameter in one way but the D compiler expected the parameters to be passed, simply because I tried to be clever and fake a return type. See: http://forum.dlang.org/thread/mailman.1547.1346632732.31962.d.gnu puremagic.com#post-mailman.1557.1346690320.31962.d.gnu:40puremagic.com
Oct 01 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 10/1/12, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 but the D
 compiler expected the parameters to be passed

missing "in another way" there.
Oct 01 2012
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Monday, 1 October 2012 at 12:12:52 UTC, deadalnix wrote:
 Le 01/10/2012 13:29, Vladimir Panteleev a écrit :
 On Monday, 1 October 2012 at 10:56:36 UTC, deadalnix wrote:
 How does to!string know that the string is 0 terminated ?

By convention (it doesn't).

It is unsafe as hell oO

Forcing the programmer to put strlen calls everywhere in his code is not any safer.
Oct 01 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 01 Oct 2012 21:13:47 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 On 9/30/2012 11:31 AM, deadalnix wrote:
 If you know that a string is 0 terminated, you can easily create a slice
 from it as follow :

 char* myZeroTerminatedString;
 char[] myZeroTerminatedString[0 .. strlen(myZeroTerminatedString)];

 It is clean and avoid to modify the stdlib in an unsafe way.

Of course, using strlen() is always going to be unsafe. But having %zs is equally unsafe for the same reason. deadalnix's example shows that adding a new format specifier %zs adds little value, but it gets much worse. Since %zs is inherently unsafe, it hides such unsafety in a commonly used library function, which will infect everything else that transitively calls writefln with unsafety. This makes %zs an unacceptable feature.

What about %s just working with zero-terminated strings? I was going to argue this point, but I just thought of a very very good counter-case for this. string x = "abc".idup; // no zero-terminator! writefln("%s", x.ptr); What we don't want is for writefln to try and interpret the pointer as a C string. Not only is it bad, but even the code seems to suggest "Hey, this should print a pointer!" The large underlying issue here is that C considers char * to be a zero-terminated string, and D considers it to be a pointer. This means any code which uses C calls heavily will have to awkwardly dance between both worlds. I think there is some value in providing something that is *not* common to do the above work (convert char * to char[]). Hm... system char[] zstr(char *s) { return s[0..strlen(s)]; } provides: writefln("%s", zstr(s)); vs. writefln("%zs", s); Arguably, nobody uses %zs, so even though writefln is common, the specifier is not. However, we can't require an import to use a bizarre specifier, and you can't link un safe code to a specifier, so the zstr concept is far superior in requiring the user to know what he is doing, and having the compiler enforce that. Does it make sense for Phobos to provide such a shortcut in an obscure header somewhere? Like std.cstring? Or should we just say "roll your own if you need it"? -Steve
Oct 01 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 02 Oct 2012 04:09:43 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:

 On 10/1/2012 7:22 PM, Steven Schveighoffer wrote:
 Does it make sense for Phobos to provide such a shortcut in an obscure
 header somewhere? Like std.cstring? Or should we just say "roll your own
 if you need it"?

As a matter of principle, I really don't like gobs of Phobos functions that are literally one liners. Phobos should not become a mile wide but inch deep library of trivia. It should consist of non-trivial, useful, and relatively deep functions.

This, arguably, is one of the most important aspects of C to support. There are lots of C functions which provide C strings. Yes, we don't want to promote using C strings, but to have one point of conversion so you *can* use safe strings is a good thing. In other words, the sooner you convert your zero-terminated strings to char slices, the better off you are. And if we label it system code, it can't be misused in safe code. Why support zero-terminated strings as literals if it wasn't important? You could argue that things like system calls which return zero-terminated strings are as safe to use as string literals which you know have zero terminated values. The only other alternative is to wrap those C functions with D ones that convert to char[]. I don't find this any more appealing. -Steve
Oct 02 2012
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Tuesday, 2 October 2012 at 02:22:33 UTC, Steven Schveighoffer 
wrote:
  system char[] zstr(char *s) { return s[0..strlen(s)]; }

 […]

 Does it make sense for Phobos to provide such a shortcut in an 
 obscure header somewhere?  Like std.cstring?  Or should we just 
 say "roll your own if you need it"?

I didn't look it up, so I could be making quite a fool of myself right now, but doesn't to!string(char*) provide exactly that? David
Oct 02 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 02 Oct 2012 15:17:42 -0400, David Nadlinger <see klickverbot.at>=
  =

wrote:

 On Tuesday, 2 October 2012 at 02:22:33 UTC, Steven Schveighoffer wrote=

  system char[] zstr(char *s) { return s[0..strlen(s)]; }

 [=E2=80=A6]

 Does it make sense for Phobos to provide such a shortcut in an obscur=


 header somewhere?  Like std.cstring?  Or should we just say "roll you=


 own if you need it"?

I didn't look it up, so I could be making quite a fool of myself right=

 now, but doesn't to!string(char*) provide exactly that?

string is immutable. Must allocate. You fool :) just kidding, honest mistake. -Steve
Oct 02 2012
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Tuesday, 2 October 2012 at 19:31:33 UTC, Steven Schveighoffer 
wrote:
 On Tue, 02 Oct 2012 15:17:42 -0400, David Nadlinger 
 <see klickverbot.at> wrote:

 On Tuesday, 2 October 2012 at 02:22:33 UTC, Steven 
 Schveighoffer wrote:
  system char[] zstr(char *s) { return s[0..strlen(s)]; }

 […]

 Does it make sense for Phobos to provide such a shortcut in 
 an obscure header somewhere?  Like std.cstring?  Or should we 
 just say "roll your own if you need it"?

I didn't look it up, so I could be making quite a fool of myself right now, but doesn't to!string(char*) provide exactly that?

string is immutable. Must allocate. You fool :)

Well, make it to!char(char*) then! ;) David
Oct 02 2012
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Tuesday, 2 October 2012 at 19:34:31 UTC, David Nadlinger wrote:
 Well, make it to!char(char*) then! ;)

Oh dear, this doesn't get better: Of course, I've meant to write »to!(char[])(char*)«. David
Oct 02 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 02 Oct 2012 15:35:47 -0400, David Nadlinger <see klickverbot.at>=
  =

wrote:

 On Tuesday, 2 October 2012 at 19:34:31 UTC, David Nadlinger wrote:
 Well, make it to!char(char*) then! ;)

Oh dear, this doesn't get better: Of course, I've meant to write =

 =C2=BBto!(char[])(char*)=C2=AB.

Right. I agree, this should not allocate (I think someone said it does,= = but it's probably not necessary to). But still, what looks better? auto x =3D SomeSystemCallThatReturnsACString(); writefln("%s", to!(char[])(x)); writefln("%s", zstr(x)); I want something easy to type, and not too difficult to visually parse. In fact, a better solution would be to define a C string type (other tha= n = char *), and just pretend those system calls return that. Then support = = that C string type in writef. -Steve
Oct 02 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 10/2/12, Walter Bright <newshound1 digitalmars.com> wrote:
 On 9/30/2012 11:31 AM, deadalnix wrote:
 If you know that a string is 0 terminated, you can easily create a slice
 from it as follow :

 char* myZeroTerminatedString;
 char[] myZeroTerminatedString[0 .. strlen(myZeroTerminatedString)];

hides such unsafety in a commonly used library function, which will infect everything else that transitively calls writefln with unsafety. This makes %zs an unacceptable feature.

How does it hide anything if you have to explicitly mark the format specifier as %zs? It would be documented, just like it's documented that passing pointers to garbage-collected memory to the C side is inherently unsafe.
 deadalnix's example shows that adding a new format specifier %zs adds
 little value.

It adds convenience, which is an important trait in this day and age. If that's not a concern, why is printf a symbol you can get your hands on as soon as you import std.stdio? And if safety is a concern why is printf used in Phobos at all? I count 427 lines of printf calls in Phobos and 843 lines in Druntime (druntime might have a good excuse since it shouldn't import Phobos functions). Many of these calls in Phobos are not simple D string literal printf calls either. Btw, some weeks ago when dstep was announced you were jumping for joy and were instantly proposing language changes to add better support for wrapping C. But asking for better library support is somehow controversial. I don't understand the double-standard.
Oct 02 2012
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Tuesday, 2 October 2012 at 21:30:35 UTC, Andrej Mitrovic wrote:
 On 10/2/12, Walter Bright <newshound1 digitalmars.com> wrote:
 On 9/30/2012 11:31 AM, deadalnix wrote:
 If you know that a string is 0 terminated, you can easily 
 create a slice
 from it as follow :

 char* myZeroTerminatedString;
 char[] myZeroTerminatedString[0 .. 
 strlen(myZeroTerminatedString)];

hides such unsafety in a commonly used library function, which will infect everything else that transitively calls writefln with unsafety. This makes %zs an unacceptable feature.

How does it hide anything if you have to explicitly mark the format specifier as %zs? It would be documented, just like it's documented that passing pointers to garbage-collected memory to the C side is inherently unsafe.

writefln cannot be safe if it has to support an unsafe format specifier. It's "hidden" because it affects every call to writefln, even if it doesn't use the unsafe format specifier.
Oct 02 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 10/3/12, Jakob Ovrum <jakobovrum gmail.com> wrote:
 writefln cannot be  safe if it has to support an unsafe format
 specifier. It's "hidden" because it affects every call to
 writefln, even if it doesn't use the unsafe format specifier.

Ah damn I completely forgot about safe. I tend to avoid recent features.. OK then I think my arguments are moot. Nevertheless I can always define a helper function for my own purposes I guess. Sorry Walter for not taking safe into account. :)
Oct 02 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Oct 03, 2012 at 03:07:14AM +0200, Andrej Mitrovic wrote:
 On 10/3/12, Jakob Ovrum <jakobovrum gmail.com> wrote:
 writefln cannot be  safe if it has to support an unsafe format
 specifier. It's "hidden" because it affects every call to writefln,
 even if it doesn't use the unsafe format specifier.


Hmm, this seems to impose unnecessary limitations on safe. I guess the current language doesn't allow for a "conditionally-safe" tag where something can be implicitly marked safe if it's provable at compile-time that it's safe? T -- Elegant or ugly code as well as fine or rude sentences have something in common: they don't depend on the language. -- Luca De Vitis
Oct 02 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, October 02, 2012 18:21:30 H. S. Teoh wrote:
 On Wed, Oct 03, 2012 at 03:07:14AM +0200, Andrej Mitrovic wrote:
 On 10/3/12, Jakob Ovrum <jakobovrum gmail.com> wrote:
 writefln cannot be  safe if it has to support an unsafe format
 specifier. It's "hidden" because it affects every call to writefln,
 even if it doesn't use the unsafe format specifier.


[...] Hmm, this seems to impose unnecessary limitations on safe. I guess the current language doesn't allow for a "conditionally-safe" tag where something can be implicitly marked safe if it's provable at compile-time that it's safe?

The format string is a runtime argument, so nothing can be proven about it at compile time. If you want any kind of safe inferrence, you need to use a template. If writefln took the format string as a template argument and generated different code (which was safe or not depending on what it did) based on what was in the format string, then inferrence could take place, but otherwise no. - Jonathan M Davis
Oct 02 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Oct 02, 2012 at 07:50:09PM -0700, Jonathan M Davis wrote:
 On Tuesday, October 02, 2012 18:21:30 H. S. Teoh wrote:
 On Wed, Oct 03, 2012 at 03:07:14AM +0200, Andrej Mitrovic wrote:
 On 10/3/12, Jakob Ovrum <jakobovrum gmail.com> wrote:
 writefln cannot be  safe if it has to support an unsafe format
 specifier. It's "hidden" because it affects every call to
 writefln, even if it doesn't use the unsafe format specifier.


[...] Hmm, this seems to impose unnecessary limitations on safe. I guess the current language doesn't allow for a "conditionally-safe" tag where something can be implicitly marked safe if it's provable at compile-time that it's safe?

The format string is a runtime argument, so nothing can be proven about it at compile time. If you want any kind of safe inferrence, you need to use a template. If writefln took the format string as a template argument and generated different code (which was safe or not depending on what it did) based on what was in the format string, then inferrence could take place, but otherwise no.

Yes that's what I mean. If the format string is known at compile-time and known to involve only safe code, then this would work. Something like this might work if CTFE is used to parse the format string piecemeal (i.e., translate something like writefln("%d %s",x,y) into write!int(x); write!string(" "); write!string(y)). The safe instances of write!T(...) will be marked safe. But it does seem like a lot of work just so we can use safe, though. I suppose we could just use trusted and call it a day. T -- Claiming that your operating system is the best in the world because more people use it is like saying McDonalds makes the best food in the world. -- Carl B. Constantine
Oct 02 2012
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Wednesday, 3 October 2012 at 05:04:01 UTC, H. S. Teoh wrote:
 Yes that's what I mean. If the format string is known at 
 compile-time
 and known to involve only  safe code, then this would work. 
 Something
 like this might work if CTFE is used to parse the format string
 piecemeal (i.e., translate something like writefln("%d %s",x,y) 
 into
 write!int(x); write!string(" "); write!string(y)). The safe 
 instances of
 write!T(...) will be marked  safe.

It doesn't matter if the argument is known at compile-time or not, because there's no way to know that without receiving the format string as a template parameter, in which case it must *always* be known at compile-time (runtime format string would not be supported), and then the syntax is no longer writefln("%d %s", x, y). Obviously, such a change is not acceptable.
 I suppose we could just use  trusted
 and call it a day.

No, that would be abusing trusted. The function would no longer be safe, *because it contains possibly unsafe code*. trusted is for safe functions that the compiler cannot prove safe.
Oct 02 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, October 03, 2012 07:35:23 Jakob Ovrum wrote:
 I suppose we could just use  trusted
 and call it a day.

No, that would be abusing trusted. The function would no longer be safe, *because it contains possibly unsafe code*. trusted is for safe functions that the compiler cannot prove safe.

Yeah. You basically _never_ just mark trusted and call it a day. You only mark something trusted if you've verified that _everything_ that that function does which is system is done in a way that's ultimately safe. In particular, marking much of anything which is templated as trusted is almost always just plain wrong. - Jonathan M Davis
Oct 02 2012
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Tue, 02 Oct 2012 21:44:11 +0100, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 In fact, a better solution would be to define a C string type (other  
 than char *), and just pretend those system calls return that.  Then  
 support that C string type in writef.

 -Steve

:D http://comments.gmane.org/gmane.comp.lang.d.general/97793 -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Oct 03 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 03 Oct 2012 08:37:14 -0400, Regan Heath <regan netmail.co.nz>  
wrote:

 On Tue, 02 Oct 2012 21:44:11 +0100, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 In fact, a better solution would be to define a C string type (other  
 than char *), and just pretend those system calls return that.  Then  
 support that C string type in writef.

 -Steve

:D http://comments.gmane.org/gmane.comp.lang.d.general/97793

Almost what I was thinking. :) Though, at that point, I don't think we need a special specifier for writef. %s works. However, looking at the vast reach of these changes, I wonder if it's worth it. That's a lot of prototypes to C functions that have to change, and a large compiler change (treating string literals as CString instead of char *), just so C strings print out with writef. Not to mention code that will certainly break... -Steve
Oct 03 2012
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Tuesday, 2 October 2012 at 13:07:46 UTC, deadalnix wrote:
 Le 01/10/2012 22:33, Vladimir Panteleev a écrit :
 On Monday, 1 October 2012 at 12:12:52 UTC, deadalnix wrote:
 Le 01/10/2012 13:29, Vladimir Panteleev a écrit :
 On Monday, 1 October 2012 at 10:56:36 UTC, deadalnix wrote:
 How does to!string know that the string is 0 terminated ?

By convention (it doesn't).

It is unsafe as hell oO

Forcing the programmer to put strlen calls everywhere in his code is not any safer.

I make the library safer. If the programmer manipulate unsafe construct (like c strings) it is up to the programmer to ensure safety, not the lib.

Thrusting the programmer is what brought upon us the wrath of security exploits via buffer overflows. -- Paulo
Oct 04 2012
prev sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 04 Oct 2012 01:05:14 +0100, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Wed, 03 Oct 2012 08:37:14 -0400, Regan Heath <regan netmail.co.nz>  
 wrote:

 On Tue, 02 Oct 2012 21:44:11 +0100, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 In fact, a better solution would be to define a C string type (other  
 than char *), and just pretend those system calls return that.  Then  
 support that C string type in writef.

 -Steve

:D http://comments.gmane.org/gmane.comp.lang.d.general/97793

Almost what I was thinking. :) Though, at that point, I don't think we need a special specifier for writef. %s works.

True.
 However, looking at the vast reach of these changes, I wonder if it's  
 worth it.  That's a lot of prototypes to C functions that have to  
 change, and a large compiler change (treating string literals as CString  
 instead of char *), just so C strings print out with writef.

That's not the only motivation. The change brings more type safety in general and should help to catch bugs, like for example the common one made by people just starting out with D (from a C/C++ background).
 Not to mention code that will certainly break...

Some code will definitely stop compiling, but it's debatable as to whether this code is not already "broken" to some degree.. it's likely not as safe/robust as it could be. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Oct 04 2012