www.digitalmars.com         C & C++   DMDScript  

D - Bug in docs and phobos (kill printf)

reply "Sandor Hojtsy" <hojtsy index.hu> writes:
Hi,
The D spec says (note the order of the pointer and the dimension):

Memory Model:
A dynamic array consists of:
 0: pointer to array data
 4: array dimension

Interfacing to C:
Although printf is designed to handle 0 terminated strings, not D dynamic
arrays of chars,
it turns out that since D dynamic arrays are a length followed by a pointer
to the data,
the %.*s format works perfectly

First, the docs at the on the memory modell are buggy. But here are some
more important conceptual errors:
It seems that in the very first place the pointer was before the dimension
(as convetional), and later it was changed to conform to the legacy printf.
This shouldn't have caused any trouble in the client source code, because
that should *not* rely on an implementation detail such as the order of
hidden fields inside a built-in type. But now it does. I consider this a bad
practice.

OTOH, this usage *does not* conform to the specification of printf. The C
standard tells:

"As noted above, a field width, or precision, or both, may be indicated by
an asterisk. In
this case, an int argument supplies the field width or precision. The
arguments
specifying field width, or precision, or both, shall appear (in that order)
before the
argument (if any) to be converted."
"If any argument is not the correct type for the corresponding
coversion specification, the behavior is undefined."

Does the standard say, that the next 4 bytes on the stack is considered as
an integer encoding the precision, and the 4 bytes after it as a pointer?
NO! So the example results in undefined behaviour in any compiler stricty
conforming to the specification. Yes I understand that in this particular
case it will work, because in your compiler the undefined behaviour always
turns out to be doing the right thing. But from the specification side, this
is still undefined behaviour, and any other compiler can claim to be fully
conformant to the specs, and still freeze on this printf. I think you should
not encourage code that results in undefined behaviour.

No go on the format string of printf. It is a string literal (?const?
char[]), implicitely casted to (const char *). All string literals are
terminated by a 0, which is stored past-the-end. This is an interesting
idea. In short this means that string literals and string variables with the
same contents are not interchangeable as function parameters. The same
nuissance that occured in C++, when a fine (std::string) type was
introduced, but string literals remained (const char *), or even worse (char
*). I was told that D doesn't need to conform to any legacy, so it can be
more effective. This is just the opposite. Why don't we have a function:

int dprintf(char [] format_str, ...)

which you can call as:

char [] d = "aa %s cc %d", e = "bb";
dprintf(d, e, 3);

resulting in "aa bb cc 3"?

I promissed bugs in phobos. Well I am not sure that this is a bug:

stream.d:
void writeLine(char[] s)
{
  writeString(s);
  write(cast(char)"\n"); // <-------
}

What is this supposed to do? Can you cast a string literal into a char?! Is
it done implicitely too? Where is it documented? Is it usefull?
Another one:

string.toStringz():
...
p = &string[0] + string.length;

// Peek past end of string[], if it's 0, no conversion necessary.
// Note that the compiler will put a 0 past the end of static
// strings, and the storage allocator will put a 0 past the end
// of newly allocated char[]'s.
if (*p == 0)        // <--------
    return string;

This is undefined behaviour again. Is reading past the end of an allocated
memory block valid and guaranteed not to produce a General Protection Fault?
If yes, can you please document this interesting property of the memory
model?

Yours,
Sandor
Jun 25 2002
parent reply "Martin M. Pedersen" <mmp www.moeller-pedersen.dk> writes:
Hi,

"Sandor Hojtsy" <hojtsy index.hu> wrote in message
news:af9bra$16te$1 digitaldaemon.com...
 turns out to be doing the right thing. But from the specification side,
this
 is still undefined behaviour, and any other compiler can claim to be fully
 conformant to the specs, and still freeze on this printf. I think you
should
 not encourage code that results in undefined behaviour.
Very good points :-)
 I promissed bugs in phobos. Well I am not sure that this is a bug:
   write(cast(char)"\n"); // <-------

 What is this supposed to do? Can you cast a string literal into a char?!
You don't have character literals in D, so this is a way write this kind of thing.
 string.toStringz():
 ...
 p = &string[0] + string.length;

 // Peek past end of string[], if it's 0, no conversion necessary.
 // Note that the compiler will put a 0 past the end of static
 // strings, and the storage allocator will put a 0 past the end
 // of newly allocated char[]'s.
 if (*p == 0)        // <--------
     return string;

 This is undefined behaviour again. Is reading past the end of an allocated
 memory block valid and guaranteed not to produce a General Protection
Fault?
 If yes, can you please document this interesting property of the memory
 model?
On some platforms, it certainly would be able to cause a GPF. Perhaps, Walter can guarantee that it does not with his Windows implementatation, and if this is the case, he is free to use dirty tricks like this. The spec needs to be portable, but the RTL does not behind the scenes. This one reminds me of a bug I once saw under SunOS (pre-Solaris): printf("%.*s", len, str) internally read (at least) one byte more than specified by 'len', and thereby introduced a GPF. If this was a bug in Sun's printf() or the user code, I cannot tell (I didn't care, as I had to fix it anyway). But it shows that one cannot portably rely on peeking a single byte beyond the string data. Regards, Martin M. Pedersen
Jun 25 2002
parent reply "Sandor Hojtsy" <hojtsy index.hu> writes:
   write(cast(char)"\n"); // <-------

 What is this supposed to do? Can you cast a string literal into a char?!
You don't have character literals in D, so this is a way write this kind
of
 thing.
Hey I missed that point: D doesn't have character literals! Oh no! You have to explicitely cast a string literal to a char type to get a literal char value? I have searched the docs, and there are nothing about casting char[] to char. So if you go on like: char [] a = "long long text here"; for(int i = 0; i < a.length; i++) { if(a[i] == "a") // <---- printf("%c", a[i]); } Will the compare in the noted line be a nasty string compare, with constructing a temporary string value from the character in a[i], and then comparing all the (one) characters of the two strings? That would ruin the performance. Or is this string literal converted into a char literal, just because it is 1 character long and/or compared to a char type? Does this mean you can also cast int[] to int? And int[][] to int[] to int. Implicit and/or explicit? Yours, Sandor
Jun 26 2002
parent reply Pavel Minayev <evilone omen.ru> writes:
On Wed=2C 26 Jun 2002 16=3A53=3A10 +0200 =22Sandor Hojtsy=22
=3Chojtsy=40index=2Ehu=3E wrote=3A

=3E Hey I missed that point=3A D doesn't have character literals!
=3E Oh no! You have to explicitely cast a string literal to a char type to get a
=3E literal char value=3F

No=2E It is done implicitly whenever possible=2E Only when you have two 
versions of overloaded function=2C one taking char=5B=5D=2C and another just
char=2C then you have to use a cast=2E

=3E So if you go on like=3A
=3E 
=3E char =5B=5D a =3D =22long long text here=22=3B
=3E for=28int i =3D 0=3B i =3C a=2Elength=3B i++=29
=3E {
=3E   if=28a=5Bi=5D =3D=3D =22a=22=29 =2F=2F =3C----
=3E     printf=28=22%c=22=2C a=5Bi=5D=29=3B
=3E }
=3E 
=3E Will the compare in the noted line be a nasty string compare=2C with
=3E constructing a temporary string value from the character in a=5Bi=5D=2C and
then
=3E comparing all the =28one=29 characters of the two strings=3F That would
ruin the
=3E performance=2E

No=2C it'll be a char compare=2E I'll give another example where it is
clearer=2E
You can write=3A

=09char c =3D getc=28=29=3B
=09int n =3D c - =220=22=3B

It'll work=2E Obviously=2C =220=22 is a char here=2C not a string=2E

Also=2C this only applies to string =5Fliterals=5F - variables=2C char or not=2C
cannot be casted to arrays=2C or vice versa=2E
Jun 26 2002
parent reply "Matthew Wilson" <matthew thedjournal.com> writes:
Pavel

Do you know what the reasoning behind the prohibition of char literals was?

Matthew

"Pavel Minayev" <evilone omen.ru> wrote in message
news:CFN374338570968982 news.digitalmars.com...
On Wed, 26 Jun 2002 16:53:10 +0200 "Sandor Hojtsy" <hojtsy index.hu> wrote:

 Hey I missed that point: D doesn't have character literals!
 Oh no! You have to explicitely cast a string literal to a char type to get
a
 literal char value?
No. It is done implicitly whenever possible. Only when you have two versions of overloaded function, one taking char[], and another just char, then you have to use a cast.
 So if you go on like:

 char [] a = "long long text here";
 for(int i = 0; i < a.length; i++)
 {
   if(a[i] == "a") // <----
     printf("%c", a[i]);
 }

 Will the compare in the noted line be a nasty string compare, with
 constructing a temporary string value from the character in a[i], and then
 comparing all the (one) characters of the two strings? That would ruin the
 performance.
No, it'll be a char compare. I'll give another example where it is clearer. You can write: char c = getc(); int n = c - "0"; It'll work. Obviously, "0" is a char here, not a string. Also, this only applies to string _literals_ - variables, char or not, cannot be casted to arrays, or vice versa.
Jun 26 2002
parent reply Pavel Minayev <evilone omen.ru> writes:
On Thu=2C 27 Jun 2002 11=3A20=3A58 +1000 =22Matthew Wilson=22
=3Cmatthew=40thedjournal=2Ecom=3E 
wrote=3A

=3E Pavel
=3E 
=3E Do you know what the reasoning behind the prohibition of char literals
was=3F

The reason was to simplify the language =28no need to remember where
to use single quotes and where double ones are required=29=2C and to
free single quotes for another purpose=2C I think=2E Just to remind=2C in
D single-quoted string literals don't support escape characters=2C so
they are good for writing pathnames=3A

was=09=22C=3A=5C=5Cbla=5C=5Cbla=5C=5Cbla=22
now   'C=3A=5Cbla=5Cbla=5Cbla'
Jun 27 2002
next sibling parent reply "Sean L. Palmer" <seanpalmer earthlink.net> writes:
How do you embed a single quote into a single-quoted string then?

And this difference alone makes people have to remember the difference
between a single- and double-quoted string.

I liked C's char literals better.  At least you always knew what you were
getting.

Sean

"Pavel Minayev" <evilone omen.ru> wrote in message
news:CFN374344654990394 news.digitalmars.com...
On Thu, 27 Jun 2002 11:20:58 +1000 "Matthew Wilson"
<matthew thedjournal.com>
wrote:

 Pavel

 Do you know what the reasoning behind the prohibition of char literals
was? The reason was to simplify the language (no need to remember where to use single quotes and where double ones are required), and to free single quotes for another purpose, I think. Just to remind, in D single-quoted string literals don't support escape characters, so they are good for writing pathnames: was "C:\\bla\\bla\\bla" now 'C:\bla\bla\bla'
Jun 27 2002
parent reply Pavel Minayev <evilone omen.ru> writes:
On Thu, 27 Jun 2002 02:04:31 -0700 "Sean L. Palmer" <seanpalmer earthlink.net> 
wrote:

 How do you embed a single quote into a single-quoted string then?
'this is a' \' 'quoted' \' 'string'
 And this difference alone makes people have to remember the difference
 between a single- and double-quoted string.
But it's quite common. I've seen it in PHP before, and somewhere else, I just don't remember. Besides, it's just so convenient for pathnames under Windows. And if you don't like it, you can just always use double quotes, after all.
 I liked C's char literals better.  At least you always knew what you were
 getting.
It _might_ seem confusing (it did so to me at first), but it turns out you get used to it rather quickly. BTW, Pascal programmers use that kind of thing for more than fifteen years already, and I didn't hear anyone complain!
Jun 27 2002
next sibling parent "Martin M. Pedersen" <mmp www.moeller-pedersen.dk> writes:
Hi,

"Pavel Minayev" <evilone omen.ru> wrote in message
news:CFN374346986391204 news.digitalmars.com...
 But it's quite common. I've seen it in PHP before, and somewhere else,
The UNIX shell does the same thing. Regards, Martin M. Pedersen
Jun 27 2002
prev sibling parent "OddesE" <OddesE_XYZ hotmail.com> writes:
"Pavel Minayev" <evilone omen.ru> wrote in message
news:CFN374346986391204 news.digitalmars.com...
 On Thu, 27 Jun 2002 02:04:31 -0700 "Sean L. Palmer"
<seanpalmer earthlink.net>
 wrote:

 How do you embed a single quote into a single-quoted string then?
'this is a' \' 'quoted' \' 'string'
 And this difference alone makes people have to remember the difference
 between a single- and double-quoted string.
But it's quite common. I've seen it in PHP before, and somewhere else, I just don't remember. Besides, it's just so convenient for pathnames under Windows. And if you don't like it, you can just always use double quotes, after all.
 I liked C's char literals better.  At least you always knew what you
were
 getting.
It _might_ seem confusing (it did so to me at first), but it turns out you get used to it rather quickly. BTW, Pascal programmers use that kind of thing for more than fifteen years already, and I didn't hear anyone complain!
I started with Pascal before I used C and was rather surprised that you had to use different quotes for characters than for strings. Pascal handles strings a lot better than C, so I thought that it was like an 'advanced' feature! :) I kinda liked the C style though, but this is even better. I do a little PHP programming, and being able to turn escaping on and off at will is really convenient! -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail
Jun 27 2002
prev sibling parent "Walter" <walter digitalmars.com> writes:
You're correct about the reasoning. -Walter

"Pavel Minayev" <evilone omen.ru> wrote in message
news:CFN374344654990394 news.digitalmars.com...
On Thu, 27 Jun 2002 11:20:58 +1000 "Matthew Wilson"
<matthew thedjournal.com>
wrote:
 Do you know what the reasoning behind the prohibition of char literals
was? The reason was to simplify the language (no need to remember where to use single quotes and where double ones are required), and to free single quotes for another purpose, I think. Just to remind, in D single-quoted string literals don't support escape characters, so they are good for writing pathnames: was "C:\\bla\\bla\\bla" now 'C:\bla\bla\bla'
Jul 10 2002