D - Bug in docs and phobos (kill printf)
- Sandor Hojtsy (79/79) Jun 25 2002 Hi,
- Martin M. Pedersen (21/41) Jun 25 2002 Hi,
- Sandor Hojtsy (23/28) Jun 26 2002 Hey I missed that point: D doesn't have character literals!
- Pavel Minayev (27/27) Jun 26 2002 On Wed=2C 26 Jun 2002 16=3A53=3A10 +0200 =22Sandor Hojtsy=22 =3Chojtsy=4...
- Matthew Wilson (17/31) Jun 26 2002 Pavel
- Pavel Minayev (12/12) Jun 27 2002 On Thu=2C 27 Jun 2002 11=3A20=3A58 +1000 =22Matthew Wilson=22 =3Cmatthew...
- Sean L. Palmer (19/21) Jun 27 2002 How do you embed a single quote into a single-quoted string then?
- Pavel Minayev (13/18) Jun 27 2002 'this is a' \' 'quoted' \' 'string'
- Martin M. Pedersen (6/7) Jun 27 2002 Hi,
-
OddesE
(20/36)
Jun 27 2002
- Walter (14/15) Jul 10 2002 You're correct about the reasoning. -Walter
Hi, The D spec says (note the order of the pointer and the dimension): Memory Model: A dynamic array consists of: 0: pointer to array data 4: array dimension Interfacing to C: Although printf is designed to handle 0 terminated strings, not D dynamic arrays of chars, it turns out that since D dynamic arrays are a length followed by a pointer to the data, the %.*s format works perfectly First, the docs at the on the memory modell are buggy. But here are some more important conceptual errors: It seems that in the very first place the pointer was before the dimension (as convetional), and later it was changed to conform to the legacy printf. This shouldn't have caused any trouble in the client source code, because that should *not* rely on an implementation detail such as the order of hidden fields inside a built-in type. But now it does. I consider this a bad practice. OTOH, this usage *does not* conform to the specification of printf. The C standard tells: "As noted above, a field width, or precision, or both, may be indicated by an asterisk. In this case, an int argument supplies the field width or precision. The arguments specifying field width, or precision, or both, shall appear (in that order) before the argument (if any) to be converted." "If any argument is not the correct type for the corresponding coversion specification, the behavior is undefined." Does the standard say, that the next 4 bytes on the stack is considered as an integer encoding the precision, and the 4 bytes after it as a pointer? NO! So the example results in undefined behaviour in any compiler stricty conforming to the specification. Yes I understand that in this particular case it will work, because in your compiler the undefined behaviour always turns out to be doing the right thing. But from the specification side, this is still undefined behaviour, and any other compiler can claim to be fully conformant to the specs, and still freeze on this printf. I think you should not encourage code that results in undefined behaviour. No go on the format string of printf. It is a string literal (?const? char[]), implicitely casted to (const char *). All string literals are terminated by a 0, which is stored past-the-end. This is an interesting idea. In short this means that string literals and string variables with the same contents are not interchangeable as function parameters. The same nuissance that occured in C++, when a fine (std::string) type was introduced, but string literals remained (const char *), or even worse (char *). I was told that D doesn't need to conform to any legacy, so it can be more effective. This is just the opposite. Why don't we have a function: int dprintf(char [] format_str, ...) which you can call as: char [] d = "aa %s cc %d", e = "bb"; dprintf(d, e, 3); resulting in "aa bb cc 3"? I promissed bugs in phobos. Well I am not sure that this is a bug: stream.d: void writeLine(char[] s) { writeString(s); write(cast(char)"\n"); // <------- } What is this supposed to do? Can you cast a string literal into a char?! Is it done implicitely too? Where is it documented? Is it usefull? Another one: string.toStringz(): ... p = &string[0] + string.length; // Peek past end of string[], if it's 0, no conversion necessary. // Note that the compiler will put a 0 past the end of static // strings, and the storage allocator will put a 0 past the end // of newly allocated char[]'s. if (*p == 0) // <-------- return string; This is undefined behaviour again. Is reading past the end of an allocated memory block valid and guaranteed not to produce a General Protection Fault? If yes, can you please document this interesting property of the memory model? Yours, Sandor
Jun 25 2002
Hi, "Sandor Hojtsy" <hojtsy index.hu> wrote in message news:af9bra$16te$1 digitaldaemon.com...turns out to be doing the right thing. But from the specification side,thisis still undefined behaviour, and any other compiler can claim to be fully conformant to the specs, and still freeze on this printf. I think youshouldnot encourage code that results in undefined behaviour.Very good points :-)I promissed bugs in phobos. Well I am not sure that this is a bug: write(cast(char)"\n"); // <------- What is this supposed to do? Can you cast a string literal into a char?!You don't have character literals in D, so this is a way write this kind of thing.string.toStringz(): ... p = &string[0] + string.length; // Peek past end of string[], if it's 0, no conversion necessary. // Note that the compiler will put a 0 past the end of static // strings, and the storage allocator will put a 0 past the end // of newly allocated char[]'s. if (*p == 0) // <-------- return string; This is undefined behaviour again. Is reading past the end of an allocated memory block valid and guaranteed not to produce a General ProtectionFault?If yes, can you please document this interesting property of the memory model?On some platforms, it certainly would be able to cause a GPF. Perhaps, Walter can guarantee that it does not with his Windows implementatation, and if this is the case, he is free to use dirty tricks like this. The spec needs to be portable, but the RTL does not behind the scenes. This one reminds me of a bug I once saw under SunOS (pre-Solaris): printf("%.*s", len, str) internally read (at least) one byte more than specified by 'len', and thereby introduced a GPF. If this was a bug in Sun's printf() or the user code, I cannot tell (I didn't care, as I had to fix it anyway). But it shows that one cannot portably rely on peeking a single byte beyond the string data. Regards, Martin M. Pedersen
Jun 25 2002
ofwrite(cast(char)"\n"); // <------- What is this supposed to do? Can you cast a string literal into a char?!You don't have character literals in D, so this is a way write this kindthing.Hey I missed that point: D doesn't have character literals! Oh no! You have to explicitely cast a string literal to a char type to get a literal char value? I have searched the docs, and there are nothing about casting char[] to char. So if you go on like: char [] a = "long long text here"; for(int i = 0; i < a.length; i++) { if(a[i] == "a") // <---- printf("%c", a[i]); } Will the compare in the noted line be a nasty string compare, with constructing a temporary string value from the character in a[i], and then comparing all the (one) characters of the two strings? That would ruin the performance. Or is this string literal converted into a char literal, just because it is 1 character long and/or compared to a char type? Does this mean you can also cast int[] to int? And int[][] to int[] to int. Implicit and/or explicit? Yours, Sandor
Jun 26 2002
On Wed=2C 26 Jun 2002 16=3A53=3A10 +0200 =22Sandor Hojtsy=22 =3Chojtsy=40index=2Ehu=3E wrote=3A =3E Hey I missed that point=3A D doesn't have character literals! =3E Oh no! You have to explicitely cast a string literal to a char type to get a =3E literal char value=3F No=2E It is done implicitly whenever possible=2E Only when you have two versions of overloaded function=2C one taking char=5B=5D=2C and another just char=2C then you have to use a cast=2E =3E So if you go on like=3A =3E =3E char =5B=5D a =3D =22long long text here=22=3B =3E for=28int i =3D 0=3B i =3C a=2Elength=3B i++=29 =3E { =3E if=28a=5Bi=5D =3D=3D =22a=22=29 =2F=2F =3C---- =3E printf=28=22%c=22=2C a=5Bi=5D=29=3B =3E } =3E =3E Will the compare in the noted line be a nasty string compare=2C with =3E constructing a temporary string value from the character in a=5Bi=5D=2C and then =3E comparing all the =28one=29 characters of the two strings=3F That would ruin the =3E performance=2E No=2C it'll be a char compare=2E I'll give another example where it is clearer=2E You can write=3A =09char c =3D getc=28=29=3B =09int n =3D c - =220=22=3B It'll work=2E Obviously=2C =220=22 is a char here=2C not a string=2E Also=2C this only applies to string =5Fliterals=5F - variables=2C char or not=2C cannot be casted to arrays=2C or vice versa=2E
Jun 26 2002
Pavel Do you know what the reasoning behind the prohibition of char literals was? Matthew "Pavel Minayev" <evilone omen.ru> wrote in message news:CFN374338570968982 news.digitalmars.com... On Wed, 26 Jun 2002 16:53:10 +0200 "Sandor Hojtsy" <hojtsy index.hu> wrote:Hey I missed that point: D doesn't have character literals! Oh no! You have to explicitely cast a string literal to a char type to getaliteral char value?No. It is done implicitly whenever possible. Only when you have two versions of overloaded function, one taking char[], and another just char, then you have to use a cast.So if you go on like: char [] a = "long long text here"; for(int i = 0; i < a.length; i++) { if(a[i] == "a") // <---- printf("%c", a[i]); } Will the compare in the noted line be a nasty string compare, with constructing a temporary string value from the character in a[i], and then comparing all the (one) characters of the two strings? That would ruin the performance.No, it'll be a char compare. I'll give another example where it is clearer. You can write: char c = getc(); int n = c - "0"; It'll work. Obviously, "0" is a char here, not a string. Also, this only applies to string _literals_ - variables, char or not, cannot be casted to arrays, or vice versa.
Jun 26 2002
On Thu=2C 27 Jun 2002 11=3A20=3A58 +1000 =22Matthew Wilson=22 =3Cmatthew=40thedjournal=2Ecom=3E wrote=3A =3E Pavel =3E =3E Do you know what the reasoning behind the prohibition of char literals was=3F The reason was to simplify the language =28no need to remember where to use single quotes and where double ones are required=29=2C and to free single quotes for another purpose=2C I think=2E Just to remind=2C in D single-quoted string literals don't support escape characters=2C so they are good for writing pathnames=3A was=09=22C=3A=5C=5Cbla=5C=5Cbla=5C=5Cbla=22 now 'C=3A=5Cbla=5Cbla=5Cbla'
Jun 27 2002
How do you embed a single quote into a single-quoted string then? And this difference alone makes people have to remember the difference between a single- and double-quoted string. I liked C's char literals better. At least you always knew what you were getting. Sean "Pavel Minayev" <evilone omen.ru> wrote in message news:CFN374344654990394 news.digitalmars.com... On Thu, 27 Jun 2002 11:20:58 +1000 "Matthew Wilson" <matthew thedjournal.com> wrote:Pavel Do you know what the reasoning behind the prohibition of char literalswas? The reason was to simplify the language (no need to remember where to use single quotes and where double ones are required), and to free single quotes for another purpose, I think. Just to remind, in D single-quoted string literals don't support escape characters, so they are good for writing pathnames: was "C:\\bla\\bla\\bla" now 'C:\bla\bla\bla'
Jun 27 2002
On Thu, 27 Jun 2002 02:04:31 -0700 "Sean L. Palmer" <seanpalmer earthlink.net> wrote:How do you embed a single quote into a single-quoted string then?'this is a' \' 'quoted' \' 'string'And this difference alone makes people have to remember the difference between a single- and double-quoted string.But it's quite common. I've seen it in PHP before, and somewhere else, I just don't remember. Besides, it's just so convenient for pathnames under Windows. And if you don't like it, you can just always use double quotes, after all.I liked C's char literals better. At least you always knew what you were getting.It _might_ seem confusing (it did so to me at first), but it turns out you get used to it rather quickly. BTW, Pascal programmers use that kind of thing for more than fifteen years already, and I didn't hear anyone complain!
Jun 27 2002
Hi, "Pavel Minayev" <evilone omen.ru> wrote in message news:CFN374346986391204 news.digitalmars.com...But it's quite common. I've seen it in PHP before, and somewhere else,The UNIX shell does the same thing. Regards, Martin M. Pedersen
Jun 27 2002
"Pavel Minayev" <evilone omen.ru> wrote in message news:CFN374346986391204 news.digitalmars.com...On Thu, 27 Jun 2002 02:04:31 -0700 "Sean L. Palmer"<seanpalmer earthlink.net>wrote:wereHow do you embed a single quote into a single-quoted string then?'this is a' \' 'quoted' \' 'string'And this difference alone makes people have to remember the difference between a single- and double-quoted string.But it's quite common. I've seen it in PHP before, and somewhere else, I just don't remember. Besides, it's just so convenient for pathnames under Windows. And if you don't like it, you can just always use double quotes, after all.I liked C's char literals better. At least you always knew what youI started with Pascal before I used C and was rather surprised that you had to use different quotes for characters than for strings. Pascal handles strings a lot better than C, so I thought that it was like an 'advanced' feature! :) I kinda liked the C style though, but this is even better. I do a little PHP programming, and being able to turn escaping on and off at will is really convenient! -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mailgetting.It _might_ seem confusing (it did so to me at first), but it turns out you get used to it rather quickly. BTW, Pascal programmers use that kind of thing for more than fifteen years already, and I didn't hear anyone complain!
Jun 27 2002
You're correct about the reasoning. -Walter "Pavel Minayev" <evilone omen.ru> wrote in message news:CFN374344654990394 news.digitalmars.com... On Thu, 27 Jun 2002 11:20:58 +1000 "Matthew Wilson" <matthew thedjournal.com> wrote:Do you know what the reasoning behind the prohibition of char literalswas? The reason was to simplify the language (no need to remember where to use single quotes and where double ones are required), and to free single quotes for another purpose, I think. Just to remind, in D single-quoted string literals don't support escape characters, so they are good for writing pathnames: was "C:\\bla\\bla\\bla" now 'C:\bla\bla\bla'
Jul 10 2002