www.digitalmars.com         C & C++   DMDScript  

D - Could not believe it!

reply "Bob W" <nospam aol.com> writes:
/*

Strange Bug:

This program produces erroneous outputs,
if the number of characters joined together
in the printf() function matches those in the
examples shown.


System:
  dmd V0.81 on WinXP


Output:

Look at - this!
Look at           <<<<< joined array contents duplicated

Test2: Look + at  <<<<< correct - unless 1 char removed
this!


Now look at that#!  <<<<< control character inserted

*/

import std.string;

char[][] a;
char[][] b;

int main (char[][] args) {

// This produces an incorrect output:

  a~="Look";  a~="at";
  printf("\n");
  printf(join(a," ") ~ " - this!\n");


// This is fine, but when for example
// 'Test2' is changed to 'Test' the
// output will be incorrect

  printf("\n\n");
  printf("Test2: " ~ join(a," + ") ~ "\n");
  printf("this!\n");


// This puts a control character between
// 'that' and the exclamation mark

  printf("\n\n");
  b~="Now";  b~="look";  b~="at";  b~="that";
  printf(join(b," "));
  printf("!\n");

  return(0);
}
Mar 16 2004
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Nothing strange. printf is a C function. In C, strings are 0-terminated. 
In D, only literals for compatibility are, generic strings as resulting 
from array operations are not.

See http://www.prowiki.org/wiki4d/wiki.cgi?HowTo/printf

You can use something like toStringz to make a null-terminated string 
from D string.

In general, printf would either get "banned" (from public examples etc, 
not from library) or we put a D-specific version together. But we 
haven't decided yet. Stream IO from Phobos is strongly recommended over 
C IO unless you know exactly what you are doing.

So welcome here and good luck.

-eye



Bob W schrieb:
 /*
 
 Strange Bug:
 
 This program produces erroneous outputs,
 if the number of characters joined together
 in the printf() function matches those in the
 examples shown.
 
 
 System:
   dmd V0.81 on WinXP
 
 
 Output:
 
 Look at - this!
 Look at           <<<<< joined array contents duplicated
 
 Test2: Look + at  <<<<< correct - unless 1 char removed
 this!
 
 
 Now look at that#!  <<<<< control character inserted
 
 */
 
 import std.string;
 
 char[][] a;
 char[][] b;
 
 int main (char[][] args) {
 
 // This produces an incorrect output:
 
   a~="Look";  a~="at";
   printf("\n");
   printf(join(a," ") ~ " - this!\n");
 
 
 // This is fine, but when for example
 // 'Test2' is changed to 'Test' the
 // output will be incorrect
 
   printf("\n\n");
   printf("Test2: " ~ join(a," + ") ~ "\n");
   printf("this!\n");
 
 
 // This puts a control character between
 // 'that' and the exclamation mark
 
   printf("\n\n");
   b~="Now";  b~="look";  b~="at";  b~="that";
   printf(join(b," "));
   printf("!\n");
 
   return(0);
 }
 
 
 

Mar 16 2004
parent reply "Bob W" <nospam aol.com> writes:
Thanks for your reply.

If I understand you correctly the following would happen:


// This works because a literal is used as parameter1

    printf("Simple string 1\n");


// This will still work because 's' points to a literal

    char[] s="Simple string 2\n";
    printf(s);


// This is ok, because both literals are merged during compile-time

    printf("Simple " ~ "string 3\n");


// Just this one is a problem because variable and literal seem
// to be merged at runtime, so a 'genuine' D string is created.
// Furthermore the end of the new string is sitting just before a
// 16-bytes boundary, which prevents eventual zero padding to
// come as a rescue.

    char[] s="Simple ";
    printf(s ~ "string 4\n");      // quick fix: add '\0' after '\n' ?


-----------------------------------------------------

"Ilya Minkov" <minkov cs.tum.edu> wrote in message
news:c3880m$2jc1$1 digitaldaemon.com...
 Nothing strange. printf is a C function. In C, strings are 0-terminated.
 In D, only literals for compatibility are, generic strings as resulting
 from array operations are not.

 See http://www.prowiki.org/wiki4d/wiki.cgi?HowTo/printf

 You can use something like toStringz to make a null-terminated string
 from D string.

 In general, printf would either get "banned" (from public examples etc,
 not from library) or we put a D-specific version together. But we
 haven't decided yet. Stream IO from Phobos is strongly recommended over
 C IO unless you know exactly what you are doing.

 So welcome here and good luck.

 -eye



 Bob W schrieb:
 /*

 Strange Bug:

 This program produces erroneous outputs,
 if the number of characters joined together
 in the printf() function matches those in the
 examples shown.


 System:
   dmd V0.81 on WinXP


 Output:

 Look at - this!
 Look at           <<<<< joined array contents duplicated

 Test2: Look + at  <<<<< correct - unless 1 char removed
 this!


 Now look at that#!  <<<<< control character inserted

 */

 import std.string;

 char[][] a;
 char[][] b;

 int main (char[][] args) {

 // This produces an incorrect output:

   a~="Look";  a~="at";
   printf("\n");
   printf(join(a," ") ~ " - this!\n");


 // This is fine, but when for example
 // 'Test2' is changed to 'Test' the
 // output will be incorrect

   printf("\n\n");
   printf("Test2: " ~ join(a," + ") ~ "\n");
   printf("this!\n");


 // This puts a control character between
 // 'that' and the exclamation mark

   printf("\n\n");
   b~="Now";  b~="look";  b~="at";  b~="that";
   printf(join(b," "));
   printf("!\n");

   return(0);
 }


Mar 17 2004
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Bob W schrieb:
 Thanks for your reply.

You're welcome!
 If I understand you correctly the following would happen:

Everything is right except for:
 // This is ok, because both literals are merged during compile-time
 
     printf("Simple " ~ "string 3\n");

With DMD it's true, but i'm not sure it is defined whether this should be so or not. That is, current compiler does it so, but if we get popular and get rivals, there this might mean a non null terminated array. It is even not defined, whether this concatenation happens at compile time or execution time.
 // Just this one is a problem because variable and literal seem
 // to be merged at runtime, so a 'genuine' D string is created.
 // Furthermore the end of the new string is sitting just before a
 // 16-bytes boundary, which prevents eventual zero padding to
 // come as a rescue.
 
     char[] s="Simple ";
     printf(s ~ "string 4\n");      // quick fix: add '\0' after '\n' ?

True. But you cannot rely on zero padding to work either. Adding \0 works, but such a string only makes sense for C functions. In D functions, this might cause problems, because when i.e. you conatinate something else to the end, you get a string with embedded 0! The thing is, when constant strings are emitted, they are padded with zeroes at the end. At runtime, a slice (a slice is a value, consisting of data pointer and length) into the constant area is assigned to the array variable. So there is a 0 right behind the array bound. As soon as any operations increasing the length are done, the array data is requiered to be copied. A new memory area is being allocated. Thus the zeroes are lost. In fact, it is a convention to copy on any change except for slicing. There are also other funny things which may happen, including: * You slice into a string literal. Simplest thing is you have a string, and decrease its length, You printf it and have the string go not to its real end, but further to 0 teminator, ie original length. * You printf using format string, which contains %s, and something afterwards. This something gets replaced by noise... %s is the wrong format, you should use ... can't remember, see the link. * Functions can write in the arrays they get as input, but if you change the length the change is not propagated back to the caller. In other words, semantics is semi-constant, where you have to make sure you either copy an array (array.dup) before changing it - if the change needs not be propagated - or use the inout modifyer. I think this should be in some newbee FAQ, please someone add if it's not. It's too late here. -eye
Mar 17 2004
parent "Bob W" <nospam aol.com> writes:
Got the info that my last reply to this thread was misposted.
I guess this one should work:



----- Original Message ----- 
From: "Ilya Minkov" <minkov cs.tum.edu>
Newsgroups: D
Sent: Thursday, 18 March, 2004 01:02
Subject: Re: Could not believe it!


 Bob W schrieb:
 Thanks for your reply.

You're welcome!
 If I understand you correctly the following would happen:

Everything is right except for:
 // This is ok, because both literals are merged during compile-time

     printf("Simple " ~ "string 3\n");

With DMD it's true, but i'm not sure it is defined whether this should be so or not. That is, current compiler does it so, but if we get popular and get rivals, there this might mean a non null terminated array. It is even not defined, whether this concatenation happens at compile time or execution time.

I am aware of this, but I just wanted to understand correctly the current phenomena I've experienced.
 // Just this one is a problem because variable and literal seem
 // to be merged at runtime, so a 'genuine' D string is created.
 // Furthermore the end of the new string is sitting just before a
 // 16-bytes boundary, which prevents eventual zero padding to
 // come as a rescue.

     char[] s="Simple ";
     printf(s ~ "string 4\n");      // quick fix: add '\0' after '\n' ?

True. But you cannot rely on zero padding to work either. Adding \0 works, but such a string only makes sense for C functions. In D functions, this might cause problems, because when i.e. you conatinate something else to the end, you get a string with embedded 0!

As long as the string is used just to be displayed, I personally would not worry about a '\0' being added. Otherwise it is a potential pitfall, I agree to that.
 The thing is, when constant strings are emitted, they are padded with
 zeroes at the end. At runtime, a slice (a slice is a value, consisting
 of data pointer and length) into the constant area is assigned to the
 array variable. So there is a 0 right behind the array bound. As soon as
 any operations increasing the length are done, the array data is
 requiered to be copied. A new memory area is being allocated. Thus the
 zeroes are lost. In fact, it is a convention to copy on any change
 except for slicing.

 There are also other funny things which may happen, including:

 * You slice into a string literal. Simplest thing is you have a string,
 and decrease its length, You printf it and have the string go not to its
 real end, but further to 0 teminator, ie original length.
 * You printf using format string, which contains %s, and something
 afterwards. This something gets replaced by noise... %s is the wrong
 format, you should use ... can't remember, see the link.

If you are referring to the '%.*s' crutch, I was quite astonished that there were no other measures found to get printf() to work with D-strings. I know that D is still in the alpha stage, but offering something like '%t' instead of '%.*s' to handle D-type of strings would help, because printf is something almost everyone will be using during an initial evaluation of D and beyond. Obscuring one of the most popular conversion specifiers for printf() probably does not really help in getting D promoted. Besides, a recent survey has shown that for some reason C++ is loosing market share to good old C, so printf() is here to stay for the next 100 years or so anyway ..... : )
 * Functions can write in the arrays they get as input, but if you change
 the length the change is not propagated back to the caller. In other
 words, semantics is semi-constant, where you have to make sure you
 either copy an array (array.dup) before changing it - if the change
 needs not be propagated - or use the inout modifyer.

That is good to know, thanks.
 I think this should be in some newbee FAQ, please someone add if it's
 not. It's too late here.

 -eye

Mar 18 2004