digitalmars.D.learn - Problems using strings in D

Grzegorz Adam Hankiewicz (22/22) Jan 24 2006 I am trying to run this little program:

Chris Miller (8/29) Jan 24 2006 Use copy-on-write unless you know you are the sole owner of a string.
Jarrett Billingsley (8/20) Jan 24 2006 The equivalent Windows code (changing / to \ in the path name) doesn't
Walter Bright (7/21) Jan 25 2006 String literals are read-only. std.path.getBaseName() is returing a slic...

James Dunne (2/33) Jan 25 2006 const might've told us this. =D

Sean Kelly (8/39) Jan 25 2006 The irritating thing is that the string literal is merely used for

Sean Kelly (7/46) Jan 25 2006 Alternately, perhaps it should be a popular D idiom to do the following:

Kris (12/20) Jan 25 2006 Alternatively, the compiler should support the notion that /some/ data i...

Sean Kelly (12/31) Jan 25 2006 Agreed :-) And now that I think about it, the compiler should be able

Sean Kelly (11/18) Jan 25 2006 I take it back :-P. Passing through an opaque function call as in the

Sean Kelly (26/26) Jan 25 2006 Okay, I've given this some thought and perhaps the best approach would

Russ Lewis (4/33) Jan 26 2006 IMHO, this is a very good idea! Assuming that it is part of bounds

Ameer Armaly (3/36) Jan 26 2006 I agree; integrating this with bounds checks would be real nice.

Walter Bright (11/29) Jan 25 2006 It is getting some detection - a seg fault. The whole reason for putting...

Sean Kelly (22/35) Jan 25 2006 Is there any way to trap such a write attempt in Windows? For example,

Jarrett Billingsley (17/24) Jan 26 2006 Using Visual Studio 6 (which uses WinDbg), and converting the given gode...

Derek Parnell (6/25) Jan 29 2006 Would it be possible to detect this at compile time rather than run time...

Grzegorz Adam Hankiewicz (19/21) Jan 28 2006 Why does D allow assignment of read only data to a read/write

Grzegorz Adam Hankiewicz <fake dont.use> writes:

I am trying to run this little program:

    import std.stdio;
    import std.path;
    
    int main()
    {
        char[] test_string = null;
        char[] original = "/home/.resource";
        test_string = getBaseName(original);
        test_string[2] = 'a';
        writefln("is %s like %s?", original, test_string);
        return 0;
    }

But I get a core dump. gdb points at the line where getBaseName is
being called.

 (gdb) bt


 (gdb) f 0

 8           test_string = getBaseName(original);

Why does this happen and how do I prevent this?

Jan 24 2006

"Chris Miller" <chris dprogramming.com> writes:

On Tue, 24 Jan 2006 17:18:04 -0500, Grzegorz Adam Hankiewicz  
<fake dont.use> wrote:

 I am trying to run this little program:

     import std.stdio;
     import std.path;
    int main()
     {
         char[] test_string = null;
         char[] original = "/home/.resource";
         test_string = getBaseName(original);
         test_string[2] = 'a';
         writefln("is %s like %s?", original, test_string);
         return 0;
     }

 But I get a core dump. gdb points at the line where getBaseName is
 being called.

  (gdb) bt


  (gdb) f 0

  8           test_string = getBaseName(original);

 Why does this happen and how do I prevent this?

Use copy-on-write unless you know you are the sole owner of a string.  
getBaseName() returns a slice of original.

test_string = getBaseName(original);
test_string = test_string.dup; // Get my own copy.
test_string[2] = 'a';
test_string[3] = 'b'; // I'm still the sole owner.

Jan 24 2006

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message 
news:pan.2006.01.24.22.18.02.498385 dont.use...
I am trying to run this little program:

    import std.stdio;
    import std.path;

    int main()
    {
        char[] test_string = null;
        char[] original = "/home/.resource";
        test_string = getBaseName(original);
        test_string[2] = 'a';
        writefln("is %s like %s?", original, test_string);
        return 0;
    }

The equivalent Windows code (changing / to \ in the path name) doesn't 
segfault.  Try putting a .dup on the end of that string literal; I know 
there's a problem (?) in Linux where string literals are stored in a 
read-only segment, so trying to modify them (which is what your code will 
do) will cause a .. problem.  Maybe gdb or DMD got the line off by one, as I 
would expect the segfault to happen on line 9.

Jan 24 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message 
news:pan.2006.01.24.22.18.02.498385 dont.use...
I am trying to run this little program:

    import std.stdio;
    import std.path;

    int main()
    {
        char[] test_string = null;
        char[] original = "/home/.resource";
        test_string = getBaseName(original);
        test_string[2] = 'a';
        writefln("is %s like %s?", original, test_string);
        return 0;
    }

 But I get a core dump. gdb points at the line where getBaseName is
 being called.

String literals are read-only. std.path.getBaseName() is returing a slice of 
its argument, which will be into read-only data. The seg fault comes from 
attempting to write into that read-only data.

The COW (copy-on-write) fix to your code would be:

    test_string = getBaseName(original).dup;

Jan 25 2006

James Dunne <james.jdunne gmail.com> writes:

Walter Bright wrote:
 "Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message 
 news:pan.2006.01.24.22.18.02.498385 dont.use...
 
I am trying to run this little program:

   import std.stdio;
   import std.path;

   int main()
   {
       char[] test_string = null;
       char[] original = "/home/.resource";
       test_string = getBaseName(original);
       test_string[2] = 'a';
       writefln("is %s like %s?", original, test_string);
       return 0;
   }

But I get a core dump. gdb points at the line where getBaseName is
being called.

 
 
 String literals are read-only. std.path.getBaseName() is returing a slice of 
 its argument, which will be into read-only data. The seg fault comes from 
 attempting to write into that read-only data.
 
 The COW (copy-on-write) fix to your code would be:
 
     test_string = getBaseName(original).dup; 
 
 

const might've told us this. =D

Jan 25 2006

Sean Kelly <sean f4.ca> writes:

James Dunne wrote:
 Walter Bright wrote:
 "Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message 
 news:pan.2006.01.24.22.18.02.498385 dont.use...

 I am trying to run this little program:

   import std.stdio;
   import std.path;

   int main()
   {
       char[] test_string = null;
       char[] original = "/home/.resource";
       test_string = getBaseName(original);
       test_string[2] = 'a';
       writefln("is %s like %s?", original, test_string);
       return 0;
   }

 But I get a core dump. gdb points at the line where getBaseName is
 being called.

 String literals are read-only. std.path.getBaseName() is returing a 
 slice of its argument, which will be into read-only data. The seg 
 fault comes from attempting to write into that read-only data.

 The COW (copy-on-write) fix to your code would be:

     test_string = getBaseName(original).dup;

 
 const might've told us this. =D

The irritating thing is that the string literal is merely used for 
initialization in the above case.  This almost has me wishing such cases 
would always cause an allocation/memcpy instead of referencing the 
original string.  Perhaps this could be a rule when non-const arrays are 
initialized with const data?  What happens if a static initializer is 
used for an int[] array and then someone attempts an in-place modification?


Sean

Jan 25 2006

Sean Kelly <sean f4.ca> writes:

Sean Kelly wrote:
 James Dunne wrote:
 Walter Bright wrote:
 "Grzegorz Adam Hankiewicz" <fake dont.use> wrote in message 
 news:pan.2006.01.24.22.18.02.498385 dont.use...

 I am trying to run this little program:

   import std.stdio;
   import std.path;

   int main()
   {
       char[] test_string = null;
       char[] original = "/home/.resource";
       test_string = getBaseName(original);
       test_string[2] = 'a';
       writefln("is %s like %s?", original, test_string);
       return 0;
   }

 But I get a core dump. gdb points at the line where getBaseName is
 being called.

 String literals are read-only. std.path.getBaseName() is returing a 
 slice of its argument, which will be into read-only data. The seg 
 fault comes from attempting to write into that read-only data.

 The COW (copy-on-write) fix to your code would be:

     test_string = getBaseName(original).dup;

 const might've told us this. =D

 
 The irritating thing is that the string literal is merely used for 
 initialization in the above case.  This almost has me wishing such cases 
 would always cause an allocation/memcpy instead of referencing the 
 original string.  Perhaps this could be a rule when non-const arrays are 
 initialized with const data?  What happens if a static initializer is 
 used for an int[] array and then someone attempts an in-place modification?

Alternately, perhaps it should be a popular D idiom to do the following:

char[] original = "/home/.resource".dup;

This would allow for efficiency when it is desired (and eliminate the 
need for a language change), but should dramatically reduce the chance 
of such errors.


Sean

Jan 25 2006

"Kris" <fu bar.com> writes:

"Sean Kelly" <sean f4.ca> wrote ...
 The irritating thing is that the string literal is merely used for 
 initialization in the above case.  This almost has me wishing such cases 
 would always cause an allocation/memcpy instead of referencing the 
 original string.  Perhaps this could be a rule when non-const arrays are 
 initialized with const data?  What happens if a static initializer is 
 used for an int[] array and then someone attempts an in-place 
 modification?

 Alternately, perhaps it should be a popular D idiom to do the following:


Alternatively, the compiler should support the notion that /some/ data is 
actually read-only; and report it as such. That would solve many problems.

CoW may very well look OK on paper ~ yet in my experience, when applying it 
to anything but trivialities, it's actually full of hollow promise. Reality 
rarely follows academic theory.

The true problem here is not convention per se. Instead it is the lack of 
compiler enforcement with respect to one convention or another. It's easy to 
say "Oh, one should follow the gentleman's agreement of copy upon write" ~ 
that's just cheap talk. It would be quite another thing if the compiler 
would enforce this. I rather suspect such enforcement would be more 
difficult than providing a limited, language-supported, read-only attribute.

Jan 25 2006

Sean Kelly <sean f4.ca> writes:

Kris wrote:
 "Sean Kelly" <sean f4.ca> wrote ...
 The irritating thing is that the string literal is merely used for 
 initialization in the above case.  This almost has me wishing such cases 
 would always cause an allocation/memcpy instead of referencing the 
 original string.  Perhaps this could be a rule when non-const arrays are 
 initialized with const data?  What happens if a static initializer is 
 used for an int[] array and then someone attempts an in-place 
 modification?

 Alternately, perhaps it should be a popular D idiom to do the following:

 
 
 Alternatively, the compiler should support the notion that /some/ data is 
 actually read-only; and report it as such. That would solve many problems.

Agreed :-)  And now that I think about it, the compiler should be able 
to detect such problems, as it does not seem terribly difficult to 
determine whether a write is being performed on something in the const 
data area vs. somewhere else.

 The true problem here is not convention per se. Instead it is the lack of 
 compiler enforcement with respect to one convention or another. It's easy to 
 say "Oh, one should follow the gentleman's agreement of copy upon write" ~ 
 that's just cheap talk. It would be quite another thing if the compiler 
 would enforce this. I rather suspect such enforcement would be more 
 difficult than providing a limited, language-supported, read-only attribute. 

See above.  I think such a flag may not actually be necessary in this 
case, simply because code generation for const data tends to be somewhat 
distinct.  Perhaps some late stage analysis could be performed to detect 
this problem?  I'm kind of guessing here, but in the small amount of 
compiler work I've done in the past I think this would have been fairly 
simple to implement.


Sean

Jan 25 2006

Sean Kelly <sean f4.ca> writes:

Sean Kelly wrote:
 
 See above.  I think such a flag may not actually be necessary in this 
 case, simply because code generation for const data tends to be somewhat 
 distinct.  Perhaps some late stage analysis could be performed to detect 
 this problem?  I'm kind of guessing here, but in the small amount of 
 compiler work I've done in the past I think this would have been fairly 
 simple to implement.

I take it back :-P.  Passing through an opaque function call as in the 
original example tosses the possibility of code analysis out the window. 
  But some detection might be better than none in this case.  Also, it 
would be nice if the system reported a meaningful error message if this 
occurs--perhaps something indicating that the segfault occurred from an 
attempted write to const data?  But once you're stuck with runtime 
detection, I don't really care if the problem is first noticed by a 
software flag or a hardware fault.  In fact, loading a core dump makes 
reproducing the problem fairly simple in most cases.


Sean

Jan 25 2006

Sean Kelly <sean f4.ca> writes:

Okay, I've given this some thought and perhaps the best approach would 
be to reconsider bounds checking under the looser category of "data 
access checking."  Bounds checking would be a required minimum and 
anything beyond that would be left as a QOI issue for the compiler 
developers.  Adding "write to static data" checking should be a trivial 
modification of the existing bounds checking code.  If you assume the 
existing bounds checking code is this:

// assume p is a pointer to the write
// location and a is the array object
if( p < &a[0] || p >= &a[$] ) {
     onArrayBoundsError( __FILE__, __LINE__ );
}

The it would simply be a matter of adding two new constant variables to 
store the top and bottom of the static area (or determining the 
locations dynamically as in the current DMD GC code) and adding an 
additional check:

// assume sb is a pointer to the base of the const data area
// and st is a pointer to one past the top of that area
if( p >= sb && p < st ) {
     onInvalidWriteError( __FILE__, __LINE__ );
}

This eliminates the need for per-variable flag maintenance and offers an 
easy way to turn off the checking if it is not desired.  And since this 
is conceptually (and functionally) quite similar to bounds checking 
anyway, it should be a fairly painless extension of established practice.


Sean

Jan 25 2006

Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:

Sean Kelly wrote:
 Okay, I've given this some thought and perhaps the best approach would 
 be to reconsider bounds checking under the looser category of "data 
 access checking."  Bounds checking would be a required minimum and 
 anything beyond that would be left as a QOI issue for the compiler 
 developers.  Adding "write to static data" checking should be a trivial 
 modification of the existing bounds checking code.  If you assume the 
 existing bounds checking code is this:
 
 // assume p is a pointer to the write
 // location and a is the array object
 if( p < &a[0] || p >= &a[$] ) {
     onArrayBoundsError( __FILE__, __LINE__ );
 }
 
 The it would simply be a matter of adding two new constant variables to 
 store the top and bottom of the static area (or determining the 
 locations dynamically as in the current DMD GC code) and adding an 
 additional check:
 
 // assume sb is a pointer to the base of the const data area
 // and st is a pointer to one past the top of that area
 if( p >= sb && p < st ) {
     onInvalidWriteError( __FILE__, __LINE__ );
 }
 
 This eliminates the need for per-variable flag maintenance and offers an 
 easy way to turn off the checking if it is not desired.  And since this 
 is conceptually (and functionally) quite similar to bounds checking 
 anyway, it should be a fairly painless extension of established practice.

IMHO, this is a very good idea!  Assuming that it is part of bounds 
checking, and thus it would disappear on release builds, then this would 
be a very good thing to do on debug builds.

Jan 26 2006

"Ameer Armaly" <ameer_armaly hotmail.com> writes:

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message 
news:drbdai$20ns$1 digitaldaemon.com...
 Sean Kelly wrote:
 Okay, I've given this some thought and perhaps the best approach would be 
 to reconsider bounds checking under the looser category of "data access 
 checking."  Bounds checking would be a required minimum and anything 
 beyond that would be left as a QOI issue for the compiler developers. 
 Adding "write to static data" checking should be a trivial modification 
 of the existing bounds checking code.  If you assume the existing bounds 
 checking code is this:

 // assume p is a pointer to the write
 // location and a is the array object
 if( p < &a[0] || p >= &a[$] ) {
     onArrayBoundsError( __FILE__, __LINE__ );
 }

 The it would simply be a matter of adding two new constant variables to 
 store the top and bottom of the static area (or determining the locations 
 dynamically as in the current DMD GC code) and adding an additional 
 check:

 // assume sb is a pointer to the base of the const data area
 // and st is a pointer to one past the top of that area
 if( p >= sb && p < st ) {
     onInvalidWriteError( __FILE__, __LINE__ );
 }

 This eliminates the need for per-variable flag maintenance and offers an 
 easy way to turn off the checking if it is not desired.  And since this 
 is conceptually (and functionally) quite similar to bounds checking 
 anyway, it should be a fairly painless extension of established practice.

 IMHO, this is a very good idea!  Assuming that it is part of bounds 
 checking, and thus it would disappear on release builds, then this would 
 be a very good thing to do on debug builds.

I agree; integrating this with bounds checks would be real nice.

Jan 26 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message 
news:dr9642$rh$1 digitaldaemon.com...
 Sean Kelly wrote:
 See above.  I think such a flag may not actually be necessary in this 
 case, simply because code generation for const data tends to be somewhat 
 distinct.  Perhaps some late stage analysis could be performed to detect 
 this problem?  I'm kind of guessing here, but in the small amount of 
 compiler work I've done in the past I think this would have been fairly 
 simple to implement.

 I take it back :-P.  Passing through an opaque function call as in the 
 original example tosses the possibility of code analysis out the window. 
 But some detection might be better than none in this case.

It is getting some detection - a seg fault. The whole reason for putting 
const data into a read-only segment is to get hardware detection and 
enforcement.

  Also, it would be nice if the system reported a meaningful error message 
 if this occurs--perhaps something indicating that the segfault occurred 
 from an attempted write to const data?

You should get such an indication if you're running it under a decent 
debugger.

  But once you're stuck with runtime detection, I don't really care if the 
 problem is first noticed by a software flag or a hardware fault.  In fact, 
 loading a core dump makes reproducing the problem fairly simple in most 
 cases.

All seg faults are are the hardware doing the checking for you rather than 
having to do it by adding instructions. Along with a good debugger, it's 
pretty good, and has the nice characteristic that it doesn't bloat the code 
or slow the execution.

Jan 25 2006

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 "Sean Kelly" <sean f4.ca> wrote in message 
 news:dr9642$rh$1 digitaldaemon.com...

 I take it back :-P.  Passing through an opaque function call as in the 
 original example tosses the possibility of code analysis out the window. 
 But some detection might be better than none in this case.

 
 It is getting some detection - a seg fault. The whole reason for putting 
 const data into a read-only segment is to get hardware detection and 
 enforcement.

Is there any way to trap such a write attempt in Windows?  For example, 
this code:

import std.c.stdio;

const char[] c = "hello";

void main()
{
     c[1] = 'a';
     printf( "%.*s\n", c );
}

runs to completion in Windows and prints "hello" (ie. the assignment is 
effectively ignored).  Removing the 'const' prints "hallo" as expected. 
  But while this is better than having the const data altered by a 
write, it also doesn't make bugs known.  All in all, I do really prefer 
to rely on the hardware to signal this, but if that's not possible I 
still want to have *some* indication that such a write was 
attempted--this was one reason I suggested extending bounds checking. 
Is it simply that Windows doesn't have a trap set up for this situation?

 All seg faults are are the hardware doing the checking for you rather than 
 having to do it by adding instructions. Along with a good debugger, it's 
 pretty good, and has the nice characteristic that it doesn't bloat the code 
 or slow the execution. 

Agreed.  And every debugger I've used can halt on such errors to allow 
the problem to be debugged.  But I'm not sure whether a debugger would 
catch the above situation in Windows (I'll admit I've never tried it).


Sean

Jan 25 2006

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Sean Kelly" <sean f4.ca> wrote in message 
news:dr9teh$hmt$1 digitaldaemon.com...
 All seg faults are are the hardware doing the checking for you rather 
 than having to do it by adding instructions. Along with a good debugger, 
 it's pretty good, and has the nice characteristic that it doesn't bloat 
 the code or slow the execution.

 Agreed.  And every debugger I've used can halt on such errors to allow the 
 problem to be debugged.  But I'm not sure whether a debugger would catch 
 the above situation in Windows (I'll admit I've never tried it).

Using Visual Studio 6 (which uses WinDbg), and converting the given gode 
into a WinMain() function, the debugger does indeed catch the access 
violation, but most of the time it breaks at.. some dissasembly in the 
middle of NTDLL.  Which is useless.  And the call stack in that case doesn't 
help either - WinDbg doesn't really seem to like the D calling convention, 
so it somehow just hides calls to D functions, making the call stack 
something like

NTDLL
WinMain
NTKERNEL

When there are supposed to be calls to any number of D functions between 
NTDLL and WinMain.

In fact, I've tried several different scenarios, and have yet to get VS6 to 
break to the line of the access violation.  It always breaks to the middle 
of NTDLL.

Jan 26 2006

"Derek Parnell" <derek psych.ward> writes:

On Thu, 26 Jan 2006 12:45:20 +1100, Walter Bright  
<newshound digitalmars.com> wrote:

 "Sean Kelly" <sean f4.ca> wrote in message
 news:dr9642$rh$1 digitaldaemon.com...
 Sean Kelly wrote:
 See above.  I think such a flag may not actually be necessary in this
 case, simply because code generation for const data tends to be  
 somewhat
 distinct.  Perhaps some late stage analysis could be performed to  
 detect
 this problem?  I'm kind of guessing here, but in the small amount of
 compiler work I've done in the past I think this would have been fairly
 simple to implement.

 I take it back :-P.  Passing through an opaque function call as in the
 original example tosses the possibility of code analysis out the window.
 But some detection might be better than none in this case.

 It is getting some detection - a seg fault. The whole reason for putting
 const data into a read-only segment is to get hardware detection and
 enforcement.

Would it be possible to detect this at compile time rather than run time?

-- 
Derek Parnell
Melbourne, Australia

Jan 29 2006

Grzegorz Adam Hankiewicz <fake dont.use> writes:

The Wed, 25 Jan 2006 11:48:30 -0800, Walter Bright wrote:
 String literals are read-only. [...] The seg fault comes from
 attempting to write into that read-only data.

Why does D allow assignment of read only data to a read/write
variable (without an explicit cast)?  Why is the following code
allowed to compile with no warning and crash at runtime?

    int main()
    {
        const char[] test = "this is a test";
        test[2] = 'b';
        return 0;
    }

Why does the following not crash and yields "String is thbs is a"?

import std.stdio;

    int main()
    {
        const char[10] test = "this is a ";
        test[2] = 'b';
        writefln("String is %s", test);
        return 0;
    }

Jan 28 2006

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Problems using strings in D