www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 2418] New: Same-value string (char[]) literals get overwritten (unlike array literals)

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2418

           Summary: Same-value string (char[]) literals get overwritten
                    (unlike array literals)
           Product: D
           Version: 1.035
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: business3 twistedpairgaming.com


When a string is initialized with a string literal, and part of the string is
then changed, all other string literals with the same value reflect the same
change.

For the following code:

------------------
module test;
import tango.io.Stdout;

class FooChar
{
        char[] str;

        this(char[] _str)
        {
                this.str = _str;
        }
}

class FooInt
{
        int[] ints;

        this(int[] _ints)
        {
                this.ints = _ints;
        }
}

void main()
{
        FooChar a;
        FooChar b;
        FooChar c;
        FooChar d;

        a = new FooChar("AAA");
        Stdout.formatln("a.str: {}", a.str);
        a.str[0] = '!';
        Stdout.formatln("a.str: {}", a.str);
        a = new FooChar("AAA");
        Stdout.formatln("a.str: {}", a.str);
        Stdout.formatln("");

        b = new FooChar("AAA");
        Stdout.formatln("b.str: {}", b.str);
        b.str[0] = '!';
        Stdout.formatln("b.str: {}", b.str);
        b = new FooChar("AAA");
        Stdout.formatln("b.str: {}", b.str);
        Stdout.formatln("");

        c = new FooChar("AA1");
        Stdout.formatln("c.str: {}", c.str);
        c.str[0] = '!';
        Stdout.formatln("c.str: {}", c.str);
        c = new FooChar("AA1");
        Stdout.formatln("c.str: {}", c.str);
        Stdout.formatln("");

        const char[] dInit = "AA2";
        d = new FooChar(dInit);
        Stdout.formatln("d.str: {}", d.str);
        d.str[0] = '!';
        Stdout.formatln("d.str: {}", d.str);
        d = new FooChar(dInit);
        Stdout.formatln("d.str: {}", d.str);
        Stdout.formatln("");

        FooInt i;

        i = new FooInt([1, 2]);
        Stdout.formatln("i.ints: {}", i.ints);
        i.ints[0] = 77;
        Stdout.formatln("i.ints: {}", i.ints);
        i = new FooInt([1, 2]);
        Stdout.formatln("i.ints: {}", i.ints);
}
------------------

Expected output:
a.str: AAA
a.str: !AA
a.str: AAA

b.str: AAA
b.str: !AA
b.str: AAA

c.str: AA1
c.str: !A1
c.str: AA1

d.str: AA2
d.str: !A2
d.str: AA2

i.ints: [1, 2]
i.ints: [77, 2]
i.ints: [1, 2]

Actual output:
a.str: AAA
a.str: !AA
a.str: !AA

b.str: !AA
b.str: !AA
b.str: !AA

c.str: AA1
c.str: !A1
c.str: !A1

d.str: AA2
d.str: !A2
d.str: !A2

i.ints: [1, 2]
i.ints: [77, 2]
i.ints: [1, 2]

This might be a side-effect of the fix for #817:
http://d.puremagic.com/issues/show_bug.cgi?id=817


-- 
Oct 14 2008
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2418





------- Comment #1 from business3 twistedpairgaming.com  2008-10-14 17:59
-------
Sorry, I guess those classes are not needed:

---------------
module test;
import tango.io.Stdout;

void main()
{
        char[] a;
        char[] b;
        char[] c;
        char[] d;

        a = "AAA";
        Stdout.formatln("a: {}", a);
        a[0] = '!';
        Stdout.formatln("a: {}", a);
        a = "AAA";
        Stdout.formatln("a: {}", a);
        Stdout.formatln("");

        b = "AAA";
        Stdout.formatln("b: {}", b);
        b[0] = '!';
        Stdout.formatln("b: {}", b);
        b = "AAA";
        Stdout.formatln("b: {}", b);
        Stdout.formatln("");

        c = "AA1";
        Stdout.formatln("c: {}", c);
        c[0] = '!';
        Stdout.formatln("c: {}", c);
        c = "AA1";
        Stdout.formatln("c: {}", c);
        Stdout.formatln("");

        const char[] dInit = "AA2";
        d = dInit;
        Stdout.formatln("d: {}", d);
        d[0] = '!';
        Stdout.formatln("d: {}", d);
        d = dInit;
        Stdout.formatln("d: {}", d);
        Stdout.formatln("");

        int[] i;

        i = [1, 2];
        Stdout.formatln("i: {}", i);
        i[0] = 77;
        Stdout.formatln("i: {}", i);
        i = [1, 2];
        Stdout.formatln("i: {}", i);
}
---------------

Expected output:
a: AAA
a: !AA
a: AAA

b: AAA
b: !AA
b: AAA

c: AA1
c: !A1
c: AA1

d: AA2
d: !A2
d: AA2

i: [1, 2]
i: [77, 2]
i: [1, 2]

Actual output:
a: AAA
a: !AA
a: !AA

b: !AA
b: !AA
b: !AA

c: AA1
c: !A1
c: !A1

d: AA2
d: !A2
d: !A2

i: [1, 2]
i: [77, 2]
i: [1, 2]


-- 
Oct 14 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2418


2korden gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID




------- Comment #2 from 2korden gmail.com  2008-10-14 18:01 -------
The bug is at your site. You don't make string copies and work on a single
string.

char[] a = "AAA";
char[] b = a; // both point to the same location
a[0] = '!';
assert(b[0] == '!'); // should be true since they both point to the same
location

Same example in C (this may help understand better):
char* str = "AAA";
char* ptr = str;
str[0] = '!';
assert(ptr[0] == '!');

The actual bug is that the following line should not compile:
char[] a = "AAA";

Problem is, D1 doesn't have invariant type modifier and the spec is frozen, so
this will never be fixed. D2, however, doesn't allow you to do this.

Solution would be to make copies explicitly:
a = new FooChar("AAA".dup);


-- 
Oct 14 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2418


business3 twistedpairgaming.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |




------- Comment #3 from business3 twistedpairgaming.com  2008-10-14 20:34
-------
(In reply to comment #2)
 The bug is at your site. You don't make string copies and work on a single
 string.
 
 char[] a = "AAA";
 char[] b = a; // both point to the same location
 a[0] = '!';
 assert(b[0] == '!'); // should be true since they both point to the same
 location
 
 Same example in C (this may help understand better):
 char* str = "AAA";
 char* ptr = str;
 str[0] = '!';
 assert(ptr[0] == '!');
 

I don't think that's applicable (or at least, it shouldn't be). Here's a shortened example: char[] a = "AAA"; a[0] = '!'; char[] b = "AAA"; It's absurd that 'b' should become "!AA". It's clear that the compiler is "optimizing" both "AAA" string literals into the same reference: char[] _tmp1 = "AAA"; char[] a = _tmp1; a[0] = '!'; char[] b = _tmp1; But I'm saying that transformation should not be occurring. The two "AAA" literals should not share the same reference. In D2, array literals are immutable, so yes, in D2 it works out fine. But in D1, array literals are *not* immutable, so the compiler shouldn't assume that array literals with a equal *value* can safely share the same *reference*.
 The actual bug is that the following line should not compile:
 char[] a = "AAA";

Again, this is for D1, so the idea of strings being immutable doesn't apply. Besides, at the very least there's still an inconsistency: char[] a = "AAA"; a[0] = '!'; char[] b = "AAA"; // b is "!AA" char[] c = ['A', 'A', 'A']; c[0] = '!'; char[] d = ['A', 'A', 'A']; // d is "AAA" The above shows that two occurrences of "AAA" share the same reference, but two occurrences of ['A', 'A', 'A'] have separate references. The former leads to odd side-effects, the latter does not. --
Oct 14 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2418


2korden gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |INVALID




------- Comment #4 from 2korden gmail.com  2008-10-15 00:02 -------
 ------- Comment #3 from business3 twistedpairgaming.com  2008-10-14  

 I don't think that's applicable (or at least, it shouldn't be). Here's a
 shortened example:

 char[] a = "AAA";
 a[0] = '!';
 char[] b = "AAA";

 It's absurd that 'b' should become "!AA".

 It's clear that the compiler is
 "optimizing" both "AAA" string literals into the same reference:

 char[] _tmp1 = "AAA";
 char[] a = _tmp1;
 a[0] = '!';
 char[] b = _tmp1;

Of course it does! This string literal is put into executable. "AAA" is just a pointer to it. Please, understand that char[] is nothing more than a pointer to a string and an associated length. When you change the string via pointer, everyone who has that pointer gets the change. Compare to object: class A { int i = 15; } A aaa = new A(); // aaa is an analog of "AAA" A a = aaa; writefln(a); // prints 15 a.i = 42; A b = aaa; writefln(b); // why the hell it prints 42???? This is because you don't make a copy of aaa. Here is another example: while (true) { char[] a = "AAA"; a[0] = '!'; // in the next iteration, should a be "AAA" again? } If answer is yes, then there should be a memory allocation in line 2 under the hood. You can't get new string every time without a memory allocation. This, however, definitely should not occur. If you want to make an allocation - do it yourself explicitely. Code with the behaviour you expect should be as follows: while (true) { char[] a = "AAA".dup; writefln(a); a[0] = '!'; writefln(a); } This is by design, it works as it should. It can't be 'fixed' because it is not broken.
 Besides, at the very least there's still an inconsistency:

 char[] a = "AAA";
 a[0] = '!';
 char[] b = "AAA";  // b is "!AA"

 char[] c = ['A', 'A', 'A'];
 c[0] = '!';
 char[] d = ['A', 'A', 'A'];  // d is "AAA"

I agree, this might be a bug, d really ought to be "!AAA"! Do you want to create a new bugreport so that the latter case would be 'fixed'? :) --
Oct 14 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2418





------- Comment #5 from kamm-removethis incasoftware.de  2008-10-15 02:36
-------
The D spec says "String literals are immutable (read only)." even in D1. So
code like this

char[] a = "ABC";
a[0] = '!';

is illegal (and will segfault on linux!).

Array literals are allocated on the heap (the D spec explicitly says so) and
will thus behave fundamentally different from string literals. 


-- 
Oct 15 2008
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2418





------- Comment #6 from business3 twistedpairgaming.com  2008-10-15 05:13
-------
(In reply to comment #5)
 The D spec says "String literals are immutable (read only)." even in D1. So
 code like this
 
 char[] a = "ABC";
 a[0] = '!';
 
 is illegal (and will segfault on linux!).
 
 Array literals are allocated on the heap (the D spec explicitly says so) and
 will thus behave fundamentally different from string literals. 
 

Alright, I see now. It does seem odd though that array literals would carry a type that lacks any sort of const/read-only/immutable/etc qualifier, but is still immutable anyway. Just another of D2's improvements I guess. Out of curiosity, any idea if the lack of a run-time error on windows is due to windows itself or just something DMD/OPTLINK does differently between the platforms? --
Oct 15 2008