www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 2093] New: string concatenation modifies original

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093

           Summary: string concatenation modifies original
           Product: D
           Version: 2.014
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: bartosz relisoft.com


I will attach source code for this example. It's an XML parser. It should
produce the following output:
c:\D\Work>xml
root
    child
     color=red

         Text=foo bar baz
Instead it produces this:
c:\D\Work>xml
root
    rootd
     rootd=red

         Text=rootdar baz
The problem is that strings are modified after being copied, when the original
is concatenated upon. The problem goes away if I idup strings:
  _name = name.idup;
  _value = value.idup;
or when I replace 
  a ~= b;
with
  a = a ~ b;


-- 
May 10 2008
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093






Created an attachment (id=256)
 --> (http://d.puremagic.com/issues/attachment.cgi?id=256&action=view)
Test case


-- 
May 10 2008
prev sibling next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
<d-bugmail puremagic.com> wrote in message 
news:bug-2093-3 http.d.puremagic.com/issues/...

 or when I replace
  a ~= b;
 with
  a = a ~ b;
~ always creates a copy, but ~= will attempt to expand the array in-place. Now, if this is D2, and ~= is expanding an invariant(char)[] in-place, then _that_ is definitely an issue.
May 10 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093


smjg iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg iname.com





Welcome to the world of bug reporting.

The way to report a bug isn't to attach a 695-line program that contains some
functionality somewhere that exhibits the problem.

The correct manner is to post a small example that illustrates the problem,
typically either by writing a test program from scratch or by simplifying
little by little the program in which you found it.

If done well, the result will be small enough to post straight into the bug
report rather than attaching it.  DMD's code coverage analysis is a useful tool
for identifying unused parts of a program in order to cut them out, among other
things.


-- 
Nov 21 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093


smjg iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |wrong-code





I think I've finally managed to figure out what was going on.

----------
import std.stdio;

void main() {
    string s1, s2;

    s1 ~= "hello";
    s2 = s1;

    writefln(s1);
    writefln(s2);

    s1.length = 0;
    s1 ~= "Hi";

    writefln(s1);
    writefln(s2);
}
----------
hello
hello
Hi
Hillo
----------

This is the kind of testcase we like here.  Walter is more likely to fix a bug
if you make life easier for him by supplying something on which the cause can
easily be seen.


-- 
Nov 21 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093






This is a known bug and is a major array design flow. Arrays has no determined
owner (the only one who can grow without a reallocation if capacity permits):

import std.stdio;

void main()
{
    char[] s1, s2;
    s1.length = 100; // reserve the capacity
    s1.length = 0;

    s2 = s1; // both are pointing to an empty string with the capacity of 100

    s1 ~= "Hello"; // array is not reallocated, it is grown in-place
    writefln(s1);
    writefln(s2); // prints empty string. s2 still points to the same string
(which is now "Hello") and carries length of 0

    s2 ~= "Hi"; // overwrites s1
    writefln(s2); // "Hi"
    writefln(s1); // "Hillo"
}

s1 is the array owner and s2 is a slice (even though it really points to the
entire array), i.e. it should reallocate and take the ownership of the
reallocated array on append, but it doesn't happen.

Currently an 'owner' is anyone who has a pointer to array's beginning:

char[] s = "hello".dup;
char[] s1 = s[0..4];
s1 ~= "!";
assert(s != s1); // fails, both are "hell!", s is overwritten

s = "_hello".dup;
char[] s2 = s[1..5];
s2 ~= "!";
assert(s != s1); // succeeds, s1 is not changed


-- 
Nov 21 2008
prev sibling next sibling parent reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093







I thought 'string' types were immutable and thus ...

  s1.length = 0;

should fail as it updates the string (trucates it to zero characters).


-- 
Nov 22 2008
parent "Denis Koroskin" <2korden gmail.com> writes:
22.11.08 в 12:58  в своём письме писал(а):

 http://d.puremagic.com/issues/show_bug.cgi?id=2093







 I thought 'string' types were immutable and thus ...

   s1.length = 0;

 should fail as it updates the string (trucates it to zero characters).
No, string is a mutable array of immutable chars: string == const(char)[]
Nov 22 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093






No, string is aliased to invariant(char)[], i.e. an array of invariant
characters. You can change its length (usually, decreasing) but not contents.


-- 
Nov 22 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093







 Currently an 'owner' is anyone who has a pointer to array's beginning:
 
 char[] s = "hello".dup;
 char[] s1 = s[0..4];
 s1 ~= "!";
 assert(s != s1); // fails, both are "hell!", s is overwritten
A simple char[] is fully mutable, so that doesn't violate any established rule, but whether it's desirable is another matter. With const(char)[] or invariant(char)[], obviously this isn't going to work, so ~= should always reallocate (unless the optimiser can be sure that no other reference to the data can possibly exist). Alternatively, the GC could maintain a note of the actual length of every heap-allocated array. Ownership would be determined by matching in both start pointer and length. When the length is increased, whether by .length or ~=, either update this actual length (if it's the owner that we're extending, IWC all other references to the same data lose ownership) or reallocate the array. --
Nov 22 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093


schveiguy yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX





Note that this behavior is defined in the spec.  See
http://www.digitalmars.com/d/2.0/arrays.html#resize

"To maximize efficiency, the runtime always tries to resize the array in place
to avoid extra copying. It will always do a copy if the new size is larger and
the array was not allocated via the new operator or a previous resize
operation.

This means that if there is an array slice immediately following the array
being resized, the resized array could overlap the slice"

The fact that it violates invariantness is a side effect that Walter has not
yet dealt with.  There have been proposals to fix this, two of which I have
proposed:

1. As you said, store the requested length along with the block length in the
GC.  Only appending to an array that ends at the end of the allocated memory
will realloc in place.  Original proposal:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=63146

Note nobody responded to this one

2. Store the length of the allocated array in the first element of the array. 
Then modify the meaning of the length member of the array struct to flag
whether it is pointing to the beginning of the array or not.  Original
proposal:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=77437

Some people had questions, but nobody proved the proposal wouldn't work.

I don't think Walter is interested in fixing this issue, as it has been a
'feature' for a while, and he never has responded positively to any decent
proposals to fix this.


-- 
Nov 22 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093


smjg iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WONTFIX                     |





Steven, you have no authority to mark this WONTFIX.


-- 
Nov 22 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093


schveiguy yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy yahoo.com
           Severity|normal                      |enhancement
           Keywords|wrong-code                  |





Sorry, I was thinking wontfix because the compiler functions as designed.  I
marked it as an enhancment instead.  Removing wrong-code keyword also, as this
is intended behavior.


-- 
Nov 22 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093


smjg iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|enhancement                 |normal
           Keywords|                            |accepts-invalid, spec





So the problem is that it _always_ leaves the decision to resize in place or
reallocate to the runtime.  The only way in which this can coexist with the
principle of invariant is that it is illegal to increase the length of a
const/invariant array.  Therefore, going by the current spec, the bug is that
DMD accepts the code.


-- 
Nov 22 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093






It seems to me then that this is a design choice  - does the string length
belong to the string or to the reference? For slices it must be the reference
but for arrays? hmmm... Curently in D, a dynamic array and a slice are
indistinguishable and I'm not so sure that should be the case. There are good
arguments for the current design and also for the separation of slices and
dynamic arrays.

Common sense seems to say that if I change the length of a string that
therefore every other reference to the same string should also honour the new
length, and that this should also have no effect on previously captured slices
of the string. 


-- 
Nov 22 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093







 It seems to me then that this is a design choice  - does the string length
 belong to the string or to the reference? For slices it must be the reference
 but for arrays? hmmm... Curently in D, a dynamic array and a slice are
 indistinguishable and I'm not so sure that should be the case. There are good
 arguments for the current design and also for the separation of slices and
 dynamic arrays.

 Common sense seems to say that if I change the length of a string that
 therefore every other reference to the same string should also honour the new
 length, and that this should also have no effect on previously captured slices
 of the string. 
Arrays should not be typed differently than slices IMO, they should be able to be passed to the same functions. I think one of the two solutions I proposed would place the 'allocated length' of an array on the heap with the array data, thereby having the length stored in a shared location. Slices should respect this length, and if they cannot see the length, they should be reallocated as a full-blown array. --
Nov 22 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093






see also bug 2095 comment 6


-- 
Feb 19 2009
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2093


Steven Schveighoffer <schveiguy yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED



19:55:38 PDT ---
This is fixed by the patch in bug 3637.  It is in dmd 2.041.

Compiling the attached file results in the desired output.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 16 2010