www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 2964] New: Reading string into associative array key garbles string

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2964

           Summary: Reading string into associative array key garbles
                    string
           Product: D
           Version: 1.043
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: djd mailinator.com


Created an attachment (id=363)
 --> (http://d.puremagic.com/issues/attachment.cgi?id=363)
.tar.gz file with D1 code illustrating bug and one-line sample input text file

Either I'm doing something dumb, or I've found a bug where a string gets
trashed between storing it as key in an associative array and then getting it
back out.

The weird thing is it only happens when the string is read in from a file. 
Adding the same string as a literal doesn't trigger it.  

The attached D1 code simply reads in each line from a BufferedFile, storing it
as key in an uint[string] AA that counts how many times each line occurred.  It
verifies the the line is valid UTF-8 going in.  It then loops over the keys in
the AA, verifying that they're valid UTF-8 and printing them out.  Only the
string fails validation and gives an error if you try to print it out.  I don't
think there's anything special about the particular string that I'm using.

I verified this with three compilers on two operating systems:
DMD 1.043 on Ubuntu 8.10 x86_64
gcc version 4.1.3 20070831 (prerelease gdc 0.25, using dmd 1.021) (Ubuntu
0.25-4.1.2-16ubuntu1)
gdcmac trunk r229 (based on gcc 4.0.1) on Mac OS X 10.5.5 x86_64 

Here is some sample output:

Reading data...
Matched bad input.
Read 1 lines, 1 unique (0 non-UTF).
Checking...
2nd validate: string
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\80\245\34\158\255\127\0\0\144\180\123\1\0\0\0\0\112\243\34\158\255\127
didn't validate as UTF
Error: 4invalid UTF-8 sequence

The Unicode string printed out (as decimal chars) varies each time under Linux,
perhaps suggesting its reading some memory it oughtn't?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 11 2009
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2964


Frits van Bommel <fvbommel wxs.nl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID





---

 Either I'm doing something dumb, or I've found a bug where a string gets
 trashed between storing it as key in an associative array and then getting it
 back out.
I'm afraid it's the former. From the InputStream.opApply() documentation at <http://www.digitalmars.com/d/1.0/phobos/std_stream.html>: "The string passed in line may be reused between calls to the delegate." This means you can't keep a copy of a line around after the current iteration without duplicating it, because it'll get overwritten. Changing the last line of your file-reading loop to "data[line.dup]++;" fixes the problem you're seeing. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
May 11 2009
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2964






Sorry; thanks.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 13 2009