www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Foreach and string to string assignment (maybe a bug..)

reply Andrej Mitrovic <none none.com> writes:
I'm not sure if this is a legitimate bug or one of my brainfarts.

I know that if I'm using a foreach loop with a char array as a reusable
datatype I definitely have to create a copy if I want to store it to a string.

But this code is a little more subtle, check it out (This is Windows only
because it uses the registry, sorry):

module mymodule;

import std.stdio : writeln, write;
import std.windows.registry;

void main()
{
    Key HKLM = Registry.localMachine;
    Key SFW = HKLM.getKey("software");

    string[] names;

    foreach (Key key; SFW.keys())
    {
        string name = key.name();
        // string name = key.name().idup; // workaround for the issue
        names ~= name;
    }

    writeln("results:");
    foreach (name; names)
    {
        write(name, ", ");
    }
}

The results are quite unexpected. The strings get overwritten with each other,
and in my case the results are similar to this:

Sun Microsystems, Sun Micros, Sun , Sun Micr, Sun, Sun Mic,...

And it goes like that for a hundred or so values, then switches to the next
name and writes more garbage like that.

If I use .idup, the problem goes away. What I don't understand is why assigning
a string to a string isn't safe in this case? They're both immutable, so I was
expecting the contents of the strings never to change. 

If it's not a bug, it certainly is a subtle issue. The original foreach loop
was quite big, and it took some time to figure out the problem. Are we *always*
supossed to be using .idup in a foreach loop? Of course, the Key key variable
is reused in the foreach loop, so I guess this has something to do with the
results.
Oct 14 2010
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 Oct 2010 13:01:36 -0400, Andrej Mitrovic <none none.com> wrote:

 I'm not sure if this is a legitimate bug or one of my brainfarts.

 I know that if I'm using a foreach loop with a char array as a reusable  
 datatype I definitely have to create a copy if I want to store it to a  
 string.

 But this code is a little more subtle, check it out (This is Windows  
 only because it uses the registry, sorry):

 module mymodule;

 import std.stdio : writeln, write;
 import std.windows.registry;

 void main()
 {
     Key HKLM = Registry.localMachine;
     Key SFW = HKLM.getKey("software");

     string[] names;

     foreach (Key key; SFW.keys())
     {
         string name = key.name();
         // string name = key.name().idup; // workaround for the issue
         names ~= name;
     }

     writeln("results:");
     foreach (name; names)
     {
         write(name, ", ");
     }
 }

 The results are quite unexpected. The strings get overwritten with each  
 other, and in my case the results are similar to this:

 Sun Microsystems, Sun Micros, Sun , Sun Micr, Sun, Sun Mic,...

 And it goes like that for a hundred or so values, then switches to the  
 next name and writes more garbage like that.

 If I use .idup, the problem goes away. What I don't understand is why  
 assigning a string to a string isn't safe in this case? They're both  
 immutable, so I was expecting the contents of the strings never to  
 change.

 If it's not a bug, it certainly is a subtle issue. The original foreach  
 loop was quite big, and it took some time to figure out the problem. Are  
 we *always* supossed to be using .idup in a foreach loop? Of course, the  
 Key key variable is reused in the foreach loop, so I guess this has  
 something to do with the results.

I'm not familiar with std.windows.registry, but there are two things here. First, unequivocally, this is a bug. If key.name is returning an immutable(char)[], and later on that value is being overwritten, this is a violation of the type system (immutable data must never change again). Second, I think the bug may not be that it's violating immutability, but rather that it's typed key.name as string. It could probably be const(char)[]. Essentially, I think from the behavior described that the foreach loop is reusing the name buffer for each iteration of the loop. This is probably to save extra heap allocations in case you don't use them after the foreach loop is over. I think your workaround is the correct way to deal with it. This is a common problem in defining an opApply loop which streams data -- do you make things safe or efficient? The only reasonable solution IMO is to make them efficient, because safety can be had by duping the data. -Steve
Oct 14 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

 This is a common problem in defining an opApply loop which streams data --  
 do you make things safe or efficient?  The only reasonable solution IMO is  
 to make them efficient, because safety can be had by duping the data.

The only reasonable solution is to make it safe, and then allow the unsafe and more efficient behaviour only on explicit request. This follows D philosophy and in practice it's the only design that allows you to write programs that actually work, instead of being just a nest for bugs. Bye, bearophile
Oct 14 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

 How do you request in a foreach loop that it doesn't dup?

There are several ways to do this. You may add a boolean value in the struct/class template, that switches the safe/unsafe behaviour (defaults to safe), or you may let the basic opApply to dup, plus you add another method with a name that warns against its non-copying behaviour that doesn't dup. In my dlibs1 I have used the first solution for the lazy xpermutations/xcombinations.
 Whether the foreach loop should auto-dup for you is another question, I  
 think it's fine to require you to dup each key's name if you plan on using  
 it later, but this should be in the documentation.

D Zen follows another design philosophy, documentation is not enough, people forget things.
 Note also, by "unsafe" we just mean surprising :)  It's still safe in  
 terms of memory safety.

"Safe" has many different meanings. Something that on default acts in a bug-prone way is not "safe". Bye, bearophile
Oct 14 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 The "safe" that Walter and Andrei are almost always concerned with is memory 
 safety.

Please help me help Walter understand that for a modern system language memory safety isn't the only safety worth having :-) Bye, bearophile
Oct 14 2010
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
So do I file this as a bug? The current behavior is way too subtle
imo, at least the compiler could issue a warning if this behavior is
intended.

On 10/14/10, bearophile <bearophileHUGS lycos.com> wrote:
 Steven Schveighoffer:

 This is a common problem in defining an opApply loop which streams data --

 do you make things safe or efficient?  The only reasonable solution IMO is

 to make them efficient, because safety can be had by duping the data.

The only reasonable solution is to make it safe, and then allow the unsafe and more efficient behaviour only on explicit request. This follows D philosophy and in practice it's the only design that allows you to write programs that actually work, instead of being just a nest for bugs. Bye, bearophile

Oct 14 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 Oct 2010 13:49:13 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Steven Schveighoffer:

 This is a common problem in defining an opApply loop which streams data  
 --
 do you make things safe or efficient?  The only reasonable solution IMO  
 is
 to make them efficient, because safety can be had by duping the data.

The only reasonable solution is to make it safe, and then allow the unsafe and more efficient behaviour only on explicit request. This follows D philosophy and in practice it's the only design that allows you to write programs that actually work, instead of being just a nest for bugs.

How do you request in a foreach loop that it doesn't dup? That's why I think the only one which allows both is the by default unsafe version. Note also, by "unsafe" we just mean surprising :) It's still safe in terms of memory safety. On Thu, 14 Oct 2010 13:55:38 -0400, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 So do I file this as a bug? The current behavior is way too subtle
 imo, at least the compiler could issue a warning if this behavior is
 intended.

Well, at the very least, this violates immutability. You can file a bug based on those grounds alone. Whether the foreach loop should auto-dup for you is another question, I think it's fine to require you to dup each key's name if you plan on using it later, but this should be in the documentation. -Steve
Oct 14 2010
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday 14 October 2010 12:20:06 bearophile wrote:
 Steven Schveighoffer:
 How do you request in a foreach loop that it doesn't dup?

There are several ways to do this. You may add a boolean value in the struct/class template, that switches the safe/unsafe behaviour (defaults to safe), or you may let the basic opApply to dup, plus you add another method with a name that warns against its non-copying behaviour that doesn't dup. In my dlibs1 I have used the first solution for the lazy xpermutations/xcombinations.
 Whether the foreach loop should auto-dup for you is another question, I
 think it's fine to require you to dup each key's name if you plan on
 using it later, but this should be in the documentation.

D Zen follows another design philosophy, documentation is not enough, people forget things.
 Note also, by "unsafe" we just mean surprising :)  It's still safe in
 terms of memory safety.

"Safe" has many different meanings. Something that on default acts in a bug-prone way is not "safe". Bye, bearophile

The "safe" that Walter and Andrei are almost always concerned with is memory safety. That doesn't mean that other types of safety aren't important, but it does mean that that's the sort of safety that dmd and Phobos is generally concerned about. - Jonathan M Davis
Oct 14 2010