www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Associative Array - where's my error?

reply Adrian Mercieca <amercieca gmail.com> writes:
Hi,

I am currently learning D, but have ran into a problem.

I wrote this simple program:

import std.stdio;
import std.string;
import std.conv;

void main()
{
	int[string] dataMap;
	
	foreach(line; stdin.byLine() ) {
		auto s = line.split("\t");
		auto anum = to!int(s[0]);
		auto aname = cast(string)s[1];
		dataMap[aname] = anum;
	}
	
	foreach(s; dataMap.keys.sort) {
		writeln(s);
	}
}

The code is in a file named 'rl.d'.

I am running this on Linux (Linux 2.6.32-31-generic-pae #61-Ubuntu SMP i686 
GNU/Linux)

The compiler flags used are: dmd -w -wi -O -inline -noboundscheck -release 
rl.d.

I invoke the program from the command line with this command:

cat data.txt | ./rl

Basically, this is piping the contents of the data.txt file (below) into 
the program.

data.txt contents:

1	ABC
2	DEF
3	GHI
4	JKL
5	MNO
6	PQR
7	STU
8	VWX
9	YZ

There are two 'fields' per line, a number and a string separated by a tab.
Now, when I run the program, I get this output:


YZ
YZ

YZ

YZ

YZ

YZ

YZ

YZ

YZ


Which is nowhere close to what I was expecting.

The program is supposed to read the input line by line, split each line on 
tab character and populate an associative array with the key being the 2nd 
string value and set the corresponding value to the value of the 1st 
field. Then it goes into a loop printing out the sorted keys of the 
associative array.

I was expecting output like:

ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ

but, as shown, I get something completely different and unexpected.

Can anyone shed any light on this for me please? What am I doing wrong?

Thanks.
- Adrian.
Apr 27 2011
next sibling parent reply Jesse Phillips <jessekphillips+D gmail.com> writes:
Adrian Mercieca Wrote:

 Hi,
 
 I am currently learning D, but have ran into a problem.
 
 I wrote this simple program:
 
 import std.stdio;
 import std.string;
 import std.conv;
 
 void main()
 {
 	int[string] dataMap;
 	
 	foreach(line; stdin.byLine() ) {
 		auto s = line.split("\t");
 		auto anum = to!int(s[0]);
 		auto aname = cast(string)s[1];
 		dataMap[aname] = anum;
 	}
 	
 	foreach(s; dataMap.keys.sort) {
 		writeln(s);
 	}
 }
 

Quite simple your error is in using cast(string). Solution use to!string(s[1]) The explanation: byLine reuses the buffer as it iterates, this is why you see YZ, though I would expect remains of some other words in there. Associative Array keys must be immutable (so that their value is not changed by another reference). By casting to string you have told the type system "This value cannot be changed by me or any other reference to this data." But in reality you do continue to modify it because byLine still has the reference. Casting types is making statements about the type, it is not converting the type (in general). You must have inserted the cast because the compiler said s[1] wasn't immutable and figured you could "force it" into being so. In such cases the compiler is telling something is wrong and you are saying "I don't care. I know I'm right." Don't use cast, to!() is much safer. Also you could use s[1].idup instead of cast or to!().
Apr 27 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jesse Phillips:

 Casting types is making statements about the type, it is not converting the
type (in general). You must have inserted the cast because the compiler said
s[1] wasn't immutable and figured you could "force it" into being so. In such
cases the compiler is telling something is wrong and you are saying "I don't
care. I know I'm right." Don't use cast, to!() is much safer.

You have written a nice explanation of the situation, and indeed if you take care of using casts correctly, this situation is not bad. But in practice I think people will not use casts so carefully. This is why I have suggested a different design, essentially: the by line dups on default and doesn't do it on request (this difference is expressed with two different function names or a template argument boolean, etc). So when you don't know what you are doing the code works correctly. Bye, bearophile
Apr 28 2011
parent reply Jesse Phillips <jessekphillips+D gmail.com> writes:
bearophile Wrote:

 You have written a nice explanation of the situation, and indeed if you take
care of using casts correctly, this situation is not bad. But in practice I
think people will not use casts so carefully. This is why I have suggested a
different design, essentially: the by line dups on default and doesn't do it on
request (this difference is expressed with two different function names or a
template argument boolean, etc). So when you don't know what you are doing the
code works correctly.
 
 Bye,
 bearophile

Thank you. I think this just hides the issue. To fix the issue you would need byLine to return a string, since user code my still save and modify the returned value and store it in an AA. Even then this is "fixing" one small area in Phobos. To really make this better you would need to change the semantics of cast to be what new users would "expect." However cast is supposed to be a very blunt tool, Java and C# have gone with changing the semantics and adding new features to provide some of the behavior. My friend recently discovered the 'as' keyword in C#. It was great because he could check for null instead of catching cast exceptions. D on the other hand left cast as a "dangerous" language feature and provided a "safe" library function. So yes new users will get it wrong. I believe the correct solution is to educate at emphasize to!() over cast in examples.
Apr 28 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Jesse Phillips:

 I think this just hides the issue. To fix the issue you would need byLine to
return a string, since user code my still save and modify the returned value
and store it in an AA.

I don't agree. If you perform a cast of the dup-ed char array to a string you will not have problems. The main problem is the missed duplication (and the bad usage of casts in general, for people coming from C/C++). Bye, bearophile
Apr 28 2011
parent reply Jesse Phillips <jessekphillips+D gmail.com> writes:
bearophile Wrote:

 Jesse Phillips:
 
 I think this just hides the issue. To fix the issue you would need byLine to
return a string, since user code my still save and modify the returned value
and store it in an AA.

I don't agree. If you perform a cast of the dup-ed char array to a string you will not have problems. The main problem is the missed duplication (and the bad usage of casts in general, for people coming from C/C++). Bye, bearophile

_This_ code wouldn't not have problems, that doesn't mean code won't be written that does have problems.
Apr 28 2011
parent Jesse Phillips <jessekphillips+D gmail.com> writes:
Andrej Mitrovic Wrote:

 There's more issues with string assignments. For example see this:
 http://d.puremagic.com/issues/show_bug.cgi?id=5059
 
 In that example, key.name() returns a string. But "key" itself will be
 reused on every loop, and the string that key holds will be
 overwritten on each pass.

Commented on your bug. It is because of a cast(string) inside the KeySequence's opApply. My point still stands, to!() is safer.
 I haven't gotten any comments on that bug report yet. I'm not sure if
 it really is a bug, but it certainly is unexpected that an immutable
 string gets overwritten.

Apr 28 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
There's more issues with string assignments. For example see this:
http://d.puremagic.com/issues/show_bug.cgi?id=5059

In that example, key.name() returns a string. But "key" itself will be
reused on every loop, and the string that key holds will be
overwritten on each pass.

I haven't gotten any comments on that bug report yet. I'm not sure if
it really is a bug, but it certainly is unexpected that an immutable
string gets overwritten.
Apr 28 2011
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Thanks. I'm glad it wasn't a language implementation issue.
Apr 28 2011
prev sibling parent Adrian Mercieca <amercieca gmail.com> writes:
Hi everyone,

Ok - I think I got the gist of what you guys are saying.

Thanks for the help.
Coming from C/C++ and Java, there's quite a bit for me to learn.
But I'll get there; so far, I am very impressed with D.

Regards.
- Adrian.
Apr 28 2011