www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Principled method of lookup-or-insert in associative arrays?

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
TDPL has an example that can be reduced as follows:

void main() {
   uint[string] map;
   foreach (line; stdin.byLine()) {
     ++map[line];
   }
}

byLine reuses its buffer so it exposes it as char[]. Therefore, 
attempting to use map[line] will fail. The program compiled and did the 
wrong thing because of a bug.

The challenge is devising a means to make the program compile and work 
as expected. Looking up an uint[string] with a char[] should work, and 
if the char[] is to be inserted, a copy should be made (via .idup or 
to!string). The rule needs to be well defined and reasonably general.

The effect is something like this:

void main() {
   uint[string] map;
   foreach (line; stdin.byLine()) {
     auto p = line in map;
     if (p) ++*p;
     else map[line.idup] = 1;
   }
}

Ideally the programmer would write the simple code (first snippet) and 
achieve the semantics of the more complex code (second snippet). Any ideas?

Andrei
Nov 20 2010
next sibling parent Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:
Andrei Alexandrescu wrote:

 TDPL has an example that can be reduced as follows:
 
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      ++map[line];
    }
 }
 
 byLine reuses its buffer so it exposes it as char[]. Therefore,
 attempting to use map[line] will fail. The program compiled and did the
 wrong thing because of a bug.
 
 The challenge is devising a means to make the program compile and work
 as expected. Looking up an uint[string] with a char[] should work, and
 if the char[] is to be inserted, a copy should be made (via .idup or
 to!string). The rule needs to be well defined and reasonably general.
 
 The effect is something like this:
 
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      auto p = line in map;
      if (p) ++*p;
      else map[line.idup] = 1;
    }
 }
 
 Ideally the programmer would write the simple code (first snippet) and
 achieve the semantics of the more complex code (second snippet). Any
 ideas?
 
 Andrei
What I don' t understand is why it is allowed to use mutable types where the key type is immutable. This way you can get immutable references to mutable data back without explicit casting (via the .keys property for example). The example would not compile if this wasn't allowed.
Nov 20 2010
prev sibling next sibling parent Fawzi Mohamed <fawzi gmx.ch> writes:
On 20-nov-10, at 09:07, Andrei Alexandrescu wrote:

 TDPL has an example that can be reduced as follows:

 void main() {
  uint[string] map;
  foreach (line; stdin.byLine()) {
    ++map[line];
  }
 }
 byLine reuses its buffer so it exposes it as char[]. Therefore,  
 attempting to use map[line] will fail. The program compiled and did  
 the wrong thing because of a bug.

 The challenge is devising a means to make the program compile and  
 work as expected. Looking up an uint[string] with a char[] should  
 work, and if the char[] is to be inserted, a copy should be made  
 (via .idup or to!string). The rule needs to be well defined and  
 reasonably general.
I think that you basically stated the important rules: lookup, update should use const(char[]), set (i.e. potentially add a new key) immutable(char[]).
 The effect is something like this:

 void main() {
  uint[string] map;
  foreach (line; stdin.byLine()) {
    auto p = line in map;
    if (p) ++*p;
    else map[line.idup] = 1;
  }
 }

 Ideally the programmer would write the simple code (first snippet)  
 and achieve the semantics of the more complex code (second snippet).  
 Any ideas?
I would consider the first program as written invalid (at runtime) because the initial value is not set, so you cannot update it with ++. Also potential hidden idup should not be added IMHO. I don't find the second program so bad... Fawzi
Nov 20 2010
prev sibling next sibling parent reply spir <denis.spir gmail.com> writes:
On Sat, 20 Nov 2010 00:07:57 -0800
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 TDPL has an example that can be reduced as follows:
=20
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      ++map[line];
    }
 }
=20
 byLine reuses its buffer so it exposes it as char[]. Therefore,=20
 attempting to use map[line] will fail. The program compiled and did the=20
 wrong thing because of a bug.
=20
 The challenge is devising a means to make the program compile and work=20
 as expected. Looking up an uint[string] with a char[] should work, and=20
 if the char[] is to be inserted, a copy should be made (via .idup or=20
 to!string). The rule needs to be well defined and reasonably general.
=20
 The effect is something like this:
=20
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      auto p =3D line in map;
      if (p) ++*p;
      else map[line.idup] =3D 1;
    }
 }
=20
 Ideally the programmer would write the simple code (first snippet) and=20
 achieve the semantics of the more complex code (second snippet). Any idea=
s? I find this proposal really necessary. But aren't there two issues here? * Comparison (for lookup) by value equality should not care about qualifier= s (ie compare raw content, here plain array memory areas). * Assignment should perform "qualification conversion" automatically, eg char[] chars =3D "abc"; string s =3D chars; This involves no implicit magic here, as target qualification is explicit. = So, why not? There is a repetitive programming pattern in D: * play around with *string's in general * as soon as text processing is needed, convert to *char[] * when finished, convert back to *string This should not be implicit, but automatic. Is there anything risky here? Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 20 2010
parent reply =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:
Dnia 20-11-2010 o 13:33:29 spir <denis.spir gmail.com> napisa=B3(a):

 I find this proposal really necessary. But aren't there two issues her=
e?
 * Comparison (for lookup) by value equality should not care about  =
 qualifiers (ie compare raw content, here plain array memory areas).
 * Assignment should perform "qualification conversion" automatically, =
eg
 	char[] chars =3D "abc";
 	string s =3D chars;
 This involves no implicit magic here, as target qualification is  =
 explicit. So, why not?
It's busting the whole const system to smithereens. char[] chars =3D "abc"; char[] backdoor =3D chars; string s =3D chars; assert (s =3D=3D "abc"); backdoor.front =3D 'k'; assert (s =3D=3D "abc"); // fails. not so immutable, huh?
 There is a repetitive programming pattern in D:
 * play around with *string's in general
 * as soon as text processing is needed, convert to *char[]
 * when finished, convert back to *string
Meh, immutable strings make life easier. -- = Tomek
Nov 20 2010
next sibling parent reply spir <denis.spir gmail.com> writes:
On Sat, 20 Nov 2010 15:22:37 +0100
Tomek Sowi=C5=84ski <just ask.me> wrote:

 I find this proposal really necessary. But aren't there two issues here?
 * Comparison (for lookup) by value equality should not care about =20
 qualifiers (ie compare raw content, here plain array memory areas).
 * Assignment should perform "qualification conversion" automatically, eg
 	char[] chars =3D "abc";
 	string s =3D chars;
 This involves no implicit magic here, as target qualification is =20
 explicit. So, why not? =20
=20 It's busting the whole const system to smithereens. =20 char[] chars =3D "abc"; char[] backdoor =3D chars; string s =3D chars; assert (s =3D=3D "abc"); backdoor.front =3D 'k'; assert (s =3D=3D "abc"); // fails. not so immutable, huh?
??? backdoor and s do not denote the same element. One is a mutable array, the = other is immutable. Why should changing backdoor affect s? Whether backdoor= and chars denote the same array depends on whether "=3D" copies or not dyn= arrays. But from immutable string to mutable array, there must be a copy (= read: dup). Anyway, the most annoying issue is not about assignments inside a given sco= pe, but parameter passing (including implicit ones like in Andrei's example= of foreach). void f (char[] chars) {} void g (string str) {} ... string str =3D "abc"; char[] chars =3D "abc".dup; f(str); g(chars); __trials__.d(30): Error: function __trials__.f (char[] chars) is not callab= le using argument types (string) __trials__.d(31): Error: function __trials__.g (string str) is not callable= using argument types (char[]) ... f(str.dup); // ok g(chars.idup); // ditto
 There is a repetitive programming pattern in D:
 * play around with *string's in general
 * as soon as text processing is needed, convert to *char[]
 * when finished, convert back to *string =20
=20 Meh, immutable strings make life easier.
For sure, I'm 100% for immutable strings by default, esp that literals prod= uce immutable elements. But one cannot perform text processing on immutable= thingies: s[3] =3D 'x'; s.replace("abc","ABC"); So that we constantly have to juggle between mutable and immutable versions= of, conceptually, the _same value_. By the way, why isn't the definition of string immutable(char[]), instead o= f immutable(char)[]? Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 20 2010
next sibling parent Adam Burton <adz21c gmail.com> writes:
spir wrote:

 On Sat, 20 Nov 2010 15:22:37 +0100
 Tomek SowiƄski <just ask.me> wrote:
 
 I find this proposal really necessary. But aren't there two issues
 here? * Comparison (for lookup) by value equality should not care about
 qualifiers (ie compare raw content, here plain array memory areas).
 * Assignment should perform "qualification conversion" automatically,
 eg char[] chars = "abc";
 string s = chars;
 This involves no implicit magic here, as target qualification is
 explicit. So, why not?
It's busting the whole const system to smithereens. char[] chars = "abc"; char[] backdoor = chars; string s = chars; assert (s == "abc"); backdoor.front = 'k'; assert (s == "abc"); // fails. not so immutable, huh?
??? backdoor and s do not denote the same element. One is a mutable array, the other is immutable. Why should changing backdoor affect s? Whether backdoor and chars denote the same array depends on whether "=" copies or not dyn arrays. But from immutable string to mutable array, there must be a copy (read: dup). Anyway, the most annoying issue is not about assignments inside a given scope, but parameter passing (including implicit ones like in Andrei's example of foreach). void f (char[] chars) {} void g (string str) {} ... string str = "abc"; char[] chars = "abc".dup; f(str); g(chars); __trials__.d(30): Error: function __trials__.f (char[] chars) is not callable using argument types (string) __trials__.d(31): Error: function __trials__.g (string str) is not callable using argument types (char[]) ... f(str.dup); // ok g(chars.idup); // ditto
 There is a repetitive programming pattern in D:
 * play around with *string's in general
 * as soon as text processing is needed, convert to *char[]
 * when finished, convert back to *string
Meh, immutable strings make life easier.
For sure, I'm 100% for immutable strings by default, esp that literals produce immutable elements. But one cannot perform text processing on immutable thingies: s[3] = 'x'; s.replace("abc","ABC"); So that we constantly have to juggle between mutable and immutable versions of, conceptually, the _same value_. By the way, why isn't the definition of string immutable(char[]), instead of immutable(char)[]?
I believe it is so you can reassign. Consider string s = "hello"; s = "good bye";
 
 
 Denis
 -- -- -- -- -- -- --
 vit esse estrany ☣
 
 spir.wikidot.com
Nov 20 2010
prev sibling next sibling parent =?UTF-8?B?UGVsbGUgTcOlbnNzb24=?= <pelle.mansson gmail.com> writes:
On 11/20/2010 05:09 PM, spir wrote:
 ???
 backdoor and s do not denote the same element. One is a mutable array, the
other is immutable. Why should changing backdoor affect s? Whether backdoor and
chars denote the same array depends on whether "=" copies or not dyn arrays.
But from immutable string to mutable array, there must be a copy (read: dup).
Must also be a copy the other way. Secret heap allocations are not fun.
 Anyway, the most annoying issue is not about assignments inside a given scope,
but parameter passing (including implicit ones like in Andrei's example of
foreach).
 	void f (char[] chars) {}
 	void g (string str) {}
 	...
 	string str = "abc";
 	char[] chars = "abc".dup;
 	f(str);
 	g(chars);
 __trials__.d(30): Error: function __trials__.f (char[] chars) is not callable
using argument types (string)
 __trials__.d(31): Error: function __trials__.g (string str) is not callable
using argument types (char[])
 	...
 	f(str.dup);	// ok
 	g(chars.idup);	// ditto
I do not understand the alternative.
 By the way, why isn't the definition of string immutable(char[]), instead of
immutable(char)[]?
string s = "abc"; s = "bde"; // fails with immutable(char[]), rightly so.
Nov 20 2010
prev sibling parent reply =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:
Dnia 20-11-2010 o 17:09:00 spir <denis.spir gmail.com> napisa=B3(a):

 It's busting the whole const system to smithereens.

 char[] chars =3D "abc";
 char[] backdoor =3D chars;
 string s =3D chars;
 assert (s =3D=3D "abc");
 backdoor.front =3D 'k';
 assert (s =3D=3D "abc"); // fails. not so immutable, huh?
??? backdoor and s do not denote the same element. One is a mutable array,=
=
 the other is immutable. Why should changing backdoor affect s? Whether=
=
 backdoor and chars denote the same array depends on whether "=3D" copi=
es =
 or not dyn arrays. But from immutable string to mutable array, there  =
 must be a copy (read: dup).
OK, I get it. I still like it the way it is. Having worked with C++ code= = with overloaded assignment ops and implicit conversions, I see =3D that = = sometimes dups and sometimes aliases as a source of confusion and = performance bugs. All this just to remove .idup (it's 5 chars!). -- = Tomek
Nov 20 2010
parent spir <denis.spir gmail.com> writes:
On Sat, 20 Nov 2010 21:21:58 +0100
Tomek Sowi=C5=84ski <just ask.me> wrote:

 Dnia 20-11-2010 o 17:09:00 spir <denis.spir gmail.com> napisa=C5=82(a):
=20
 It's busting the whole const system to smithereens.

 char[] chars =3D "abc";
 char[] backdoor =3D chars;
 string s =3D chars;
 assert (s =3D=3D "abc");
 backdoor.front =3D 'k';
 assert (s =3D=3D "abc"); // fails. not so immutable, huh?
??? backdoor and s do not denote the same element. One is a mutable array, =
=20
 the other is immutable. Why should changing backdoor affect s? Whether =
=20
 backdoor and chars denote the same array depends on whether "=3D" copie=
s =20
 or not dyn arrays. But from immutable string to mutable array, there =20
 must be a copy (read: dup).
=20 OK, I get it. I still like it the way it is. Having worked with C++ code =
=20
 with overloaded assignment ops and implicit conversions, I see =3D that =
=20
 sometimes dups and sometimes aliases as a source of confusion and =20
 performance bugs. All this just to remove .idup (it's 5 chars!).
=20
Yes, you may be right on this ("explicit is better than implicit", as pytho= n folks say). Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 20 2010
prev sibling parent reply "Tyro[a.c.edwards]" <nospam home.com> writes:
On 11/20/2010 11:22 PM, Tomek Sowiński wrote:
 Dnia 20-11-2010 o 13:33:29 spir <denis.spir gmail.com> napisał(a):

 I find this proposal really necessary. But aren't there two issues here?
 * Comparison (for lookup) by value equality should not care about
 qualifiers (ie compare raw content, here plain array memory areas).
 * Assignment should perform "qualification conversion" automatically, eg
 char[] chars = "abc";
 string s = chars;
 This involves no implicit magic here, as target qualification is
 explicit. So, why not?
It's busting the whole const system to smithereens. char[] chars = "abc"; char[] backdoor = chars; string s = chars; assert (s == "abc"); backdoor.front = 'k'; assert (s == "abc"); // fails. not so immutable, huh?
What would be the harm if upon seeing your code the compiler did this: char[] chars = "abc".dup; char[] backdoor = chars; string s = chars.idup; assert (s == "abc"); backdoor.front = 'k'; // [1] assert (s == "abc"); Slightly magical but works according to expectation. I don't see the problem. [1] Assuming your are using std.array, this would fail because backdoor.front is not an lvalue.
 There is a repetitive programming pattern in D:
 * play around with *string's in general
 * as soon as text processing is needed, convert to *char[]
 * when finished, convert back to *string
Meh, immutable strings make life easier.
Nov 20 2010
parent reply =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:
Tyro[a.c.edwards] <nospam home.com> napisa=B3(a):

 What would be the harm if upon seeing your code the compiler did this:=
  char[] chars =3D "abc".dup;
 char[] backdoor =3D chars;
 string s =3D chars.idup;
 assert (s =3D=3D "abc");
 backdoor.front =3D 'k';	// [1]
 assert (s =3D=3D "abc");
  Slightly magical but works according to expectation. I don't see the =
=
 problem.
  [1] Assuming your are using std.array, this would fail because  =
 backdoor.front is not an lvalue.
The harm is confusion. Now =3D on arrays always means aliasing, but with= = your proposal, it may *sometimes* mean dupping. Imagine real-life code = with type aliasing and type inference in play, and trying to determine = whether some line makes a dupping string<->char[] conversion or not. What would be the harm if everyone just put .(i)dup where it belongs? -- = Tomek
Nov 20 2010
parent reply "Tyro[a.c.edwards]" <nospam home.com> writes:
On 11/21/2010 9:23 AM, Tomek Sowiński wrote:
 Tyro[a.c.edwards] <nospam home.com> napisał(a):

 What would be the harm if upon seeing your code the compiler did this:
 char[] chars = "abc".dup;
 char[] backdoor = chars;
 string s = chars.idup;
 assert (s == "abc");
 backdoor.front = 'k'; // [1]
 assert (s == "abc");
 Slightly magical but works according to expectation. I don't see the
 problem.
 [1] Assuming your are using std.array, this would fail because
 backdoor.front is not an lvalue.
The harm is confusion. Now = on arrays always means aliasing, but with your proposal, it may *sometimes* mean dupping. Imagine real-life code with type aliasing and type inference in play, and trying to determine whether some line makes a dupping string<->char[] conversion or not.
alias char[] firstClassString; alias firstClassString stringGold; alias stringGold lightAtEndOfTunnel; alias lightAtEndOfTunnel daBomb; alias daBomb iAmThe; iAmThe man; if I want to know the actual type of shiznit: writeln(typeid(man)); // outputs char[]
 What would be the harm if everyone just put .(i)dup where it belongs?
Actually, I don't see any harm in using .(i)dup. However, my limited experience does not allow me to think on the same level as you so I'm just trying to understand your point a little better. That being said, I still don't understand the gravity of the implied problem.
Nov 20 2010
parent reply =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:
Dnia 21-11-2010 o 02:02:35 Tyro[a.c.edwards] <nospam home.com> napisa=B3=
(a):

 The harm is confusion. Now =3D on arrays always means aliasing, but w=
ith
 your proposal, it may *sometimes* mean dupping. Imagine real-life cod=
e
 with type aliasing and type inference in play, and trying to determin=
e
 whether some line makes a dupping string<->char[] conversion or not.
alias char[] firstClassString; alias firstClassString stringGold; alias stringGold lightAtEndOfTunnel; alias lightAtEndOfTunnel daBomb; alias daBomb iAmThe; iAmThe man; if I want to know the actual type of shiznit: writeln(typeid(man)); // outputs char[]
  What would be the harm if everyone just put .(i)dup where it belongs=
?
   Actually, I don't see any harm in using .(i)dup. However, my limited=
=
 experience does not allow me to think on the same level as you so I'm =
=
 just trying to understand your point a little better. That being said,=
I =
 still don't understand the gravity of the implied problem.
The status quo is imposing a tiny upfront tax on the writer of typing = .(i)dup. The proposal is imposing a minor tax on the reader of knowing another ru= le = and, in case of convoluted code like above, minor tax of determining if = = this rule applies. Plus, some tax on the implementor to put this rule in= to = the compiler. The point is the aggregated tax imposed by the proposal is much greater = = because although minor it grows linearly with the number of readers (cod= e = is read much more often then written). And even with finite set of = readers, the tax grows till infinity because with time people forget wha= t = they read, therefore the tax is collected anew. And think of the gains -- dropping .(i)dup doesn't really buy you much = anyway. -- = Tomek$
Nov 20 2010
parent "Tyro[a.c.edwards]" <nospam home.com> writes:
On 11/21/2010 10:39 AM, Tomek Sowiński wrote:
 Dnia 21-11-2010 o 02:02:35 Tyro[a.c.edwards] <nospam home.com> napisał(a):

 The harm is confusion. Now = on arrays always means aliasing, but with
 your proposal, it may *sometimes* mean dupping. Imagine real-life code
 with type aliasing and type inference in play, and trying to determine
 whether some line makes a dupping string<->char[] conversion or not.
alias char[] firstClassString; alias firstClassString stringGold; alias stringGold lightAtEndOfTunnel; alias lightAtEndOfTunnel daBomb; alias daBomb iAmThe; iAmThe man; if I want to know the actual type of shiznit: writeln(typeid(man)); // outputs char[]
 What would be the harm if everyone just put .(i)dup where it belongs?
Actually, I don't see any harm in using .(i)dup. However, my limited experience does not allow me to think on the same level as you so I'm just trying to understand your point a little better. That being said, I still don't understand the gravity of the implied problem.
The status quo is imposing a tiny upfront tax on the writer of typing .(i)dup. The proposal is imposing a minor tax on the reader of knowing another rule and, in case of convoluted code like above, minor tax of determining if this rule applies. Plus, some tax on the implementor to put this rule into the compiler.
OK! Makes sense.
 The point is the aggregated tax imposed by the proposal is much greater
 because although minor it grows linearly with the number of readers
 (code is read much more often then written). And even with finite set of
 readers, the tax grows till infinity because with time people forget
 what they read, therefore the tax is collected anew.

 And think of the gains -- dropping .(i)dup doesn't really buy you much
 anyway.
Fair enough. Like I said, I'm not proposing that this happens. Just trying to understand why it shouldn't. Thanks for entertaining my questions.
Nov 20 2010
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-11-20 03:07:57 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 TDPL has an example that can be reduced as follows:
 
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      ++map[line];
    }
 }
 
 byLine reuses its buffer so it exposes it as char[]. Therefore, 
 attempting to use map[line] will fail. The program compiled and did the 
 wrong thing because of a bug.
 
 The challenge is devising a means to make the program compile and work 
 as expected. Looking up an uint[string] with a char[] should work, and 
 if the char[] is to be inserted, a copy should be made (via .idup or 
 to!string). The rule needs to be well defined and reasonably general.
 
 The effect is something like this:
 
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      auto p = line in map;
      if (p) ++*p;
      else map[line.idup] = 1;
    }
 }
 
 Ideally the programmer would write the simple code (first snippet) and 
 achieve the semantics of the more complex code (second snippet). Any 
 ideas?
But didn't you agree with me yesterday in another thread that it's best to make the caller responsible for the idup in cases where you need a string to be immutable? Now you want the reverse for AAs? Consider this example: uint[string][100] maps; foreach (line; stdin.byLine()) { foreach (map; maps) { ++map[line]; } } With your proposal, you'd be silently iduping 100 times each line to produce 100 times the same immutable string. It's true that it's difficult to make the caller responsible for iduping the string since whether you need a true immutable string or not depends on a test, but I'm still wary about functions that idup things on your behalf. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 20 2010
parent reply "Tyro[a.c.edwards]" <nospam home.com> writes:
On 11/20/2010 9:39 PM, Michel Fortin wrote:
 On 2010-11-20 03:07:57 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 TDPL has an example that can be reduced as follows:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 ++map[line];
 }
 }

 byLine reuses its buffer so it exposes it as char[]. Therefore,
 attempting to use map[line] will fail. The program compiled and did
 the wrong thing because of a bug.

 The challenge is devising a means to make the program compile and work
 as expected. Looking up an uint[string] with a char[] should work, and
 if the char[] is to be inserted, a copy should be made (via .idup or
 to!string). The rule needs to be well defined and reasonably general.

 The effect is something like this:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 auto p = line in map;
 if (p) ++*p;
 else map[line.idup] = 1;
 }
 }

 Ideally the programmer would write the simple code (first snippet) and
 achieve the semantics of the more complex code (second snippet). Any
 ideas?
But didn't you agree with me yesterday in another thread that it's best to make the caller responsible for the idup in cases where you need a string to be immutable? Now you want the reverse for AAs? Consider this example: uint[string][100] maps; foreach (line; stdin.byLine()) { foreach (map; maps) { ++map[line]; } } With your proposal, you'd be silently iduping 100 times each line to produce 100 times the same immutable string.
Maybe I'm not knowledgeable enough to understand but, if I need to do this under the current constraints I would have to write ++map[line.idup]. In effect I end up duping 100 times anyway. Or is there another solution that I'm just not seeing?
 It's true that it's difficult to make the caller responsible for iduping
 the string since whether you need a true immutable string or not depends
 on a test, but I'm still wary about functions that idup things on your
 behalf.
Nov 20 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-11-20 18:04:29 -0500, "Tyro[a.c.edwards]" <nospam home.com> said:

 On 11/20/2010 9:39 PM, Michel Fortin wrote:
 But didn't you agree with me yesterday in another thread that it's best
 to make the caller responsible for the idup in cases where you need a
 string to be immutable? Now you want the reverse for AAs?
 
 Consider this example:
 
 uint[string][100] maps;
 foreach (line; stdin.byLine()) {
 foreach (map; maps) {
 ++map[line];
 }
 }
 
 With your proposal, you'd be silently iduping 100 times each line to
 produce 100 times the same immutable string.
Maybe I'm not knowledgeable enough to understand but, if I need to do this under the current constraints I would have to write ++map[line.idup]. In effect I end up duping 100 times anyway. Or is there another solution that I'm just not seeing?
One solution is to just idup the line before entering the inner loop. uint[string][100] maps; foreach (line; stdin.byLine()) { string iline = line.idup; foreach (map; maps) { ++map[iline]; } } The best solution avoid unnecessary allocations in the problem above to would be to idup the line only once for the first map that doesn't already contain the key. This way if all maps already have the key you don't do any idup, but you are still sure that you won't idup the same line twice. uint[string][100] maps; foreach (line; stdin.byLine()) { string iline = null; foreach (map; maps) { uint* value = line in map; if (value) ++(*value); else { if (iline == null) iline = line.idup; ++map[iline]; } } } The key point is that by making the caller responsible of calling idup, the inefficiency appears more clearly and can be acted upon more easily, if deemed necessary. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 20 2010
parent Fawzi Mohamed <fawzi gmx.ch> writes:
I was thinking about this, and revised my idea.
While I think that in general hiding idup is a bad practice, I think  
that for AA one could make the case that it might be acceptable.
idup is needed only when assigning a value using a key that isn't  
immutable.
Now I think that this use in AA is relatively seldom, one normally  
wants to read more than write, so some magic might be warranted.
If one has a use case where the the idup really becomes an issue he  
should expend the extra thinking to avoid it.
The advantage of this is that idup is not needed for basic operations,  
which I think is good, as one doesn't have to confront himself with  
the const complexity so early when using the language. const should be  
kid of avoidable in several tasks if one doesn't care about it.
What I am still uneasy about the original example is its reliance on  
the automatic initialization to 0 for undefined keys, an AA normally  
throws when the key is not present, it doesn't return the default  
value, why should updating be different?.

Fawzi
Nov 21 2010
prev sibling next sibling parent =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> napisa=B3(a):

 TDPL has an example that can be reduced as follows:

 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      ++map[line];
    }
 }

 byLine reuses its buffer so it exposes it as char[]. Therefore,  =
 attempting to use map[line] will fail. The program compiled and did th=
e =
 wrong thing because of a bug.

 The challenge is devising a means to make the program compile and work=
=
 as expected. Looking up an uint[string] with a char[] should work, and=
=
 if the char[] is to be inserted, a copy should be made (via .idup or  =
 to!string). The rule needs to be well defined and reasonably general.

 The effect is something like this:

 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      auto p =3D line in map;
      if (p) ++*p;
      else map[line.idup] =3D 1;
    }
 }

 Ideally the programmer would write the simple code (first snippet) and=
=
 achieve the semantics of the more complex code (second snippet). Any  =
 ideas?
To be honest I find the 2nd snippet more readable, it's just one line mo= re: uint[string] map; foreach (line; stdin.byLine()) { if (auto p =3D line in map) ++*p; else map[line.idup] =3D 1; } You know exactly what it does and it's const correct. I don't see value = in = implicit cloning. The first example smells of STL map[key] -- if key is not found, it = implicitly inserts key, initializes the value, and returns a reference. = = Let's not go there. -- = Tomek
Nov 20 2010
prev sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 20/11/2010 08:07, Andrei Alexandrescu wrote:
 TDPL has an example that can be reduced as follows:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 ++map[line];
 }
 }

 byLine reuses its buffer so it exposes it as char[]. Therefore,
 attempting to use map[line] will fail. The program compiled and did the
 wrong thing because of a bug.

 The challenge is devising a means to make the program compile and work
 as expected. Looking up an uint[string] with a char[] should work, and
 if the char[] is to be inserted, a copy should be made (via .idup or
 to!string). The rule needs to be well defined and reasonably general.

 The effect is something like this:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 auto p = line in map;
 if (p) ++*p;
 else map[line.idup] = 1;
 }
 }

 Ideally the programmer would write the simple code (first snippet) and
 achieve the semantics of the more complex code (second snippet). Any ideas?

 Andrei
An idea: import std.conv; //... uint[string] map; foreach (line; stdin.byLine()) { map.getOrPut(line, 0, to!string) ++; } Explanation: getOrPut retrieves entry under key "line", returns a ref value if found. If not found, it creates a new entry with value 0 (second parameter), and with key to!string(line), and again returns the ref value. So the third parameter is a delegate/function that creates a new key from the lookup key given as the first parameter. There is a contract in this creator function that the created key must equal the lookup key. Thus there may be some minor optimization versus the second snippet of the OP, since getOrPut doesn't need to calculate hashCode twice. getOrPut might implicitly know how to convert certain objects, like const(char) to string, so map.getOrPut(line, 0, to!string) ++; could be just: map.getOrPut(line, 0) ++; There should also be another signature of getOrPut in which the second parameter is delegate/function (a value creator), for situations where the entry value is not unexpensive to create. For example: map.getOrPut(line, { return new Integer(0); }, to!string) ++; -- Bruno Medeiros - Software Engineer
Dec 02 2010
parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 02/12/2010 14:18, Bruno Medeiros wrote:
 On 20/11/2010 08:07, Andrei Alexandrescu wrote:
 TDPL has an example that can be reduced as follows:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 ++map[line];
 }
 }

 byLine reuses its buffer so it exposes it as char[]. Therefore,
 attempting to use map[line] will fail. The program compiled and did the
 wrong thing because of a bug.

 The challenge is devising a means to make the program compile and work
 as expected. Looking up an uint[string] with a char[] should work, and
 if the char[] is to be inserted, a copy should be made (via .idup or
 to!string). The rule needs to be well defined and reasonably general.

 The effect is something like this:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 auto p = line in map;
 if (p) ++*p;
 else map[line.idup] = 1;
 }
 }

 Ideally the programmer would write the simple code (first snippet) and
 achieve the semantics of the more complex code (second snippet). Any
 ideas?

 Andrei
An idea: import std.conv; //... uint[string] map; foreach (line; stdin.byLine()) { map.getOrPut(line, 0, to!string) ++; }
Also, it doesn't use pointers, which I guess may be a bit of an advantage. -- Bruno Medeiros - Software Engineer
Dec 02 2010