digitalmars.D - Principled method of lookup-or-insert in associative arrays?

Andrei Alexandrescu (26/26) Nov 20 2010 TDPL has an example that can be reduced as follows:

Lutger Blijdestijn (5/39) Nov 20 2010 What I don' t understand is why it is allowed to use mutable types where...
Fawzi Mohamed (9/36) Nov 20 2010 I think that you basically stated the important rules: lookup, update
spir (21/52) Nov 20 2010 s?

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (13/25) Nov 20 2010 eg

spir (38/61) Nov 20 2010 ???

Adam Burton (4/69) Nov 20 2010 I believe it is so you can reassign. Consider
=?UTF-8?B?UGVsbGUgTcOlbnNzb24=?= (5/21) Nov 20 2010 I do not understand the alternative.
=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (12/26) Nov 20 2010 =

spir (13/35) Nov 20 2010 =20

Tyro[a.c.edwards] (12/33) Nov 20 2010 What would be the harm if upon seeing your code the compiler did this:

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (10/21) Nov 20 2010 The harm is confusion. Now =3D on arrays always means aliasing, but with...

Tyro[a.c.edwards] (13/30) Nov 20 2010 alias char[] firstClassString;

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (30/47) Nov 20 2010 ith

Tyro[a.c.edwards] (4/38) Nov 20 2010 Fair enough. Like I said, I'm not proposing that this happens. Just

Michel Fortin (22/54) Nov 20 2010 But didn't you agree with me yesterday in another thread that it's best

Tyro[a.c.edwards] (5/55) Nov 20 2010 Maybe I'm not knowledgeable enough to understand but, if I need to do

Michel Fortin (35/56) Nov 20 2010 One solution is to just idup the line before entering the inner loop.

Fawzi Mohamed (18/18) Nov 21 2010 I was thinking about this, and revised my idea.

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (21/47) Nov 20 2010 =
Bruno Medeiros (28/54) Dec 02 2010 An idea:

Bruno Medeiros (4/46) Dec 02 2010 Also, it doesn't use pointers, which I guess may be a bit of an advantag...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

TDPL has an example that can be reduced as follows:

void main() {
   uint[string] map;
   foreach (line; stdin.byLine()) {
     ++map[line];
   }
}

byLine reuses its buffer so it exposes it as char[]. Therefore, 
attempting to use map[line] will fail. The program compiled and did the 
wrong thing because of a bug.

The challenge is devising a means to make the program compile and work 
as expected. Looking up an uint[string] with a char[] should work, and 
if the char[] is to be inserted, a copy should be made (via .idup or 
to!string). The rule needs to be well defined and reasonably general.

The effect is something like this:

void main() {
   uint[string] map;
   foreach (line; stdin.byLine()) {
     auto p = line in map;
     if (p) ++*p;
     else map[line.idup] = 1;
   }
}

Ideally the programmer would write the simple code (first snippet) and 
achieve the semantics of the more complex code (second snippet). Any ideas?

Andrei

Nov 20 2010

Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:

Andrei Alexandrescu wrote:

 TDPL has an example that can be reduced as follows:
 
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      ++map[line];
    }
 }
 
 byLine reuses its buffer so it exposes it as char[]. Therefore,
 attempting to use map[line] will fail. The program compiled and did the
 wrong thing because of a bug.
 
 The challenge is devising a means to make the program compile and work
 as expected. Looking up an uint[string] with a char[] should work, and
 if the char[] is to be inserted, a copy should be made (via .idup or
 to!string). The rule needs to be well defined and reasonably general.
 
 The effect is something like this:
 
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      auto p = line in map;
      if (p) ++*p;
      else map[line.idup] = 1;
    }
 }
 
 Ideally the programmer would write the simple code (first snippet) and
 achieve the semantics of the more complex code (second snippet). Any
 ideas?
 
 Andrei

What I don' t understand is why it is allowed to use mutable types where the 
key type is immutable. This way you can get immutable references to mutable 
data back without explicit casting (via the .keys property for example). The 
example would not compile if this wasn't allowed.

Nov 20 2010

Fawzi Mohamed <fawzi gmx.ch> writes:

On 20-nov-10, at 09:07, Andrei Alexandrescu wrote:

 TDPL has an example that can be reduced as follows:

 void main() {
  uint[string] map;
  foreach (line; stdin.byLine()) {
    ++map[line];
  }
 }

 byLine reuses its buffer so it exposes it as char[]. Therefore,  
 attempting to use map[line] will fail. The program compiled and did  
 the wrong thing because of a bug.

 The challenge is devising a means to make the program compile and  
 work as expected. Looking up an uint[string] with a char[] should  
 work, and if the char[] is to be inserted, a copy should be made  
 (via .idup or to!string). The rule needs to be well defined and  
 reasonably general.

I think that you basically stated the important rules: lookup, update  
should use const(char[]), set (i.e. potentially add a new key)  
immutable(char[]).

 The effect is something like this:

 void main() {
  uint[string] map;
  foreach (line; stdin.byLine()) {
    auto p = line in map;
    if (p) ++*p;
    else map[line.idup] = 1;
  }
 }

 Ideally the programmer would write the simple code (first snippet)  
 and achieve the semantics of the more complex code (second snippet).  
 Any ideas?

I would consider the first program as written invalid (at runtime)  
because the initial value is not set, so you cannot update it with ++.
Also potential hidden idup should not be added IMHO.
I don't find the second program so bad...

Fawzi

Nov 20 2010

spir <denis.spir gmail.com> writes:

On Sat, 20 Nov 2010 00:07:57 -0800
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 TDPL has an example that can be reduced as follows:
=20
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      ++map[line];
    }
 }
=20
 byLine reuses its buffer so it exposes it as char[]. Therefore,=20
 attempting to use map[line] will fail. The program compiled and did the=20
 wrong thing because of a bug.
=20
 The challenge is devising a means to make the program compile and work=20
 as expected. Looking up an uint[string] with a char[] should work, and=20
 if the char[] is to be inserted, a copy should be made (via .idup or=20
 to!string). The rule needs to be well defined and reasonably general.
=20
 The effect is something like this:
=20
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      auto p =3D line in map;
      if (p) ++*p;
      else map[line.idup] =3D 1;
    }
 }
=20
 Ideally the programmer would write the simple code (first snippet) and=20
 achieve the semantics of the more complex code (second snippet). Any idea=

s?

I find this proposal really necessary. But aren't there two issues here?
* Comparison (for lookup) by value equality should not care about qualifier=
s (ie compare raw content, here plain array memory areas).
* Assignment should perform "qualification conversion" automatically, eg
	char[] chars =3D "abc";
	string s =3D chars;
This involves no implicit magic here, as target qualification is explicit. =
So, why not?

There is a repetitive programming pattern in D:
* play around with *string's in general
* as soon as text processing is needed, convert to *char[]
* when finished, convert back to *string
This should not be implicit, but automatic.
Is there anything risky here?


Denis
-- -- -- -- -- -- --
vit esse estrany =E2=98=A3

spir.wikidot.com

Nov 20 2010

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:

Dnia 20-11-2010 o 13:33:29 spir <denis.spir gmail.com> napisa=B3(a):

 I find this proposal really necessary. But aren't there two issues her=

e?
 * Comparison (for lookup) by value equality should not care about  =

 qualifiers (ie compare raw content, here plain array memory areas).
 * Assignment should perform "qualification conversion" automatically, =

eg
 	char[] chars =3D "abc";
 	string s =3D chars;
 This involves no implicit magic here, as target qualification is  =

 explicit. So, why not?

It's busting the whole const system to smithereens.

char[] chars =3D "abc";
char[] backdoor =3D chars;
string s =3D chars;
assert (s =3D=3D "abc");
backdoor.front =3D 'k';
assert (s =3D=3D "abc"); // fails. not so immutable, huh?

 There is a repetitive programming pattern in D:
 * play around with *string's in general
 * as soon as text processing is needed, convert to *char[]
 * when finished, convert back to *string

Meh, immutable strings make life easier.

-- =

Tomek

Nov 20 2010

spir <denis.spir gmail.com> writes:

On Sat, 20 Nov 2010 15:22:37 +0100
Tomek Sowi=C5=84ski <just ask.me> wrote:

 I find this proposal really necessary. But aren't there two issues here?
 * Comparison (for lookup) by value equality should not care about =20
 qualifiers (ie compare raw content, here plain array memory areas).
 * Assignment should perform "qualification conversion" automatically, eg
 	char[] chars =3D "abc";
 	string s =3D chars;
 This involves no implicit magic here, as target qualification is =20
 explicit. So, why not? =20

=20
 It's busting the whole const system to smithereens.
=20
 char[] chars =3D "abc";
 char[] backdoor =3D chars;
 string s =3D chars;
 assert (s =3D=3D "abc");
 backdoor.front =3D 'k';
 assert (s =3D=3D "abc"); // fails. not so immutable, huh?

???
backdoor and s do not denote the same element. One is a mutable array, the =
other is immutable. Why should changing backdoor affect s? Whether backdoor=
 and chars denote the same array depends on whether "=3D" copies or not dyn=
 arrays. But from immutable string to mutable array, there must be a copy (=
read: dup).

Anyway, the most annoying issue is not about assignments inside a given sco=
pe, but parameter passing (including implicit ones like in Andrei's example=
 of foreach).
	void f (char[] chars) {}
	void g (string str) {}
	...
	string str =3D "abc";
	char[] chars =3D "abc".dup;
	f(str);
	g(chars);
__trials__.d(30): Error: function __trials__.f (char[] chars) is not callab=
le using argument types (string)
__trials__.d(31): Error: function __trials__.g (string str) is not callable=
 using argument types (char[])
	...
	f(str.dup);	// ok
	g(chars.idup);	// ditto

 There is a repetitive programming pattern in D:
 * play around with *string's in general
 * as soon as text processing is needed, convert to *char[]
 * when finished, convert back to *string =20

=20
 Meh, immutable strings make life easier.

For sure, I'm 100% for immutable strings by default, esp that literals prod=
uce immutable elements. But one cannot perform text processing on immutable=
 thingies:
	s[3] =3D 'x';
	s.replace("abc","ABC");
So that we constantly have to juggle between mutable and immutable versions=
 of, conceptually, the _same value_.

By the way, why isn't the definition of string immutable(char[]), instead o=
f immutable(char)[]?


Denis
-- -- -- -- -- -- --
vit esse estrany =E2=98=A3

spir.wikidot.com

Nov 20 2010

Adam Burton <adz21c gmail.com> writes:

spir wrote:

 On Sat, 20 Nov 2010 15:22:37 +0100
 Tomek Sowiński <just ask.me> wrote:
 
 I find this proposal really necessary. But aren't there two issues
 here? * Comparison (for lookup) by value equality should not care about
 qualifiers (ie compare raw content, here plain array memory areas).
 * Assignment should perform "qualification conversion" automatically,
 eg char[] chars = "abc";
 string s = chars;
 This involves no implicit magic here, as target qualification is
 explicit. So, why not?

 
 It's busting the whole const system to smithereens.
 
 char[] chars = "abc";
 char[] backdoor = chars;
 string s = chars;
 assert (s == "abc");
 backdoor.front = 'k';
 assert (s == "abc"); // fails. not so immutable, huh?

 
 ???
 backdoor and s do not denote the same element. One is a mutable array, the
 other is immutable. Why should changing backdoor affect s? Whether
 backdoor and chars denote the same array depends on whether "=" copies or
 not dyn arrays. But from immutable string to mutable array, there must be
 a copy (read: dup).
 
 Anyway, the most annoying issue is not about assignments inside a given
 scope, but parameter passing (including implicit ones like in Andrei's
 example of foreach). void f (char[] chars) {} void g (string str) {}
 ...
 string str = "abc";
 char[] chars = "abc".dup;
 f(str);
 g(chars);
 __trials__.d(30): Error: function __trials__.f (char[] chars) is not
 callable using argument types (string) __trials__.d(31): Error: function
 __trials__.g (string str) is not callable using argument types (char[])
 ...
 f(str.dup);	// ok
 g(chars.idup);	// ditto
 
 There is a repetitive programming pattern in D:
 * play around with *string's in general
 * as soon as text processing is needed, convert to *char[]
 * when finished, convert back to *string

 
 Meh, immutable strings make life easier.

 
 For sure, I'm 100% for immutable strings by default, esp that literals
 produce immutable elements. But one cannot perform text processing on
 immutable thingies: s[3] = 'x'; s.replace("abc","ABC");
 So that we constantly have to juggle between mutable and immutable
 versions of, conceptually, the _same value_.
 
 By the way, why isn't the definition of string immutable(char[]), instead
 of immutable(char)[]?

I believe it is so you can reassign. Consider

string s = "hello";
s = "good bye";
 
 
 Denis
 -- -- -- -- -- -- --
 vit esse estrany ☣
 
 spir.wikidot.com

Nov 20 2010

=?UTF-8?B?UGVsbGUgTcOlbnNzb24=?= <pelle.mansson gmail.com> writes:

On 11/20/2010 05:09 PM, spir wrote:
 ???
 backdoor and s do not denote the same element. One is a mutable array, the
other is immutable. Why should changing backdoor affect s? Whether backdoor and
chars denote the same array depends on whether "=" copies or not dyn arrays.
But from immutable string to mutable array, there must be a copy (read: dup).

Must also be a copy the other way. Secret heap allocations are not fun.

 Anyway, the most annoying issue is not about assignments inside a given scope,
but parameter passing (including implicit ones like in Andrei's example of
foreach).
 	void f (char[] chars) {}
 	void g (string str) {}
 	...
 	string str = "abc";
 	char[] chars = "abc".dup;
 	f(str);
 	g(chars);
 __trials__.d(30): Error: function __trials__.f (char[] chars) is not callable
using argument types (string)
 __trials__.d(31): Error: function __trials__.g (string str) is not callable
using argument types (char[])
 	...
 	f(str.dup);	// ok
 	g(chars.idup);	// ditto

I do not understand the alternative.

 By the way, why isn't the definition of string immutable(char[]), instead of
immutable(char)[]?

string s = "abc";
s = "bde"; // fails with immutable(char[]), rightly so.

Nov 20 2010

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:

Dnia 20-11-2010 o 17:09:00 spir <denis.spir gmail.com> napisa=B3(a):

 It's busting the whole const system to smithereens.

 char[] chars =3D "abc";
 char[] backdoor =3D chars;
 string s =3D chars;
 assert (s =3D=3D "abc");
 backdoor.front =3D 'k';
 assert (s =3D=3D "abc"); // fails. not so immutable, huh?

 ???
 backdoor and s do not denote the same element. One is a mutable array,=

  =

 the other is immutable. Why should changing backdoor affect s? Whether=

  =

 backdoor and chars denote the same array depends on whether "=3D" copi=

es  =

 or not dyn arrays. But from immutable string to mutable array, there  =

 must be a copy (read: dup).

OK, I get it. I still like it the way it is. Having worked with C++ code=
  =

with overloaded assignment ops and implicit conversions, I see =3D that =
 =

sometimes dups and sometimes aliases as a source of confusion and  =

performance bugs. All this just to remove .idup (it's 5 chars!).

-- =

Tomek

Nov 20 2010

spir <denis.spir gmail.com> writes:

On Sat, 20 Nov 2010 21:21:58 +0100
Tomek Sowi=C5=84ski <just ask.me> wrote:

 Dnia 20-11-2010 o 17:09:00 spir <denis.spir gmail.com> napisa=C5=82(a):
=20
 It's busting the whole const system to smithereens.

 char[] chars =3D "abc";
 char[] backdoor =3D chars;
 string s =3D chars;
 assert (s =3D=3D "abc");
 backdoor.front =3D 'k';
 assert (s =3D=3D "abc"); // fails. not so immutable, huh?

 ???
 backdoor and s do not denote the same element. One is a mutable array, =


=20
 the other is immutable. Why should changing backdoor affect s? Whether =


=20
 backdoor and chars denote the same array depends on whether "=3D" copie=


s =20
 or not dyn arrays. But from immutable string to mutable array, there =20
 must be a copy (read: dup).

=20
 OK, I get it. I still like it the way it is. Having worked with C++ code =

=20
 with overloaded assignment ops and implicit conversions, I see =3D that =

=20
 sometimes dups and sometimes aliases as a source of confusion and =20
 performance bugs. All this just to remove .idup (it's 5 chars!).
=20

Yes, you may be right on this ("explicit is better than implicit", as pytho=
n folks say).

Denis
-- -- -- -- -- -- --
vit esse estrany =E2=98=A3

spir.wikidot.com

Nov 20 2010

"Tyro[a.c.edwards]" <nospam home.com> writes:

On 11/20/2010 11:22 PM, Tomek Sowi�ski wrote:
 Dnia 20-11-2010 o 13:33:29 spir <denis.spir gmail.com> napisa�(a):

 I find this proposal really necessary. But aren't there two issues here?
 * Comparison (for lookup) by value equality should not care about
 qualifiers (ie compare raw content, here plain array memory areas).
 * Assignment should perform "qualification conversion" automatically, eg
 char[] chars = "abc";
 string s = chars;
 This involves no implicit magic here, as target qualification is
 explicit. So, why not?

 It's busting the whole const system to smithereens.

 char[] chars = "abc";
 char[] backdoor = chars;
 string s = chars;
 assert (s == "abc");
 backdoor.front = 'k';
 assert (s == "abc"); // fails. not so immutable, huh?

What would be the harm if upon seeing your code the compiler did this:

char[] chars = "abc".dup;
char[] backdoor = chars;
string s = chars.idup;
assert (s == "abc");
backdoor.front = 'k';	// [1]
assert (s == "abc");

Slightly magical but works according to expectation. I don't see the 
problem.

[1] Assuming your are using std.array, this would fail because 
backdoor.front is not an lvalue.

 There is a repetitive programming pattern in D:
 * play around with *string's in general
 * as soon as text processing is needed, convert to *char[]
 * when finished, convert back to *string

 Meh, immutable strings make life easier.

Nov 20 2010

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:

Tyro[a.c.edwards] <nospam home.com> napisa=B3(a):

 What would be the harm if upon seeing your code the compiler did this:=

  char[] chars =3D "abc".dup;
 char[] backdoor =3D chars;
 string s =3D chars.idup;
 assert (s =3D=3D "abc");
 backdoor.front =3D 'k';	// [1]
 assert (s =3D=3D "abc");
  Slightly magical but works according to expectation. I don't see the =

 =

 problem.
  [1] Assuming your are using std.array, this would fail because  =

 backdoor.front is not an lvalue.

The harm is confusion. Now =3D on arrays always means aliasing, but with=
  =

your proposal, it may *sometimes* mean dupping. Imagine real-life code  =

with type aliasing and type inference in play, and trying to determine  =

whether some line makes a dupping string<->char[] conversion or not.

What would be the harm if everyone just put .(i)dup where it belongs?

-- =

Tomek

Nov 20 2010

"Tyro[a.c.edwards]" <nospam home.com> writes:

On 11/21/2010 9:23 AM, Tomek Sowi�ski wrote:
 Tyro[a.c.edwards] <nospam home.com> napisa�(a):

 What would be the harm if upon seeing your code the compiler did this:
 char[] chars = "abc".dup;
 char[] backdoor = chars;
 string s = chars.idup;
 assert (s == "abc");
 backdoor.front = 'k'; // [1]
 assert (s == "abc");
 Slightly magical but works according to expectation. I don't see the
 problem.
 [1] Assuming your are using std.array, this would fail because
 backdoor.front is not an lvalue.

 The harm is confusion. Now = on arrays always means aliasing, but with
 your proposal, it may *sometimes* mean dupping. Imagine real-life code
 with type aliasing and type inference in play, and trying to determine
 whether some line makes a dupping string<->char[] conversion or not.

alias char[] firstClassString;
alias firstClassString stringGold;
alias stringGold lightAtEndOfTunnel;
alias lightAtEndOfTunnel daBomb;
alias daBomb iAmThe;

iAmThe man;

if I want to know the actual type of shiznit:

writeln(typeid(man)); // outputs char[]

 What would be the harm if everyone just put .(i)dup where it belongs?

Actually, I don't see any harm in using .(i)dup. However, my limited 
experience does not allow me to think on the same level as you so I'm 
just trying to understand your point a little better. That being said, I 
still don't understand the gravity of the implied problem.

Nov 20 2010

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:

Dnia 21-11-2010 o 02:02:35 Tyro[a.c.edwards] <nospam home.com> napisa=B3=
(a):

 The harm is confusion. Now =3D on arrays always means aliasing, but w=


ith
 your proposal, it may *sometimes* mean dupping. Imagine real-life cod=


e
 with type aliasing and type inference in play, and trying to determin=


e
 whether some line makes a dupping string<->char[] conversion or not.

  alias char[] firstClassString;
 alias firstClassString stringGold;
 alias stringGold lightAtEndOfTunnel;
 alias lightAtEndOfTunnel daBomb;
 alias daBomb iAmThe;
  iAmThe man;
  if I want to know the actual type of shiznit:
  writeln(typeid(man)); // outputs char[]

  What would be the harm if everyone just put .(i)dup where it belongs=


?
   Actually, I don't see any harm in using .(i)dup. However, my limited=

  =

 experience does not allow me to think on the same level as you so I'm =

 =

 just trying to understand your point a little better. That being said,=

 I  =

 still don't understand the gravity of the implied problem.

The status quo is imposing a tiny upfront tax on the writer of typing  =

.(i)dup.
The proposal is imposing a minor tax on the reader of knowing another ru=
le  =

and, in case of convoluted code like above, minor tax of determining if =
 =

this rule applies. Plus, some tax on the implementor to put this rule in=
to  =

the compiler.

The point is the aggregated tax imposed by the proposal is much greater =
 =

because although minor it grows linearly with the number of readers (cod=
e  =

is read much more often then written). And even with finite set of  =

readers, the tax grows till infinity because with time people forget wha=
t  =

they read, therefore the tax is collected anew.

And think of the gains -- dropping .(i)dup doesn't really buy you much  =

anyway.

-- =

Tomek$

Nov 20 2010

"Tyro[a.c.edwards]" <nospam home.com> writes:

On 11/21/2010 10:39 AM, Tomek Sowi�ski wrote:
 Dnia 21-11-2010 o 02:02:35 Tyro[a.c.edwards] <nospam home.com> napisa�(a):

 The harm is confusion. Now = on arrays always means aliasing, but with
 your proposal, it may *sometimes* mean dupping. Imagine real-life code
 with type aliasing and type inference in play, and trying to determine
 whether some line makes a dupping string<->char[] conversion or not.

 alias char[] firstClassString;
 alias firstClassString stringGold;
 alias stringGold lightAtEndOfTunnel;
 alias lightAtEndOfTunnel daBomb;
 alias daBomb iAmThe;
 iAmThe man;
 if I want to know the actual type of shiznit:
 writeln(typeid(man)); // outputs char[]

 What would be the harm if everyone just put .(i)dup where it belongs?

 Actually, I don't see any harm in using .(i)dup. However, my limited
 experience does not allow me to think on the same level as you so I'm
 just trying to understand your point a little better. That being said,
 I still don't understand the gravity of the implied problem.

 The status quo is imposing a tiny upfront tax on the writer of typing
 .(i)dup.
 The proposal is imposing a minor tax on the reader of knowing another
 rule and, in case of convoluted code like above, minor tax of
 determining if this rule applies. Plus, some tax on the implementor to
 put this rule into the compiler.

OK! Makes sense.

 The point is the aggregated tax imposed by the proposal is much greater
 because although minor it grows linearly with the number of readers
 (code is read much more often then written). And even with finite set of
 readers, the tax grows till infinity because with time people forget
 what they read, therefore the tax is collected anew.

 And think of the gains -- dropping .(i)dup doesn't really buy you much
 anyway.

Fair enough. Like I said, I'm not proposing that this happens. Just 
trying to understand why it shouldn't. Thanks for entertaining my questions.

Nov 20 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-11-20 03:07:57 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 TDPL has an example that can be reduced as follows:
 
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      ++map[line];
    }
 }
 
 byLine reuses its buffer so it exposes it as char[]. Therefore, 
 attempting to use map[line] will fail. The program compiled and did the 
 wrong thing because of a bug.
 
 The challenge is devising a means to make the program compile and work 
 as expected. Looking up an uint[string] with a char[] should work, and 
 if the char[] is to be inserted, a copy should be made (via .idup or 
 to!string). The rule needs to be well defined and reasonably general.
 
 The effect is something like this:
 
 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      auto p = line in map;
      if (p) ++*p;
      else map[line.idup] = 1;
    }
 }
 
 Ideally the programmer would write the simple code (first snippet) and 
 achieve the semantics of the more complex code (second snippet). Any 
 ideas?

But didn't you agree with me yesterday in another thread that it's best 
to make the caller responsible for the idup in cases where you need a 
string to be immutable? Now you want the reverse for AAs?

Consider this example:

	uint[string][100] maps;
	foreach (line; stdin.byLine()) {
		foreach (map; maps) {
			++map[line];
		}
	}

With your proposal, you'd be silently iduping 100 times each line to 
produce 100 times the same immutable string.

It's true that it's difficult to make the caller responsible for 
iduping the string since whether you need a true immutable string or 
not depends on a test, but I'm still wary about functions that idup 
things on your behalf.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 20 2010

"Tyro[a.c.edwards]" <nospam home.com> writes:

On 11/20/2010 9:39 PM, Michel Fortin wrote:
 On 2010-11-20 03:07:57 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 TDPL has an example that can be reduced as follows:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 ++map[line];
 }
 }

 byLine reuses its buffer so it exposes it as char[]. Therefore,
 attempting to use map[line] will fail. The program compiled and did
 the wrong thing because of a bug.

 The challenge is devising a means to make the program compile and work
 as expected. Looking up an uint[string] with a char[] should work, and
 if the char[] is to be inserted, a copy should be made (via .idup or
 to!string). The rule needs to be well defined and reasonably general.

 The effect is something like this:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 auto p = line in map;
 if (p) ++*p;
 else map[line.idup] = 1;
 }
 }

 Ideally the programmer would write the simple code (first snippet) and
 achieve the semantics of the more complex code (second snippet). Any
 ideas?

 But didn't you agree with me yesterday in another thread that it's best
 to make the caller responsible for the idup in cases where you need a
 string to be immutable? Now you want the reverse for AAs?

 Consider this example:

 uint[string][100] maps;
 foreach (line; stdin.byLine()) {
 foreach (map; maps) {
 ++map[line];
 }
 }

 With your proposal, you'd be silently iduping 100 times each line to
 produce 100 times the same immutable string.

Maybe I'm not knowledgeable enough to understand but, if I need to do 
this under the current constraints I would have to write 
++map[line.idup]. In effect I end up duping 100 times anyway. Or is 
there another solution that I'm just not seeing?

 It's true that it's difficult to make the caller responsible for iduping
 the string since whether you need a true immutable string or not depends
 on a test, but I'm still wary about functions that idup things on your
 behalf.

Nov 20 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-11-20 18:04:29 -0500, "Tyro[a.c.edwards]" <nospam home.com> said:

 On 11/20/2010 9:39 PM, Michel Fortin wrote:
 But didn't you agree with me yesterday in another thread that it's best
 to make the caller responsible for the idup in cases where you need a
 string to be immutable? Now you want the reverse for AAs?
 
 Consider this example:
 
 uint[string][100] maps;
 foreach (line; stdin.byLine()) {
 foreach (map; maps) {
 ++map[line];
 }
 }
 
 With your proposal, you'd be silently iduping 100 times each line to
 produce 100 times the same immutable string.

 
 Maybe I'm not knowledgeable enough to understand but, if I need to do 
 this under the current constraints I would have to write 
 ++map[line.idup]. In effect I end up duping 100 times anyway. Or is 
 there another solution that I'm just not seeing?

One solution is to just idup the line before entering the inner loop.

	uint[string][100] maps;
	foreach (line; stdin.byLine()) {
		string iline = line.idup;
		foreach (map; maps) {
			++map[iline];
		}
	}

The best solution avoid unnecessary allocations in the problem above to 
would be to idup the line only once for the first map that doesn't 
already contain the key. This way if all maps already have the key you 
don't do any idup, but you are still sure that you won't idup the same 
line twice.

	uint[string][100] maps;
	foreach (line; stdin.byLine()) {
		string iline = null;
		foreach (map; maps) {
			uint* value = line in map;
			if (value)
				++(*value);
			else {
				if (iline == null)
					iline = line.idup;
				++map[iline];
			}
		}
	}

The key point is that by making the caller responsible of calling idup, 
the inefficiency appears more clearly and can be acted upon more 
easily, if deemed necessary.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 20 2010

Fawzi Mohamed <fawzi gmx.ch> writes:

I was thinking about this, and revised my idea.
While I think that in general hiding idup is a bad practice, I think  
that for AA one could make the case that it might be acceptable.
idup is needed only when assigning a value using a key that isn't  
immutable.
Now I think that this use in AA is relatively seldom, one normally  
wants to read more than write, so some magic might be warranted.
If one has a use case where the the idup really becomes an issue he  
should expend the extra thinking to avoid it.
The advantage of this is that idup is not needed for basic operations,  
which I think is good, as one doesn't have to confront himself with  
the const complexity so early when using the language. const should be  
kid of avoidable in several tasks if one doesn't care about it.
What I am still uneasy about the original example is its reliance on  
the automatic initialization to 0 for undefined keys, an AA normally  
throws when the key is not present, it doesn't return the default  
value, why should updating be different?.

Fawzi

Nov 21 2010

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> napisa=B3(a):

 TDPL has an example that can be reduced as follows:

 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      ++map[line];
    }
 }

 byLine reuses its buffer so it exposes it as char[]. Therefore,  =

 attempting to use map[line] will fail. The program compiled and did th=

e  =

 wrong thing because of a bug.

 The challenge is devising a means to make the program compile and work=

  =

 as expected. Looking up an uint[string] with a char[] should work, and=

  =

 if the char[] is to be inserted, a copy should be made (via .idup or  =

 to!string). The rule needs to be well defined and reasonably general.

 The effect is something like this:

 void main() {
    uint[string] map;
    foreach (line; stdin.byLine()) {
      auto p =3D line in map;
      if (p) ++*p;
      else map[line.idup] =3D 1;
    }
 }

 Ideally the programmer would write the simple code (first snippet) and=

  =

 achieve the semantics of the more complex code (second snippet). Any  =

 ideas?

To be honest I find the 2nd snippet more readable, it's just one line mo=
re:

   uint[string] map;
   foreach (line; stdin.byLine()) {
     if (auto p =3D line in map) ++*p;
     else map[line.idup] =3D 1;
   }

You know exactly what it does and it's const correct. I don't see value =
in  =

implicit cloning.

The first example smells of STL map[key] -- if key is not found, it  =

implicitly inserts key, initializes the value, and returns a reference. =
 =

Let's not go there.

-- =

Tomek

Nov 20 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 20/11/2010 08:07, Andrei Alexandrescu wrote:
 TDPL has an example that can be reduced as follows:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 ++map[line];
 }
 }

 byLine reuses its buffer so it exposes it as char[]. Therefore,
 attempting to use map[line] will fail. The program compiled and did the
 wrong thing because of a bug.

 The challenge is devising a means to make the program compile and work
 as expected. Looking up an uint[string] with a char[] should work, and
 if the char[] is to be inserted, a copy should be made (via .idup or
 to!string). The rule needs to be well defined and reasonably general.

 The effect is something like this:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 auto p = line in map;
 if (p) ++*p;
 else map[line.idup] = 1;
 }
 }

 Ideally the programmer would write the simple code (first snippet) and
 achieve the semantics of the more complex code (second snippet). Any ideas?

 Andrei

An idea:

	import std.conv;
	//...
	uint[string] map;
	foreach (line; stdin.byLine()) {
		map.getOrPut(line, 0, to!string) ++;
	}


Explanation:
getOrPut retrieves entry under key "line", returns a ref value if found. 
If not found, it creates a new entry with value 0 (second parameter), 
and with key to!string(line), and again returns the ref value. So the 
third parameter is a delegate/function that creates a new key from the 
lookup key given as the first parameter. There is a contract in this 
creator function that the created key must equal the lookup key. Thus 
there may be some minor optimization versus the second snippet of the 
OP, since getOrPut doesn't need to calculate hashCode twice.

getOrPut might implicitly know how to convert certain objects, like 
const(char) to string, so
	map.getOrPut(line, 0, to!string) ++;
could be just:
	map.getOrPut(line, 0) ++;


There should also be another signature of getOrPut in which the second 
parameter is delegate/function (a value creator), for situations where 
the entry value is not unexpensive to create. For example:

	map.getOrPut(line, { return new Integer(0); }, to!string) ++;


-- 
Bruno Medeiros - Software Engineer

Dec 02 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 02/12/2010 14:18, Bruno Medeiros wrote:
 On 20/11/2010 08:07, Andrei Alexandrescu wrote:
 TDPL has an example that can be reduced as follows:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 ++map[line];
 }
 }

 byLine reuses its buffer so it exposes it as char[]. Therefore,
 attempting to use map[line] will fail. The program compiled and did the
 wrong thing because of a bug.

 The challenge is devising a means to make the program compile and work
 as expected. Looking up an uint[string] with a char[] should work, and
 if the char[] is to be inserted, a copy should be made (via .idup or
 to!string). The rule needs to be well defined and reasonably general.

 The effect is something like this:

 void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
 auto p = line in map;
 if (p) ++*p;
 else map[line.idup] = 1;
 }
 }

 Ideally the programmer would write the simple code (first snippet) and
 achieve the semantics of the more complex code (second snippet). Any
 ideas?

 Andrei

 An idea:

 import std.conv;
 //...
 uint[string] map;
 foreach (line; stdin.byLine()) {
 map.getOrPut(line, 0, to!string) ++;
 }

Also, it doesn't use pointers, which I guess may be a bit of an advantage.

-- 
Bruno Medeiros - Software Engineer

Dec 02 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Principled method of lookup-or-insert in associative arrays?