digitalmars.D - Consistency

Steve D (26/26) Feb 15 2015 Python (built-in)

Adam D. Ruppe (4/8) Feb 15 2015 And also more easily wrong: 1 in [1,2,3] is a slow search, so the
Xinok (8/34) Feb 15 2015 Part of the issue is that "in" means to search by key, not by

bearophile (6/8) Feb 15 2015 Values as in Python are much more useful for arrays :-)

Meta (15/29) Feb 15 2015 O(n) lookup

John Colvin (3/5) Feb 15 2015 tup1.expand.only.countUntil(2).writeln;

bearophile (5/7) Feb 15 2015 A little shorter:

Xinok (3/13) Feb 15 2015 A small nitpick, but I'm sure that should be O(log n).

Meta (3/16) Feb 15 2015 Oh, whoops. I mixed up average-case complexity with worst-case.

bearophile (7/9) Feb 15 2015 D associative arrays used to be O(1) amortized and O(n ln n) in

Meta (6/16) Feb 15 2015 Hmm, if that's the case, then the logic for not allowing "in" for

Baz (6/25) Feb 15 2015 Where is the lang. scpec. saying that "in" must only be used if

Andrei Alexandrescu (2/8) Feb 15 2015 I remember associative arrays were a lot slower and bigger before. -- An...

Steve D (33/33) Feb 15 2015 Well, it provoked a little discussion, and I remain unconvinced.

Andrei Alexandrescu (13/40) Feb 15 2015 We're good as we are. D's standard library goes for a proportional

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (8/12) Feb 16 2015 To be really consistent,

Daniel Murphy (6/12) Feb 16 2015 It's called 'in', not 'haskey'. Is 3 in the array? Is 7 in the map?

John Colvin (6/20) Feb 16 2015 I'm quite a fan of python's // operator for integer division,

bearophile (8/14) Feb 16 2015 The C/D division operator semantics is a bug-prone design

Andrei Alexandrescu (11/24) Feb 16 2015 Yah, I agree. FWIW many times "consistency" is used, well,
"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/17) Feb 17 2015 I disagree with that; the difference between keys and values is

Craig Dillabaugh (4/14) Feb 15 2015 How is it possible to have a search structure that takes O(n ln n)
Daniel Murphy (3/5) Feb 15 2015 No, they were still O(n) worst case, for a single bucket with a degenera...

"Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> (6/12) Feb 16 2015 Java 7 changed its HashMap implementation to use TreeMap (red

Daniel Murphy (5/9) Feb 16 2015 I don't think that would pay off for most uses of builtin AAs. One day

bearophile (5/7) Feb 16 2015 I see. I was unable to hit this degenerate case in my testing

FG (3/5) Feb 16 2015 From what I can see in druntime/rt/aaA.d, the bucket is chosen by divid...

Daniel Murphy (2/5) Feb 16 2015 There isn't one any more, we're talking about the previous implementatio...

Steven Schveighoffer (8/13) Feb 16 2015 IIRC, the AA code tried to make balanced trees, at least on rehash, but

"Steve D" <whatchcallamit weedline.com> writes:

Python (built-in)

dict1 = {"a":1,"b":2}
tup1  = (0,1,2,3)

str1  = "abcd"










There is some sort of consistency of use, though they are 
different sequence/collection types.

Now try the same in D

auto dict1 = ["a":1,"b":2];
auto tup1  = tuple(0,1,2,3);
auto arr1  = [0,1,2,3];
auto str1  = "abcd";

Having some consistency involving 
sequences/arrays/strings/collections etc, which are the 
foundations of any language makes programming much easier, 
intuitive and pleasant. It shouldn't be too difficult for a super 
bare-metal language like D.
I'm honestly not knocking D (I love it), just some thoughts 
(maybe provoke some discussion?, for D3?)

Feb 15 2015

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:
 Having some consistency involving 
 sequences/arrays/strings/collections etc, which are the 
 foundations of any language makes programming much easier, 
 intuitive and pleasant.

And also more easily wrong: 1 in [1,2,3] is a slow search, so the 
D strategy is to make it look different to draw your attention to 
the potential problem.

Feb 15 2015

"Xinok" <xinok live.com> writes:

On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:
 Python (built-in)

 dict1 = {"a":1,"b":2}
 tup1  = (0,1,2,3)

 str1  = "abcd"










 There is some sort of consistency of use, though they are 
 different sequence/collection types.

 Now try the same in D

 auto dict1 = ["a":1,"b":2];
 auto tup1  = tuple(0,1,2,3);
 auto arr1  = [0,1,2,3];
 auto str1  = "abcd";

 Having some consistency involving 
 sequences/arrays/strings/collections etc, which are the 
 foundations of any language makes programming much easier, 
 intuitive and pleasant. It shouldn't be too difficult for a 
 super bare-metal language like D.
 I'm honestly not knocking D (I love it), just some thoughts 
 (maybe provoke some discussion?, for D3?)

Part of the issue is that "in" means to search by key, not by 
value. In that respect, it only makes sense for dictionaries (and 
perhaps sets). I guess the idea is that this generic code should 
be valid for any container which implements the in operator:

     if(a in r) writeln(r[a]);

On a different note, I do wish D had tuple literals like Python 
has.

Feb 15 2015

"bearophile" <bearophileHUGS lycos.com> writes:

Xinok:

 In that respect, it only makes sense for dictionaries

Values as in Python are much more useful for arrays :-)


Meta:

 D is better off than Python in this case,

I've never subscribed with point of view :-)

Bye,
bearophile

Feb 15 2015

"Meta" <jared771 gmail.com> writes:

On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:
 Python (built-in)

 dict1 = {"a":1,"b":2}
 tup1  = (0,1,2,3)

 str1  = "abcd"



O(1) lookup



O(n) lookup



O(n) lookup



O(n) lookup

This has been discussed to death here. D is better off than 
Python in this case, as it doesn't hide the fact that it's doing 
an O(n) operation sometimes and an O(1) operation other times.




No way to do this in D



arr1.countUntil(2)



str1.countUntil("c")


 There is some sort of consistency of use, though they are 
 different sequence/collection types.

The problem is mainly with Tuple, which is not a range (arrays 
and strings are).
You can, however, write a small utility function for this. Now 
that I think about it, it can probably be made O(1) as well.

Feb 15 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Sunday, 15 February 2015 at 18:15:13 UTC, Meta wrote:


 No way to do this in D

tup1.expand.only.countUntil(2).writeln;

Admittedly, it's a little longer than expected :)

Feb 15 2015

"bearophile" <bearophileHUGS lycos.com> writes:

John Colvin:

 tup1.expand.only.countUntil(2).writeln;

 Admittedly, it's a little longer than expected :)

A little shorter:

tup1[].only.countUntil(2).writeln;

Bye,
bearophile

Feb 15 2015

"Xinok" <xinok live.com> writes:

On Sunday, 15 February 2015 at 18:15:13 UTC, Meta wrote:
 On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:
 Python (built-in)

 dict1 = {"a":1,"b":2}
 tup1  = (0,1,2,3)

 str1  = "abcd"



 O(1) lookup

A small nitpick, but I'm sure that should be O(log n). 
Dictionaries don't have constant lookup time.

Feb 15 2015

"Meta" <jared771 gmail.com> writes:

On Sunday, 15 February 2015 at 18:20:10 UTC, Xinok wrote:
 On Sunday, 15 February 2015 at 18:15:13 UTC, Meta wrote:
 On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:
 Python (built-in)

 dict1 = {"a":1,"b":2}
 tup1  = (0,1,2,3)

 str1  = "abcd"



 O(1) lookup

 A small nitpick, but I'm sure that should be O(log n). 
 Dictionaries don't have constant lookup time.

Oh, whoops. I mixed up average-case complexity with worst-case. 
Although, isn't lookup O(n) in the worst case for hash tables?

Feb 15 2015

"bearophile" <bearophileHUGS lycos.com> writes:

Meta:

 Oh, whoops. I mixed up average-case complexity with worst-case. 
 Although, isn't lookup O(n) in the worst case for hash tables?

D associative arrays used to be O(1) amortized and O(n ln n) in 
worst case. Now they are O(1) amortized and O(n) worst case. And 
for an adversary it's not too much hard to find and hit that O(n) 
case.

Bye,
bearophile

Feb 15 2015

"Meta" <jared771 gmail.com> writes:

On Sunday, 15 February 2015 at 18:45:45 UTC, bearophile wrote:
 Meta:

 Oh, whoops. I mixed up average-case complexity with 
 worst-case. Although, isn't lookup O(n) in the worst case for 
 hash tables?

 D associative arrays used to be O(1) amortized and O(n ln n) in 
 worst case. Now they are O(1) amortized and O(n) worst case. 
 And for an adversary it's not too much hard to find and hit 
 that O(n) case.

 Bye,
 bearophile

Hmm, if that's the case, then the logic for not allowing "in" for 
arrays and other collections seems to be quite weak. However, 
what if the AA implementation is changed again so that lookup is 
worst-case O(n*logn)? Should we then again disable "in" for 
arrays, etc.?

Feb 15 2015

"Baz" <vv.temp gmx.com> writes:

On Sunday, 15 February 2015 at 19:04:57 UTC, Meta wrote:
 On Sunday, 15 February 2015 at 18:45:45 UTC, bearophile wrote:
 Meta:

 Oh, whoops. I mixed up average-case complexity with 
 worst-case. Although, isn't lookup O(n) in the worst case for 
 hash tables?

 D associative arrays used to be O(1) amortized and O(n ln n) 
 in worst case. Now they are O(1) amortized and O(n) worst 
 case. And for an adversary it's not too much hard to find and 
 hit that O(n) case.

 Bye,
 bearophile

 Hmm, if that's the case, then the logic for not allowing "in" 
 for arrays and other collections seems to be quite weak. 
 However, what if the AA implementation is changed again so that 
 lookup is worst-case O(n*logn)? Should we then again disable 
 "in" for arrays, etc.?

Where is the lang. scpec. saying that "in" must only be used if 
the complexity is O this O that ? (seriously show me the link...)

It's understandable that phobos respects this consensus, but 
please don't deactivate opIn_r for the users..."in" makes sense 
in many contexts.

Feb 15 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/15/15 10:45 AM, bearophile wrote:
 Meta:

 Oh, whoops. I mixed up average-case complexity with worst-case.
 Although, isn't lookup O(n) in the worst case for hash tables?

 D associative arrays used to be O(1) amortized and O(n ln n) in worst
 case. Now they are O(1) amortized and O(n) worst case. And for an
 adversary it's not too much hard to find and hit that O(n) case.

I remember associative arrays were a lot slower and bigger before. -- Andrei

Feb 15 2015

"Steve D" <whatchcallamit weedline.com> writes:

Well, it provoked a little discussion, and I remain unconvinced.

Why differentiate between 'in' for an associative-array and 'in' 
for an
unordered sequence/array? This implies that the programmer is an 
idiot who
believes that trawling through an unordered sequence is as fast 
as a dictionary
key lookup. Such programmers should not be coding, never went to 
school or are
probably one of those new kids who only fanny-about with 
javascript frameworks.

The point is that python's 'in' or 'index()' .. whatever, give 
the fastest
implementation for the underlying data type. (the same as canFind 
will probably
give the quickest result)
The coder can trust this, and then use the common idiom instead 
of wondering
(as a new D coder) -  is it an 'in'?, a 'canFind'? an 'indexOf'? 
a 'countUntil'?
Is it builtin? Is it in std.algorithm? What's the typical lookup 
for this seq?

The second point is that common idioms across datatypes, make for 
consistent,
simpler intuitive coding, whilst also trusting that the language 
implementors
have spent time optimizing the different underlying 
implementations.

Think about new D coders, or those coming from other languages or 
planets.

Apparently this has all been done to death so, yeah, like 
whatever..

Arguing for inconsistency means you are all retarded :)

Feb 15 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/15/15 12:29 PM, Steve D wrote:
 Well, it provoked a little discussion, and I remain unconvinced.

 Why differentiate between 'in' for an associative-array and 'in' for an
 unordered sequence/array? This implies that the programmer is an idiot who
 believes that trawling through an unordered sequence is as fast as a
 dictionary
 key lookup. Such programmers should not be coding, never went to school
 or are
 probably one of those new kids who only fanny-about with javascript
 frameworks.

 The point is that python's 'in' or 'index()' .. whatever, give the fastest
 implementation for the underlying data type. (the same as canFind will
 probably
 give the quickest result)
 The coder can trust this, and then use the common idiom instead of
 wondering
 (as a new D coder) -  is it an 'in'?, a 'canFind'? an 'indexOf'? a
 'countUntil'?
 Is it builtin? Is it in std.algorithm? What's the typical lookup for
 this seq?

 The second point is that common idioms across datatypes, make for
 consistent,
 simpler intuitive coding, whilst also trusting that the language
 implementors
 have spent time optimizing the different underlying implementations.

 Think about new D coders, or those coming from other languages or planets.

 Apparently this has all been done to death so, yeah, like whatever..

 Arguing for inconsistency means you are all retarded :)

We're good as we are. D's standard library goes for a proportional 
response in syntax space to the cost of search. This is STL's influence, 
where this approach has worked better than "best effort under uniform 
syntax" (e.g. std::distance()).

std.container follows the same convention, see e.g. APIs with "linear" 
in their name.

This is not enforced or prescriptive. User code is free to choose 
different conventions.

The one possible change would be to allow "in" with statically-sized 
arrays, under the assumption that the amount of data searched is fixed 
and proportional to the size of the program.


Andrei

Feb 15 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Sunday, 15 February 2015 at 21:23:13 UTC, Andrei Alexandrescu 
wrote:
 The one possible change would be to allow "in" with 
 statically-sized arrays, under the assumption that the amount 
 of data searched is fixed and proportional to the size of the 
 program.

To be really consistent,
     x in arr
would need to be equivalent to:
     (x >= 0) && (x < arr.length)

`in` tests for the presence of a _key_ in AAs, and the equivalent 
notion of a key for arrays is an index.

Feb 16 2015

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Marc Schütz" " wrote in message 
news:iftpzvhoyxxqhbfsxull forum.dlang.org...

 To be really consistent,
      x in arr
 would need to be equivalent to:
      (x >= 0) && (x < arr.length)

 `in` tests for the presence of a _key_ in AAs, and the equivalent notion 
 of a key for arrays is an index.

It's called 'in', not 'haskey'.  Is 3 in the array?  Is 7 in the map? 
Everybody understands what it means and the whole argument is nonsense. 
Next somebody will be arguing that float/float and int/int aren't the same 
operation and should have different syntax.

Feb 16 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Monday, 16 February 2015 at 13:10:44 UTC, Daniel Murphy wrote:
 "Marc Schütz" " wrote in message 
 news:iftpzvhoyxxqhbfsxull forum.dlang.org...

 To be really consistent,
     x in arr
 would need to be equivalent to:
     (x >= 0) && (x < arr.length)

 `in` tests for the presence of a _key_ in AAs, and the 
 equivalent notion of a key for arrays is an index.

 It's called 'in', not 'haskey'.  Is 3 in the array?  Is 7 in 
 the map? Everybody understands what it means and the whole 
 argument is nonsense. Next somebody will be arguing that 
 float/float and int/int aren't the same operation and should 
 have different syntax.

I'm quite a fan of python's // operator for integer division, 
especially when combined with python 3's choice to make / always 
mean floating point division (i.e. 3/2 == float(3)/2, 3//2 == 1). 
It recognises that integer division is a weird thing and 
separates it from the much less weird floating point division.

Feb 16 2015

"bearophile" <bearophileHUGS lycos.com> writes:

John Colvin:

 I'm quite a fan of python's // operator for integer division, 
 especially when combined with python 3's choice to make / 
 always mean floating point division (i.e. 3/2 == float(3)/2, 
 3//2 == 1). It recognises that integer division is a weird 
 thing and separates it from the much less weird floating point 
 division.

The C/D division operator semantics is a bug-prone design 
mistake, and this design was fixed in Python3 despite this has 
broken lot of Python2 code. It's a pitfall that a modern language 
must avoid (I don't know how Rust handles divisions, I hope they 
have removed this problem).

Bye,
bearophile

Feb 16 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/16/15 5:10 AM, Daniel Murphy wrote:
 "Marc Schütz" " wrote in message
 news:iftpzvhoyxxqhbfsxull forum.dlang.org...

 To be really consistent,
      x in arr
 would need to be equivalent to:
      (x >= 0) && (x < arr.length)

 `in` tests for the presence of a _key_ in AAs, and the equivalent
 notion of a key for arrays is an index.

 It's called 'in', not 'haskey'.  Is 3 in the array?  Is 7 in the map?
 Everybody understands what it means and the whole argument is nonsense.

Yah, I agree. FWIW many times "consistency" is used, well, 
inconsistently. In a consistency argument the choice of benchmark (= 
what's the standard that new stuff must be consistent with?) is crucial. 
Oftentimes in entities as complex as programming languages, it's easy to 
make consistency arguments for numerous artifacts depending on the 
choice of benchmark. This is painfully evident in C++ - 
consistency-based arguments got diluted quite a bit since it became 
obvious they're so easy to make in favor of anything.

 Next somebody will be arguing that float/float and int/int aren't the
 same operation and should have different syntax.

I guess you just got taken downtown over that :o).


Andrei

Feb 16 2015

"Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:

On Monday, 16 February 2015 at 13:10:44 UTC, Daniel Murphy wrote:
 "Marc Schütz" " wrote in message 
 news:iftpzvhoyxxqhbfsxull forum.dlang.org...

 To be really consistent,
     x in arr
 would need to be equivalent to:
     (x >= 0) && (x < arr.length)

 `in` tests for the presence of a _key_ in AAs, and the 
 equivalent notion of a key for arrays is an index.

 It's called 'in', not 'haskey'.  Is 3 in the array?  Is 7 in 
 the map? Everybody understands what it means and the whole 
 argument is nonsense.

I disagree with that; the difference between keys and values is 
pretty obvious to me, as is the analogy with indices (both go 
inside the []). But the entire discussion is pointless, because 
`in` will not be changed (and for good reasons!).

Feb 17 2015

"Craig Dillabaugh" <craig.dillabaugh gmail.com> writes:

On Sunday, 15 February 2015 at 18:45:45 UTC, bearophile wrote:
 Meta:

 Oh, whoops. I mixed up average-case complexity with 
 worst-case. Although, isn't lookup O(n) in the worst case for 
 hash tables?

 D associative arrays used to be O(1) amortized and O(n ln n) in 
 worst case. Now they are O(1) amortized and O(n) worst case. 
 And for an adversary it's not too much hard to find and hit 
 that O(n) case.

 Bye,
 bearophile

How is it possible to have a search structure that takes O(n ln n)
time ... you would have to go out of your way to design something
particularly bad.

Feb 15 2015

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"bearophile"  wrote in message news:jufdlgyynxiwbqubbbkx forum.dlang.org...

 D associative arrays used to be O(1) amortized and O(n ln n) in worst 
 case.

No, they were still O(n) worst case, for a single bucket with a degenerate 
binary tree.

Feb 15 2015

"Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> writes:

On Monday, 16 February 2015 at 06:23:06 UTC, Daniel Murphy wrote:
 "bearophile"  wrote in message 
 news:jufdlgyynxiwbqubbbkx forum.dlang.org...

 D associative arrays used to be O(1) amortized and O(n ln n) 
 in worst case.

 No, they were still O(n) worst case, for a single bucket with a 
 degenerate binary tree.

Java 7 changed its HashMap implementation to use TreeMap (red 
black search tree) instead of LinkedList for its buckets, if the 
key can be sorted. That puts the worst case lookup time from O(n) 
to O(log n) for sortable keys. Maybe that's worth considering for 
AAs?

Feb 16 2015

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"Casper Færgemand" " wrote in message 
news:aenkruurfmfanarkeicm forum.dlang.org...

 Java 7 changed its HashMap implementation to use TreeMap (red black search 
 tree) instead of LinkedList for its buckets, if the key can be sorted. 
 That puts the worst case lookup time from O(n) to O(log n) for sortable 
 keys. Maybe that's worth considering for AAs?

I don't think that would pay off for most uses of builtin AAs.  One day 
we'll probably get a fully customizable hash table in phobos that can do 
this kind of stuff.

Feb 16 2015

"bearophile" <bearophileHUGS lycos.com> writes:

Daniel Murphy:

 No, they were still O(n) worst case, for a single bucket with a 
 degenerate binary tree.

I see. I was unable to hit this degenerate case in my testing 
code, but I guess that was possible. Thank you.

Bye,
bearophile

Feb 16 2015

FG <home fgda.pl> writes:

 No, they were still O(n) worst case, for a single bucket with a degenerate
binary tree.


What binary tree? I don't see any.

 I see. I was unable to hit this degenerate case in my testing code, but I
guess that was possible. Thank you.

 From what I can see in druntime/rt/aaA.d, the bucket is chosen by dividing the
key hash modulo the number of buckets. The number of buckets is a prime number
(x in [31, 97, 389, 1543, ...]), so it is very unlikely to write a hash
function that would give different results but the same (i % x) value. The most
probable cause would therefore be a hash function implementation that gives the
exact same hash for a prepared set of keys. Then you'd get the worse case
scenario in which you have to walk a list and compare keys one-by-one in O(n).

But that's not a problem with the associative array implementation. It's a bad
hash problem.

Feb 16 2015

"Daniel Murphy" <yebbliesnospam gmail.com> writes:

"FG"  wrote in message news:mbsk0i$1ukb$1 digitalmars.com...

 No, they were still O(n) worst case, for a single bucket with a 
 degenerate binary > tree.


 What binary tree? I don't see any.

There isn't one any more, we're talking about the previous implementation.

Feb 16 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 2/16/15 1:23 AM, Daniel Murphy wrote:
 "bearophile"  wrote in message news:jufdlgyynxiwbqubbbkx forum.dlang.org...

 D associative arrays used to be O(1) amortized and O(n ln n) in worst
 case.

 No, they were still O(n) worst case, for a single bucket with a
 degenerate binary tree.

IIRC, the AA code tried to make balanced trees, at least on rehash, but 
I thought it did so proactively on insertion too.

But in any case, the reason AA's are so slow is because the load factor 
is 4. As in, the AA will have 4x as many elements as buckets before it 
increases buckets. I've brought it up before. Typically you want to keep 
the number of elements per bucket near 1 for O(1) lookup.

-Steve

Feb 16 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Consistency