www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Consistency

reply "Steve D" <whatchcallamit weedline.com> writes:
Python (built-in)

dict1 = {"a":1,"b":2}
tup1  = (0,1,2,3)
arr1  = [0,1,2,3]  # list type
str1  = "abcd"

print "b" in dict1    # True
print 3 in tup1       # True
print 3 in arr1       # True
print "c" in str1     # True

print tup1.index(2)   # 2
print arr1.index(2)   # 2
print str1.index("c") # 2

There is some sort of consistency of use, though they are 
different sequence/collection types.

Now try the same in D

auto dict1 = ["a":1,"b":2];
auto tup1  = tuple(0,1,2,3);
auto arr1  = [0,1,2,3];
auto str1  = "abcd";

Having some consistency involving 
sequences/arrays/strings/collections etc, which are the 
foundations of any language makes programming much easier, 
intuitive and pleasant. It shouldn't be too difficult for a super 
bare-metal language like D.
I'm honestly not knocking D (I love it), just some thoughts 
(maybe provoke some discussion?, for D3?)
Feb 15 2015
next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:
 Having some consistency involving 
 sequences/arrays/strings/collections etc, which are the 
 foundations of any language makes programming much easier, 
 intuitive and pleasant.
And also more easily wrong: 1 in [1,2,3] is a slow search, so the D strategy is to make it look different to draw your attention to the potential problem.
Feb 15 2015
prev sibling next sibling parent reply "Xinok" <xinok live.com> writes:
On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:
 Python (built-in)

 dict1 = {"a":1,"b":2}
 tup1  = (0,1,2,3)
 arr1  = [0,1,2,3]  # list type
 str1  = "abcd"

 print "b" in dict1    # True
 print 3 in tup1       # True
 print 3 in arr1       # True
 print "c" in str1     # True

 print tup1.index(2)   # 2
 print arr1.index(2)   # 2
 print str1.index("c") # 2

 There is some sort of consistency of use, though they are 
 different sequence/collection types.

 Now try the same in D

 auto dict1 = ["a":1,"b":2];
 auto tup1  = tuple(0,1,2,3);
 auto arr1  = [0,1,2,3];
 auto str1  = "abcd";

 Having some consistency involving 
 sequences/arrays/strings/collections etc, which are the 
 foundations of any language makes programming much easier, 
 intuitive and pleasant. It shouldn't be too difficult for a 
 super bare-metal language like D.
 I'm honestly not knocking D (I love it), just some thoughts 
 (maybe provoke some discussion?, for D3?)
Part of the issue is that "in" means to search by key, not by value. In that respect, it only makes sense for dictionaries (and perhaps sets). I guess the idea is that this generic code should be valid for any container which implements the in operator: if(a in r) writeln(r[a]); On a different note, I do wish D had tuple literals like Python has.
Feb 15 2015
parent "bearophile" <bearophileHUGS lycos.com> writes:
Xinok:

 In that respect, it only makes sense for dictionaries
Values as in Python are much more useful for arrays :-) Meta:
 D is better off than Python in this case,
I've never subscribed with point of view :-) Bye, bearophile
Feb 15 2015
prev sibling parent reply "Meta" <jared771 gmail.com> writes:
On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:
 Python (built-in)

 dict1 = {"a":1,"b":2}
 tup1  = (0,1,2,3)
 arr1  = [0,1,2,3]  # list type
 str1  = "abcd"

 print "b" in dict1    # True
O(1) lookup
 print 3 in tup1       # True
O(n) lookup
 print 3 in arr1       # True
O(n) lookup
 print "c" in str1     # True
O(n) lookup This has been discussed to death here. D is better off than Python in this case, as it doesn't hide the fact that it's doing an O(n) operation sometimes and an O(1) operation other times.
 print tup1.index(2)   # 2
No way to do this in D
 print arr1.index(2)   # 2
arr1.countUntil(2)
 print str1.index("c") # 2
str1.countUntil("c")
 There is some sort of consistency of use, though they are 
 different sequence/collection types.
The problem is mainly with Tuple, which is not a range (arrays and strings are). You can, however, write a small utility function for this. Now that I think about it, it can probably be made O(1) as well.
Feb 15 2015
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Sunday, 15 February 2015 at 18:15:13 UTC, Meta wrote:
 print tup1.index(2)   # 2
No way to do this in D
tup1.expand.only.countUntil(2).writeln; Admittedly, it's a little longer than expected :)
Feb 15 2015
parent "bearophile" <bearophileHUGS lycos.com> writes:
John Colvin:

 tup1.expand.only.countUntil(2).writeln;

 Admittedly, it's a little longer than expected :)
A little shorter: tup1[].only.countUntil(2).writeln; Bye, bearophile
Feb 15 2015
prev sibling parent reply "Xinok" <xinok live.com> writes:
On Sunday, 15 February 2015 at 18:15:13 UTC, Meta wrote:
 On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:
 Python (built-in)

 dict1 = {"a":1,"b":2}
 tup1  = (0,1,2,3)
 arr1  = [0,1,2,3]  # list type
 str1  = "abcd"

 print "b" in dict1    # True
O(1) lookup
A small nitpick, but I'm sure that should be O(log n). Dictionaries don't have constant lookup time.
Feb 15 2015
parent reply "Meta" <jared771 gmail.com> writes:
On Sunday, 15 February 2015 at 18:20:10 UTC, Xinok wrote:
 On Sunday, 15 February 2015 at 18:15:13 UTC, Meta wrote:
 On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:
 Python (built-in)

 dict1 = {"a":1,"b":2}
 tup1  = (0,1,2,3)
 arr1  = [0,1,2,3]  # list type
 str1  = "abcd"

 print "b" in dict1    # True
O(1) lookup
A small nitpick, but I'm sure that should be O(log n). Dictionaries don't have constant lookup time.
Oh, whoops. I mixed up average-case complexity with worst-case. Although, isn't lookup O(n) in the worst case for hash tables?
Feb 15 2015
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Meta:

 Oh, whoops. I mixed up average-case complexity with worst-case. 
 Although, isn't lookup O(n) in the worst case for hash tables?
D associative arrays used to be O(1) amortized and O(n ln n) in worst case. Now they are O(1) amortized and O(n) worst case. And for an adversary it's not too much hard to find and hit that O(n) case. Bye, bearophile
Feb 15 2015
next sibling parent reply "Meta" <jared771 gmail.com> writes:
On Sunday, 15 February 2015 at 18:45:45 UTC, bearophile wrote:
 Meta:

 Oh, whoops. I mixed up average-case complexity with 
 worst-case. Although, isn't lookup O(n) in the worst case for 
 hash tables?
D associative arrays used to be O(1) amortized and O(n ln n) in worst case. Now they are O(1) amortized and O(n) worst case. And for an adversary it's not too much hard to find and hit that O(n) case. Bye, bearophile
Hmm, if that's the case, then the logic for not allowing "in" for arrays and other collections seems to be quite weak. However, what if the AA implementation is changed again so that lookup is worst-case O(n*logn)? Should we then again disable "in" for arrays, etc.?
Feb 15 2015
parent "Baz" <vv.temp gmx.com> writes:
On Sunday, 15 February 2015 at 19:04:57 UTC, Meta wrote:
 On Sunday, 15 February 2015 at 18:45:45 UTC, bearophile wrote:
 Meta:

 Oh, whoops. I mixed up average-case complexity with 
 worst-case. Although, isn't lookup O(n) in the worst case for 
 hash tables?
D associative arrays used to be O(1) amortized and O(n ln n) in worst case. Now they are O(1) amortized and O(n) worst case. And for an adversary it's not too much hard to find and hit that O(n) case. Bye, bearophile
Hmm, if that's the case, then the logic for not allowing "in" for arrays and other collections seems to be quite weak. However, what if the AA implementation is changed again so that lookup is worst-case O(n*logn)? Should we then again disable "in" for arrays, etc.?
Where is the lang. scpec. saying that "in" must only be used if the complexity is O this O that ? (seriously show me the link...) It's understandable that phobos respects this consensus, but please don't deactivate opIn_r for the users..."in" makes sense in many contexts.
Feb 15 2015
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/15/15 10:45 AM, bearophile wrote:
 Meta:

 Oh, whoops. I mixed up average-case complexity with worst-case.
 Although, isn't lookup O(n) in the worst case for hash tables?
D associative arrays used to be O(1) amortized and O(n ln n) in worst case. Now they are O(1) amortized and O(n) worst case. And for an adversary it's not too much hard to find and hit that O(n) case.
I remember associative arrays were a lot slower and bigger before. -- Andrei
Feb 15 2015
parent reply "Steve D" <whatchcallamit weedline.com> writes:
Well, it provoked a little discussion, and I remain unconvinced.

Why differentiate between 'in' for an associative-array and 'in' 
for an
unordered sequence/array? This implies that the programmer is an 
idiot who
believes that trawling through an unordered sequence is as fast 
as a dictionary
key lookup. Such programmers should not be coding, never went to 
school or are
probably one of those new kids who only fanny-about with 
javascript frameworks.

The point is that python's 'in' or 'index()' .. whatever, give 
the fastest
implementation for the underlying data type. (the same as canFind 
will probably
give the quickest result)
The coder can trust this, and then use the common idiom instead 
of wondering
(as a new D coder) -  is it an 'in'?, a 'canFind'? an 'indexOf'? 
a 'countUntil'?
Is it builtin? Is it in std.algorithm? What's the typical lookup 
for this seq?

The second point is that common idioms across datatypes, make for 
consistent,
simpler intuitive coding, whilst also trusting that the language 
implementors
have spent time optimizing the different underlying 
implementations.

Think about new D coders, or those coming from other languages or 
planets.

Apparently this has all been done to death so, yeah, like 
whatever..

Arguing for inconsistency means you are all retarded :)
Feb 15 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/15/15 12:29 PM, Steve D wrote:
 Well, it provoked a little discussion, and I remain unconvinced.

 Why differentiate between 'in' for an associative-array and 'in' for an
 unordered sequence/array? This implies that the programmer is an idiot who
 believes that trawling through an unordered sequence is as fast as a
 dictionary
 key lookup. Such programmers should not be coding, never went to school
 or are
 probably one of those new kids who only fanny-about with javascript
 frameworks.

 The point is that python's 'in' or 'index()' .. whatever, give the fastest
 implementation for the underlying data type. (the same as canFind will
 probably
 give the quickest result)
 The coder can trust this, and then use the common idiom instead of
 wondering
 (as a new D coder) -  is it an 'in'?, a 'canFind'? an 'indexOf'? a
 'countUntil'?
 Is it builtin? Is it in std.algorithm? What's the typical lookup for
 this seq?

 The second point is that common idioms across datatypes, make for
 consistent,
 simpler intuitive coding, whilst also trusting that the language
 implementors
 have spent time optimizing the different underlying implementations.

 Think about new D coders, or those coming from other languages or planets.

 Apparently this has all been done to death so, yeah, like whatever..

 Arguing for inconsistency means you are all retarded :)
We're good as we are. D's standard library goes for a proportional response in syntax space to the cost of search. This is STL's influence, where this approach has worked better than "best effort under uniform syntax" (e.g. std::distance()). std.container follows the same convention, see e.g. APIs with "linear" in their name. This is not enforced or prescriptive. User code is free to choose different conventions. The one possible change would be to allow "in" with statically-sized arrays, under the assumption that the amount of data searched is fixed and proportional to the size of the program. Andrei
Feb 15 2015
parent reply "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Sunday, 15 February 2015 at 21:23:13 UTC, Andrei Alexandrescu 
wrote:
 The one possible change would be to allow "in" with 
 statically-sized arrays, under the assumption that the amount 
 of data searched is fixed and proportional to the size of the 
 program.
To be really consistent, x in arr would need to be equivalent to: (x >= 0) && (x < arr.length) `in` tests for the presence of a _key_ in AAs, and the equivalent notion of a key for arrays is an index.
Feb 16 2015
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Marc Schütz" " wrote in message 
news:iftpzvhoyxxqhbfsxull forum.dlang.org...

 To be really consistent,
      x in arr
 would need to be equivalent to:
      (x >= 0) && (x < arr.length)

 `in` tests for the presence of a _key_ in AAs, and the equivalent notion 
 of a key for arrays is an index.
It's called 'in', not 'haskey'. Is 3 in the array? Is 7 in the map? Everybody understands what it means and the whole argument is nonsense. Next somebody will be arguing that float/float and int/int aren't the same operation and should have different syntax.
Feb 16 2015
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 16 February 2015 at 13:10:44 UTC, Daniel Murphy wrote:
 "Marc Schütz" " wrote in message 
 news:iftpzvhoyxxqhbfsxull forum.dlang.org...

 To be really consistent,
     x in arr
 would need to be equivalent to:
     (x >= 0) && (x < arr.length)

 `in` tests for the presence of a _key_ in AAs, and the 
 equivalent notion of a key for arrays is an index.
It's called 'in', not 'haskey'. Is 3 in the array? Is 7 in the map? Everybody understands what it means and the whole argument is nonsense. Next somebody will be arguing that float/float and int/int aren't the same operation and should have different syntax.
I'm quite a fan of python's // operator for integer division, especially when combined with python 3's choice to make / always mean floating point division (i.e. 3/2 == float(3)/2, 3//2 == 1). It recognises that integer division is a weird thing and separates it from the much less weird floating point division.
Feb 16 2015
parent "bearophile" <bearophileHUGS lycos.com> writes:
John Colvin:

 I'm quite a fan of python's // operator for integer division, 
 especially when combined with python 3's choice to make / 
 always mean floating point division (i.e. 3/2 == float(3)/2, 
 3//2 == 1). It recognises that integer division is a weird 
 thing and separates it from the much less weird floating point 
 division.
The C/D division operator semantics is a bug-prone design mistake, and this design was fixed in Python3 despite this has broken lot of Python2 code. It's a pitfall that a modern language must avoid (I don't know how Rust handles divisions, I hope they have removed this problem). Bye, bearophile
Feb 16 2015
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/16/15 5:10 AM, Daniel Murphy wrote:
 "Marc Schütz" " wrote in message
 news:iftpzvhoyxxqhbfsxull forum.dlang.org...

 To be really consistent,
      x in arr
 would need to be equivalent to:
      (x >= 0) && (x < arr.length)

 `in` tests for the presence of a _key_ in AAs, and the equivalent
 notion of a key for arrays is an index.
It's called 'in', not 'haskey'. Is 3 in the array? Is 7 in the map? Everybody understands what it means and the whole argument is nonsense.
Yah, I agree. FWIW many times "consistency" is used, well, inconsistently. In a consistency argument the choice of benchmark (= what's the standard that new stuff must be consistent with?) is crucial. Oftentimes in entities as complex as programming languages, it's easy to make consistency arguments for numerous artifacts depending on the choice of benchmark. This is painfully evident in C++ - consistency-based arguments got diluted quite a bit since it became obvious they're so easy to make in favor of anything.
 Next somebody will be arguing that float/float and int/int aren't the
 same operation and should have different syntax.
I guess you just got taken downtown over that :o). Andrei
Feb 16 2015
prev sibling parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Monday, 16 February 2015 at 13:10:44 UTC, Daniel Murphy wrote:
 "Marc Schütz" " wrote in message 
 news:iftpzvhoyxxqhbfsxull forum.dlang.org...

 To be really consistent,
     x in arr
 would need to be equivalent to:
     (x >= 0) && (x < arr.length)

 `in` tests for the presence of a _key_ in AAs, and the 
 equivalent notion of a key for arrays is an index.
It's called 'in', not 'haskey'. Is 3 in the array? Is 7 in the map? Everybody understands what it means and the whole argument is nonsense.
I disagree with that; the difference between keys and values is pretty obvious to me, as is the analogy with indices (both go inside the []). But the entire discussion is pointless, because `in` will not be changed (and for good reasons!).
Feb 17 2015
prev sibling next sibling parent "Craig Dillabaugh" <craig.dillabaugh gmail.com> writes:
On Sunday, 15 February 2015 at 18:45:45 UTC, bearophile wrote:
 Meta:

 Oh, whoops. I mixed up average-case complexity with 
 worst-case. Although, isn't lookup O(n) in the worst case for 
 hash tables?
D associative arrays used to be O(1) amortized and O(n ln n) in worst case. Now they are O(1) amortized and O(n) worst case. And for an adversary it's not too much hard to find and hit that O(n) case. Bye, bearophile
How is it possible to have a search structure that takes O(n ln n) time ... you would have to go out of your way to design something particularly bad.
Feb 15 2015
prev sibling parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"bearophile"  wrote in message news:jufdlgyynxiwbqubbbkx forum.dlang.org...

 D associative arrays used to be O(1) amortized and O(n ln n) in worst 
 case.
No, they were still O(n) worst case, for a single bucket with a degenerate binary tree.
Feb 15 2015
next sibling parent reply "Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> writes:
On Monday, 16 February 2015 at 06:23:06 UTC, Daniel Murphy wrote:
 "bearophile"  wrote in message 
 news:jufdlgyynxiwbqubbbkx forum.dlang.org...

 D associative arrays used to be O(1) amortized and O(n ln n) 
 in worst case.
No, they were still O(n) worst case, for a single bucket with a degenerate binary tree.
Java 7 changed its HashMap implementation to use TreeMap (red black search tree) instead of LinkedList for its buckets, if the key can be sorted. That puts the worst case lookup time from O(n) to O(log n) for sortable keys. Maybe that's worth considering for AAs?
Feb 16 2015
parent "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Casper Færgemand" " wrote in message 
news:aenkruurfmfanarkeicm forum.dlang.org...

 Java 7 changed its HashMap implementation to use TreeMap (red black search 
 tree) instead of LinkedList for its buckets, if the key can be sorted. 
 That puts the worst case lookup time from O(n) to O(log n) for sortable 
 keys. Maybe that's worth considering for AAs?
I don't think that would pay off for most uses of builtin AAs. One day we'll probably get a fully customizable hash table in phobos that can do this kind of stuff.
Feb 16 2015
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Daniel Murphy:

 No, they were still O(n) worst case, for a single bucket with a 
 degenerate binary tree.
I see. I was unable to hit this degenerate case in my testing code, but I guess that was possible. Thank you. Bye, bearophile
Feb 16 2015
parent reply FG <home fgda.pl> writes:
 No, they were still O(n) worst case, for a single bucket with a degenerate
binary tree.
What binary tree? I don't see any.
 I see. I was unable to hit this degenerate case in my testing code, but I
guess that was possible. Thank you.
From what I can see in druntime/rt/aaA.d, the bucket is chosen by dividing the key hash modulo the number of buckets. The number of buckets is a prime number (x in [31, 97, 389, 1543, ...]), so it is very unlikely to write a hash function that would give different results but the same (i % x) value. The most probable cause would therefore be a hash function implementation that gives the exact same hash for a prepared set of keys. Then you'd get the worse case scenario in which you have to walk a list and compare keys one-by-one in O(n). But that's not a problem with the associative array implementation. It's a bad hash problem.
Feb 16 2015
parent "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"FG"  wrote in message news:mbsk0i$1ukb$1 digitalmars.com...

 No, they were still O(n) worst case, for a single bucket with a 
 degenerate binary > tree.
What binary tree? I don't see any.
There isn't one any more, we're talking about the previous implementation.
Feb 16 2015
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/16/15 1:23 AM, Daniel Murphy wrote:
 "bearophile"  wrote in message news:jufdlgyynxiwbqubbbkx forum.dlang.org...

 D associative arrays used to be O(1) amortized and O(n ln n) in worst
 case.
No, they were still O(n) worst case, for a single bucket with a degenerate binary tree.
IIRC, the AA code tried to make balanced trees, at least on rehash, but I thought it did so proactively on insertion too. But in any case, the reason AA's are so slow is because the load factor is 4. As in, the AA will have 4x as many elements as buckets before it increases buckets. I've brought it up before. Typically you want to keep the number of elements per bucket near 1 for O(1) lookup. -Steve
Feb 16 2015