digitalmars.D - Suggestion: Operator `in` for slices

Quirin Schroll (35/35) Dec 18 2021 The operator `in` between `T` and `T[]` is requested a lot of

Adam Ruppe (8/10) Dec 18 2021 Nah, the general rule against it is due to speed/computational

Quirin Schroll (6/13) Dec 18 2021 How is that even a criterion? `foreach` on arrays has O(n)

Dukc (14/23) Dec 19 2021 `foreach` is not supposed to give that guarantee. `in` is. Just

Dennis (36/38) Dec 18 2021 Who do people keep repeating that?

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/6) Dec 18 2021 Also, if the array is sorted it would be O(log N). You can
Dukc (11/22) Dec 19 2021 These are bugs. The O(log n) assumption still makes sense when

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (21/25) Dec 19 2021 I know this is a common abuse of notation in C++ circles, but

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (14/17) Dec 19 2021 By this I mean that using "O(…)" alone means worst case.

Steven Schveighoffer (9/14) Dec 19 2021 D AAs do not use linked lists any more. It uses open addressing. But

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/10) Dec 19 2021 Yes, if you don't shrink.

Steven Schveighoffer (3/15) Dec 21 2021 Please see existing research, this is not a new concept.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/21) Dec 21 2021 What do you mean by research, this is a trivial topic. In order

Paul Backus (8/18) Dec 22 2021 To guarantee amortized O(1) deletion, it is sufficient to ensure

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (62/64) Dec 29 2021 I haven't looked at the implementation of AAs or the allocator

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/15) Dec 29 2021 To put numbers on this, when deleting a long series of keys the

Paul Backus (3/19) Dec 29 2021 Seems like an issue worth filing a bugzilla report about.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/4) Jan 02 2022 Probably, I don't use AAs, but I can file an issue if nobody else
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/4) Jan 07 2022 I am not able to register for issues.dlang.org, does not not

bauss (3/8) Jan 07 2022 Is that real? That gmail accounts don't work? That's so odd.
Mike Parker (3/8) Jan 07 2022 I registered with a gmail account. What error do you see?

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/16) Jan 07 2022 «The e-mail address you entered

=?UTF-8?Q?Ali_=c3=87ehreli?= (3/6) Dec 19 2021 That.
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/9) Dec 19 2021 But your proposal would kinda work like in Python?

Dennis (5/10) Dec 20 2021 In my proposal `'a' in ['a', 'b']` will return `null` because it

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/8) Dec 20 2021 Oh yeah, I'm sorry, you looked at the index as a key. I'm not

Steven Schveighoffer (27/53) Dec 19 2021 Well, yeah. But that doesn't mean it happens often or even regularly.

Dennis (11/14) Dec 20 2021 For the record, I'm not arguing in favor of `in` for slices. I'm

Araq (5/20) Dec 20 2021 And how come it cannot store the result of the "in" operation in

Biotronic (15/27) Dec 20 2021 bool isSubset(T1, T2)(T1 needles, T2 haystack) {

Dennis (2/3) Dec 20 2021 Thanks, that's an illuminating example.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/17) Dec 20 2021 Yes, I agree with this. When I use "in" in Python I am very much

Paul Backus (8/26) Dec 20 2021 ```d

Stanislav Blinov (9/24) Dec 20 2021 TBH, for that particular case, this looks nicer, and IMO
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/10) Dec 20 2021 You can always create a library construct, but then you should

Paul Backus (8/19) Dec 20 2021 There are a lot of features one could potentially add to D to
Nick Treleaven (9/12) Dec 21 2021 No, `canFind` and `among` search for a given value in a range.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/8) Dec 22 2021 What it is conceptually, depends on how you conceptualize it. The

Nick Treleaven (9/12) Jan 01 2022 `in` is defined for RedBlackTree, but it returns bool:

Elronnd (3/10) Jan 01 2022 A redblacktree is not a sequence, though; its elements are not

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/9) Jan 02 2022 Right, "in" is a set operator. You can implement a set as a dict

Steven Schveighoffer (7/14) Dec 21 2021 One thing not really being discussed is that there is a difference

Era Scarecrow (10/16) Jan 02 2022 Then maybe we in should be implemented; Have it check if you

Nick Treleaven (7/15) Jan 04 2022 It would make sense to implement `in` for SortedRange, but there

Era Scarecrow (14/18) Jan 05 2022 Sorry, i was somehow thinking there was a key/value pair in this

Nick Treleaven (7/13) Jan 07 2022 Ok, you're right. The search key then wouldn't need to contain

Biotronic (12/15) Dec 19 2021 It's in

Quirin Schroll <qs.il.paperinik gmail.com> writes:

The operator `in` between `T` and `T[]` is requested a lot of 
times. Most people suggest it should return a `bool` signifying 
if some value of the array on the right-hand side is equal (`==`) 
to the `T` value on the left-hand side. This is the obvious thing 
to do.

I guess it has never been implemented because this information 
(true/false) is so much less than it could be. If D used 1-based 
indexing, `in` could return the index if found or 0 if not found. 
Because 0 == false, one could use `if (auto x in array)`. 
However, D's indexing 0-based, so 0 is a valid index. `in` cannot 
return a plain `size_t`.

My idea is inspired by how AAs `in` operator does not return a 
(near-useless) `bool`, but a value that can be used like a 
`bool`, but provides much more valuable information: a `T*`. It 
would work returning a `T*` for slices' `in` as well, but it is 
much less useful than the index. (One could get the index of 
something from the pointer to it using pointer subtraction, but 
pointer arithmetic not considered ` safe`.) The index practically 
*is* the pointer.

So, can we return an object that is, for almost all intents and 
purposes, usable as an index, but evaluating it via `if` returns 
`true` or `false` depending on whether the value was found or not.

So, my suggestion is that the `in` operator for slices returns a 
value of this type:
```D
struct InResult
{
     size_t index = size_t.max;
     alias index this;
     bool opCast(T : bool)() const { return index != size_t.max; }
}
```

In fact, this is very much a simple optional type. It takes a 
very unlikely valid value of its underlying type and reinterprets 
it as the invalid value.

Dec 18 2021

Adam Ruppe <destructionator gmail.com> writes:

On Saturday, 18 December 2021 at 18:29:56 UTC, Quirin Schroll 
wrote:
 I guess it has never been implemented because this information 
 (true/false) is so much less than it could be.

Nah, the general rule against it is due to speed/computational 
complexity. The `in` operator on AAs has a speed of O(log n) or 
O(1). `in` on slices would exceed that being O(n).

I'm not sure the speed rule is written or not, but generally `in` 
is supposed to have the same speed requirement as `[n]` which is 
sub-linear.

Dec 18 2021

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:
 On Saturday, 18 December 2021 at 18:29:56 UTC, Quirin Schroll 
 wrote:
 I guess it has never been implemented because this information 
 (true/false) is so much less than it could be.

 Nah, the general rule against it is due to speed/computational 
 complexity. The `in` operator on AAs has a speed of O(log n) or 
 O(1). `in` on slices would exceed that being O(n).

How is that even a criterion? `foreach` on arrays has O(n) 
complexity as well. You cannot say it's okay for `foreach` 
because it obviously has to be that way. The same can be said for 
linear search. It doesn't take a genius that searching an 
unordered structure cannot be better than O(n) in the worst case.

Dec 18 2021

Dukc <ajieskola gmail.com> writes:

On Saturday, 18 December 2021 at 20:17:51 UTC, Quirin Schroll 
wrote:
 Nah, the general rule against it is due to speed/computational 
 complexity. The `in` operator on AAs has a speed of O(log n) 
 or O(1). `in` on slices would exceed that being O(n).

 How is that even a criterion? `foreach` on arrays has O(n) 
 complexity as well. You cannot say it's okay for `foreach` 
 because it obviously has to be that way.

`foreach` is not supposed to give that guarantee. `in` is. Just 
like the indexing operator (`range[index]`). That does not say 
you can't define a function that checks for an element in O(n) 
time, but it should be called something else than `in` then.

It's the same reason as why we don't define `range[index]` as 
`range.save.dropExactly(index).front` for all forward ranges that 
do not explicitly declare `opIndex`. It's not that you can't 
index a range like that, but that we usually want a compile error 
rather than silent O(n) indexing if we are expecting O(1) or 
O(log n) indexing.

 The same can be said for linear search. It doesn't take a 
 genius that searching an unordered structure cannot be better 
 than O(n) in the worst case.

But it isn't always immediately clear to the reader what the type 
of the data structure is.

Dec 19 2021

Dennis <dkorpel gmail.com> writes:

On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:
 The `in` operator on AAs has a speed of O(log n) or O(1). `in` 
 on slices would exceed that being O(n).

Who do people keep repeating that?

- Worst case time complexity of the `in` operator in AA's is 
actually O(n). This can easily happen in practice when your 
`toHash` has a poor implementation, which I have had before

- In custom types you can overload the `in` operator to something 
with arbitrarily bad time complexity.

- Despite scaling with O(n), a linear scan is pretty fast. You 
can probably scan your entire RAM in a matter of seconds.

The whole "it ruins performance guarantees" so far seems 
completely hypothetical and is not something that came from 
experience with actual code.

The problem with `in` for slices to me is that it would be 
inconsistent with how it works for AAs, where `in` searches for a 
*key* and returns a pointer to a *value*. The same for slices 
would look like this:
```D
T* opIn(size_t i, T[] arr)
{
     return i >= arr.length ? null : &arr[i];
}

void main()
{
     int[3] a = [10, 20, 30];
     assert((10 in a) == null); // index 10 is out of bounds
     assert((2 in a) == &a[2]);
}
```

But people suggest `in` to search a *value* in a slice.

I've once overloaded the `in` operator for a custom 2D array type 
used to represent a maze in which path finding was done. There it 
was useful for bounds checking since writing comparisons with 
`width` and `height` all the time is cumbersome. I wouldn't mind 
giving `in` this meaning so you can do easy bounds checks in 
slices as well, but I think it will be to confusing for newcomers 
expecting `in` to work like in Python.

Dec 18 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 18 December 2021 at 20:46:06 UTC, Dennis wrote:
 - Despite scaling with O(n), a linear scan is pretty fast. You 
 can probably scan your entire RAM in a matter of seconds.

Also, if the array is sorted it would be O(log N). You can 
achieve this either by adding a high level IR that can deduce 
that the array is sorted or making it part of the type system.

Dec 18 2021

Dukc <ajieskola gmail.com> writes:

On Saturday, 18 December 2021 at 20:46:06 UTC, Dennis wrote:
 On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:
 The `in` operator on AAs has a speed of O(log n) or O(1). `in` 
 on slices would exceed that being O(n).

 Who do people keep repeating that?

 - Worst case time complexity of the `in` operator in AA's is 
 actually O(n). This can easily happen in practice when your 
 `toHash` has a poor implementation, which I have had before

 - In custom types you can overload the `in` operator to 
 something with arbitrarily bad time complexity.

These are bugs. The O(log n) assumption still makes sense when 
your code works as it's supposed to.

 - Despite scaling with O(n), a linear scan is pretty fast. You 
 can probably scan your entire RAM in a matter of seconds.

A few seconds is by far too much. Remember, the data structure 
might be fed to function designed by introspection. The function 
sees "Ah, an `in` operator. `in` is on average O(log N) at worst, 
I can call it hundreds of times without significant slowdown", 
and compiles such that it takes a quarter of an hour to execute.

Had the `in` operator respected the O(log N) assumption, the 
introspecting function would have failed to compile alerting the 
programmer, or used an alternative, quicker implementation.

Dec 19 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Sunday, 19 December 2021 at 10:10:21 UTC, Dukc wrote:
 function sees "Ah, an `in` operator. `in` is on average O(log 
 N) at worst, I can call it hundreds of times without 
 significant slowdown", and compiles such that it takes a 
 quarter of an hour to execute.

I know this is a common abuse of notation in C++ circles, but 
O(…) means worst case.

Also if something is O(log N), then it is also O(N) or any other 
bigger bound. The notations O(N) means that N is sufficient to 
put an upper bound on the computation growth (in terms of input 
size) for each and every possible input of any size.

Average case is not given with Big-Oh-notation, but done 
"accurately" by presenting a model for the distribution of input 
data and then solving it. (usually a very big task)

When some C++ libraries talk of amortized complexity, they 
probably should not use O-notation, but we get what they mean. 
E.g. if you do a long series of operations then the costly 
reallocations will be amortized, so they therefore ignore it in 
the analysis and "pretend" it does not happen.

But it isn't on average. To do on-average-calculations you need a 
model of the input and the operations, and the calculcations can 
take a scholar many days to figure out. E.g. one strategy is to 
translate the model and patterns of operations to recurrence 
relations and solving it as integrals… a very tedious and time 
consuming job.

Dec 19 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Sunday, 19 December 2021 at 11:07:30 UTC, Ola Fosheim Grøstad 
wrote:
 Average case is not given with Big-Oh-notation, but done 
 "accurately" by presenting a model for the distribution of 
 input data and then solving it. (usually a very big task)

By this I mean that using "O(…)" alone means worst case.

We can say that quicksort is O(N²). Not qualifying the statements 
implies worst case.

If we consider all inputs equally likely then we can also claim 
that the average case complexity is limited by O(N log N).

Now, if AA was implemented as a hashtable with a constant 
upper-bound on the linked lists (say 100) then we could say that 
"in" would be O(1).

We could also claim that inserts would be O(1) amortized (ignore 
rehashing, assuming a distribution that is relatively flat).

So in general, we should be able to have AAs where all operations 
(insert, delete and membership tests) are O(1) amortized.

Dec 19 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/19/21 6:45 AM, Ola Fosheim Grøstad wrote:
 Now, if AA was implemented as a hashtable with a constant upper-bound on 
 the linked lists (say 100) then we could say that "in" would be O(1).

D AAs do not use linked lists any more. It uses open addressing. But 
having a bound on how many collisions can happen is ineffective against 
a bad hash function (say, one that always returns 0).

The expectation of the hash table is that the hash function is good. 
Without a good hash function, the complexity goes to crap.

 
 We could also claim that inserts would be O(1) amortized (ignore 
 rehashing, assuming a distribution that is relatively flat).

That is what we claim (and what hash algorithms in general claim). 
amortized O(1) lookups and inserts.

-Steve

Dec 19 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Sunday, 19 December 2021 at 18:36:12 UTC, Steven Schveighoffer 
wrote:
 That is what we claim (and what hash algorithms in general 
 claim). amortized O(1) lookups and inserts.

Yes, if you don't shrink.

You need to prove that the claim holds for all possible orderings 
of inserts, deletions and lookups.

So you need to prove that you always do O(N) operations between 
rehashing, so you get O(N) operations of complexity O(1) + one 
rehash of complexity O(N) = O(N).

Dec 19 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/19/21 2:34 PM, Ola Fosheim Grøstad wrote:
 On Sunday, 19 December 2021 at 18:36:12 UTC, Steven Schveighoffer wrote:
 That is what we claim (and what hash algorithms in general claim). 
 amortized O(1) lookups and inserts.

 
 Yes, if you don't shrink.
 
 You need to prove that the claim holds for all possible orderings of 
 inserts, deletions and lookups.
 
 So you need to prove that you always do O(N) operations between 
 rehashing, so you get O(N) operations of complexity O(1) + one rehash of 
 complexity O(N) = O(N).

Please see existing research, this is not a new concept.

-Steve

Dec 21 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 21 December 2021 at 17:02:22 UTC, Steven 
Schveighoffer wrote:
 On 12/19/21 2:34 PM, Ola Fosheim Grøstad wrote:
 On Sunday, 19 December 2021 at 18:36:12 UTC, Steven 
 Schveighoffer wrote:
 That is what we claim (and what hash algorithms in general 
 claim). amortized O(1) lookups and inserts.

 
 Yes, if you don't shrink.
 
 You need to prove that the claim holds for all possible 
 orderings of inserts, deletions and lookups.
 
 So you need to prove that you always do O(N) operations 
 between rehashing, so you get O(N) operations of complexity 
 O(1) + one rehash of complexity O(N) = O(N).

 Please see existing research, this is not a new concept.

What do you mean by research, this is a trivial topic. In order 
to get O(1) amortised you have to delay shrinkage, you must prove 
that any possible sequence does not reallocate more frequently 
than N operations.

Dec 21 2021

Paul Backus <snarwin gmail.com> writes:

On Sunday, 19 December 2021 at 19:34:18 UTC, Ola Fosheim Grøstad 
wrote:
 On Sunday, 19 December 2021 at 18:36:12 UTC, Steven 
 Schveighoffer wrote:
 That is what we claim (and what hash algorithms in general 
 claim). amortized O(1) lookups and inserts.

 Yes, if you don't shrink.

 You need to prove that the claim holds for all possible 
 orderings of inserts, deletions and lookups.

 So you need to prove that you always do O(N) operations between 
 rehashing, so you get O(N) operations of complexity O(1) + one 
 rehash of complexity O(N) = O(N).

To guarantee amortized O(1) deletion, it is sufficient to ensure 
that `shrink_threshold * growth_factor < growth_threshold`. And 
if you look at the source code for D's built-in AAs, you will see 
that this is, in fact, ensured:

https://github.com/dlang/druntime/blob/v2.098.1/src/rt/aaA.d#L17-L27

So, we can safely put this discussion to rest, I think.

Dec 22 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 22 December 2021 at 14:27:32 UTC, Paul Backus wrote:
 https://github.com/dlang/druntime/blob/v2.098.1/src/rt/aaA.d#L17-L27

 So, we can safely put this discussion to rest, I think.

I haven't looked at the implementation of AAs or the allocator 
used, so I cannot tell whether it is O(1) amortized or not. You 
can test it by repeatedly adding/removing items near the 
thresholds.

A preliminary test suggests that the observed behaviour is rather 
wasteful, which should be a warning for anyone who considers 
using it in production.

It grows by a factor of 4, that is a lot of waste. It also 
appears to grow on deletion:

adding
changed 0 : 0
changed 6 : 8
changed 25 : 32
changed 102 : 128
changed 409 : 512
changed 1638 : 2048
changed 6553 : 8192

removing
changed 4095 : 32768
changed 1023 : 8192
changed 255 : 2048
changed 63 : 512
changed 15 : 128
changed 3 : 32


```
import std.stdio;

struct AA
{
     Impl* impl;
};

struct Impl {
     void*[] buckets;
};


void main(){
     immutable N = 6554;
     int[int] a;
     ulong nn = 0;
     writeln("adding");
     for(int i = 0; i<N; i++){
         auto b = a;
         a[i] = i;
     	auto aa = cast(AA*) &a;
     	auto n = aa.impl.buckets.length;
         if (n!=nn) {
             writeln("changed ", i, " : ", nn);
             nn = n;
         }
     }

     writeln("\nremoving");
     for(int i = N-1; i>=0; i--){
         auto b = a;
         a.remove(i);
     	auto aa = cast(AA*) &a;
     	auto n = aa.impl.buckets.length;
         if (n!=nn) {
             writeln("changed ", i, " : ", nn);
             nn = n;
         }
     }
}
```

Dec 29 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 29 December 2021 at 10:53:13 UTC, Ola Fosheim 
Grøstad wrote:
 removing
 changed 4095 : 32768
 changed 1023 : 8192
 changed 255 : 2048
 changed 63 : 512
 changed 15 : 128
 changed 3 : 32

To put numbers on this, when deleting a long series of keys the 
hash table has consistently less than 13% filled slots, and right 
before shrinking it has 3% filled slots (1024/32768). That is a 
lot of wasted memory, 97%. Hash tables should not be worse than 
75% waste as a rule of thumb.

With such high constant factors O(1) notation is of little value.

Dec 29 2021

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 29 December 2021 at 11:26:48 UTC, Ola Fosheim 
Grøstad wrote:
 On Wednesday, 29 December 2021 at 10:53:13 UTC, Ola Fosheim 
 Grøstad wrote:
 removing
 changed 4095 : 32768
 changed 1023 : 8192
 changed 255 : 2048
 changed 63 : 512
 changed 15 : 128
 changed 3 : 32

 To put numbers on this, when deleting a long series of keys the 
 hash table has consistently less than 13% filled slots, and 
 right before shrinking it has 3% filled slots (1024/32768). 
 That is a lot of wasted memory, 97%. Hash tables should not be 
 worse than 75% waste as a rule of thumb.

 With such high constant factors O(1) notation is of little 
 value.

Seems like an issue worth filing a bugzilla report about.

Dec 29 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 29 December 2021 at 21:18:30 UTC, Paul Backus wrote:
 Seems like an issue worth filing a bugzilla report about.

Probably, I don't use AAs, but I can file an issue if nobody else 
has already.

Jan 02 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 29 December 2021 at 21:18:30 UTC, Paul Backus wrote:
 Seems like an issue worth filing a bugzilla report about.

I am not able to register for issues.dlang.org, does not not 
accept gmail accounts. I guess someone else has to file it.

Jan 07 2022

bauss <jj_1337 live.dk> writes:

On Friday, 7 January 2022 at 10:07:02 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 29 December 2021 at 21:18:30 UTC, Paul Backus 
 wrote:
 Seems like an issue worth filing a bugzilla report about.

 I am not able to register for issues.dlang.org, does not not 
 accept gmail accounts. I guess someone else has to file it.

Is that real? That gmail accounts don't work? That's so odd.

Jan 07 2022

Mike Parker <aldacron gmail.com> writes:

On Friday, 7 January 2022 at 10:07:02 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 29 December 2021 at 21:18:30 UTC, Paul Backus 
 wrote:
 Seems like an issue worth filing a bugzilla report about.

 I am not able to register for issues.dlang.org, does not not 
 accept gmail accounts. I guess someone else has to file it.

I registered with a gmail account. What error do you see?

Jan 07 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 7 January 2022 at 10:45:39 UTC, Mike Parker wrote:
 On Friday, 7 January 2022 at 10:07:02 UTC, Ola Fosheim Grøstad 
 wrote:
 On Wednesday, 29 December 2021 at 21:18:30 UTC, Paul Backus 
 wrote:
 Seems like an issue worth filing a bugzilla report about.

 I am not able to register for issues.dlang.org, does not not 
 accept gmail accounts. I guess someone else has to file it.

 I registered with a gmail account. What error do you see?

«The e-mail address you entered 
(firstname.middlename.lastname gmail.com) didn't pass our syntax 
checking for a legal email address. A legal address must contain 
exactly one ' ', and at least one '.' after the  . Currently, 
registering using Gmail addresses is not allowed due to spam. It 
also must not contain any illegal characters.»

Jan 07 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 12/18/21 12:46 PM, Dennis wrote:

 The problem with `in` for slices to me is that it would be inconsistent
 with how it works for AAs, where `in` searches for a *key* and returns a
 pointer to a *value*.

That.

Ali

Dec 19 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Saturday, 18 December 2021 at 20:46:06 UTC, Dennis wrote:
 I wouldn't mind giving `in` this meaning so you can do easy 
 bounds checks in slices as well, but I think it will be to 
 confusing for newcomers expecting `in` to work like in Python.

But your proposal would kinda work like in Python?

In [3]: 'a' in {'a':1, 'b':2}
Out[3]: True
In [4]: 'a' in ['a','b']
Out[4]: True

Dec 19 2021

Dennis <dkorpel gmail.com> writes:

On Sunday, 19 December 2021 at 17:11:23 UTC, Ola Fosheim Grøstad 
wrote:
 But your proposal would kinda work like in Python?

 In [3]: 'a' in {'a':1, 'b':2}
 Out[3]: True
 In [4]: 'a' in ['a','b']
 Out[4]: True

In my proposal `'a' in ['a', 'b']` will return `null` because it 
looks for the key 'a', and since the ASCII value of 'a' is `0x61` 
and the array length 2, it will return `null`.

Dec 20 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Monday, 20 December 2021 at 10:34:44 UTC, Dennis wrote:
 In my proposal `'a' in ['a', 'b']` will return `null` because 
 it looks for the key 'a', and since the ASCII value of 'a' is 
 `0x61` and the array length 2, it will return `null`.

Oh yeah, I'm sorry, you looked at the index as a key. I'm not 
sure how I managed to mix that up…

But yeah, just doing what Python does, but with a pointer to a 
value or null instead of boolean, makes sense in D.

Dec 20 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/18/21 3:46 PM, Dennis wrote:
 On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:
 The `in` operator on AAs has a speed of O(log n) or O(1). `in` on 
 slices would exceed that being O(n).

 
 Who do people keep repeating that?
 
 - Worst case time complexity of the `in` operator in AA's is actually 
 O(n). This can easily happen in practice when your `toHash` has a poor 
 implementation, which I have had before

Well, yeah. But that doesn't mean it happens often or even regularly. 
You need to have a bad `toHash` function.

We shouldn't lower the bar for `in` just because it's possible to make 
it perform poorly.

 - In custom types you can overload the `in` operator to something with 
 arbitrarily bad time complexity.

There's your answer -- make a custom type that allows `in` with an 
array-like structure.

Or make a UFCS function.

 - Despite scaling with O(n), a linear scan is pretty fast. You can 
 probably scan your entire RAM in a matter of seconds.

Complexity is not some old-fashioned idea that has no relevance on 
today's hardware. Linear searching is a real bottleneck, and identifying 
`in` to be sub-linear is a useful design decision.

If you are searching your entire memory by using an `opCmp`, I bet it 
will be more than a few seconds.

 The whole "it ruins performance guarantees" so far seems completely 
 hypothetical and is not something that came from experience with actual 
 code.

The design is sound. You use `in`, you get sub-linear (i.e. fast) 
performance. That kind of expectation is useful from a generic point of 
view. It's similar to opIndex for a linked list -- it's possible, it 
makes it so you can used linked lists in places that might expect 
arrays, but generic code that uses it might perform really badly as the 
code expects opIndex to be fast.

 The problem with `in` for slices to me is that it would be inconsistent 
 with how it works for AAs, where `in` searches for a *key* and returns a 
 pointer to a *value*. The same for slices would look like this:

[snip]

 But people suggest `in` to search a *value* in a slice.

Yes, this is a good point against. But the thing is, we also have 
`find`, which works fine, I'm not sure why we need `in` support.

 I've once overloaded the `in` operator for a custom 2D array type used 
 to represent a maze in which path finding was done. There it was useful 
 for bounds checking since writing comparisons with `width` and `height` 
 all the time is cumbersome. I wouldn't mind giving `in` this meaning so 
 you can do easy bounds checks in slices as well, but I think it will be 
 to confusing for newcomers expecting `in` to work like in Python.

Yes, that's not a terrible use case for it. But you are right that `in` 
will be confusing to newcomers, especially when you learn that an array 
is a collection of data. `if(x in arr)` doesn't seem to suggest that it 
will be looking at indexes.

-Steve

Dec 19 2021

Dennis <dkorpel gmail.com> writes:

On Sunday, 19 December 2021 at 18:31:46 UTC, Steven Schveighoffer 
wrote:
 There's your answer -- make a custom type that allows `in` with 
 an array-like structure.

For the record, I'm not arguing in favor of `in` for slices. I'm 
just baffled that whenever this comes up, the first thing people 
bring up is the 'time complexity' argument.

 That kind of expectation is useful from a generic point of view.

And this is my biggest problem with that argument: it's about 
this mystical function that takes a generic data structure and 
uses the `in` operator multiple times on it, which can now 
suddenly be called on an array, ruining performance to the 
surprise of the programmer.

What is this function? Does it exist in Phobos currently?

Dec 20 2021

Araq <rumpf_a web.de> writes:

On Monday, 20 December 2021 at 11:04:00 UTC, Dennis wrote:
 On Sunday, 19 December 2021 at 18:31:46 UTC, Steven 
 Schveighoffer wrote:
 There's your answer -- make a custom type that allows `in` 
 with an array-like structure.

 For the record, I'm not arguing in favor of `in` for slices. 
 I'm just baffled that whenever this comes up, the first thing 
 people bring up is the 'time complexity' argument.

 That kind of expectation is useful from a generic point of 
 view.

 And this is my biggest problem with that argument: it's about 
 this mystical function that takes a generic data structure and 
 uses the `in` operator multiple times on it, which can now 
 suddenly be called on an array, ruining performance to the 
 surprise of the programmer.

 What is this function? Does it exist in Phobos currently?

And how come it cannot store the result of the "in" operation in 
a local variable for re-use? And how exactly is calling an 
oh-so-efficient sub-linear O(log N) function repeatedly when you 
can save the result in a local *not* a problem?

Dec 20 2021

Biotronic <simen.kjaras gmail.com> writes:

On Monday, 20 December 2021 at 11:28:52 UTC, Araq wrote:
 On Monday, 20 December 2021 at 11:04:00 UTC, Dennis wrote:
 And this is my biggest problem with that argument: it's about 
 this mystical function that takes a generic data structure and 
 uses the `in` operator multiple times on it, which can now 
 suddenly be called on an array, ruining performance to the 
 surprise of the programmer.

 What is this function? Does it exist in Phobos currently?

 And how come it cannot store the result of the "in" operation 
 in a local variable for re-use? And how exactly is calling an 
 oh-so-efficient sub-linear O(log N) function repeatedly when 
 you can save the result in a local *not* a problem?


     bool isSubset(T1, T2)(T1 needles, T2 haystack) {
         foreach (needle; needles) {
             if (!(needle in haystack)) return false;
         }
         return true;
     }

If 'in' is O(1), that's a fairly sensible implementation. If it's 
O(N), it's not. Since the lookup uses different arguments each 
time, local variables can't help you.

Note that, of course, there's no problem if `x in arr` is a 
simple bounds check - if your bounds check on an array is O(N), 
you're doing it wrong.

--
   Simen

Dec 20 2021

Dennis <dkorpel gmail.com> writes:

On Monday, 20 December 2021 at 13:07:40 UTC, Biotronic wrote:
 If 'in' is O(1), that's a fairly sensible implementation.

Thanks, that's an illuminating example.

Dec 20 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Monday, 20 December 2021 at 11:04:00 UTC, Dennis wrote:
 On Sunday, 19 December 2021 at 18:31:46 UTC, Steven 
 Schveighoffer wrote:
 There's your answer -- make a custom type that allows `in` 
 with an array-like structure.

 For the record, I'm not arguing in favor of `in` for slices. 
 I'm just baffled that whenever this comes up, the first thing 
 people bring up is the 'time complexity' argument.

Yes, I agree with this. When I use "in" in Python I am very much 
aware that it does a linear scan, but it is a very useful 
shorthand that can replaces a long and confusing sequence of 
or-expressions.

E.g.

```
if keyword in ['if', 'while', 'for]
```

is a very useful and clear syntactical construct.

Dec 20 2021

Paul Backus <snarwin gmail.com> writes:

On Monday, 20 December 2021 at 13:54:56 UTC, Ola Fosheim Grøstad 
wrote:
 On Monday, 20 December 2021 at 11:04:00 UTC, Dennis wrote:
 On Sunday, 19 December 2021 at 18:31:46 UTC, Steven 
 Schveighoffer wrote:
 There's your answer -- make a custom type that allows `in` 
 with an array-like structure.

 For the record, I'm not arguing in favor of `in` for slices. 
 I'm just baffled that whenever this comes up, the first thing 
 people bring up is the 'time complexity' argument.

 Yes, I agree with this. When I use "in" in Python I am very 
 much aware that it does a linear scan, but it is a very useful 
 shorthand that can replaces a long and confusing sequence of 
 or-expressions.

 E.g.

 ```
 if keyword in ['if', 'while', 'for]
 ```

 is a very useful and clear syntactical construct.

```d
import std.algorithm.searching;

if (["if", "while", "for"].canFind(keyword)) {
     // ...
}
```

Dec 20 2021

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Monday, 20 December 2021 at 14:21:08 UTC, Paul Backus wrote:
 On Monday, 20 December 2021 at 13:54:56 UTC, Ola Fosheim 
 Grøstad wrote:
 E.g.

 ```
 if keyword in ['if', 'while', 'for]
 ```

 is a very useful and clear syntactical construct.

 ```d
 import std.algorithm.searching;

 if (["if", "while", "for"].canFind(keyword)) {
     // ...
 }
 ```

TBH, for that particular case, this looks nicer, and IMO 
documents the intent better than the original Pythonism:

```d
import std.algorithm.comparison : among;

if (keyword.among("if", "while", "for")) { /* ... */ }
```

It also doesn't let work go to waste, as it returns a 1-based 
index.

Dec 20 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Monday, 20 December 2021 at 14:21:08 UTC, Paul Backus wrote:
 ```d
 import std.algorithm.searching;

 if (["if", "while", "for"].canFind(keyword)) {
     // ...
 }
 ```

You can always create a library construct, but then you should 
also argue in favour of removing AAs and "in". You either do it, 
or you don't. Why the half-measures?

Dec 20 2021

Paul Backus <snarwin gmail.com> writes:

On Monday, 20 December 2021 at 21:11:28 UTC, Ola Fosheim Grøstad 
wrote:
 On Monday, 20 December 2021 at 14:21:08 UTC, Paul Backus wrote:
 ```d
 import std.algorithm.searching;

 if (["if", "while", "for"].canFind(keyword)) {
     // ...
 }
 ```

 You can always create a library construct, but then you should 
 also argue in favour of removing AAs and "in". You either do 
 it, or you don't. Why the half-measures?

There are a lot of features one could potentially add to D to 
improve it, and limited resources to devote to doing so. I'm not 
necessarily *against* adding `in` for slices, but since we 
already have a perfectly good library solution, I don't think it 
deserves a high priority.

Feel free to write the DIP yourself if you disagree. :)

Dec 20 2021

Nick Treleaven <nick geany.org> writes:

On Monday, 20 December 2021 at 21:11:28 UTC, Ola Fosheim Grøstad 
wrote:
 You can always create a library construct, but then you should 
 also argue in favour of removing AAs and "in". You either do 
 it, or you don't. Why the half-measures?

No, `canFind` and `among` search for a given value in a range. 
`in` checks if a given key exists and returns a pointer to the 
associated data (not a pointer to the matching key). This is why 
Phobos SortedRange does not implement opIn even though it would 
be sub-linear. Dennis is right that conceptually a key for an 
array is an index. An array element value is the data associated 
with an index.

Dec 21 2021

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 21 December 2021 at 17:06:04 UTC, Nick Treleaven 
wrote:
 Dennis is right that conceptually a key for an array is an 
 index. An array element value is the data associated with an 
 index.

What it is conceptually, depends on how you conceptualize it. The 
most common conceptualization of "in" is as a set operation and 
you cannot represent a set with array indexes.

Dec 22 2021

Nick Treleaven <nick geany.org> writes:

On Wednesday, 22 December 2021 at 12:43:26 UTC, Ola Fosheim 
Grøstad wrote:
 What it is conceptually, depends on how you conceptualize it. 
 The most common conceptualization of "in" is as a set operation 
 and you cannot represent a set with array indexes.

`in` is defined for RedBlackTree, but it returns bool:
https://dlang.org/phobos/std_container_rbtree.html#.RedBlackTree.opBinaryRight

So aside from the complexity argument, `element in arr` could be 
defined if it returned bool, not a pointer (in order to be 
consistent with AAs not returning a pointer to the *key*). But 
due to time complexity it makes more sense to define `index in 
arr`.

Jan 01 2022

Elronnd <elronnd elronnd.net> writes:

On Saturday, 1 January 2022 at 16:40:02 UTC, Nick Treleaven wrote:
 On Wednesday, 22 December 2021 at 12:43:26 UTC, Ola Fosheim 
 Grøstad wrote:
 What it is conceptually, depends on how you conceptualize it. 
 The most common conceptualization of "in" is as a set 
 operation and you cannot represent a set with array indexes.

 `in` is defined for RedBlackTree, but it returns bool:
 https://dlang.org/phobos/std_container_rbtree.html#.RedBlackTree.opBinaryRight

A redblacktree is not a sequence, though; its elements are not 
keyed in any way.

Jan 01 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Sunday, 2 January 2022 at 02:33:22 UTC, Elronnd wrote:
 A redblacktree is not a sequence, though; its elements are not 
 keyed in any way.

Right, "in" is a set operator. You can implement a set as a dict 
with null-values, then you can think of the keys as values. If 
you think of the ADT as the set-concept then what is a key and 
what is a value is a matter of interpretation. If you think of a 
binary tree (or rbtree) as a set, then the distinction between 
key and value becomes blurred.

Jan 02 2022

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/20/21 6:04 AM, Dennis wrote:
 On Sunday, 19 December 2021 at 18:31:46 UTC, Steven Schveighoffer wrote:
 There's your answer -- make a custom type that allows `in` with an 
 array-like structure.

 
 For the record, I'm not arguing in favor of `in` for slices. I'm just 
 baffled that whenever this comes up, the first thing people bring up is 
 the 'time complexity' argument.

One thing not really being discussed is that there is a difference 
between "some library" defining slow `in` operators, or slow `opIndex`, 
and dlang/Phobos doing it.

D picked the path of trying to ensure complexity consistency for `in`, 
but it's more of a stylistic rule, not necessarily a requirement.

-Steve

Dec 21 2021

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Tuesday, 21 December 2021 at 17:00:49 UTC, Steven 
Schveighoffer wrote:
 One thing not really being discussed is that there is a 
 difference between "some library" defining slow `in` operators, 
 or slow `opIndex`, and dlang/Phobos doing it.

 D picked the path of trying to ensure complexity consistency 
 for `in`, but it's more of a stylistic rule, not necessarily a 
 requirement.

  Then maybe we in should be implemented; Have it check if you 
have it sorted (*unless you do assumesorted template*). If it is 
sorted naturally use BinarySearch, and if not it would probably 
do a linear search **BUT** give a warning message and file/line 
number so it can be fixed/traced? (*Or just make it have to be 
Binary Search and asserts out if it isn't sorted*)

  As for bool vs pointer return... probably return a pointer as 
that can easily be tested as bool for no extra cost.

Jan 02 2022

Nick Treleaven <nick geany.org> writes:

On Monday, 3 January 2022 at 00:15:53 UTC, Era Scarecrow wrote:
  Then maybe we in should be implemented; Have it check if you 
 have it sorted (*unless you do assumesorted template*). If it 
 is sorted naturally use BinarySearch, and if not it would 
 probably do a linear search **BUT** give a warning message and 
 file/line number so it can be fixed/traced? (*Or just make it 
 have to be Binary Search and asserts out if it isn't sorted*)

  As for bool vs pointer return... probably return a pointer as 
 that can easily be tested as bool for no extra cost.

It would make sense to implement `in` for SortedRange, but there 
is no use case for returning a pointer. You already have the 
value on the left hand side, the only reason for a pointer would 
be to mutate the element in the range. Mutating that would 
potentially stop the range being in order, violating the point of 
SortedRange.

Jan 04 2022

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Tuesday, 4 January 2022 at 18:23:33 UTC, Nick Treleaven wrote:
 It would make sense to implement `in` for SortedRange, but 
 there is no use case for returning a pointer. You already have 
 the value on the left hand side, the only reason for a pointer 
 would be to mutate the element in the range.

  Sorry, i was somehow thinking there was a key/value pair in this 
and not just a normal sorted block.

  But yes there may still be use cases. Consider i do opCmp on a 
struct, and it only does comparisons on say a name portion (*Or 
UPC or some other immutable property ID*), but no other part of 
the stored data. The name/ID would be how it's sorted; But any 
attached data which is non-sorted would be inaccessible 
otherwise. This could be phone number, address, grades, just 
about anything (*and those items could be changed without 
changing the sorting order*)

  Though you can probably do 2 steps it seems like a bit of extra 
pointless work to me. Or maybe AA's are a better choice in that 
case. I'm not sure.

Jan 05 2022

Nick Treleaven <nick geany.org> writes:

On Wednesday, 5 January 2022 at 23:16:48 UTC, Era Scarecrow wrote:
 Consider i do opCmp on a struct, and it only does comparisons 
 on say a name portion

Ok, you're right. The search key then wouldn't need to contain 
the other data which is not used for comparisons.

 But any attached data which is non-sorted would be inaccessible 
 otherwise. This could be phone number, address, grades, just 
 about anything (*and those items could be changed without 
 changing the sorting order*)

True, but if SortedRange exposed a pointer to the matching 
element, it should probably be const so the data that *is* 
compared is not changed. (It could perhaps alternatively return a 
one-based index instead, which would convert to bool as expected).

Jan 07 2022

Biotronic <simen.kjaras gmail.com> writes:

On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:
 I'm not sure the speed rule is written or not, but generally 
 `in` is supposed to have the same speed requirement as `[n]` 
 which is sub-linear.

It's in 
[std.container](https://dlang.org/phobos/std_container.html), 
bottom of the page.

I [remember a discussion on 
complexity](https://forum.dlang.org/post/n3qtgq$2frv$1 digitalmars.com) that
prompted Andrei to create a library for marking functions with their
complexity. I'm not sure if this was the original source for the complexities
given in std.container.

My brain is also telling me the values were sourced from 
Alexander Stepanov's [Elements of 
Programming](http://elementsofprogramming.com/), but I can't find 
a source for that, and I'm too lazy to look very deep.

--
   Biotronic

Dec 19 2021

D Programming

C/C++ Programming

Other

digitalmars.D - Suggestion: Operator `in` for slices