digitalmars.D - Shortcomings of D-Style Fused Operator Overloading

Ahmet Sait (61/63) Jul 07 2023 For the past few weeks I've been working on a simple compiler for

Richard (Rikki) Andrew Cattermole (6/6) Jul 07 2023 This does beg the question, should we be supporting Unicode mathematical...

Quirin Schroll (6/8) Jul 07 2023 No. If an editor, font, or terminal doesn’t support this, you’re

Richard (Rikki) Andrew Cattermole (8/16) Jul 07 2023 We are not talking about basic comparison operators that can be

IchorDev (14/21) Jul 07 2023 I hope that would never happen. I'm a programmer, I have no idea

Richard (Rikki) Andrew Cattermole (5/5) Jul 07 2023 I do not know how to read them either. I doubt you'd see much code using...

IchorDev (6/12) Jul 07 2023 They could always use mixins that replace those characters with

Ahmet Sait (5/6) Jul 07 2023 I didn't post this to discuss unicode characters so I would

Dennis (15/17) Jul 07 2023 The thing is, D's operator overloading was designed to support
Quirin Schroll (93/100) Jul 07 2023 **TL;DR: Getting orderings right in a programming language is

Ahmet Sait (8/17) Jul 07 2023 This is exactly how I define equality (which differs from how

Timon Gehr (12/13) Jul 07 2023 VRP is a kind of abstract interpretation, and the natural definition for...

Dom DiSc (9/24) Jul 08 2023 You're all aware that opCmp can return a float, with the values

Ahmet Sait (2/7) Jul 08 2023 You're more than welcome to prove us wrong ;)
Timon Gehr (8/19) Jul 08 2023 I am very aware of this. OTOH, you may not be aware that your solution

Timon Gehr (4/5) Jul 08 2023 Typo. This is of course only an implication:

deadalnix (3/24) Jul 07 2023 Thanks god. If you are doing something non standard, do not use

Jonathan M Davis (17/51) Jul 07 2023 Yeah. The limitations on opCmp are very much on purpose. We already argu...

monkyyy (15/16) Jul 08 2023 I dislike the current scheme because its not very meta

Ahmet Sait <nightmarex1337 hotmail.com> writes:

For the past few weeks I've been working on a simple compiler for 
my grad project, and I was trying to make use of operator 
overloading in some of my types named Interval and Constant.

Interval supposed to be a type that supports all kinds of 
arithmetic operations with another Interval which would be used 
for implementing [Value Range 
Propagation](https://digitalmars.com/articles/b62.html).

Now to the operator overloading part. As I was trying to 
implement the Interval type, I realized an interesting issue. The 
way operator overloading works in D is with a unified `opCmp()` 
method that returns -1, 0 or 1 depending on the inequality, and 
`a < b` gets lowered to `a.opCmp(b) < 0`. However, implementing 
inequality correctly for intervals requires us to know the exact 
operator used.

Let's define two intervals x and y:
{x ∈ ℝ │ a ≤ x ≤ b}
{y ∈ ℝ │ c ≤ y ≤ d}

And the inequality operators:
```
x < y = ⎧ 1  if b < c
         ⎨
         ⎩ 0  otherwise

x ≤ y = ⎧ 1  if b ≤ c
         ⎨
         ⎩ 0  otherwise

x > y = ⎧ 1  if a > d
         ⎨
         ⎩ 0  otherwise

x ≥ y = ⎧ 1  if a ≥ d
         ⎨
         ⎩ 0  otherwise
```

Exercise: Write an `opCmp()` method that implements the rules 
above for `struct Interval { int lower, upper; }`. (Hint: It's 
not possible.)

I didn't fully implement or used this Interval type I was cooking 
because I was running out of time for the project, so I opted for 
something simpler which is to just do constant folding, hence the 
Constant type. Constant is basically a tagged union like 
`bool|integer|real`, and it doesn't have the fuzzy ordering 
issues like in the Interval type above.

For Constant, I wanted to implement all unary operators, one of 
which is logical not `!`. The only way to do this in D is by 
writing a `bool opCast(T:bool)`, except I don't want to return a 
bool! I want to return another Constant that represents the 
logical inverse of what the current Constant contains, and be 
able write this:
```
Constant c2 = !c1; // Lowered to !c1.opCast!bool
```

As you can guess, returning a Constant from `opCast()` instead of 
`bool` results in this error message:
 Error: cannot implicitly convert expression `c1.opCast()` of 
 type `Constant` to `bool`

***

Now you might be thinking, I could just define a bunch of regular 
functions such as `logicalNot()`, `lowerOrEquals()` etc. and you 
would be right (which is what I ended up doing anyway). I guess 
the point here is that the way D merges these operators to reduce 
potential bugs actually end up making certain things 
inexpressible naturally.

So what do you guys think? Would you say distinct operators are 
better?

Jul 07 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

This does beg the question, should we be supporting Unicode mathematical 
symbols for operators?

In theory this should be easy enough to do, its just some Unicode tables 
that the lexer just needs a bit of info on how to categorize after all.

Will need input from Walter if this is an acceptable path to look into 
further though.

Jul 07 2023

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Friday, 7 July 2023 at 09:38:37 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 This does beg the question, should we be supporting Unicode 
 mathematical symbols for operators?

No. If an editor, font, or terminal doesn’t support this, you’re 
in a bad spot. We had this discussion already. Anyone can use a 
OTF/TTF font that displays <= as a wider ≤ and call it a day if 
they want to.

Jul 07 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 07/07/2023 10:34 PM, Quirin Schroll wrote:
 On Friday, 7 July 2023 at 09:38:37 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 This does beg the question, should we be supporting Unicode 
 mathematical symbols for operators?

 
 No. If an editor, font, or terminal doesn’t support this, you’re in a 
 bad spot. We had this discussion already. Anyone can use a OTF/TTF font 
 that displays <= as a wider ≤ and call it a day if they want to.

We are not talking about basic comparison operators that can be 
represented in a glyph by a ligature.

http://www.unicode.org/charts/PDF/U2200.pdf

http://www.unicode.org/charts/PDF/U27C0.pdf

http://www.unicode.org/charts/PDF/U2A00.pdf

There is a lot of mathematical operators which have no equivalent 
operator overload.

Jul 07 2023

IchorDev <zxinsworld gmail.com> writes:

On Friday, 7 July 2023 at 10:39:40 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 We are not talking about basic comparison operators that can be 
 represented in a glyph by a ligature.

 http://www.unicode.org/charts/PDF/U2200.pdf

 http://www.unicode.org/charts/PDF/U27C0.pdf

 http://www.unicode.org/charts/PDF/U2A00.pdf

 There is a lot of mathematical operators which have no 
 equivalent operator overload.

I hope that would never happen. I'm a programmer, I have no idea 
how to read mathematical operators. I don't want to be forced to 
decipher the code of others that uses them. It's already enough 
of a headache when I want to research a *computer science* topic 
and everything gets explained exclusively in complicated 
mathematical notation. I'm not a mathematician, I work 
exclusively with numbers of finite complexity.
Additionally, mathematical operators present a huge readability 
issue. People had enough trouble differentiating (e.g.) `5l` and 
`51` that `5l` was removed. Many mathematical operators look way 
too similar unless they're magnified. I think functions with 
shortened names are just fine...

Jul 07 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

I do not know how to read them either. I doubt you'd see much code using 
them.

It would only be the people who are very familiar with their usage who 
would find any use of it. If for no other reason than everyone else do 
not have quick access to those characters already setup.

Jul 07 2023

IchorDev <zxinsworld gmail.com> writes:

On Friday, 7 July 2023 at 12:09:19 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 I do not know how to read them either. I doubt you'd see much 
 code using them.

 It would only be the people who are very familiar with their 
 usage who would find any use of it. If for no other reason than 
 everyone else do not have quick access to those characters 
 already setup.

They could always use mixins that replace those characters with 
function calls. What I worry about is: as a language feature, it 
might become common enough that I have to read through it in the 
code of others to understand what their code is doing.

Jul 07 2023

Ahmet Sait <nightmarex1337 hotmail.com> writes:

On Friday, 7 July 2023 at 10:39:40 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 [snip]

I didn't post this to discuss unicode characters so I would 
suggest people to stop derailing discussion with any of that. 
Thanks.

Jul 07 2023

Dennis <dkorpel gmail.com> writes:

On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:
 So what do you guys think? Would you say distinct operators are 
 better?

The thing is, D's operator overloading was designed to support 
library numeric types, such as half float, fixed point, complex 
numbers, vector3 etc. It's intentionally limited in order to 
prevent Domain Specific Languages embedded in D code directly. 
Templates / string mixins can still be used for that.

Now there are other mathematical objects that could *almost* be 
modeled as a D type with operator overloading, but then it only 
gets you 90% there before you hit a limitation like  `opCmp` for 
your interval. I have a similar case where my symbolic number 
type could use an `opCmp` returning another `typeof(this)` 
instead of an `int` or `float`, but alas.

I don't think a proposal to make operator overloading a lot more 
powerful has much chance, but only making `opCmp` a bit more 
powerful might if you can demonstrate valid use cases for it.

Jul 07 2023

Quirin Schroll <qs.il.paperinik gmail.com> writes:

**TL;DR: Getting orderings right in a programming language is 
hard.**

On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:
 […]
 Exercise: Write an `opCmp()` method that implements the rules 
 above for `struct Interval { int lower, upper; }`. (Hint: It's 
 not possible.)

One can’t because D’s philosophy behind `opCmp` is a total 
ordering and that `a <= b` is the same as `a < b || a == b` 
(which both isn’t the case for your intervals).

The total ordering part can be stretched to a partial ordering by 
making `opCmp` return `float` and use `float.nan` to indicate 
that the elements are unrelated.

 […]

 So what do you guys think? Would you say distinct operators are 
 better?

Yes, 100%. Orderings cannot really be done using D’s approach 
with `opXXX`. Ideally, the programmer provides minimal details 
about the ordering and the compiler generates the rest so that 
ordering is consistent. As an example, C++ up until C++20 
requires *you* to write both `==` and `!=`. In almost every 
circumstance, one would define `!=` in terms of `==` completely 
mechanical. I see no reason why `!=` should even be overloadable. 
Maybe there are examples where `a != b` is not the same as `!(a 
== b)`, I’m open enough to accept that, even though I’m not 
creative enough to find something where this is reasonable. C++20 
also added the `<=>` operator to allow for D’s approach because 
in a lot of cases, it serves the programmer well.

I agree with you that `opCmp` is simplistic design; partial 
orderings feel like a hack. I’ve thought about orderings a lot 
and I’m glad you’ve exposed me to this, because now I need to 
re-consider what the most general properties of ordering 
relations could be. However, I think that your ordering relations 
make little sense.

What properties should we minimally expect of `==`, `<` and `<=`, 
`>`, and `>=` to be reasonable?
1. Rewrites
    * `a != b` is the same as `!(a == b)`, therefore `!=` should 
not be user-definable, `==` suffices.
    * `a > b` is the same as `b < a`, therefore `>` should not be 
user-definable, `<` suffices.
    * `a >= b` is the same as `b <= a`, therefore `>=` should not 
be user-definable, `<=` suffices.

2. Individual properties
    * `<`, `<=`, and `==` are 
[transitive](https://en.wikipedia.org/wiki/Transitive_relation): 
E.g. for `<`: If `a < b` and `b < c` then `a < c`.
    * `<` is totally irreflexive: `a < a` is always false.
    * `<` is asymmetric: If `a < b`, then `b < a` is false.
    * `==` is symmetrical: `a == b` is the same as `b == a`.
    * `<=` is partially reflexive: For any `a` and `b`, if `a <= 
b`, then `a <= a` and `b <= b`.
    * `==` is partially reflexive as a consequence from 
transitivity and symmetry: If `a == b`, by symmetry, also `b == 
a`, thus by transitivity, `a == a` and `b == b`.

3. Interactions
    * `a == b` and `a < b` are mutually exclusive; in particular:
       * If `a == b`, then `a < b` is false. (This is a 
strengthening of irreflexivity.)
       * If `a < b`, then `a != b`.
       * Because `==` is symmetrical, the same for `b < a`.
    * If `a < b` or `a == b`, then `a <= b` (definition of 
*less-or-equal*).
    * If `a <= b`, then `a < b` or `a == b` (definition of 
*less-or-equal*).
    * If `a <= b` and `a >= b`, then `a == b` (Anti-symmetry).

The last two or three are debatable; I have not put them in 
Rewrites intentionally. As presented, the intervals don’t adhere 
to the last two. It’s not even clear what `==` on the intervals 
would be. It looks like the intervals are supposed to represent 
*any value in the bounds,* which means that – unless the lower 
and bound is the same, i.e. the interval is a single number – `I 
== I` should actually fail.

In the axioms I presented, reflexivity is partial because `==` 
and `<=` on IEEE floats are not reflexive: `float.nan != 
float.nan`, but any value that is equal to any other value is 
equal to itself. And, of course, the axioms are supposed to be 
true for build-in types.

While partial in the sense that not all elements are included, 
IEEE floats are partially totally ordered:
* We indeed have `a <= b` or `b <= a` for all `a` and `b` such 
that `a == a` and `b == b`.

IEEE floats are also anti-symmetric, and in general, `==` should 
be the partial equivalence relation induced by `<=`: `a == b` if 
and only if `a <= b` and `a >= b`.
For IEEE floats, positive and negative zero are equivalent, `+0.0 
== -0.0`, but they are not equal (in bit pattern, i.e. `+0.0 !is 
-0.0` and behavior: `1/+0.0 != 1/-0.0`). As another example, the 
not-a-number values are behaviorally the same, they’re only 
technically distinct.

I’ve written an [answer on 
StackOverflow](https://stackoverflow.com/a/71844213/3273130) to 
the question *What is the difference between equality and 
equivalence?* A TL;DR is: D’s `is` operator is (at least close 
to) mathematical equality, whereas the `==` operator is merely 
the default equivalence relation, which can be anything, and 
(think of `x` and `y` as `ref` parameters) `&x == &y` in D 
represents identity (any mutation of `x` is a mutation of `y`).

The C++ proposal 
[P0515R0](https://open-std.org/JTC1/SC22/WG21/docs/papers/2017/p0515r0.pdf) is
also a good read. It was rejected, but introduces relevant notions. Look at
least at § 1.4.1 *Guidance: What we teach.*

Jul 07 2023

Ahmet Sait <nightmarex1337 hotmail.com> writes:

On Friday, 7 July 2023 at 13:28:23 UTC, Quirin Schroll wrote:
 The last two or three are debatable; I have not put them in 
 Rewrites intentionally. As presented, the intervals don’t 
 adhere to the last two. It’s not even clear what `==` on the 
 intervals would be. It looks like the intervals are supposed to 
 represent *any value in the bounds,* which means that – unless 
 the lower and bound is the same, i.e. the interval is a single 
 number – `I == I` should actually fail.

This is exactly how I define equality (which differs from how 
math folks define it); if X and Y are intervals then the 
comparison between them must hold for all possible values of 
these intervals which means equality only holds if they converge 
on a single point on the numerical axis. Seems like the only 
definition that's useful for implementing VRP.

 The C++ proposal 
 [P0515R0](https://open-std.org/JTC1/SC22/WG21/docs/papers/2017/p0515r0.pdf) is
also a good read. It was rejected, but introduces relevant notions. Look at
least at § 1.4.1 *Guidance: What we teach.*

That's an interesting read, thanks for the link.

Jul 07 2023

Timon Gehr <timon.gehr gmx.ch> writes:

On 7/7/23 17:49, Ahmet Sait wrote:
 Seems like the only definition that's useful for implementing VRP.

VRP is a kind of abstract interpretation, and the natural definition for 
that is the perfect transformer:

`I op I' = { I op I' | x∈I, x'∈I' }`

I.e., I'd say it is more useful to have three-valued logic operating on 
`{false}`, `{true}` and `{false,true}`.

Alternatively,

`I <=> I' = { I <=> I' | x∈I, x'∈I' }`

producing a non-empty subset of `{-1,0,1}`.

In any case, implementing abstract interpretation as operations on 
independent abstract elements in the first place is not particularly 
natural and only works in very special cases such as the interval domain.

Jul 07 2023

Dom DiSc <dominikus scherkl.de> writes:

On Friday, 7 July 2023 at 17:15:45 UTC, Timon Gehr wrote:
 On 7/7/23 17:49, Ahmet Sait wrote:
 Seems like the only definition that's useful for implementing 
 VRP.

 VRP is a kind of abstract interpretation, and the natural 
 definition for that is the perfect transformer:

 `I op I' = { I op I' | x∈I, x'∈I' }`

 I.e., I'd say it is more useful to have three-valued logic 
 operating on `{false}`, `{true}` and `{false,true}`.

 Alternatively,

 `I <=> I' = { I <=> I' | x∈I, x'∈I' }`

 producing a non-empty subset of `{-1,0,1}`.

 In any case, implementing abstract interpretation as operations 
 on independent abstract elements in the first place is not 
 particularly natural and only works in very special cases such 
 as the interval domain.

You're all aware that opCmp can return a float, with the values 
0,1,-1 and NaN?
This can be used for intervals very good (returning NaN if the 
intervals overlap, 0 if they are equal and 1 or -1 if they are 
disjunct).
Unfortunately float is 32 bit, so pretty much the first thing I 
did after I found D was to implement a 2-bit type with exactly 
those 4 values.

Jul 08 2023

Ahmet Sait <nightmarex1337 hotmail.com> writes:

On Saturday, 8 July 2023 at 08:32:00 UTC, Dom DiSc wrote:
 You're all aware that opCmp can return a float, with the values 
 0,1,-1 and NaN?
 This can be used for intervals very good (returning NaN if the 
 intervals overlap, 0 if they are equal and 1 or -1 if they are 
 disjunct).

You're more than welcome to prove us wrong ;)

Jul 08 2023

Timon Gehr <timon.gehr gmx.ch> writes:

On 7/8/23 10:32, Dom DiSc wrote:
 In any case, implementing abstract interpretation as operations on 
 independent abstract elements in the first place is not particularly 
 natural and only works in very special cases such as the interval domain.

 
 You're all aware that opCmp can return a float, with the values 0,1,-1 
 and NaN?
 This can be used for intervals very good (returning NaN if the intervals 
 overlap, 0 if they are equal and 1 or -1 if they are disjunct).
 Unfortunately float is 32 bit, so pretty much the first thing I did 
 after I found D was to implement a 2-bit type with exactly those 4 values.

I am very aware of this. OTOH, you may not be aware that your solution 
does not satisfy the OP's constraints.

It cannot do so because D's operator overloading while not enforcing
`a <= b || b < a` still enforces `a <= b <==> a < b || b <= a`, but
while `[x,y] <= [y,z]` holds, neither `[x,y] < [y,z]` nor
`[y,z] <= [x,y]` hold for x<y<z`.

This means the OP is indeed right that it is impossible.

Jul 08 2023

Timon Gehr <timon.gehr gmx.ch> writes:

On 7/8/23 19:22, Timon Gehr wrote:
 `a <= b <==> a < b || b <= a`

Typo. This is of course only an implication:

`a <= b ==> a < b || b <= a`.

In any case, the argument stands.

Jul 08 2023

deadalnix <deadalnix gmail.com> writes:

On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:
 Let's define two intervals x and y:
 {x ∈ ℝ │ a ≤ x ≤ b}
 {y ∈ ℝ │ c ≤ y ≤ d}

 And the inequality operators:
 ```
 x < y = ⎧ 1  if b < c
         ⎨
         ⎩ 0  otherwise

 x ≤ y = ⎧ 1  if b ≤ c
         ⎨
         ⎩ 0  otherwise

 x > y = ⎧ 1  if a > d
         ⎨
         ⎩ 0  otherwise

 x ≥ y = ⎧ 1  if a ≥ d
         ⎨
         ⎩ 0  otherwise
 ```

 Exercise: Write an `opCmp()` method that implements the rules 
 above for `struct Interval { int lower, upper; }`. (Hint: It's 
 not possible.)

Thanks god. If you are doing something non standard, do not use 
the builtin operators, give you operation an explicit name.

Jul 07 2023

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Friday, July 7, 2023 10:18:56 AM MDT deadalnix via Digitalmars-d wrote:
 On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:
 Let's define two intervals x and y:
 {x ∈ ℝ │ a ≤ x ≤ b}
 {y ∈ ℝ │ c ≤ y ≤ d}

 And the inequality operators:
 ```
 x < y = ⎧ 1  if b < c

         ⎨
         ⎩ 0  otherwise

 x ≤ y = ⎧ 1  if b ≤ c

         ⎨
         ⎩ 0  otherwise

 x > y = ⎧ 1  if a > d

         ⎨
         ⎩ 0  otherwise

 x ≥ y = ⎧ 1  if a ≥ d

         ⎨
         ⎩ 0  otherwise

 ```

 Exercise: Write an `opCmp()` method that implements the rules
 above for `struct Interval { int lower, upper; }`. (Hint: It's
 not possible.)

 Thanks god. If you are doing something non standard, do not use
 the builtin operators, give you operation an explicit name.

Yeah. The limitations on opCmp are very much on purpose. We already arguably
have issues due to how the comparison operations work with NaN (it's a
useful feature, but it does potentially cause problems with generic code).
The comparison operators are very much intended to work in the standard
manner even on user-defined types, and types where that isn't going to work
should be using a different set of functions.

Operator overloading in general is provided so that user-defined types can
be used like built-in types and work with generic code that uses those
operators. Domain-specific language stuff should probably be left to either
a domain-specific language or just use properly named functions. Trying to
alter the built-in operators to operate in a way that differs like this from
the built-in types is arguably an abuse of operator overloading.

So, while obviously, if someone is looking to do this like they might have
done in C++, that can be annoying for them, it's really not something that D
is looking to support.

- Jonathan M Davis

Jul 07 2023

monkyyy <crazymonkyyy gmail.com> writes:

On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:
 

I dislike the current scheme because its not very meta 
programmable

like, id like to write Nullable!T opcmp in 3 lines of code

```d
opCmp(typeof(this) a){
   if(a.isnull != isnull){
     return isnull.opCmp(a.isnull);}
   return value.opCmp(a.value);
}
```
That is very very much NOT how it works; maybe it should act like 
opBinary and
`a < b` is rewritten as `a.opCmp!"<"(b)` with some list of 
prefered overloads

Jul 08 2023

D Programming

C/C++ Programming

Other

digitalmars.D - Shortcomings of D-Style Fused Operator Overloading