www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Shortcomings of D-Style Fused Operator Overloading

reply Ahmet Sait <nightmarex1337 hotmail.com> writes:
For the past few weeks I've been working on a simple compiler for 
my grad project, and I was trying to make use of operator 
overloading in some of my types named Interval and Constant.

Interval supposed to be a type that supports all kinds of 
arithmetic operations with another Interval which would be used 
for implementing [Value Range 
Propagation](https://digitalmars.com/articles/b62.html).

Now to the operator overloading part. As I was trying to 
implement the Interval type, I realized an interesting issue. The 
way operator overloading works in D is with a unified `opCmp()` 
method that returns -1, 0 or 1 depending on the inequality, and 
`a < b` gets lowered to `a.opCmp(b) < 0`. However, implementing 
inequality correctly for intervals requires us to know the exact 
operator used.

Let's define two intervals x and y:
{x ∈ ℝ │ a ≤ x ≤ b}
{y ∈ ℝ │ c ≤ y ≤ d}

And the inequality operators:
```
x < y = ⎧ 1  if b < c
         ⎨
         ⎩ 0  otherwise

x ≤ y = ⎧ 1  if b ≤ c
         ⎨
         ⎩ 0  otherwise

x > y = ⎧ 1  if a > d
         ⎨
         ⎩ 0  otherwise

x ≥ y = ⎧ 1  if a ≥ d
         ⎨
         ⎩ 0  otherwise
```

Exercise: Write an `opCmp()` method that implements the rules 
above for `struct Interval { int lower, upper; }`. (Hint: It's 
not possible.)

I didn't fully implement or used this Interval type I was cooking 
because I was running out of time for the project, so I opted for 
something simpler which is to just do constant folding, hence the 
Constant type. Constant is basically a tagged union like 
`bool|integer|real`, and it doesn't have the fuzzy ordering 
issues like in the Interval type above.

For Constant, I wanted to implement all unary operators, one of 
which is logical not `!`. The only way to do this in D is by 
writing a `bool opCast(T:bool)`, except I don't want to return a 
bool! I want to return another Constant that represents the 
logical inverse of what the current Constant contains, and be 
able write this:
```
Constant c2 = !c1; // Lowered to !c1.opCast!bool
```

As you can guess, returning a Constant from `opCast()` instead of 
`bool` results in this error message:
 Error: cannot implicitly convert expression `c1.opCast()` of 
 type `Constant` to `bool`
*** Now you might be thinking, I could just define a bunch of regular functions such as `logicalNot()`, `lowerOrEquals()` etc. and you would be right (which is what I ended up doing anyway). I guess the point here is that the way D merges these operators to reduce potential bugs actually end up making certain things inexpressible naturally. So what do you guys think? Would you say distinct operators are better?
Jul 07 2023
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
This does beg the question, should we be supporting Unicode mathematical 
symbols for operators?

In theory this should be easy enough to do, its just some Unicode tables 
that the lexer just needs a bit of info on how to categorize after all.

Will need input from Walter if this is an acceptable path to look into 
further though.
Jul 07 2023
parent reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Friday, 7 July 2023 at 09:38:37 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 This does beg the question, should we be supporting Unicode 
 mathematical symbols for operators?
No. If an editor, font, or terminal doesn’t support this, you’re in a bad spot. We had this discussion already. Anyone can use a OTF/TTF font that displays <= as a wider ≤ and call it a day if they want to.
Jul 07 2023
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 07/07/2023 10:34 PM, Quirin Schroll wrote:
 On Friday, 7 July 2023 at 09:38:37 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 This does beg the question, should we be supporting Unicode 
 mathematical symbols for operators?
No. If an editor, font, or terminal doesn’t support this, you’re in a bad spot. We had this discussion already. Anyone can use a OTF/TTF font that displays <= as a wider ≤ and call it a day if they want to.
We are not talking about basic comparison operators that can be represented in a glyph by a ligature. http://www.unicode.org/charts/PDF/U2200.pdf http://www.unicode.org/charts/PDF/U27C0.pdf http://www.unicode.org/charts/PDF/U2A00.pdf There is a lot of mathematical operators which have no equivalent operator overload.
Jul 07 2023
next sibling parent reply IchorDev <zxinsworld gmail.com> writes:
On Friday, 7 July 2023 at 10:39:40 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 We are not talking about basic comparison operators that can be 
 represented in a glyph by a ligature.

 http://www.unicode.org/charts/PDF/U2200.pdf

 http://www.unicode.org/charts/PDF/U27C0.pdf

 http://www.unicode.org/charts/PDF/U2A00.pdf

 There is a lot of mathematical operators which have no 
 equivalent operator overload.
I hope that would never happen. I'm a programmer, I have no idea how to read mathematical operators. I don't want to be forced to decipher the code of others that uses them. It's already enough of a headache when I want to research a *computer science* topic and everything gets explained exclusively in complicated mathematical notation. I'm not a mathematician, I work exclusively with numbers of finite complexity. Additionally, mathematical operators present a huge readability issue. People had enough trouble differentiating (e.g.) `5l` and `51` that `5l` was removed. Many mathematical operators look way too similar unless they're magnified. I think functions with shortened names are just fine...
Jul 07 2023
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
I do not know how to read them either. I doubt you'd see much code using 
them.

It would only be the people who are very familiar with their usage who 
would find any use of it. If for no other reason than everyone else do 
not have quick access to those characters already setup.
Jul 07 2023
parent IchorDev <zxinsworld gmail.com> writes:
On Friday, 7 July 2023 at 12:09:19 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 I do not know how to read them either. I doubt you'd see much 
 code using them.

 It would only be the people who are very familiar with their 
 usage who would find any use of it. If for no other reason than 
 everyone else do not have quick access to those characters 
 already setup.
They could always use mixins that replace those characters with function calls. What I worry about is: as a language feature, it might become common enough that I have to read through it in the code of others to understand what their code is doing.
Jul 07 2023
prev sibling parent Ahmet Sait <nightmarex1337 hotmail.com> writes:
On Friday, 7 July 2023 at 10:39:40 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 [snip]
I didn't post this to discuss unicode characters so I would suggest people to stop derailing discussion with any of that. Thanks.
Jul 07 2023
prev sibling next sibling parent Dennis <dkorpel gmail.com> writes:
On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:
 So what do you guys think? Would you say distinct operators are 
 better?
The thing is, D's operator overloading was designed to support library numeric types, such as half float, fixed point, complex numbers, vector3 etc. It's intentionally limited in order to prevent Domain Specific Languages embedded in D code directly. Templates / string mixins can still be used for that. Now there are other mathematical objects that could *almost* be modeled as a D type with operator overloading, but then it only gets you 90% there before you hit a limitation like `opCmp` for your interval. I have a similar case where my symbolic number type could use an `opCmp` returning another `typeof(this)` instead of an `int` or `float`, but alas. I don't think a proposal to make operator overloading a lot more powerful has much chance, but only making `opCmp` a bit more powerful might if you can demonstrate valid use cases for it.
Jul 07 2023
prev sibling next sibling parent reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
**TL;DR: Getting orderings right in a programming language is 
hard.**

On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:
 […]
 Exercise: Write an `opCmp()` method that implements the rules 
 above for `struct Interval { int lower, upper; }`. (Hint: It's 
 not possible.)
One can’t because D’s philosophy behind `opCmp` is a total ordering and that `a <= b` is the same as `a < b || a == b` (which both isn’t the case for your intervals). The total ordering part can be stretched to a partial ordering by making `opCmp` return `float` and use `float.nan` to indicate that the elements are unrelated.
 […]

 So what do you guys think? Would you say distinct operators are 
 better?
Yes, 100%. Orderings cannot really be done using D’s approach with `opXXX`. Ideally, the programmer provides minimal details about the ordering and the compiler generates the rest so that ordering is consistent. As an example, C++ up until C++20 requires *you* to write both `==` and `!=`. In almost every circumstance, one would define `!=` in terms of `==` completely mechanical. I see no reason why `!=` should even be overloadable. Maybe there are examples where `a != b` is not the same as `!(a == b)`, I’m open enough to accept that, even though I’m not creative enough to find something where this is reasonable. C++20 also added the `<=>` operator to allow for D’s approach because in a lot of cases, it serves the programmer well. I agree with you that `opCmp` is simplistic design; partial orderings feel like a hack. I’ve thought about orderings a lot and I’m glad you’ve exposed me to this, because now I need to re-consider what the most general properties of ordering relations could be. However, I think that your ordering relations make little sense. What properties should we minimally expect of `==`, `<` and `<=`, `>`, and `>=` to be reasonable? 1. Rewrites * `a != b` is the same as `!(a == b)`, therefore `!=` should not be user-definable, `==` suffices. * `a > b` is the same as `b < a`, therefore `>` should not be user-definable, `<` suffices. * `a >= b` is the same as `b <= a`, therefore `>=` should not be user-definable, `<=` suffices. 2. Individual properties * `<`, `<=`, and `==` are [transitive](https://en.wikipedia.org/wiki/Transitive_relation): E.g. for `<`: If `a < b` and `b < c` then `a < c`. * `<` is totally irreflexive: `a < a` is always false. * `<` is asymmetric: If `a < b`, then `b < a` is false. * `==` is symmetrical: `a == b` is the same as `b == a`. * `<=` is partially reflexive: For any `a` and `b`, if `a <= b`, then `a <= a` and `b <= b`. * `==` is partially reflexive as a consequence from transitivity and symmetry: If `a == b`, by symmetry, also `b == a`, thus by transitivity, `a == a` and `b == b`. 3. Interactions * `a == b` and `a < b` are mutually exclusive; in particular: * If `a == b`, then `a < b` is false. (This is a strengthening of irreflexivity.) * If `a < b`, then `a != b`. * Because `==` is symmetrical, the same for `b < a`. * If `a < b` or `a == b`, then `a <= b` (definition of *less-or-equal*). * If `a <= b`, then `a < b` or `a == b` (definition of *less-or-equal*). * If `a <= b` and `a >= b`, then `a == b` (Anti-symmetry). The last two or three are debatable; I have not put them in Rewrites intentionally. As presented, the intervals don’t adhere to the last two. It’s not even clear what `==` on the intervals would be. It looks like the intervals are supposed to represent *any value in the bounds,* which means that – unless the lower and bound is the same, i.e. the interval is a single number – `I == I` should actually fail. In the axioms I presented, reflexivity is partial because `==` and `<=` on IEEE floats are not reflexive: `float.nan != float.nan`, but any value that is equal to any other value is equal to itself. And, of course, the axioms are supposed to be true for build-in types. While partial in the sense that not all elements are included, IEEE floats are partially totally ordered: * We indeed have `a <= b` or `b <= a` for all `a` and `b` such that `a == a` and `b == b`. IEEE floats are also anti-symmetric, and in general, `==` should be the partial equivalence relation induced by `<=`: `a == b` if and only if `a <= b` and `a >= b`. For IEEE floats, positive and negative zero are equivalent, `+0.0 == -0.0`, but they are not equal (in bit pattern, i.e. `+0.0 !is -0.0` and behavior: `1/+0.0 != 1/-0.0`). As another example, the not-a-number values are behaviorally the same, they’re only technically distinct. I’ve written an [answer on StackOverflow](https://stackoverflow.com/a/71844213/3273130) to the question *What is the difference between equality and equivalence?* A TL;DR is: D’s `is` operator is (at least close to) mathematical equality, whereas the `==` operator is merely the default equivalence relation, which can be anything, and (think of `x` and `y` as `ref` parameters) `&x == &y` in D represents identity (any mutation of `x` is a mutation of `y`). The C++ proposal [P0515R0](https://open-std.org/JTC1/SC22/WG21/docs/papers/2017/p0515r0.pdf) is also a good read. It was rejected, but introduces relevant notions. Look at least at § 1.4.1 *Guidance: What we teach.*
Jul 07 2023
parent reply Ahmet Sait <nightmarex1337 hotmail.com> writes:
On Friday, 7 July 2023 at 13:28:23 UTC, Quirin Schroll wrote:
 The last two or three are debatable; I have not put them in 
 Rewrites intentionally. As presented, the intervals don’t 
 adhere to the last two. It’s not even clear what `==` on the 
 intervals would be. It looks like the intervals are supposed to 
 represent *any value in the bounds,* which means that – unless 
 the lower and bound is the same, i.e. the interval is a single 
 number – `I == I` should actually fail.
This is exactly how I define equality (which differs from how math folks define it); if X and Y are intervals then the comparison between them must hold for all possible values of these intervals which means equality only holds if they converge on a single point on the numerical axis. Seems like the only definition that's useful for implementing VRP.
 The C++ proposal 
 [P0515R0](https://open-std.org/JTC1/SC22/WG21/docs/papers/2017/p0515r0.pdf) is
also a good read. It was rejected, but introduces relevant notions. Look at
least at § 1.4.1 *Guidance: What we teach.*
That's an interesting read, thanks for the link.
Jul 07 2023
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 7/7/23 17:49, Ahmet Sait wrote:
 Seems like the only definition that's useful for implementing VRP.
VRP is a kind of abstract interpretation, and the natural definition for that is the perfect transformer: `I op I' = { I op I' | x∈I, x'∈I' }` I.e., I'd say it is more useful to have three-valued logic operating on `{false}`, `{true}` and `{false,true}`. Alternatively, `I <=> I' = { I <=> I' | x∈I, x'∈I' }` producing a non-empty subset of `{-1,0,1}`. In any case, implementing abstract interpretation as operations on independent abstract elements in the first place is not particularly natural and only works in very special cases such as the interval domain.
Jul 07 2023
parent reply Dom DiSc <dominikus scherkl.de> writes:
On Friday, 7 July 2023 at 17:15:45 UTC, Timon Gehr wrote:
 On 7/7/23 17:49, Ahmet Sait wrote:
 Seems like the only definition that's useful for implementing 
 VRP.
VRP is a kind of abstract interpretation, and the natural definition for that is the perfect transformer: `I op I' = { I op I' | x∈I, x'∈I' }` I.e., I'd say it is more useful to have three-valued logic operating on `{false}`, `{true}` and `{false,true}`. Alternatively, `I <=> I' = { I <=> I' | x∈I, x'∈I' }` producing a non-empty subset of `{-1,0,1}`. In any case, implementing abstract interpretation as operations on independent abstract elements in the first place is not particularly natural and only works in very special cases such as the interval domain.
You're all aware that opCmp can return a float, with the values 0,1,-1 and NaN? This can be used for intervals very good (returning NaN if the intervals overlap, 0 if they are equal and 1 or -1 if they are disjunct). Unfortunately float is 32 bit, so pretty much the first thing I did after I found D was to implement a 2-bit type with exactly those 4 values.
Jul 08 2023
next sibling parent Ahmet Sait <nightmarex1337 hotmail.com> writes:
On Saturday, 8 July 2023 at 08:32:00 UTC, Dom DiSc wrote:
 You're all aware that opCmp can return a float, with the values 
 0,1,-1 and NaN?
 This can be used for intervals very good (returning NaN if the 
 intervals overlap, 0 if they are equal and 1 or -1 if they are 
 disjunct).
You're more than welcome to prove us wrong ;)
Jul 08 2023
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 7/8/23 10:32, Dom DiSc wrote:
 In any case, implementing abstract interpretation as operations on 
 independent abstract elements in the first place is not particularly 
 natural and only works in very special cases such as the interval domain.
You're all aware that opCmp can return a float, with the values 0,1,-1 and NaN? This can be used for intervals very good (returning NaN if the intervals overlap, 0 if they are equal and 1 or -1 if they are disjunct). Unfortunately float is 32 bit, so pretty much the first thing I did after I found D was to implement a 2-bit type with exactly those 4 values.
I am very aware of this. OTOH, you may not be aware that your solution does not satisfy the OP's constraints. It cannot do so because D's operator overloading while not enforcing `a <= b || b < a` still enforces `a <= b <==> a < b || b <= a`, but while `[x,y] <= [y,z]` holds, neither `[x,y] < [y,z]` nor `[y,z] <= [x,y]` hold for x<y<z`. This means the OP is indeed right that it is impossible.
Jul 08 2023
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 7/8/23 19:22, Timon Gehr wrote:
 `a <= b <==> a < b || b <= a`
Typo. This is of course only an implication: `a <= b ==> a < b || b <= a`. In any case, the argument stands.
Jul 08 2023
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:
 Let's define two intervals x and y:
 {x ∈ ℝ │ a ≤ x ≤ b}
 {y ∈ ℝ │ c ≤ y ≤ d}

 And the inequality operators:
 ```
 x < y = ⎧ 1  if b < c
         ⎨
         ⎩ 0  otherwise

 x ≤ y = ⎧ 1  if b ≤ c
         ⎨
         ⎩ 0  otherwise

 x > y = ⎧ 1  if a > d
         ⎨
         ⎩ 0  otherwise

 x ≥ y = ⎧ 1  if a ≥ d
         ⎨
         ⎩ 0  otherwise
 ```

 Exercise: Write an `opCmp()` method that implements the rules 
 above for `struct Interval { int lower, upper; }`. (Hint: It's 
 not possible.)
Thanks god. If you are doing something non standard, do not use the builtin operators, give you operation an explicit name.
Jul 07 2023
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, July 7, 2023 10:18:56 AM MDT deadalnix via Digitalmars-d wrote:
 On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:
 Let's define two intervals x and y:
 {x ∈ ℝ │ a ≤ x ≤ b}
 {y ∈ ℝ │ c ≤ y ≤ d}

 And the inequality operators:
 ```
 x < y = ⎧ 1  if b < c

         ⎨
         ⎩ 0  otherwise

 x ≤ y = ⎧ 1  if b ≤ c

         ⎨
         ⎩ 0  otherwise

 x > y = ⎧ 1  if a > d

         ⎨
         ⎩ 0  otherwise

 x ≥ y = ⎧ 1  if a ≥ d

         ⎨
         ⎩ 0  otherwise

 ```

 Exercise: Write an `opCmp()` method that implements the rules
 above for `struct Interval { int lower, upper; }`. (Hint: It's
 not possible.)
Thanks god. If you are doing something non standard, do not use the builtin operators, give you operation an explicit name.
Yeah. The limitations on opCmp are very much on purpose. We already arguably have issues due to how the comparison operations work with NaN (it's a useful feature, but it does potentially cause problems with generic code). The comparison operators are very much intended to work in the standard manner even on user-defined types, and types where that isn't going to work should be using a different set of functions. Operator overloading in general is provided so that user-defined types can be used like built-in types and work with generic code that uses those operators. Domain-specific language stuff should probably be left to either a domain-specific language or just use properly named functions. Trying to alter the built-in operators to operate in a way that differs like this from the built-in types is arguably an abuse of operator overloading. So, while obviously, if someone is looking to do this like they might have done in C++, that can be annoying for them, it's really not something that D is looking to support. - Jonathan M Davis
Jul 07 2023
prev sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 7 July 2023 at 09:18:38 UTC, Ahmet Sait wrote:
 
I dislike the current scheme because its not very meta programmable like, id like to write Nullable!T opcmp in 3 lines of code ```d opCmp(typeof(this) a){ if(a.isnull != isnull){ return isnull.opCmp(a.isnull);} return value.opCmp(a.value); } ``` That is very very much NOT how it works; maybe it should act like opBinary and `a < b` is rewritten as `a.opCmp!"<"(b)` with some list of prefered overloads
Jul 08 2023