digitalmars.D - VRP and signed <-> unsigned conversion

Steven Schveighoffer (26/26) Dec 15 2021 Here's an interesting situation that I hadn't considered before:

Commander Zot (5/12) Dec 15 2021 number 1.
Quirin Schroll (32/40) Dec 15 2021 I don't understand why `byte` should be implicitly convertible to

Steven Schveighoffer (15/42) Dec 15 2021 (they did)

Era Scarecrow (8/12) Dec 15 2021 Using GDC (64bit) and working on my Reed Solomon codes, i'm
Kagamin (9/18) Dec 16 2021 If you want proofs, D has safe conversions:

Kagamin (18/22) Dec 16 2021 In your case the value is int typed and when a negative int value

Steven Schveighoffer <schveiguy gmail.com> writes:

Here's an interesting situation that I hadn't considered before:

```d
ubyte foo(ubyte a, ubyte b)
{
    return (a & 0xf) - (b & 0xf);
}
```

This fails to compile because the VRP of both integer operands is 0 - 
15. This gives a VRP of the result of -15 to 15, which does not fit into 
a `ubyte`.

However, -15 to 15 *does* fit into a `byte`. And a `byte` implicitly 
casts to a `ubyte`, so you can rewrite the function:

```d
ubyte foo(ubyte a, ubyte b)
{
    byte result = (a & 0xf) - (b & 0xf);
    return result;
}
```

I'm wondering:

1. Does it make sense for this to be valid? Should we reexamine unsigned 
<-> signed implicit casting?
2. If the above rewrite is possible, shouldn't VRP just allow this 
conversion? i.e. a type that has an unsigned/signed counterpart should 
be assignable if the signed/unsigned can accept the range.

-Steve

Dec 15 2021

Commander Zot <no no.no> writes:

On Wednesday, 15 December 2021 at 14:39:09 UTC, Steven 
Schveighoffer wrote:

 1. Does it make sense for this to be valid? Should we reexamine 
 unsigned <-> signed implicit casting?
 2. If the above rewrite is possible, shouldn't VRP just allow 
 this conversion? i.e. a type that has an unsigned/signed 
 counterpart should be assignable if the signed/unsigned can 
 accept the range.

 -Steve

number 1.
if a conversion cannot be proven to not truncate it should 
require a cast.

Dec 15 2021

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Wednesday, 15 December 2021 at 14:39:09 UTC, Steven 
Schveighoffer wrote:
 […]

 I'm wondering:

 1. Does it make sense for this to be valid? Should we reexamine 
 unsigned <-> signed implicit casting?

I don't understand why `byte` should be implicitly convertible to 
`ubyte`. Seeing this as is, I think this is a bug.

 2. If the above rewrite is possible, shouldn't VRP just allow 
 this conversion? i.e. a type that has an unsigned/signed 
 counterpart should be assignable if the signed/unsigned can 
 accept the range.

To me, it seems that VRP is designed around mathematical 
intuition in which the integer types are seen as {−2ⁿ⁻¹, …, 
2ⁿ⁻¹−1} and {0, …, 2ⁿ−1} for *n* appropriate. (I hope the Unicode 
superscripts render properly.)
The problem starts with subtraction. If *x, y* ∈ {0, …, 15}, then 
*x* − *y* ∈ {−15, …, 15}. A lot of professional people know that 
(at least the unsigned types) implement arithmetic modulo 2ⁿ, so 
*x* − *y* is well-defined. However, you *can* see unsigned types 
as positive types (e.g. when taking the length of an array); in 
this case, subtraction *x* − *y* makes no sense when *y* > *x*.
I guess the whole problem comes from the double-role unsigned 
types play: positive numbers vs. mod-2ⁿ numbers; a triple-role 
with bit-operations.

This is a fundamental design problem, and VRP cannot fix it. I 
don't know of a good solution, I've yet to see one. What I've 
never seen is splitting integers into *three* types: signed ones 
for {−2ⁿ⁻¹, …, 2ⁿ⁻¹−1}, unsigned ones for
{0, …, 2ⁿ−1}, and 
bit-vectors for {0, 1}ⁿ. Bit operators would only be available 
for the latter, arithmetic for all of them, with the intuition 
that bit-vectors are mod-2ⁿ meaning that all operations are 
well-defined except division by zero, but for signed and unsigned 
types, all operations are partial (except unary plus). Even unary 
minus is partial for signed types because −(−2ⁿ⁻¹) ∉ {−2ⁿ⁻¹,
…, 
2ⁿ⁻¹−1}. What (ideal) VRP can do is keep you from code that might 
leave the domain.
Another thing to note is that division in modular arithmetic is a 
bit weird: In mod-8 arithmetic, 7 ÷ 3 = 5 (as 3 × 5 = 15 ≡ 7), 
but virtually nobody wants that.

Dec 15 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/15/21 1:04 PM, Quirin Schroll wrote:
 On Wednesday, 15 December 2021 at 14:39:09 UTC, Steven Schveighoffer wrote:
 […]

 I'm wondering:

 1. Does it make sense for this to be valid? Should we reexamine 
 unsigned <-> signed implicit casting?

 
 I don't understand why `byte` should be implicitly convertible to 
 `ubyte`. Seeing this as is, I think this is a bug.

If it's a bug, it's a bug in design. From C, that D inherited.

 
 2. If the above rewrite is possible, shouldn't VRP just allow this 
 conversion? i.e. a type that has an unsigned/signed counterpart should 
 be assignable if the signed/unsigned can accept the range.

 
 To me, it seems that VRP is designed around mathematical intuition in 
 which the integer types are seen as {−2ⁿ⁻¹, …, 2ⁿ⁻¹−1} and {0,
…, 2ⁿ−1} 
 for *n* appropriate. (I hope the Unicode superscripts render properly.)

(they did)

 The problem starts with subtraction. If *x, y* ∈ {0, …, 15}, then *x*
− 
 *y* ∈ {−15, …, 15}. A lot of professional people know that (at least
the 
 unsigned types) implement arithmetic modulo 2ⁿ, so *x* − *y* is 
 well-defined. However, you *can* see unsigned types as positive types 
 (e.g. when taking the length of an array); in this case, subtraction *x* 
 − *y* makes no sense when *y* > *x*.
 I guess the whole problem comes from the double-role unsigned types 
 play: positive numbers vs. mod-2ⁿ numbers; a triple-role with 
 bit-operations.

The impetus of VRP was simple -- C allows implicit casting between *all* 
integer types. D did not want to do that due to the errors it causes. 
However, since integer promotion is mimicked from C, all math is done at 
integer level (this makes C code that's compiled with D code behave 
similarly). However, it's quite painful to have to cast everything with 
integers that are smaller than 32-bit ints, so VRP is used to assist in 
making sure you don't have to cast if the compiler can prove the value fits.

But this quirk of allowing implicit casting between unsigned and signed 
violates that spirit. To me it feels like a bad code smell. I also have 
issues with the character types being considered as integers. D could do 
a lot better in this area I think.

-Steve

Dec 15 2021

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Wednesday, 15 December 2021 at 18:43:38 UTC, Steven 
Schveighoffer wrote:
 However, it's quite painful to have to cast everything with 
 integers that are smaller than 32-bit ints, so VRP is used to 
 assist in making sure you don't have to cast if the compiler 
 can prove the value fits.

  Using GDC (64bit) and working on my Reed Solomon codes, i'm 
finding the default of long throws me off when i was using ints; 
So everything smaller than 64bit (Which is everything) has to be 
cast. I'm starting to default to size_t for everything to make it 
go away without a ton of ugly casting, except in a handful of 
places that need manual byte conversions.

Dec 15 2021

Kagamin <spam here.lot> writes:

On Wednesday, 15 December 2021 at 18:43:38 UTC, Steven 
Schveighoffer wrote:
 The impetus of VRP was simple -- C allows implicit casting 
 between *all* integer types. D did not want to do that due to 
 the errors it causes. However, since integer promotion is 
 mimicked from C, all math is done at integer level (this makes 
 C code that's compiled with D code behave similarly). However, 
 it's quite painful to have to cast everything with integers 
 that are smaller than 32-bit ints, so VRP is used to assist in 
 making sure you don't have to cast if the compiler can prove 
 the value fits.

If you want proofs, D has safe conversions:
```
ubyte foo(ubyte a, ubyte b)
{
    return byte((a & 0xf) - (b & 0xf));
}
```

Dec 16 2021

Kagamin <spam here.lot> writes:

On Wednesday, 15 December 2021 at 14:39:09 UTC, Steven 
Schveighoffer wrote:
 2. If the above rewrite is possible, shouldn't VRP just allow 
 this conversion? i.e. a type that has an unsigned/signed 
 counterpart should be assignable if the signed/unsigned can 
 accept the range.

In your case the value is int typed and when a negative int value 
is converted to ubyte, you lose upper 3 bytes. It's more like

```
ubyte f()
{
    int n=-15;
    return n;
}
```

Well, even this doesn't compile:
```
ubyte f()
{
    return -15;
}
```

Dec 16 2021

D Programming

C/C++ Programming

Other

digitalmars.D - VRP and signed <-> unsigned conversion