digitalmars.D - Range of enum type values

Johan Engelen (47/47) Dec 27 2019 Hi all,

Timon Gehr (23/34) Dec 27 2019 Signed integers have wraparound semantics.

Steven Schveighoffer (10/57) Dec 27 2019 We have another option, which I like. That is, only allow bitwise

Johan Engelen (24/82) Dec 27 2019 Thanks for the correction!

Timon Gehr (27/35) Dec 28 2019 It's not a tangent, it's a powerful tool to decide whether something has...

ag0aep6g (5/24) Dec 28 2019 Also: https://dlang.org/spec/function.html#function-safety
Johan Engelen (25/30) Dec 28 2019 I think we are talking about different things here.

Walter Bright (6/15) Dec 28 2019 What it does mean is that all instances of #safe resulting in #undefined...

Timon Gehr (5/21) Dec 28 2019 Yes, but the only reasonable kinds of exceptions are invalid

Timon Gehr (8/25) Dec 28 2019 I acknowledge that the implementation has bugs. I reported a number of

Timon Gehr (3/12) Dec 28 2019 I think so, but you might want to add that currently, e.g. `cast(E)x` is...

Walter Bright (5/5) Dec 28 2019 I am skeptical about the value of major breaking changes with enums at t...

Johan Engelen (15/19) Dec 28 2019 Yeah, I agree. It's good to clarify things in the spec though. To

Johan Engelen <j j.nl> writes:

Hi all,
   I am wondering about the valid range of values of an enum type. 
I couldn't find anything explicit about this in the language 
specification.

Consider this code:
```
enum Flags
{
     A = 1,
     B = 2,
     C = 4
}

bool rangeCheck(Flags f)
{
     return (f >= Flags.min) && (f <= Flags.max);
}
bool preciseCheck(Flags f)
{
     return (f == Flags.A) || (f == Flags.B) || (f == Flags.C);
}```

Is `rangeCheck` guaranteed to return true? Is `preciseCheck` 
guaranteed to return true?
A variable of type Flags is always initialized to Flags.A.
Integer assignment is not allowed.
So `Flags f` should always have a value of A B or C, right?

No. This code is accepted:
```
Flags getFlags()
{
     return Flags.A | Flags.B; // and so is `^`, `&`, `+`, `*`, ...
}
```

I'd like to have the value range implications of the use of 
operators on enum values explicitly mentioned in the spec.

Current compiler behavior results in an infinite value range. 
(but it's implicit behavior, i.e. not explicitly mentioned in 
spec)

- Currently, are operations resulting in a value larger than the 
underlying integer storage type UB, like for normal signed 
integers?
- Should we limit the range of valid values of the Flags enum 
(C++ defines valid range to be [0..7])?
- Do we want to limit operations allowed on enum types? Or change 
the result type? (e.g. the type of `Flags + Flags` is `int` 
instead of `Flags`.

cheers,
   Johan

Dec 27 2019

Timon Gehr <timon.gehr gmx.ch> writes:

On 27.12.19 13:14, Johan Engelen wrote:
 
 Current compiler behavior results in an infinite value range. (but it's 
 implicit behavior, i.e. not explicitly mentioned in spec)
 
 - Currently, are operations resulting in a value larger than the 
 underlying integer storage type UB,

They are  safe. You can't have UB in  safe code.

 like for normal signed integers?

Signed integers have wraparound semantics.
https://forum.dlang.org/thread/n23bo3$qe$1 digitalmars.com

The spec mentions this for AddExpressions (but the example only shows it 
for uint): https://dlang.org/spec/expression.html
"If both operands are of integral types and an overflow or underflow 
occurs in the computation, wrapping will happen."

There simply _can't_ be any UB in signed integer operations, as they are 
considered  safe.

 - Should we limit the range of valid values of the Flags enum (C++ 
 defines valid range to be [0..7])?
 - Do we want to limit operations allowed on enum types? Or change the 
 result type? (e.g. the type of `Flags + Flags` is `int` instead of `Flags`.

I see those options:

1. The valid range is the full range of the underlying type (as DMD 
treats it now).

2. The range is [1..4]. In this case, the operations have to promote 
their operands to the enum base type, and most casts to enum types must 
be  system.

3. The range is [0..7]. In this case, only operations that preserve this 
range (such as bitwise operators) should yield the enum type, and other 
operations should promote their operands to the enum base type, and most 
casts to enum types must be  system.


Personally, I think 2 makes most sense (especially with `final switch`, 
as the current semantics forces compilers to insert default cases 
there), but this would be a breaking language change.

Dec 27 2019

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/27/19 9:12 AM, Timon Gehr wrote:
 On 27.12.19 13:14, Johan Engelen wrote:
 Current compiler behavior results in an infinite value range. (but 
 it's implicit behavior, i.e. not explicitly mentioned in spec)

 - Currently, are operations resulting in a value larger than the 
 underlying integer storage type UB,

 
 They are  safe. You can't have UB in  safe code.
 
 like for normal signed integers?

 
 Signed integers have wraparound semantics.
 https://forum.dlang.org/thread/n23bo3$qe$1 digitalmars.com
 
 The spec mentions this for AddExpressions (but the example only shows it 
 for uint): https://dlang.org/spec/expression.html
 "If both operands are of integral types and an overflow or underflow 
 occurs in the computation, wrapping will happen."
 
 There simply _can't_ be any UB in signed integer operations, as they are 
 considered  safe.
 
 - Should we limit the range of valid values of the Flags enum (C++ 
 defines valid range to be [0..7])?
 - Do we want to limit operations allowed on enum types? Or change the 
 result type? (e.g. the type of `Flags + Flags` is `int` instead of 
 `Flags`.

 
 I see those options:
 
 1. The valid range is the full range of the underlying type (as DMD 
 treats it now).
 
 2. The range is [1..4]. In this case, the operations have to promote 
 their operands to the enum base type, and most casts to enum types must 
 be  system.
 
 3. The range is [0..7]. In this case, only operations that preserve this 
 range (such as bitwise operators) should yield the enum type, and other 
 operations should promote their operands to the enum base type, and most 
 casts to enum types must be  system.
 
 
 Personally, I think 2 makes most sense (especially with `final switch`, 
 as the current semantics forces compilers to insert default cases 
 there), but this would be a breaking language change.

We have another option, which I like. That is, only allow bitwise 
operations on enums that are flagged as allowing bitwise operations 
(either with a uda, or via some other mechanism). Many languages 
actually treat enums just like structs, where you can add operators and 
functions. This is also a possibility.

This is also a breaking change, but also I don't want the compiler 
complaining about final switch on enums where the enum is intended not 
to be a bitwise flag. So I'd prefer 2 over 3.

-Steve

Dec 27 2019

Johan Engelen <j j.nl> writes:

On Friday, 27 December 2019 at 14:58:59 UTC, Steven Schveighoffer 
wrote:
 On 12/27/19 9:12 AM, Timon Gehr wrote:
 On 27.12.19 13:14, Johan Engelen wrote:
 Current compiler behavior results in an infinite value range. 
 (but it's implicit behavior, i.e. not explicitly mentioned in 
 spec)

 - Currently, are operations resulting in a value larger than 
 the underlying integer storage type UB,

 
 They are  safe. You can't have UB in  safe code.
 
 like for normal signed integers?

 
 Signed integers have wraparound semantics.
 https://forum.dlang.org/thread/n23bo3$qe$1 digitalmars.com


Thanks for the correction!
I hope someone finds the time to make that more explicit in the 
spec.

 The spec mentions this for AddExpressions (but the example 
 only shows it for uint): https://dlang.org/spec/expression.html
 "If both operands are of integral types and an overflow or 
 underflow occurs in the computation, wrapping will happen."
 
 There simply _can't_ be any UB in signed integer operations, 
 as they are considered  safe.


I don't accept this argument [*], but no argument needed here. 
Just needs some clarification in spec text.

 - Should we limit the range of valid values of the Flags enum 
 (C++ defines valid range to be [0..7])?
 - Do we want to limit operations allowed on enum types? Or 
 change the result type? (e.g. the type of `Flags + Flags` is 
 `int` instead of `Flags`.

 
 I see those options:
 
 1. The valid range is the full range of the underlying type 
 (as DMD treats it now).
 
 2. The range is [1..4]. In this case, the operations have to 
 promote their operands to the enum base type, and most casts 
 to enum types must be  system.
 
 3. The range is [0..7]. In this case, only operations that 
 preserve this range (such as bitwise operators) should yield 
 the enum type, and other operations should promote their 
 operands to the enum base type, and most casts to enum types 
 must be  system.
 
 
 Personally, I think 2 makes most sense (especially with `final 
 switch`, as the current semantics forces compilers to insert 
 default cases there), but this would be a breaking language 
 change.

 We have another option, which I like. That is, only allow 
 bitwise operations on enums that are flagged as allowing 
 bitwise operations (either with a uda, or via some other 
 mechanism). Many languages actually treat enums just like 
 structs, where you can add operators and functions. This is 
 also a possibility.

 This is also a breaking change, but also I don't want the 
 compiler complaining about final switch on enums where the enum 
 is intended not to be a bitwise flag. So I'd prefer 2 over 3.

Let's separate the discussion into what it _currently is_ and 
what _it might be in future_.

Current language behavior:
enum value range = full range of base type; integer operations 
work as-if the type is the base type.

Future:
Several options + lots of discussion ;-) and DIP needed.

Can I summarize it like that?

cheers,
   Johan



[*]
Let's not go off on a tangent, but there is enough UB in D that I 
do not accept that  safe=="no UB" argument. One example that 
comes to mind is bitshifting by more than the operand bit width: 
"illegal" is what the spec says but that doesn't make sense for 
runtime shift values and, in practice, turns into UB at runtime. 
;-)

Dec 27 2019

Timon Gehr <timon.gehr gmx.ch> writes:

On 27.12.19 19:12, Johan Engelen wrote:
 
 [*]
 Let's not go off on a tangent,

It's not a tangent, it's a powerful tool to decide whether something has 
any business being UB or not.

 but there is enough UB in D that I do not accept that  safe=="no UB" argument.

https://dlang.org/articles/safed.html

"In D, we expect the vast majority of programmers to operate within the 
safe subset of D, which we call SafeD. The safety and the ease of use of 
SafeD is comparable to Java—in fact Java programs can be 
machine-translated into this safe subset of D. SafeD is easy to learn 
and it keeps the programmers away from undefined behaviors. It is also 
very efficient."

"[...] you are guaranteed not to encounter any undefined behavior."

https://dlang.org/spec/memory-safe-d.html

"Therefore, the safe subset of D consists only of programming language 
features that are guaranteed to never result in memory corruption. See 
this article for a rationale."

("this article" links to
https://dlang.org/articles/safed.html)

 One example that comes to mind is 
 bitshifting by more than the operand bit width: "illegal" is what the 
 spec says but that doesn't make sense for runtime shift values and, in 
 practice, turns into UB at runtime. ;-)

What that means is not that UB is allowed in  safe code, but rather that 
the spec hasn't been properly updated after  safe was introduced to 
clarify what "illegal" means here. It should mean that the returned 
value is arbitrary, not that the behavior of the entire program will be 
arbitrary. I think Walter has said as much before, but I can't find the 
post.

 safe is meant to imply no memory corruption.  safe implies no UB, 
because UB can lead to any behavior, including memory corruption. UB 
allows compilers to insert arbitrary code execution exploits. How can 
you call that  safe?

Dec 28 2019

ag0aep6g <anonymous example.com> writes:

On 28.12.19 14:33, Timon Gehr wrote:
 https://dlang.org/articles/safed.html
 
 "In D, we expect the vast majority of programmers to operate within the 
 safe subset of D, which we call SafeD. The safety and the ease of use of 
 SafeD is comparable to Java—in fact Java programs can be 
 machine-translated into this safe subset of D. SafeD is easy to learn 
 and it keeps the programmers away from undefined behaviors. It is also 
 very efficient."
 
 "[...] you are guaranteed not to encounter any undefined behavior."
 
 https://dlang.org/spec/memory-safe-d.html
 
 "Therefore, the safe subset of D consists only of programming language 
 features that are guaranteed to never result in memory corruption. See 
 this article for a rationale."
 
 ("this article" links to
 https://dlang.org/articles/safed.html)

Also: https://dlang.org/spec/function.html#function-safety

"Safe functions are functions that are statically checked to exhibit no 
possibility of undefined behavior."

"Safe functions are marked with the  safe attribute."

Dec 28 2019

Johan Engelen <j j.nl> writes:

On Saturday, 28 December 2019 at 13:33:25 UTC, Timon Gehr wrote:
 
  safe is meant to imply no memory corruption.  safe implies no 
 UB, because UB can lead to any behavior, including memory 
 corruption. UB allows compilers to insert arbitrary code 
 execution exploits. How can you call that  safe?

I think we are talking about different things here.

You are saying: the spec says  safe means no UB, and if the spec 
doesn't say it then it simply needs updating. There are a number 
of text pieces that say that.

I am saying: regardless of what the spec and any of those 
articles promise, current D behavior is that  safe _can_ have UB 
in it.

I know most people don't like to hear it nor acknowledge it. But 
I think it is better to be realistic about this. ` safe` 
currently does _not_ mean the code is super safe.

One could say that the compilers are just not standard-compliant, 
nor is the spec itself, but the problem is bigger than that. 
Adding null dereference checks everywhere is not what (I think) 
people want. So that means there will always be UB potential in 
 safe code with interface method calls or class member variable 
access. I know about the "we specify that reading from NULL must 
result in a segfault". It misses the point by not understanding 
that a "null dereference" doesn't mean "access address 0" (let 
alone that by that we disallow e.g. system programmers to 
actually use address 0). Member variable access often does not 
access address 0x0, nor does an interface method call.

(I've been in these discussions too many times now. I'll try to 
stop arguing it.)

-Johan

Dec 28 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 12/28/2019 12:22 PM, Johan Engelen wrote:
 You are saying: the spec says  safe means no UB, and if the spec doesn't say
it 
 then it simply needs updating. There are a number of text pieces that say that.
 
 I am saying: regardless of what the spec and any of those articles promise, 
 current D behavior is that  safe _can_ have UB in it.
 
 I know most people don't like to hear it nor acknowledge it. But I think it is 
 better to be realistic about this. ` safe` currently does _not_ mean the code
is 
 super safe.

What it does mean is that all instances of #safe resulting in
#undefinedBehavior 
should be logged either in bugzilla or noted as an exception in the
#specification.

Note that there's a bugzilla keyword 'safe' which can tag them. Here's the 
current set of open tagged issues:

https://issues.dlang.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&keywords=safe&keywords_type=allwords&list_id=208819&query_format=advanced

Dec 28 2019

Timon Gehr <timon.gehr gmx.ch> writes:

On 28.12.19 22:08, Walter Bright wrote:
 On 12/28/2019 12:22 PM, Johan Engelen wrote:
 You are saying: the spec says  safe means no UB, and if the spec 
 doesn't say it then it simply needs updating. There are a number of 
 text pieces that say that.

 I am saying: regardless of what the spec and any of those articles 
 promise, current D behavior is that  safe _can_ have UB in it.

 I know most people don't like to hear it nor acknowledge it. But I 
 think it is better to be realistic about this. ` safe` currently does 
 _not_ mean the code is super safe.

 
 What it does mean is that all instances of #safe resulting in 
 #undefinedBehavior should be logged either in bugzilla

Yes.

 or noted as an exception in the #specification.
 ...

Yes, but the only reasonable kinds of exceptions are invalid 
 trusted/ system code or specific compiler flags to disable safety (e.g. 
-boundscheck=off). There can't be any other exceptions.

Dec 28 2019

Timon Gehr <timon.gehr gmx.ch> writes:

On 28.12.19 21:22, Johan Engelen wrote:
 On Saturday, 28 December 2019 at 13:33:25 UTC, Timon Gehr wrote:
  safe is meant to imply no memory corruption.  safe implies no UB, 
 because UB can lead to any behavior, including memory corruption. UB 
 allows compilers to insert arbitrary code execution exploits. How can 
 you call that  safe?

 
 I think we are talking about different things here.
 
 You are saying: the spec says  safe means no UB, and if the spec doesn't 
 say it then it simply needs updating. There are a number of text pieces 
 that say that.
 
 I am saying: regardless of what the spec and any of those articles 
 promise, current D behavior is that  safe _can_ have UB in it.

Each of those instances is a bug.

 
 I know most people don't like to hear it nor acknowledge it. But I think it is
better to be realistic about this. ` safe` currently does _not_ mean the code
is super safe.

I acknowledge that the implementation has bugs. I reported a number of 
frontend bugs in the type checker myself. What you are saying is that 
the backends have bugs as well. It's not very surprising, but I don't 
think you can use existing bugs in the implementation as a justification 
to deliberately introduce more of those bugs, which is what I read when 
you write "I don't buy this argument".

Dec 28 2019

Timon Gehr <timon.gehr gmx.ch> writes:

On 27.12.19 19:12, Johan Engelen wrote:
 
 Current language behavior:
 enum value range = full range of base type; integer operations work 
 as-if the type is the base type.
 
 Future:
 Several options + lots of discussion ;-) and DIP needed.
 
 Can I summarize it like that?

I think so, but you might want to add that currently, e.g. `cast(E)x` is 
 safe for an `enum E:typeof(x){ ... }`

Dec 28 2019

Walter Bright <newshound2 digitalmars.com> writes:

I am skeptical about the value of major breaking changes with enums at this 
point, as it doesn't seem like there are a lot of undetected bugs emanating
from 
the fairly loose definition of them.

Related to this is the ability to specify a range of values for a type, rather 
than enumerating them.

Dec 28 2019

Johan Engelen <j j.nl> writes:

On Saturday, 28 December 2019 at 20:20:46 UTC, Walter Bright 
wrote:
 I am skeptical about the value of major breaking changes with 
 enums at this point, as it doesn't seem like there are a lot of 
 undetected bugs emanating from the fairly loose definition of 
 them.

Yeah, I agree. It's good to clarify things in the spec though. To 
prevent someone (i.e. me) from trying to use enum range 
information for optimization.

https://github.com/dlang/dlang.org/pull/2728

Can we add a text like: "The enum type can be used in operator 
expressions (like AddExpression): the resulting type is the enum 
type, and the resulting value is computed by performing the 
operation as if the type is the enum base type. A variable of 
type enum does not have to have a value that corresponds with any 
of the enum members; the range of valid values for an enum typed 
variable is [basetype.min ... basetype.max]."

I'm not so satisfied with this text though.

-Johan

Dec 28 2019

D Programming

C/C++ Programming

Other

digitalmars.D - Range of enum type values