www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Range of enum type values

reply Johan Engelen <j j.nl> writes:
Hi all,
   I am wondering about the valid range of values of an enum type. 
I couldn't find anything explicit about this in the language 
specification.

Consider this code:
```
enum Flags
{
     A = 1,
     B = 2,
     C = 4
}

bool rangeCheck(Flags f)
{
     return (f >= Flags.min) && (f <= Flags.max);
}
bool preciseCheck(Flags f)
{
     return (f == Flags.A) || (f == Flags.B) || (f == Flags.C);
}```

Is `rangeCheck` guaranteed to return true? Is `preciseCheck` 
guaranteed to return true?
A variable of type Flags is always initialized to Flags.A.
Integer assignment is not allowed.
So `Flags f` should always have a value of A B or C, right?

No. This code is accepted:
```
Flags getFlags()
{
     return Flags.A | Flags.B; // and so is `^`, `&`, `+`, `*`, ...
}
```

I'd like to have the value range implications of the use of 
operators on enum values explicitly mentioned in the spec.

Current compiler behavior results in an infinite value range. 
(but it's implicit behavior, i.e. not explicitly mentioned in 
spec)

- Currently, are operations resulting in a value larger than the 
underlying integer storage type UB, like for normal signed 
integers?
- Should we limit the range of valid values of the Flags enum 
(C++ defines valid range to be [0..7])?
- Do we want to limit operations allowed on enum types? Or change 
the result type? (e.g. the type of `Flags + Flags` is `int` 
instead of `Flags`.

cheers,
   Johan
Dec 27 2019
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 27.12.19 13:14, Johan Engelen wrote:
 
 Current compiler behavior results in an infinite value range. (but it's 
 implicit behavior, i.e. not explicitly mentioned in spec)
 
 - Currently, are operations resulting in a value larger than the 
 underlying integer storage type UB,
They are safe. You can't have UB in safe code.
 like for normal signed integers?
Signed integers have wraparound semantics. https://forum.dlang.org/thread/n23bo3$qe$1 digitalmars.com The spec mentions this for AddExpressions (but the example only shows it for uint): https://dlang.org/spec/expression.html "If both operands are of integral types and an overflow or underflow occurs in the computation, wrapping will happen." There simply _can't_ be any UB in signed integer operations, as they are considered safe.
 - Should we limit the range of valid values of the Flags enum (C++ 
 defines valid range to be [0..7])?
 - Do we want to limit operations allowed on enum types? Or change the 
 result type? (e.g. the type of `Flags + Flags` is `int` instead of `Flags`.
I see those options: 1. The valid range is the full range of the underlying type (as DMD treats it now). 2. The range is [1..4]. In this case, the operations have to promote their operands to the enum base type, and most casts to enum types must be system. 3. The range is [0..7]. In this case, only operations that preserve this range (such as bitwise operators) should yield the enum type, and other operations should promote their operands to the enum base type, and most casts to enum types must be system. Personally, I think 2 makes most sense (especially with `final switch`, as the current semantics forces compilers to insert default cases there), but this would be a breaking language change.
Dec 27 2019
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/27/19 9:12 AM, Timon Gehr wrote:
 On 27.12.19 13:14, Johan Engelen wrote:
 Current compiler behavior results in an infinite value range. (but 
 it's implicit behavior, i.e. not explicitly mentioned in spec)

 - Currently, are operations resulting in a value larger than the 
 underlying integer storage type UB,
They are safe. You can't have UB in safe code.
 like for normal signed integers?
Signed integers have wraparound semantics. https://forum.dlang.org/thread/n23bo3$qe$1 digitalmars.com The spec mentions this for AddExpressions (but the example only shows it for uint): https://dlang.org/spec/expression.html "If both operands are of integral types and an overflow or underflow occurs in the computation, wrapping will happen." There simply _can't_ be any UB in signed integer operations, as they are considered safe.
 - Should we limit the range of valid values of the Flags enum (C++ 
 defines valid range to be [0..7])?
 - Do we want to limit operations allowed on enum types? Or change the 
 result type? (e.g. the type of `Flags + Flags` is `int` instead of 
 `Flags`.
I see those options: 1. The valid range is the full range of the underlying type (as DMD treats it now). 2. The range is [1..4]. In this case, the operations have to promote their operands to the enum base type, and most casts to enum types must be system. 3. The range is [0..7]. In this case, only operations that preserve this range (such as bitwise operators) should yield the enum type, and other operations should promote their operands to the enum base type, and most casts to enum types must be system. Personally, I think 2 makes most sense (especially with `final switch`, as the current semantics forces compilers to insert default cases there), but this would be a breaking language change.
We have another option, which I like. That is, only allow bitwise operations on enums that are flagged as allowing bitwise operations (either with a uda, or via some other mechanism). Many languages actually treat enums just like structs, where you can add operators and functions. This is also a possibility. This is also a breaking change, but also I don't want the compiler complaining about final switch on enums where the enum is intended not to be a bitwise flag. So I'd prefer 2 over 3. -Steve
Dec 27 2019
parent reply Johan Engelen <j j.nl> writes:
On Friday, 27 December 2019 at 14:58:59 UTC, Steven Schveighoffer 
wrote:
 On 12/27/19 9:12 AM, Timon Gehr wrote:
 On 27.12.19 13:14, Johan Engelen wrote:
 Current compiler behavior results in an infinite value range. 
 (but it's implicit behavior, i.e. not explicitly mentioned in 
 spec)

 - Currently, are operations resulting in a value larger than 
 the underlying integer storage type UB,
They are safe. You can't have UB in safe code.
 like for normal signed integers?
Signed integers have wraparound semantics. https://forum.dlang.org/thread/n23bo3$qe$1 digitalmars.com
Thanks for the correction! I hope someone finds the time to make that more explicit in the spec.
 The spec mentions this for AddExpressions (but the example 
 only shows it for uint): https://dlang.org/spec/expression.html
 "If both operands are of integral types and an overflow or 
 underflow occurs in the computation, wrapping will happen."
 
 There simply _can't_ be any UB in signed integer operations, 
 as they are considered  safe.
I don't accept this argument [*], but no argument needed here. Just needs some clarification in spec text.
 - Should we limit the range of valid values of the Flags enum 
 (C++ defines valid range to be [0..7])?
 - Do we want to limit operations allowed on enum types? Or 
 change the result type? (e.g. the type of `Flags + Flags` is 
 `int` instead of `Flags`.
I see those options: 1. The valid range is the full range of the underlying type (as DMD treats it now). 2. The range is [1..4]. In this case, the operations have to promote their operands to the enum base type, and most casts to enum types must be system. 3. The range is [0..7]. In this case, only operations that preserve this range (such as bitwise operators) should yield the enum type, and other operations should promote their operands to the enum base type, and most casts to enum types must be system. Personally, I think 2 makes most sense (especially with `final switch`, as the current semantics forces compilers to insert default cases there), but this would be a breaking language change.
We have another option, which I like. That is, only allow bitwise operations on enums that are flagged as allowing bitwise operations (either with a uda, or via some other mechanism). Many languages actually treat enums just like structs, where you can add operators and functions. This is also a possibility. This is also a breaking change, but also I don't want the compiler complaining about final switch on enums where the enum is intended not to be a bitwise flag. So I'd prefer 2 over 3.
Let's separate the discussion into what it _currently is_ and what _it might be in future_. Current language behavior: enum value range = full range of base type; integer operations work as-if the type is the base type. Future: Several options + lots of discussion ;-) and DIP needed. Can I summarize it like that? cheers, Johan [*] Let's not go off on a tangent, but there is enough UB in D that I do not accept that safe=="no UB" argument. One example that comes to mind is bitshifting by more than the operand bit width: "illegal" is what the spec says but that doesn't make sense for runtime shift values and, in practice, turns into UB at runtime. ;-)
Dec 27 2019
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 27.12.19 19:12, Johan Engelen wrote:
 
 [*]
 Let's not go off on a tangent,
It's not a tangent, it's a powerful tool to decide whether something has any business being UB or not.
 but there is enough UB in D that I do not accept that  safe=="no UB" argument.
https://dlang.org/articles/safed.html "In D, we expect the vast majority of programmers to operate within the safe subset of D, which we call SafeD. The safety and the ease of use of SafeD is comparable to Java—in fact Java programs can be machine-translated into this safe subset of D. SafeD is easy to learn and it keeps the programmers away from undefined behaviors. It is also very efficient." "[...] you are guaranteed not to encounter any undefined behavior." https://dlang.org/spec/memory-safe-d.html "Therefore, the safe subset of D consists only of programming language features that are guaranteed to never result in memory corruption. See this article for a rationale." ("this article" links to https://dlang.org/articles/safed.html)
 One example that comes to mind is 
 bitshifting by more than the operand bit width: "illegal" is what the 
 spec says but that doesn't make sense for runtime shift values and, in 
 practice, turns into UB at runtime. ;-)
What that means is not that UB is allowed in safe code, but rather that the spec hasn't been properly updated after safe was introduced to clarify what "illegal" means here. It should mean that the returned value is arbitrary, not that the behavior of the entire program will be arbitrary. I think Walter has said as much before, but I can't find the post. safe is meant to imply no memory corruption. safe implies no UB, because UB can lead to any behavior, including memory corruption. UB allows compilers to insert arbitrary code execution exploits. How can you call that safe?
Dec 28 2019
next sibling parent ag0aep6g <anonymous example.com> writes:
On 28.12.19 14:33, Timon Gehr wrote:
 https://dlang.org/articles/safed.html
 
 "In D, we expect the vast majority of programmers to operate within the 
 safe subset of D, which we call SafeD. The safety and the ease of use of 
 SafeD is comparable to Java—in fact Java programs can be 
 machine-translated into this safe subset of D. SafeD is easy to learn 
 and it keeps the programmers away from undefined behaviors. It is also 
 very efficient."
 
 "[...] you are guaranteed not to encounter any undefined behavior."
 
 https://dlang.org/spec/memory-safe-d.html
 
 "Therefore, the safe subset of D consists only of programming language 
 features that are guaranteed to never result in memory corruption. See 
 this article for a rationale."
 
 ("this article" links to
 https://dlang.org/articles/safed.html)
Also: https://dlang.org/spec/function.html#function-safety "Safe functions are functions that are statically checked to exhibit no possibility of undefined behavior." "Safe functions are marked with the safe attribute."
Dec 28 2019
prev sibling parent reply Johan Engelen <j j.nl> writes:
On Saturday, 28 December 2019 at 13:33:25 UTC, Timon Gehr wrote:
 
  safe is meant to imply no memory corruption.  safe implies no 
 UB, because UB can lead to any behavior, including memory 
 corruption. UB allows compilers to insert arbitrary code 
 execution exploits. How can you call that  safe?
I think we are talking about different things here. You are saying: the spec says safe means no UB, and if the spec doesn't say it then it simply needs updating. There are a number of text pieces that say that. I am saying: regardless of what the spec and any of those articles promise, current D behavior is that safe _can_ have UB in it. I know most people don't like to hear it nor acknowledge it. But I think it is better to be realistic about this. ` safe` currently does _not_ mean the code is super safe. One could say that the compilers are just not standard-compliant, nor is the spec itself, but the problem is bigger than that. Adding null dereference checks everywhere is not what (I think) people want. So that means there will always be UB potential in safe code with interface method calls or class member variable access. I know about the "we specify that reading from NULL must result in a segfault". It misses the point by not understanding that a "null dereference" doesn't mean "access address 0" (let alone that by that we disallow e.g. system programmers to actually use address 0). Member variable access often does not access address 0x0, nor does an interface method call. (I've been in these discussions too many times now. I'll try to stop arguing it.) -Johan
Dec 28 2019
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/28/2019 12:22 PM, Johan Engelen wrote:
 You are saying: the spec says  safe means no UB, and if the spec doesn't say
it 
 then it simply needs updating. There are a number of text pieces that say that.
 
 I am saying: regardless of what the spec and any of those articles promise, 
 current D behavior is that  safe _can_ have UB in it.
 
 I know most people don't like to hear it nor acknowledge it. But I think it is 
 better to be realistic about this. ` safe` currently does _not_ mean the code
is 
 super safe.
What it does mean is that all instances of #safe resulting in #undefinedBehavior should be logged either in bugzilla or noted as an exception in the #specification. Note that there's a bugzilla keyword 'safe' which can tag them. Here's the current set of open tagged issues: https://issues.dlang.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&keywords=safe&keywords_type=allwords&list_id=208819&query_format=advanced
Dec 28 2019
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 28.12.19 22:08, Walter Bright wrote:
 On 12/28/2019 12:22 PM, Johan Engelen wrote:
 You are saying: the spec says  safe means no UB, and if the spec 
 doesn't say it then it simply needs updating. There are a number of 
 text pieces that say that.

 I am saying: regardless of what the spec and any of those articles 
 promise, current D behavior is that  safe _can_ have UB in it.

 I know most people don't like to hear it nor acknowledge it. But I 
 think it is better to be realistic about this. ` safe` currently does 
 _not_ mean the code is super safe.
What it does mean is that all instances of #safe resulting in #undefinedBehavior should be logged either in bugzilla
Yes.
 or noted as an exception in the #specification.
 ...
Yes, but the only reasonable kinds of exceptions are invalid trusted/ system code or specific compiler flags to disable safety (e.g. -boundscheck=off). There can't be any other exceptions.
Dec 28 2019
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 28.12.19 21:22, Johan Engelen wrote:
 On Saturday, 28 December 2019 at 13:33:25 UTC, Timon Gehr wrote:
  safe is meant to imply no memory corruption.  safe implies no UB, 
 because UB can lead to any behavior, including memory corruption. UB 
 allows compilers to insert arbitrary code execution exploits. How can 
 you call that  safe?
I think we are talking about different things here. You are saying: the spec says safe means no UB, and if the spec doesn't say it then it simply needs updating. There are a number of text pieces that say that. I am saying: regardless of what the spec and any of those articles promise, current D behavior is that safe _can_ have UB in it.
Each of those instances is a bug.
 
 I know most people don't like to hear it nor acknowledge it. But I think it is
better to be realistic about this. ` safe` currently does _not_ mean the code
is super safe.
I acknowledge that the implementation has bugs. I reported a number of frontend bugs in the type checker myself. What you are saying is that the backends have bugs as well. It's not very surprising, but I don't think you can use existing bugs in the implementation as a justification to deliberately introduce more of those bugs, which is what I read when you write "I don't buy this argument".
Dec 28 2019
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 27.12.19 19:12, Johan Engelen wrote:
 
 Current language behavior:
 enum value range = full range of base type; integer operations work 
 as-if the type is the base type.
 
 Future:
 Several options + lots of discussion ;-) and DIP needed.
 
 Can I summarize it like that?
I think so, but you might want to add that currently, e.g. `cast(E)x` is safe for an `enum E:typeof(x){ ... }`
Dec 28 2019
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
I am skeptical about the value of major breaking changes with enums at this 
point, as it doesn't seem like there are a lot of undetected bugs emanating
from 
the fairly loose definition of them.

Related to this is the ability to specify a range of values for a type, rather 
than enumerating them.
Dec 28 2019
parent Johan Engelen <j j.nl> writes:
On Saturday, 28 December 2019 at 20:20:46 UTC, Walter Bright 
wrote:
 I am skeptical about the value of major breaking changes with 
 enums at this point, as it doesn't seem like there are a lot of 
 undetected bugs emanating from the fairly loose definition of 
 them.
Yeah, I agree. It's good to clarify things in the spec though. To prevent someone (i.e. me) from trying to use enum range information for optimization. https://github.com/dlang/dlang.org/pull/2728 Can we add a text like: "The enum type can be used in operator expressions (like AddExpression): the resulting type is the enum type, and the resulting value is computed by performing the operation as if the type is the enum base type. A variable of type enum does not have to have a value that corresponds with any of the enum members; the range of valid values for an enum typed variable is [basetype.min ... basetype.max]." I'm not so satisfied with this text though. -Johan
Dec 28 2019