digitalmars.D - Range of enum type values
- Johan Engelen (47/47) Dec 27 2019 Hi all,
- Timon Gehr (23/34) Dec 27 2019 Signed integers have wraparound semantics.
- Steven Schveighoffer (10/57) Dec 27 2019 We have another option, which I like. That is, only allow bitwise
- Johan Engelen (24/82) Dec 27 2019 Thanks for the correction!
- Timon Gehr (27/35) Dec 28 2019 It's not a tangent, it's a powerful tool to decide whether something has...
- ag0aep6g (5/24) Dec 28 2019 Also: https://dlang.org/spec/function.html#function-safety
- Johan Engelen (25/30) Dec 28 2019 I think we are talking about different things here.
- Walter Bright (6/15) Dec 28 2019 What it does mean is that all instances of #safe resulting in #undefined...
- Timon Gehr (5/21) Dec 28 2019 Yes, but the only reasonable kinds of exceptions are invalid
- Timon Gehr (8/25) Dec 28 2019 I acknowledge that the implementation has bugs. I reported a number of
- Timon Gehr (3/12) Dec 28 2019 I think so, but you might want to add that currently, e.g. `cast(E)x` is...
- Walter Bright (5/5) Dec 28 2019 I am skeptical about the value of major breaking changes with enums at t...
- Johan Engelen (15/19) Dec 28 2019 Yeah, I agree. It's good to clarify things in the spec though. To
Hi all, I am wondering about the valid range of values of an enum type. I couldn't find anything explicit about this in the language specification. Consider this code: ``` enum Flags { A = 1, B = 2, C = 4 } bool rangeCheck(Flags f) { return (f >= Flags.min) && (f <= Flags.max); } bool preciseCheck(Flags f) { return (f == Flags.A) || (f == Flags.B) || (f == Flags.C); }``` Is `rangeCheck` guaranteed to return true? Is `preciseCheck` guaranteed to return true? A variable of type Flags is always initialized to Flags.A. Integer assignment is not allowed. So `Flags f` should always have a value of A B or C, right? No. This code is accepted: ``` Flags getFlags() { return Flags.A | Flags.B; // and so is `^`, `&`, `+`, `*`, ... } ``` I'd like to have the value range implications of the use of operators on enum values explicitly mentioned in the spec. Current compiler behavior results in an infinite value range. (but it's implicit behavior, i.e. not explicitly mentioned in spec) - Currently, are operations resulting in a value larger than the underlying integer storage type UB, like for normal signed integers? - Should we limit the range of valid values of the Flags enum (C++ defines valid range to be [0..7])? - Do we want to limit operations allowed on enum types? Or change the result type? (e.g. the type of `Flags + Flags` is `int` instead of `Flags`. cheers, Johan
Dec 27 2019
On 27.12.19 13:14, Johan Engelen wrote:Current compiler behavior results in an infinite value range. (but it's implicit behavior, i.e. not explicitly mentioned in spec) - Currently, are operations resulting in a value larger than the underlying integer storage type UB,They are safe. You can't have UB in safe code.like for normal signed integers?Signed integers have wraparound semantics. https://forum.dlang.org/thread/n23bo3$qe$1 digitalmars.com The spec mentions this for AddExpressions (but the example only shows it for uint): https://dlang.org/spec/expression.html "If both operands are of integral types and an overflow or underflow occurs in the computation, wrapping will happen." There simply _can't_ be any UB in signed integer operations, as they are considered safe.- Should we limit the range of valid values of the Flags enum (C++ defines valid range to be [0..7])? - Do we want to limit operations allowed on enum types? Or change the result type? (e.g. the type of `Flags + Flags` is `int` instead of `Flags`.I see those options: 1. The valid range is the full range of the underlying type (as DMD treats it now). 2. The range is [1..4]. In this case, the operations have to promote their operands to the enum base type, and most casts to enum types must be system. 3. The range is [0..7]. In this case, only operations that preserve this range (such as bitwise operators) should yield the enum type, and other operations should promote their operands to the enum base type, and most casts to enum types must be system. Personally, I think 2 makes most sense (especially with `final switch`, as the current semantics forces compilers to insert default cases there), but this would be a breaking language change.
Dec 27 2019
On 12/27/19 9:12 AM, Timon Gehr wrote:On 27.12.19 13:14, Johan Engelen wrote:We have another option, which I like. That is, only allow bitwise operations on enums that are flagged as allowing bitwise operations (either with a uda, or via some other mechanism). Many languages actually treat enums just like structs, where you can add operators and functions. This is also a possibility. This is also a breaking change, but also I don't want the compiler complaining about final switch on enums where the enum is intended not to be a bitwise flag. So I'd prefer 2 over 3. -SteveCurrent compiler behavior results in an infinite value range. (but it's implicit behavior, i.e. not explicitly mentioned in spec) - Currently, are operations resulting in a value larger than the underlying integer storage type UB,They are safe. You can't have UB in safe code.like for normal signed integers?Signed integers have wraparound semantics. https://forum.dlang.org/thread/n23bo3$qe$1 digitalmars.com The spec mentions this for AddExpressions (but the example only shows it for uint): https://dlang.org/spec/expression.html "If both operands are of integral types and an overflow or underflow occurs in the computation, wrapping will happen." There simply _can't_ be any UB in signed integer operations, as they are considered safe.- Should we limit the range of valid values of the Flags enum (C++ defines valid range to be [0..7])? - Do we want to limit operations allowed on enum types? Or change the result type? (e.g. the type of `Flags + Flags` is `int` instead of `Flags`.I see those options: 1. The valid range is the full range of the underlying type (as DMD treats it now). 2. The range is [1..4]. In this case, the operations have to promote their operands to the enum base type, and most casts to enum types must be system. 3. The range is [0..7]. In this case, only operations that preserve this range (such as bitwise operators) should yield the enum type, and other operations should promote their operands to the enum base type, and most casts to enum types must be system. Personally, I think 2 makes most sense (especially with `final switch`, as the current semantics forces compilers to insert default cases there), but this would be a breaking language change.
Dec 27 2019
On Friday, 27 December 2019 at 14:58:59 UTC, Steven Schveighoffer wrote:On 12/27/19 9:12 AM, Timon Gehr wrote:Thanks for the correction! I hope someone finds the time to make that more explicit in the spec.On 27.12.19 13:14, Johan Engelen wrote:Current compiler behavior results in an infinite value range. (but it's implicit behavior, i.e. not explicitly mentioned in spec) - Currently, are operations resulting in a value larger than the underlying integer storage type UB,They are safe. You can't have UB in safe code.like for normal signed integers?Signed integers have wraparound semantics. https://forum.dlang.org/thread/n23bo3$qe$1 digitalmars.comI don't accept this argument [*], but no argument needed here. Just needs some clarification in spec text.The spec mentions this for AddExpressions (but the example only shows it for uint): https://dlang.org/spec/expression.html "If both operands are of integral types and an overflow or underflow occurs in the computation, wrapping will happen." There simply _can't_ be any UB in signed integer operations, as they are considered safe.Let's separate the discussion into what it _currently is_ and what _it might be in future_. Current language behavior: enum value range = full range of base type; integer operations work as-if the type is the base type. Future: Several options + lots of discussion ;-) and DIP needed. Can I summarize it like that? cheers, Johan [*] Let's not go off on a tangent, but there is enough UB in D that I do not accept that safe=="no UB" argument. One example that comes to mind is bitshifting by more than the operand bit width: "illegal" is what the spec says but that doesn't make sense for runtime shift values and, in practice, turns into UB at runtime. ;-)We have another option, which I like. That is, only allow bitwise operations on enums that are flagged as allowing bitwise operations (either with a uda, or via some other mechanism). Many languages actually treat enums just like structs, where you can add operators and functions. This is also a possibility. This is also a breaking change, but also I don't want the compiler complaining about final switch on enums where the enum is intended not to be a bitwise flag. So I'd prefer 2 over 3.- Should we limit the range of valid values of the Flags enum (C++ defines valid range to be [0..7])? - Do we want to limit operations allowed on enum types? Or change the result type? (e.g. the type of `Flags + Flags` is `int` instead of `Flags`.I see those options: 1. The valid range is the full range of the underlying type (as DMD treats it now). 2. The range is [1..4]. In this case, the operations have to promote their operands to the enum base type, and most casts to enum types must be system. 3. The range is [0..7]. In this case, only operations that preserve this range (such as bitwise operators) should yield the enum type, and other operations should promote their operands to the enum base type, and most casts to enum types must be system. Personally, I think 2 makes most sense (especially with `final switch`, as the current semantics forces compilers to insert default cases there), but this would be a breaking language change.
Dec 27 2019
On 27.12.19 19:12, Johan Engelen wrote:[*] Let's not go off on a tangent,It's not a tangent, it's a powerful tool to decide whether something has any business being UB or not.but there is enough UB in D that I do not accept that safe=="no UB" argument.https://dlang.org/articles/safed.html "In D, we expect the vast majority of programmers to operate within the safe subset of D, which we call SafeD. The safety and the ease of use of SafeD is comparable to Java—in fact Java programs can be machine-translated into this safe subset of D. SafeD is easy to learn and it keeps the programmers away from undefined behaviors. It is also very efficient." "[...] you are guaranteed not to encounter any undefined behavior." https://dlang.org/spec/memory-safe-d.html "Therefore, the safe subset of D consists only of programming language features that are guaranteed to never result in memory corruption. See this article for a rationale." ("this article" links to https://dlang.org/articles/safed.html)One example that comes to mind is bitshifting by more than the operand bit width: "illegal" is what the spec says but that doesn't make sense for runtime shift values and, in practice, turns into UB at runtime. ;-)What that means is not that UB is allowed in safe code, but rather that the spec hasn't been properly updated after safe was introduced to clarify what "illegal" means here. It should mean that the returned value is arbitrary, not that the behavior of the entire program will be arbitrary. I think Walter has said as much before, but I can't find the post. safe is meant to imply no memory corruption. safe implies no UB, because UB can lead to any behavior, including memory corruption. UB allows compilers to insert arbitrary code execution exploits. How can you call that safe?
Dec 28 2019
On 28.12.19 14:33, Timon Gehr wrote:https://dlang.org/articles/safed.html "In D, we expect the vast majority of programmers to operate within the safe subset of D, which we call SafeD. The safety and the ease of use of SafeD is comparable to Java—in fact Java programs can be machine-translated into this safe subset of D. SafeD is easy to learn and it keeps the programmers away from undefined behaviors. It is also very efficient." "[...] you are guaranteed not to encounter any undefined behavior." https://dlang.org/spec/memory-safe-d.html "Therefore, the safe subset of D consists only of programming language features that are guaranteed to never result in memory corruption. See this article for a rationale." ("this article" links to https://dlang.org/articles/safed.html)Also: https://dlang.org/spec/function.html#function-safety "Safe functions are functions that are statically checked to exhibit no possibility of undefined behavior." "Safe functions are marked with the safe attribute."
Dec 28 2019
On Saturday, 28 December 2019 at 13:33:25 UTC, Timon Gehr wrote:safe is meant to imply no memory corruption. safe implies no UB, because UB can lead to any behavior, including memory corruption. UB allows compilers to insert arbitrary code execution exploits. How can you call that safe?I think we are talking about different things here. You are saying: the spec says safe means no UB, and if the spec doesn't say it then it simply needs updating. There are a number of text pieces that say that. I am saying: regardless of what the spec and any of those articles promise, current D behavior is that safe _can_ have UB in it. I know most people don't like to hear it nor acknowledge it. But I think it is better to be realistic about this. ` safe` currently does _not_ mean the code is super safe. One could say that the compilers are just not standard-compliant, nor is the spec itself, but the problem is bigger than that. Adding null dereference checks everywhere is not what (I think) people want. So that means there will always be UB potential in safe code with interface method calls or class member variable access. I know about the "we specify that reading from NULL must result in a segfault". It misses the point by not understanding that a "null dereference" doesn't mean "access address 0" (let alone that by that we disallow e.g. system programmers to actually use address 0). Member variable access often does not access address 0x0, nor does an interface method call. (I've been in these discussions too many times now. I'll try to stop arguing it.) -Johan
Dec 28 2019
On 12/28/2019 12:22 PM, Johan Engelen wrote:You are saying: the spec says safe means no UB, and if the spec doesn't say it then it simply needs updating. There are a number of text pieces that say that. I am saying: regardless of what the spec and any of those articles promise, current D behavior is that safe _can_ have UB in it. I know most people don't like to hear it nor acknowledge it. But I think it is better to be realistic about this. ` safe` currently does _not_ mean the code is super safe.What it does mean is that all instances of #safe resulting in #undefinedBehavior should be logged either in bugzilla or noted as an exception in the #specification. Note that there's a bugzilla keyword 'safe' which can tag them. Here's the current set of open tagged issues: https://issues.dlang.org/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&keywords=safe&keywords_type=allwords&list_id=208819&query_format=advanced
Dec 28 2019
On 28.12.19 22:08, Walter Bright wrote:On 12/28/2019 12:22 PM, Johan Engelen wrote:Yes.You are saying: the spec says safe means no UB, and if the spec doesn't say it then it simply needs updating. There are a number of text pieces that say that. I am saying: regardless of what the spec and any of those articles promise, current D behavior is that safe _can_ have UB in it. I know most people don't like to hear it nor acknowledge it. But I think it is better to be realistic about this. ` safe` currently does _not_ mean the code is super safe.What it does mean is that all instances of #safe resulting in #undefinedBehavior should be logged either in bugzillaor noted as an exception in the #specification. ...Yes, but the only reasonable kinds of exceptions are invalid trusted/ system code or specific compiler flags to disable safety (e.g. -boundscheck=off). There can't be any other exceptions.
Dec 28 2019
On 28.12.19 21:22, Johan Engelen wrote:On Saturday, 28 December 2019 at 13:33:25 UTC, Timon Gehr wrote:Each of those instances is a bug.safe is meant to imply no memory corruption. safe implies no UB, because UB can lead to any behavior, including memory corruption. UB allows compilers to insert arbitrary code execution exploits. How can you call that safe?I think we are talking about different things here. You are saying: the spec says safe means no UB, and if the spec doesn't say it then it simply needs updating. There are a number of text pieces that say that. I am saying: regardless of what the spec and any of those articles promise, current D behavior is that safe _can_ have UB in it.I know most people don't like to hear it nor acknowledge it. But I think it is better to be realistic about this. ` safe` currently does _not_ mean the code is super safe.I acknowledge that the implementation has bugs. I reported a number of frontend bugs in the type checker myself. What you are saying is that the backends have bugs as well. It's not very surprising, but I don't think you can use existing bugs in the implementation as a justification to deliberately introduce more of those bugs, which is what I read when you write "I don't buy this argument".
Dec 28 2019
On 27.12.19 19:12, Johan Engelen wrote:Current language behavior: enum value range = full range of base type; integer operations work as-if the type is the base type. Future: Several options + lots of discussion ;-) and DIP needed. Can I summarize it like that?I think so, but you might want to add that currently, e.g. `cast(E)x` is safe for an `enum E:typeof(x){ ... }`
Dec 28 2019
I am skeptical about the value of major breaking changes with enums at this point, as it doesn't seem like there are a lot of undetected bugs emanating from the fairly loose definition of them. Related to this is the ability to specify a range of values for a type, rather than enumerating them.
Dec 28 2019
On Saturday, 28 December 2019 at 20:20:46 UTC, Walter Bright wrote:I am skeptical about the value of major breaking changes with enums at this point, as it doesn't seem like there are a lot of undetected bugs emanating from the fairly loose definition of them.Yeah, I agree. It's good to clarify things in the spec though. To prevent someone (i.e. me) from trying to use enum range information for optimization. https://github.com/dlang/dlang.org/pull/2728 Can we add a text like: "The enum type can be used in operator expressions (like AddExpression): the resulting type is the enum type, and the resulting value is computed by performing the operation as if the type is the enum base type. A variable of type enum does not have to have a value that corresponds with any of the enum members; the range of valid values for an enum typed variable is [basetype.min ... basetype.max]." I'm not so satisfied with this text though. -Johan
Dec 28 2019