www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Treating the abusive unsigned syndrome

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
D pursues compatibility with C and C++ in the following manner: if a 
code snippet compiles in both C and D or C++ and D, then it should have 
the same semantics.

A classic problem with C and C++ integer arithmetic is that any 
operation involving at least an unsigned integral receives automatically 
an unsigned type, regardless of how silly that actually is, 
semantically. About the only advantage of this rule is that it's simple. 
IMHO it only has disadvantages from then on.

The following operations suffer from the "abusive unsigned syndrome" (u 
is an unsigned integral, i is a signed integral):

(1) u + i, i + u
(2) u - i, i - u
(3) u - u
(4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
requires that these all return unsigned, ouch)
(5) u < i, i < u, u <= i etc. (all ordering comparisons)
(6) -u

Logic operations &, |, and ^ also yield unsigned, but such cases are 
less abusive because at least the operation wasn't arithmetic in the 
first place. Comparing for equality is also quite a conundrum - should 
minus two billion compare equal to 2_294_967_296? I'll ignore these for 
now and focus on (1) - (6).

So far we haven't found a solid solution to this problem that at the 
same time allows "good" code pass through, weeds out "bad" code, and is 
compatible with C and C++. The closest I got was to have the compiler 
define the following internal types:

__intuint
__longulong

I've called them "dual-signed integers" in the past, but let's try the 
shorter "undecided sign". Each of these is a subtype of both the signed 
and the unsigned integral in its name, e.g. __intuint is a subtype of 
both int and uint. (Originally I thought of defining __byteubyte and 
__shortushort as well but dropped them in the interest of simplicity.)

The sign-ambiguous operations (1) - (6) yield __intuint if no operand 
size was larger than 32 bits, and __longulong otherwise. Undecided sign 
types define their own operations. Let x and y be values of undecided 
sign. Then x + y, x - y, and -x also return a sign-ambiguous integral 
(the size is that of the largest operand). However, the other operators 
do not work on sign-ambiguous integrals, e.g. x / y would not compile 
because you must decide what sign x and y should have prior to invoking 
the operation. (Rationale: multiplication/division work differently 
depending on the signedness of their operands).

User code cannot define a symbol of sign-ambiguous type, e.g.

auto a = u + i;

would not compile. However, given that __intuint is a subtype of both 
int and uint, it can be freely converted to either whenever there's no 
ambiguity:

int a = u + i; // fine
uint b = u + i; // fine

The advantage of this scheme is that it weeds out many (most? all?) 
surprises and oddities caused by the abusive unsigned rule of C and C++. 
The disadvantage is that it is more complex and may surprise the novice 
in its own way by refusing to compile code that looks legit.

At the moment, we're in limbo regarding the decision to go forward with 
this. Walter, as many good long-time C programmers, knows the abusive 
unsigned rule so well he's not hurt by it and consequently has little 
incentive to see it as a problem. I have had to teach C and C++ to young 
students coming from Java introductory courses and have a more 
up-to-date perspective on the dangers. My strong belief is that we need 
to address this mess somehow, which type inference will only make more 
painful (in the hand of the beginner, auto can be a quite dangerous tool 
for wrong belief propagation). I also know seasoned programmers who had 
no idea that -u compiles and that it also oddly returns an unsigned type.

Your opinions, comments, and suggestions for improvements would as 
always be welcome.


Andrei
Nov 25 2008
next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 25 Nov 2008 18:59:01 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 D pursues compatibility with C and C++ in the following manner: if a  
 code snippet compiles in both C and D or C++ and D, then it should have  
 the same semantics.

 A classic problem with C and C++ integer arithmetic is that any  
 operation involving at least an unsigned integral receives automatically  
 an unsigned type, regardless of how silly that actually is,  
 semantically. About the only advantage of this rule is that it's simple.  
 IMHO it only has disadvantages from then on.

 The following operations suffer from the "abusive unsigned syndrome" (u  
 is an unsigned integral, i is a signed integral):

 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C  
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u

 Logic operations &, |, and ^ also yield unsigned, but such cases are  
 less abusive because at least the operation wasn't arithmetic in the  
 first place. Comparing for equality is also quite a conundrum - should  
 minus two billion compare equal to 2_294_967_296? I'll ignore these for  
 now and focus on (1) - (6).

 So far we haven't found a solid solution to this problem that at the  
 same time allows "good" code pass through, weeds out "bad" code, and is  
 compatible with C and C++. The closest I got was to have the compiler  
 define the following internal types:

 __intuint
 __longulong

 I've called them "dual-signed integers" in the past, but let's try the  
 shorter "undecided sign". Each of these is a subtype of both the signed  
 and the unsigned integral in its name, e.g. __intuint is a subtype of  
 both int and uint. (Originally I thought of defining __byteubyte and  
 __shortushort as well but dropped them in the interest of simplicity.)

 The sign-ambiguous operations (1) - (6) yield __intuint if no operand  
 size was larger than 32 bits, and __longulong otherwise. Undecided sign  
 types define their own operations. Let x and y be values of undecided  
 sign. Then x + y, x - y, and -x also return a sign-ambiguous integral  
 (the size is that of the largest operand). However, the other operators  
 do not work on sign-ambiguous integrals, e.g. x / y would not compile  
 because you must decide what sign x and y should have prior to invoking  
 the operation. (Rationale: multiplication/division work differently  
 depending on the signedness of their operands).

 User code cannot define a symbol of sign-ambiguous type, e.g.

 auto a = u + i;

 would not compile. However, given that __intuint is a subtype of both  
 int and uint, it can be freely converted to either whenever there's no  
 ambiguity:

 int a = u + i; // fine
 uint b = u + i; // fine

 The advantage of this scheme is that it weeds out many (most? all?)  
 surprises and oddities caused by the abusive unsigned rule of C and C++.  
 The disadvantage is that it is more complex and may surprise the novice  
 in its own way by refusing to compile code that looks legit.

 At the moment, we're in limbo regarding the decision to go forward with  
 this. Walter, as many good long-time C programmers, knows the abusive  
 unsigned rule so well he's not hurt by it and consequently has little  
 incentive to see it as a problem. I have had to teach C and C++ to young  
 students coming from Java introductory courses and have a more  
 up-to-date perspective on the dangers. My strong belief is that we need  
 to address this mess somehow, which type inference will only make more  
 painful (in the hand of the beginner, auto can be a quite dangerous tool  
 for wrong belief propagation). I also know seasoned programmers who had  
 no idea that -u compiles and that it also oddly returns an unsigned type.

 Your opinions, comments, and suggestions for improvements would as  
 always be welcome.


 Andrei
I think it's fine. That's the way the LLVM stores the integral values internally, IIRC. But what is the type of -u? If it is undecided, then the following should compile: uint u = 100; uint s = -u; // undecided implicitly convertible to unsigned
Nov 25 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 On Tue, 25 Nov 2008 18:59:01 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should 
 have the same semantics.

 A classic problem with C and C++ integer arithmetic is that any 
 operation involving at least an unsigned integral receives 
 automatically an unsigned type, regardless of how silly that actually 
 is, semantically. About the only advantage of this rule is that it's 
 simple. IMHO it only has disadvantages from then on.

 The following operations suffer from the "abusive unsigned syndrome" 
 (u is an unsigned integral, i is a signed integral):

 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u

 Logic operations &, |, and ^ also yield unsigned, but such cases are 
 less abusive because at least the operation wasn't arithmetic in the 
 first place. Comparing for equality is also quite a conundrum - should 
 minus two billion compare equal to 2_294_967_296? I'll ignore these 
 for now and focus on (1) - (6).

 So far we haven't found a solid solution to this problem that at the 
 same time allows "good" code pass through, weeds out "bad" code, and 
 is compatible with C and C++. The closest I got was to have the 
 compiler define the following internal types:

 __intuint
 __longulong

 I've called them "dual-signed integers" in the past, but let's try the 
 shorter "undecided sign". Each of these is a subtype of both the 
 signed and the unsigned integral in its name, e.g. __intuint is a 
 subtype of both int and uint. (Originally I thought of defining 
 __byteubyte and __shortushort as well but dropped them in the interest 
 of simplicity.)

 The sign-ambiguous operations (1) - (6) yield __intuint if no operand 
 size was larger than 32 bits, and __longulong otherwise. Undecided 
 sign types define their own operations. Let x and y be values of 
 undecided sign. Then x + y, x - y, and -x also return a sign-ambiguous 
 integral (the size is that of the largest operand). However, the other 
 operators do not work on sign-ambiguous integrals, e.g. x / y would 
 not compile because you must decide what sign x and y should have 
 prior to invoking the operation. (Rationale: multiplication/division 
 work differently depending on the signedness of their operands).

 User code cannot define a symbol of sign-ambiguous type, e.g.

 auto a = u + i;

 would not compile. However, given that __intuint is a subtype of both 
 int and uint, it can be freely converted to either whenever there's no 
 ambiguity:

 int a = u + i; // fine
 uint b = u + i; // fine

 The advantage of this scheme is that it weeds out many (most? all?) 
 surprises and oddities caused by the abusive unsigned rule of C and 
 C++. The disadvantage is that it is more complex and may surprise the 
 novice in its own way by refusing to compile code that looks legit.

 At the moment, we're in limbo regarding the decision to go forward 
 with this. Walter, as many good long-time C programmers, knows the 
 abusive unsigned rule so well he's not hurt by it and consequently has 
 little incentive to see it as a problem. I have had to teach C and C++ 
 to young students coming from Java introductory courses and have a 
 more up-to-date perspective on the dangers. My strong belief is that 
 we need to address this mess somehow, which type inference will only 
 make more painful (in the hand of the beginner, auto can be a quite 
 dangerous tool for wrong belief propagation). I also know seasoned 
 programmers who had no idea that -u compiles and that it also oddly 
 returns an unsigned type.

 Your opinions, comments, and suggestions for improvements would as 
 always be welcome.


 Andrei
I think it's fine. That's the way the LLVM stores the integral values internally, IIRC. But what is the type of -u? If it is undecided, then the following should compile: uint u = 100; uint s = -u; // undecided implicitly convertible to unsigned
Yah, but at least you actively asked for an unsigned. Compare and contrast with surprises such as: uint a = 5; writeln(-a); // this won't print -5 Such code would be disallowed in the undecided-sign regime. Andrei
Nov 25 2008
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Few general comments.

Andrei Alexandrescu:

 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.
I didn't know of such "support" for C++ syntax too, isn't such "support" for C syntax only? D has very little to share with C++. This rule is good because you can take a piece of C code and convert it to D with less work and fewer surprises. I have already translated large pieces of C code to D, so I appreciate this. But in several things C syntax and semantics is too much error prone or "wrong", so sometimes it can also become a significant disadvantage for a language like D that tries to be much less error-prone than C. One solution is to "disable" some of the more error-prone syntax allowed in C, turning it into a compilation error. For example I have seen newbies write bugs caused by leaving & where a && was necessary. In such case just adopting "and" and making "&&" a syntax error solves the problem and doesn't lead to bugs when you convert C code to D (you just use a search&replace, replacing && with and on the code). In other situations it may be less easy to find such kind of solutions (that is invent an alternative syntax/semantics and make the C one a syntax error), in such cases I think it's better to discuss each one of such situations independently. In some situations we can even break the standard way D pursues compatibility, for the sake of avoiding bugs and making the semantics better.
 The disadvantage is that it is more complex
It's not really more complex, it just makes visible some hidden complexity that is already present and inherent of the signed/unsigned nature of the numbers. It also follows the Python Zen rule: "In the face of ambiguity, refuse the temptation to guess."
 and may surprise the novice 
 in its own way by refusing to compile code that looks legit.
A compile error is better than a potential runtime bug.
 Walter, as many good long-time C programmers, knows the abusive 
 unsigned rule so well he's not hurt by it and consequently has little 
 incentive to see it as a problem.
I'm not a newbie of programming, but in the last year I have put in my code two bugs related to this, so I suggest to find ways to avoid this silly situation. I think the first bug was something like: if (arr.lenght > x) ... where x was a signed int with value -5 (this specific bug can also be solved making array length a signed value. What's the point of making it unsigned in the first place? I have seen that in D it's safer to use signed values everywhere you don't strictly need an unsigned value. And that length doesn't need to be unsigned). Beside the unsigned/signed problems discussed here, it may be positive to list some of other situations where the C syntax/semantics may lead to bugs. For example, does fixes the C semantics of % (modulo) operation? Another example: in both Pascal and Python3 there are two different operators for the division, one for the FP one and one for the integer one (in Pascal they are / and div, in Python3 they are / and // ).. So can it be positive for D too to define two different operators for such purpose? Bye, bearophile
Nov 25 2008
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
bearophile:
 if (arr.lenght > x) ...
Oh, yes :-) and writing "lenght" instead of "lenght" is a common mistake of mine, usually the code editor allows me to avoid this error because the right one becomes colored. That's why in the past I have suggested something simpler and shorter like "len" (others have suggested "size" instead, it too is acceptable to me). Bye, bearophile
Nov 25 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"bearophile" wrote
 bearophile:
 if (arr.lenght > x) ...
Oh, yes :-) and writing "lenght" instead of "lenght" is a common mistake of mine
lol!!!
Nov 25 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:
 lol!!! 
I know, I know... :-) But when people do errors so often, the error is elsewhere, in the original choice of that word to denote how many items an iterable has. In my libs I have defined len() like this, that I use now and then (where running speed isn't essential): long len(TyItems)(TyItems items) { static if (HasLength!(TyItems)) return items.length; else { long len; // this generates: foreach (p1, p2, p3; items) len++; with a variable number of p1, p2... mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) len++;"); return len; } } // End of len(items) /// ditto long len(TyItems, TyFun)(TyItems items, TyFun pred) { static assert(IsCallable!(TyFun), "len(): predicate must be a callable"); long len; static if (IsAA!(TyItems)) { foreach (key, val; items) if (pred(key, val)) len++; } else static if (is(typeof(TyItems.opApply))) { mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ ")) len++;"); } else { foreach (el; items) if (pred(el)) len++; } return len; } // End of len(items, pred) alias len!(string) strLen; /// ditto alias len!(int[]) intLen; /// ditto alias len!(float[]) floatLen; /// ditto Having a global callable like len() instead of an attribute is (sometimes) better, because you can use it for example like this (this is working syntax of my dlibs): children.sort(&len!(string)); That sorts the array of strings "children" according to the given callable key, that is the len of the strings. Bye, bearophile
Nov 25 2008
parent reply "Nick Sabalausky" <a a.a> writes:
"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:gghc97$1mfo$1 digitalmars.com...
 Steven Schveighoffer:
 lol!!!
I know, I know... :-) But when people do errors so often, the error is elsewhere, in the original choice of that word to denote how many items an iterable has. In my libs I have defined len() like this, that I use now and then (where running speed isn't essential): long len(TyItems)(TyItems items) { static if (HasLength!(TyItems)) return items.length; else { long len; // this generates: foreach (p1, p2, p3; items) len++; with a variable number of p1, p2... mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) len++;"); return len; } } // End of len(items) /// ditto long len(TyItems, TyFun)(TyItems items, TyFun pred) { static assert(IsCallable!(TyFun), "len(): predicate must be a callable"); long len; static if (IsAA!(TyItems)) { foreach (key, val; items) if (pred(key, val)) len++; } else static if (is(typeof(TyItems.opApply))) { mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ ")) len++;"); } else { foreach (el; items) if (pred(el)) len++; } return len; } // End of len(items, pred) alias len!(string) strLen; /// ditto alias len!(int[]) intLen; /// ditto alias len!(float[]) floatLen; /// ditto Having a global callable like len() instead of an attribute is (sometimes) better, because you can use it for example like this (this is working syntax of my dlibs): children.sort(&len!(string)); That sorts the array of strings "children" according to the given callable key, that is the len of the strings.
If we ever get extension methods, then maybe something along these lines would be nice: extension typeof(T.length) len(T t) { return T.length; }
Nov 25 2008
parent reply KennyTM~ <kennytm gmail.com> writes:
Nick Sabalausky wrote:
 "bearophile" <bearophileHUGS lycos.com> wrote in message 
 news:gghc97$1mfo$1 digitalmars.com...
 Steven Schveighoffer:
 lol!!!
I know, I know... :-) But when people do errors so often, the error is elsewhere, in the original choice of that word to denote how many items an iterable has. In my libs I have defined len() like this, that I use now and then (where running speed isn't essential): long len(TyItems)(TyItems items) { static if (HasLength!(TyItems)) return items.length; else { long len; // this generates: foreach (p1, p2, p3; items) len++; with a variable number of p1, p2... mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) len++;"); return len; } } // End of len(items) /// ditto long len(TyItems, TyFun)(TyItems items, TyFun pred) { static assert(IsCallable!(TyFun), "len(): predicate must be a callable"); long len; static if (IsAA!(TyItems)) { foreach (key, val; items) if (pred(key, val)) len++; } else static if (is(typeof(TyItems.opApply))) { mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ ")) len++;"); } else { foreach (el; items) if (pred(el)) len++; } return len; } // End of len(items, pred) alias len!(string) strLen; /// ditto alias len!(int[]) intLen; /// ditto alias len!(float[]) floatLen; /// ditto Having a global callable like len() instead of an attribute is (sometimes) better, because you can use it for example like this (this is working syntax of my dlibs): children.sort(&len!(string)); That sorts the array of strings "children" according to the given callable key, that is the len of the strings.
If we ever get extension methods, then maybe something along these lines would be nice: extension typeof(T.length) len(T t) { return T.length; }
Already works: uint len(A) (in A x) { return x.length; }
Nov 25 2008
parent reply "Nick Sabalausky" <a a.a> writes:
"KennyTM~" <kennytm gmail.com> wrote in message 
news:ggipu6$26mr$1 digitalmars.com...
 Nick Sabalausky wrote:
 "bearophile" <bearophileHUGS lycos.com> wrote in message 
 news:gghc97$1mfo$1 digitalmars.com...
 Steven Schveighoffer:
 lol!!!
I know, I know... :-) But when people do errors so often, the error is elsewhere, in the original choice of that word to denote how many items an iterable has. In my libs I have defined len() like this, that I use now and then (where running speed isn't essential): long len(TyItems)(TyItems items) { static if (HasLength!(TyItems)) return items.length; else { long len; // this generates: foreach (p1, p2, p3; items) len++; with a variable number of p1, p2... mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) len++;"); return len; } } // End of len(items) /// ditto long len(TyItems, TyFun)(TyItems items, TyFun pred) { static assert(IsCallable!(TyFun), "len(): predicate must be a callable"); long len; static if (IsAA!(TyItems)) { foreach (key, val; items) if (pred(key, val)) len++; } else static if (is(typeof(TyItems.opApply))) { mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "; items) if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ ")) len++;"); } else { foreach (el; items) if (pred(el)) len++; } return len; } // End of len(items, pred) alias len!(string) strLen; /// ditto alias len!(int[]) intLen; /// ditto alias len!(float[]) floatLen; /// ditto Having a global callable like len() instead of an attribute is (sometimes) better, because you can use it for example like this (this is working syntax of my dlibs): children.sort(&len!(string)); That sorts the array of strings "children" according to the given callable key, that is the len of the strings.
If we ever get extension methods, then maybe something along these lines would be nice: extension typeof(T.length) len(T t) { return T.length; }
Already works: uint len(A) (in A x) { return x.length; }
Oh, right. For some stupid reason I was forgetting that the param would always be an array and therefore be eligible for the existing array property syntax (and that .length always returns a uint).
Nov 26 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Nick Sabalausky:
 Oh, right. For some stupid reason I was forgetting that the param would 
 always be an array and therefore be eligible for the existing array property 
 syntax (and that .length always returns a uint).
From the len() code I have posted you can see there are other places where you want to use len(), in particular to count the number of items that a lazy generator (opApply for now) yields. Bye, bearophile
Nov 26 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Nick Sabalausky:
 Oh, right. For some stupid reason I was forgetting that the param would 
 always be an array and therefore be eligible for the existing array property 
 syntax (and that .length always returns a uint).
From the len() code I have posted you can see there are other places where you want to use len(), in particular to count the number of items that a lazy generator (opApply for now) yields. Bye, bearophile
I'm rather weary of a short and suggestive name that embodies a linear operation. I recall there was a discussion about that a while ago in this newsgroup. I'd rather call it linearLength or something that suggests it's a best-effort function that may take O(n). Andrei
Nov 26 2008
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 I'm rather weary of a short and suggestive name that embodies a linear 
 operation. I recall there was a discussion about that a while ago in 
 this newsgroup. I'd rather call it linearLength or something that 
 suggests it's a best-effort function that may take O(n).
I remember parts of that discussion, and I like your general rule, and I agree that generally it's better to give the programmer a hint of the complexity of a specific operation, for example a method of a user defined class, etc. But len() is supposed to be used very often, so it's better to keep it short, because if you don't have an IDE it's not nice to type linearLength() one time every 2 lines of code. Being used so often also implies that you remember how it works, so you are supposed to be able to remember it can be O(n) on lazy iterators. So in this specific case I think it's acceptable to break your general rule, for practical reasons. Bye, bearophile
Nov 26 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 I'm rather weary of a short and suggestive name that embodies a linear 
 operation. I recall there was a discussion about that a while ago in 
 this newsgroup. I'd rather call it linearLength or something that 
 suggests it's a best-effort function that may take O(n).
I remember parts of that discussion, and I like your general rule, and I agree that generally it's better to give the programmer a hint of the complexity of a specific operation, for example a method of a user defined class, etc. But len() is supposed to be used very often, so it's better to keep it short, because if you don't have an IDE it's not nice to type linearLength() one time every 2 lines of code. Being used so often also implies that you remember how it works, so you are supposed to be able to remember it can be O(n) on lazy iterators. So in this specific case I think it's acceptable to break your general rule, for practical reasons. Bye, bearophile
If it's used often it shouldn't have linear complexity :o). Andrei
Nov 26 2008
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Andrei Alexandrescu wrote:
 bearophile wrote:
 Nick Sabalausky:
 Oh, right. For some stupid reason I was forgetting that the param 
 would always be an array and therefore be eligible for the existing 
 array property syntax (and that .length always returns a uint).
From the len() code I have posted you can see there are other places where you want to use len(), in particular to count the number of items that a lazy generator (opApply for now) yields. Bye, bearophile
I'm rather weary of a short and suggestive name that embodies a linear operation. I recall there was a discussion about that a while ago in this newsgroup. I'd rather call it linearLength or something that suggests it's a best-effort function that may take O(n).
My personal rules of optimization: - I don't know what's slow. - I don't know what's called often enough to be worth speeding up. - Most of the time, my data sets are small. If getting the length of an array were a linear operation, that wouldn't much affect any of my code. Most of my arrays are probably no larger than twenty elements, and I don't often need to get their lengths. If I need to change data structures for better performance, I'd like to be able to replace them (or switch to generators) without undo effort. Things like changing function names according to the algorithmic complexity of the implementation just hurts.
 Andrei
Nov 26 2008
prev sibling parent Kagamin <spam here.lot> writes:
bearophile Wrote:

 From the len() code I have posted you can see there are other places where you
want to use len(), in particular to count the number of items that a lazy
generator (opApply for now) yields.
hmm... import std.stdio, std.algorithm; void main() { bool pred(int x){ return x>2; } auto counter=(int count, int x){ return pred(x)?count+1:count; }; int[] a=[0,1,2,3,4]; auto lazylen=reduce!(counter)(0,a); writeln(lazylen); //2 }
Nov 26 2008
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Walter, as many good long-time C programmers, knows the abusive 
 unsigned rule so well he's not hurt by it and consequently has
 little incentive to see it as a problem.
I'm not a newbie of programming, but in the last year I have put in my code two bugs related to this, so I suggest to find ways to avoid this silly situation. I think the first bug was something like: if (arr.lenght > x) ...
 where x was a signed int with value -5 (this specific bug can also be
 solved making array length a signed value. What's the point of making
 it unsigned in the first place? I have seen that in D it's safer to
 use signed values everywhere you don't strictly need an unsigned
 value. And that length doesn't need to be unsigned).
It's worthwhile keeping length an unsigned type if we can convincingly sell unsigned types as models of natural numbers. With the current rules, we can't make a convincing argument. But if we do manage to improve the rules, then we'll all be better off. Andrei
Nov 25 2008
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I remembered a couple more details. The names bits8, bits16, bits32, and 
bits64 were a possible choice for undecided-sign integrals. Walter and I 
liked that quite some. Walter also suggested that we make those actually 
full types accessible to programmers. We both were concerned that they'd 
add to the already large panoply of integral types in D. Dropping bits8 
and bits16 would reduce bloating at the cost of consistency.

So we're contemplating:

(a) Add bits8, bits16, bit32, bits64 public types.
(b) Add bit32, bits64 public types.
(c) Add bits8, bits16, bit32, bits64 compiler-internal types.
(d) Add bit32, bits64 compiler-internal types.

Make your pick or add more choices!


Andrei
Nov 25 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
I remembered a couple more details. The names bits8, bits16, bits32, and 
bits64 were a possible choice for undecided-sign integrals. Walter and I 
liked that quite some. Walter also suggested that we make those actually 
full types accessible to programmers. We both were concerned that they'd 
add to the already large panoply of integral types in D. Dropping bits8 and 
bits16 would reduce bloating at the cost of consistency.

 So we're contemplating:

 (a) Add bits8, bits16, bit32, bits64 public types.
 (b) Add bit32, bits64 public types.
 (c) Add bits8, bits16, bit32, bits64 compiler-internal types.
 (d) Add bit32, bits64 compiler-internal types.

 Make your pick or add more choices!
One other thing to contemplate: What happens if you add a bits32 to a bits64, long, or ulong value? This needs to be illegal since you don't know whether to sign-extend or not. Or you could reinterpret the expression to promote the original types to 64-bit first? This makes the version with 8 and 16 bit types less attractive. Another alternative is to select the bits type based on the entire expression. Of course, you'd have to disallow them as public types. And you'd want to do some special optimizations. You could represent it conceptually as calculating for all the bits types until the one that is decided is used, and then the compiler can optimize out the unused ones, which would at least keep it context-free. -Steve
Nov 25 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 I remembered a couple more details. The names bits8, bits16, bits32, and 
 bits64 were a possible choice for undecided-sign integrals. Walter and I 
 liked that quite some. Walter also suggested that we make those actually 
 full types accessible to programmers. We both were concerned that they'd 
 add to the already large panoply of integral types in D. Dropping bits8 and 
 bits16 would reduce bloating at the cost of consistency.

 So we're contemplating:

 (a) Add bits8, bits16, bit32, bits64 public types.
 (b) Add bit32, bits64 public types.
 (c) Add bits8, bits16, bit32, bits64 compiler-internal types.
 (d) Add bit32, bits64 compiler-internal types.

 Make your pick or add more choices!
One other thing to contemplate: What happens if you add a bits32 to a bits64, long, or ulong value? This needs to be illegal since you don't know whether to sign-extend or not. Or you could reinterpret the expression to promote the original types to 64-bit first?
Good point. There's no (or not much) arithmetic mixing bits32 and some 64-bit integral because it's unclear whether extending the bits32 operand should extend the sign bit or not.
 This makes the version with 8 and 16 bit types less attractive.
 
 Another alternative is to select the bits type based on the entire 
 expression.  Of course, you'd have to disallow them as public types.  And 
 you'd want to do some special optimizations.  You could represent it 
 conceptually as calculating for all the bits types until the one that is 
 decided is used, and then the compiler can optimize out the unused ones, 
 which would at least keep it context-free.
 
 -Steve 
That's the intent of defining arithmetic on sign-ambiguous values. The type information propagates in a complex expression. I haven't heard of typechecking on entire expression patterns and I think it would be a rather unclean technique (it means either that there are values that you can't tell the type of, or that a given value has a context-dependent type). Andrei
Nov 25 2008
prev sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Tue, 25 Nov 2008 11:06:32 -0600, Andrei Alexandrescu wrote:

 I remembered a couple more details. The names bits8, bits16, bits32, and 
 bits64 were a possible choice for undecided-sign integrals. Walter and I 
 liked that quite some. Walter also suggested that we make those actually 
 full types accessible to programmers. We both were concerned that they'd 
 add to the already large panoply of integral types in D. Dropping bits8 
 and bits16 would reduce bloating at the cost of consistency.
 
 So we're contemplating:
 
 (a) Add bits8, bits16, bit32, bits64 public types.
 (b) Add bit32, bits64 public types.
 (c) Add bits8, bits16, bit32, bits64 compiler-internal types.
 (d) Add bit32, bits64 compiler-internal types.
 
 Make your pick or add more choices!
I'll add more. :) The problem with signed/unsigned types is that neither int nor uint is a sub-type of one another. They're essentially incompatible. Therefore a possible solution is: 1. Disallow implicit signed <=> unsigned conversion. 2. For those willing to port large C/C++ codebases introduce a compiler compatibility switch which would add global operators mimicking the C behavior: uint opAdd(int, uint) uint opAdd(uint, int) ulong opAdd(long, ulong) etc. This way you can even implement compatibility levels: only C-style additions, or additions with multiplications, or complete compatibility including the original signed/unsigned comparison behavior.
Nov 25 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Tue, 25 Nov 2008 11:06:32 -0600, Andrei Alexandrescu wrote:
 
 I remembered a couple more details. The names bits8, bits16, bits32, and 
 bits64 were a possible choice for undecided-sign integrals. Walter and I 
 liked that quite some. Walter also suggested that we make those actually 
 full types accessible to programmers. We both were concerned that they'd 
 add to the already large panoply of integral types in D. Dropping bits8 
 and bits16 would reduce bloating at the cost of consistency.

 So we're contemplating:

 (a) Add bits8, bits16, bit32, bits64 public types.
 (b) Add bit32, bits64 public types.
 (c) Add bits8, bits16, bit32, bits64 compiler-internal types.
 (d) Add bit32, bits64 compiler-internal types.

 Make your pick or add more choices!
I'll add more. :) The problem with signed/unsigned types is that neither int nor uint is a sub-type of one another. They're essentially incompatible. Therefore a possible solution is: 1. Disallow implicit signed <=> unsigned conversion.
I forgot to mention that that's implied in the bitsNN approach too.
 2.  For those willing to port large C/C++ codebases introduce a compiler
 compatibility switch which would add global operators mimicking the C
 behavior:
 
 uint opAdd(int, uint)
 uint opAdd(uint, int)
 ulong opAdd(long, ulong)
 etc.
Having semantics depend so heavily and confusingly on a compiler switch is extremely dangerous. Note that actually quite a lot of code will compile, with different semantics, with or without the switch.
 This way you can even implement compatibility levels: only C-style
 additions, or additions with multiplications, or complete compatibility
 including the original signed/unsigned comparison behavior.
I don't think we can pursue such a path. Andrei
Nov 25 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Tue, 25 Nov 2008 15:49:23 -0600, Andrei Alexandrescu wrote:

 Sergey Gromov wrote:
 2.  For those willing to port large C/C++ codebases introduce a compiler
 compatibility switch which would add global operators mimicking the C
 behavior:
 
 uint opAdd(int, uint)
 uint opAdd(uint, int)
 ulong opAdd(long, ulong)
 etc.
Having semantics depend so heavily and confusingly on a compiler switch is extremely dangerous. Note that actually quite a lot of code will compile, with different semantics, with or without the switch.
One of us should be missing something. There was no 'different semantics' in my proposal. The code either compiles and behaves exactly like in C or does not compile at all. The amount of code which compiles or fails depends on a compiler switch, not semantics.
Nov 25 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Tue, 25 Nov 2008 15:49:23 -0600, Andrei Alexandrescu wrote:
 
 Sergey Gromov wrote:
 2.  For those willing to port large C/C++ codebases introduce a compiler
 compatibility switch which would add global operators mimicking the C
 behavior:

 uint opAdd(int, uint)
 uint opAdd(uint, int)
 ulong opAdd(long, ulong)
 etc.
Having semantics depend so heavily and confusingly on a compiler switch is extremely dangerous. Note that actually quite a lot of code will compile, with different semantics, with or without the switch.
One of us should be missing something. There was no 'different semantics' in my proposal. The code either compiles and behaves exactly like in C or does not compile at all. The amount of code which compiles or fails depends on a compiler switch, not semantics.
Sorry, I misunderstood. Andrei
Nov 25 2008
prev sibling next sibling parent reply Russell Lewis <webmaster villagersonline.com> writes:
I'm of the opinion that we should make mixed-sign operations a 
compile-time error.  I know that it would be annoying in some 
situations, but IMHO it gives you clearer, more reliable code.

IMHO, it's a mistake to have implicit casts that lose information.


Want to hear a funny/sad, but somewhat related story?  I was chasing 
down a segfault recently at work.  I hunted and hunted, and finally 
found out that the pointer returned from malloc() was bad.  I figured 
that I was overwriting the heap, right?  So I added tracing and 
debugging everywhere...no luck.

I finally, in desperation, included <stdlib.h> to the source file (there 
was a warning about malloc() not being prototyped)...and the segfaults 
vanished!!!

The problem was that the xlc compiler, when it doesn't have the 
prototype for a function, assumes that it returns int...but int is 32 
bits.  Moreover, the compiler was happily implicitly casting that int to 
a pointer...which was 64 bits.

The compiler was silently cropping the top 32 bits off my pointers.

And it all was a "feature" to make programming "easier."


Russ

Andrei Alexandrescu wrote:
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.
 
 A classic problem with C and C++ integer arithmetic is that any 
 operation involving at least an unsigned integral receives automatically 
 an unsigned type, regardless of how silly that actually is, 
 semantically. About the only advantage of this rule is that it's simple. 
 IMHO it only has disadvantages from then on.
 
 The following operations suffer from the "abusive unsigned syndrome" (u 
 is an unsigned integral, i is a signed integral):
 
 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u
 
 Logic operations &, |, and ^ also yield unsigned, but such cases are 
 less abusive because at least the operation wasn't arithmetic in the 
 first place. Comparing for equality is also quite a conundrum - should 
 minus two billion compare equal to 2_294_967_296? I'll ignore these for 
 now and focus on (1) - (6).
 
 So far we haven't found a solid solution to this problem that at the 
 same time allows "good" code pass through, weeds out "bad" code, and is 
 compatible with C and C++. The closest I got was to have the compiler 
 define the following internal types:
 
 __intuint
 __longulong
 
 I've called them "dual-signed integers" in the past, but let's try the 
 shorter "undecided sign". Each of these is a subtype of both the signed 
 and the unsigned integral in its name, e.g. __intuint is a subtype of 
 both int and uint. (Originally I thought of defining __byteubyte and 
 __shortushort as well but dropped them in the interest of simplicity.)
 
 The sign-ambiguous operations (1) - (6) yield __intuint if no operand 
 size was larger than 32 bits, and __longulong otherwise. Undecided sign 
 types define their own operations. Let x and y be values of undecided 
 sign. Then x + y, x - y, and -x also return a sign-ambiguous integral 
 (the size is that of the largest operand). However, the other operators 
 do not work on sign-ambiguous integrals, e.g. x / y would not compile 
 because you must decide what sign x and y should have prior to invoking 
 the operation. (Rationale: multiplication/division work differently 
 depending on the signedness of their operands).
 
 User code cannot define a symbol of sign-ambiguous type, e.g.
 
 auto a = u + i;
 
 would not compile. However, given that __intuint is a subtype of both 
 int and uint, it can be freely converted to either whenever there's no 
 ambiguity:
 
 int a = u + i; // fine
 uint b = u + i; // fine
 
 The advantage of this scheme is that it weeds out many (most? all?) 
 surprises and oddities caused by the abusive unsigned rule of C and C++. 
 The disadvantage is that it is more complex and may surprise the novice 
 in its own way by refusing to compile code that looks legit.
 
 At the moment, we're in limbo regarding the decision to go forward with 
 this. Walter, as many good long-time C programmers, knows the abusive 
 unsigned rule so well he's not hurt by it and consequently has little 
 incentive to see it as a problem. I have had to teach C and C++ to young 
 students coming from Java introductory courses and have a more 
 up-to-date perspective on the dangers. My strong belief is that we need 
 to address this mess somehow, which type inference will only make more 
 painful (in the hand of the beginner, auto can be a quite dangerous tool 
 for wrong belief propagation). I also know seasoned programmers who had 
 no idea that -u compiles and that it also oddly returns an unsigned type.
 
 Your opinions, comments, and suggestions for improvements would as 
 always be welcome.
 
 
 Andrei
Nov 25 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
(You may want to check your system's date, unless of course you traveled 
in time.)

Russell Lewis wrote:
 I'm of the opinion that we should make mixed-sign operations a 
 compile-time error.  I know that it would be annoying in some 
 situations, but IMHO it gives you clearer, more reliable code.
The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
 IMHO, it's a mistake to have implicit casts that lose information.
Hear, hear.
 Want to hear a funny/sad, but somewhat related story?  I was chasing 
 down a segfault recently at work.  I hunted and hunted, and finally 
 found out that the pointer returned from malloc() was bad.  I figured 
 that I was overwriting the heap, right?  So I added tracing and 
 debugging everywhere...no luck.
 
 I finally, in desperation, included <stdlib.h> to the source file (there 
 was a warning about malloc() not being prototyped)...and the segfaults 
 vanished!!!
 
 The problem was that the xlc compiler, when it doesn't have the 
 prototype for a function, assumes that it returns int...but int is 32 
 bits.  Moreover, the compiler was happily implicitly casting that int to 
 a pointer...which was 64 bits.
 
 The compiler was silently cropping the top 32 bits off my pointers.
 
 And it all was a "feature" to make programming "easier."
Good story for reminding ourselves of the advantages of type safety! Andrei
Nov 25 2008
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 The problem is, it's much more annoying than one might imagine. Even 
 array.length - 1 is up for scrutiny. Technically, even array.length + 1 
 is a problem because 1 is really a signed int. We could provide 
 exceptions for constants, but exceptions are generally not solving the 
 core issue.
That can be solved making array.length signed. Can you list few other annoying situations? Bye, bearophile
Nov 25 2008
next sibling parent reply "Nick Sabalausky" <a a.a> writes:
"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:gghsa1$2u0c$1 digitalmars.com...
 Andrei Alexandrescu:
 The problem is, it's much more annoying than one might imagine. Even
 array.length - 1 is up for scrutiny. Technically, even array.length + 1
 is a problem because 1 is really a signed int. We could provide
 exceptions for constants, but exceptions are generally not solving the
 core issue.
That can be solved making array.length signed. Can you list few other annoying situations?
I disagree. If you start using that as a solution, then you may as well eliminate unsigned values entirely. I think the root problem with disallowing mixed-sign operations is that math just doesn't work that way. What I mean by that is, disallowing mixed-sign operations implies that we have these nice cleanly separated worlds of "signed math" and "unsigned math". But depending on the operator, the signs/ordering of the operands, and what the operands actually represent, math has tendancy to switch back and forth between the signed ("can be negative") and unsigned ("can't be negative") worlds. So if we have a type system that forces us to jump through hoops every time that world-switch happens, and we then decide that it's justifiable to say "well, let's fix it for array.length by tossing that over to the 'can be negative' world, even though it cuts our range of allowable values in half", then there's nothing stopping us from solving the rest of the cases by throwing them over the "can be negative" wall as well. All of a sudden, we have no unsigned. Just a thought: Maybe some sort of built-in "units" system could help here? Instead of just making array.length a "signed" or "unsigned" and leavng it as that, add a "units system" and tag array.length as being a length, with length tags carring the connotation that negative is disallowed. Adding/subtracting a pure constant to a length would cause the constant to be automaticlly tagged as a "length delta" (which can be negative). And the units system would, of course, contain the rule that a length delta added/subtracted from a length results in a length. The units system could then translate all of that into "signed vs unsigned".
Nov 25 2008
parent Kagamin <spam here.lot> writes:
Nick Sabalausky Wrote:

 happens, and we then decide that it's justifiable to say "well, let's fix it 
 for array.length by tossing that over to the 'can be negative' world, even 
 though it cuts our range of allowable values in half", then there's nothing 
 stopping us from solving the rest of the cases by throwing them over the 
 "can be negative" wall as well. All of a sudden, we have no unsigned.
Well... cutting out range can be no problem, after all a thought was floating around that structs shouldn't be larger that a couple of kb, note that array of shorts with signed length spans entire 32-bit address space.
Nov 26 2008
prev sibling next sibling parent Daniel de Kok <daniel nowhere.nospam> writes:
On Tue, 25 Nov 2008 16:56:17 -0500, bearophile wrote:
 Andrei Alexandrescu:
 The problem is, it's much more annoying than one might imagine. Even
 array.length - 1 is up for scrutiny. Technically, even array.length + 1
 is a problem because 1 is really a signed int. We could provide
 exceptions for constants, but exceptions are generally not solving the
 core issue.
That can be solved making array.length signed.
Is that conceptually clean/clear? (If so, I'd like to request an array of length -1.) I like Andrei's proposal because it keeps clarity in such cases: sizes are non-negative quantities. Once you start subtracting ints, it's possibly not a size anymore, in such cases you want the user to decide explicitly. -- Daniel
Nov 25 2008
prev sibling parent Ary Borenszweig <ary esperanto.org.ar> writes:
bearophile wrote:
 Andrei Alexandrescu:
 The problem is, it's much more annoying than one might imagine. Even 
 array.length - 1 is up for scrutiny. Technically, even array.length + 1 
 is a problem because 1 is really a signed int. We could provide 
 exceptions for constants, but exceptions are generally not solving the 
 core issue.
That can be solved making array.length signed.
unsigned types, the length of a list, array, etc., is always int. In this way, they prevented the bugs and problems everyone mention here.
Nov 26 2008
prev sibling next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 (You may want to check your system's date, unless of course you traveled
 in time.)
 Russell Lewis wrote:
 I'm of the opinion that we should make mixed-sign operations a
 compile-time error.  I know that it would be annoying in some
 situations, but IMHO it gives you clearer, more reliable code.
The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
Perhaps not, but the fact that constants are signed integers has been mentioned as a problem before. Would making these polysemous values help at all? That seems to be what your proposal is effectively trying to do anyway. Sean
Nov 25 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 (You may want to check your system's date, unless of course you traveled
 in time.)
 Russell Lewis wrote:
 I'm of the opinion that we should make mixed-sign operations a
 compile-time error.  I know that it would be annoying in some
 situations, but IMHO it gives you clearer, more reliable code.
The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
Perhaps not, but the fact that constants are signed integers has been mentioned as a problem before. Would making these polysemous values help at all? That seems to be what your proposal is effectively trying to do anyway.
Well with constants we can do many tricks; I mentioned an extreme example. Polysemy does indeed help but my latest design (described in the post starting this thread) gets away with simple subtyping. I like polysemy (the name is really cool :o)) but I don't want to be concept-heavy: if a classic technique words, I'd use that and save polysemy for a tougher task that cannot be comfortably tackled with existing means. Andrei
Nov 25 2008
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-25 16:39:05 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Russell Lewis wrote:
 I'm of the opinion that we should make mixed-sign operations a 
 compile-time error.  I know that it would be annoying in some 
 situations, but IMHO it gives you clearer, more reliable code.
The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
Then the problem is that integer literals are of a specific type. Just make them polysemous and the problem is solved. I'm with Russel on this one. To me, a litteral value (123, -8, 0) is not an int, not even a constant: it's just a number which doesn't imply any type at all until you place it into a variable (or a constant, or an enum, etc.). And if you're afraid the word polysemous will scare people, don't say the word and call it a "integer litteral". Polysemy in this case is just a mechanism used by the compiler to make the value work as expected with all integral types. All you really need is a type implicitly castable to everything capable of holding the numerical value (much like your __intuint). I'd make "auto x = 1" create a signed integer variable for the sake of simplicity. And all this would also make "uint x = -1" illegal... but then you can easily use "uint x = uint.max" if you want to enable all the bits. It's easier as in C: you don't have to include the right header and remember the name of a constant. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 26 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-11-25 16:39:05 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Russell Lewis wrote:
 I'm of the opinion that we should make mixed-sign operations a 
 compile-time error.  I know that it would be annoying in some 
 situations, but IMHO it gives you clearer, more reliable code.
The problem is, it's much more annoying than one might imagine. Even array.length - 1 is up for scrutiny. Technically, even array.length + 1 is a problem because 1 is really a signed int. We could provide exceptions for constants, but exceptions are generally not solving the core issue.
Then the problem is that integer literals are of a specific type. Just make them polysemous and the problem is solved.
Well that at best takes care of _some_ operations involving constants, but for example does not quite take care of array.length - 1. I am now sorry I gave the silly example of array.length + 1. Many people latched on it and thought that solving that solves the whole problem. That's not quite the case. Also consider: auto delta = a1.length - a2.length; What should the type of delta be? Well, it depends. In my scheme that wouldn't even compile, which I think is a good thing; you must decide whether prior information makes it an unsigned or a signed integral.
 I'm with Russel on this one. To me, a litteral value (123, -8, 0) is not 
 an int, not even a constant: it's just a number which doesn't imply any 
 type at all until you place it into a variable (or a constant, or an 
 enum, etc.).

 And if you're afraid the word polysemous will scare people, don't say 
 the word and call it a "integer litteral". Polysemy in this case is just 
 a mechanism used by the compiler to make the value work as expected with 
 all integral types. All you really need is a type implicitly castable to 
 everything capable of holding the numerical value (much like your 
 __intuint).
 
 I'd make "auto x = 1" create a signed integer variable for the sake of 
 simplicity.
That can be formalized by having polysemous types have a "lemma", a default type.
 And all this would also make "uint x = -1" illegal... but then you can 
 easily use "uint x = uint.max" if you want to enable all the bits. It's 
 easier as in C: you don't have to include the right header and remember 
 the name of a constant.
Fine. With constants there is some mileage that can be squeezed. But let's keep in mind that that doesn't solve the larger issue. Andrei
Nov 26 2008
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Well that at best takes care of _some_ operations involving constants, 
 but for example does not quite take care of array.length - 1.
How does it not solve the problem. array.length is of type uint, 1 is polysemous(byte, ubyte, short, ushort, int, uint, long, ulong). Only "uint - uint" is acceptable, and its result is "uint".
 Also consider:
 
 auto delta = a1.length - a2.length;
 
 What should the type of delta be? Well, it depends. In my scheme that 
 wouldn't even compile, which I think is a good thing; you must decide 
 whether prior information makes it an unsigned or a signed integral.
In my scheme it would give you a uint. You'd have to cast to get a signed integer... I see how it's not ideal, but I can't imagine how it could be coherent otherwise. auto diff = cast(int)a1.length - cast(int)a2.length; By casting explicitly, you indicate in the code that if a1.length or a2.length contain numbers which are too big to be represented as int, you'll get garbage. In this case, it'd be pretty surprising to get that problem. In other cases it may not be so clear-cut. Perhaps we could add a "sign" property to uint and an "unsign" property to int that'd give you the signed or unsigned corresponding value and which could do range checking at runtime (enabled by a compiler flag). auto diff = a1.length.sign - a2.length.sign; And for the general problem of "uint - uint" giving a result below uint.min, as I said in my other post, that could be handled by a runtime check (enabled by a compiler flag) just like array bound checking. One last thing. I think that in general it's a much better habit to change the type to signed prior doing the substratction. It may be harmless in the case of a substraction, but as you said when starting the thread, it isn't for others (multiply, divide, modulo). I think the scheme above promotes this good habit by making it easier to change the type at the operands rather than at the result.
 I'd make "auto x = 1" create a signed integer variable for the sake of 
 simplicity.
That can be formalized by having polysemous types have a "lemma", a default type.
That's indeed what I'm suggesting.
 And all this would also make "uint x = -1" illegal... but then you can 
 easily use "uint x = uint.max" if you want to enable all the bits. It's 
 easier as in C: you don't have to include the right header and remember 
 the name of a constant.
Fine. With constants there is some mileage that can be squeezed. But let's keep in mind that that doesn't solve the larger issue.
Well, by making implicit convertions between uint and int illegal, we're solving the larger issue. Just not in a seemless manner. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 26 2008
parent reply Don <nospam nospam.com> writes:
Michel Fortin wrote:
 On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that 
 wouldn't even compile, which I think is a good thing; you must decide 
 whether prior information makes it an unsigned or a signed integral.
In my scheme it would give you a uint. You'd have to cast to get a signed integer... I see how it's not ideal, but I can't imagine how it could be coherent otherwise. auto diff = cast(int)a1.length - cast(int)a2.length;
Actually, there's no solution. Imagine a 32 bit system, where one object can be greater than 2GB in size (not possible in Windows AFAIK, but theoretically possible). Then if a1 is 3GB, delta cannot be stored in an int. If a2 is 3GB, it requires an int for storage, since result is less than 0. ==> I think length has to be an int. It's less bad than uint.
 Perhaps we could add a "sign" property to uint and an "unsign" property 
 to int that'd give you the signed or unsigned corresponding value and 
 which could do range checking at runtime (enabled by a compiler flag).
 
     auto diff = a1.length.sign - a2.length.sign;
 
 And for the general problem of "uint - uint" giving a result below 
 uint.min, as I said in my other post, that could be handled by a runtime 
 check (enabled by a compiler flag) just like array bound checking.
That's not bad.
 Fine. With constants there is some mileage that can be squeezed. But 
 let's keep in mind that that doesn't solve the larger issue.
Well, by making implicit convertions between uint and int illegal, we're solving the larger issue. Just not in a seemless manner.
We are of one mind. I think that constants are the root cause of the problem.
Nov 26 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Michel Fortin wrote:
 On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that 
 wouldn't even compile, which I think is a good thing; you must decide 
 whether prior information makes it an unsigned or a signed integral.
In my scheme it would give you a uint. You'd have to cast to get a signed integer... I see how it's not ideal, but I can't imagine how it could be coherent otherwise. auto diff = cast(int)a1.length - cast(int)a2.length;
Actually, there's no solution.
There is. We need to find the block of marble it's in and then chip the extra marble off it.
 Imagine a 32 bit system, where one object can be greater than 2GB in 
 size (not possible in Windows AFAIK, but theoretically possible).
It is possible in Windows if you change some I-forgot-which parameter in boot.ini.
 Then 
 if a1 is 3GB, delta cannot be stored in an int. If a2 is 3GB, it 
 requires an int for storage, since result is less than 0.
 
 ==> I think length has to be an int. It's less bad than uint.
I'm not sure how the conclusion follows from the premises, but consider this. If someone deals with large arrays, they do have the possibility of doing things like: if (a1.length >= a2.length) { size_t delta = a1.length - a2.length; ... use delta ... } else { size_t rDelta = a2.length - a1.length; ... use rDelta ... } I'm not saying it's better than sliced bread, but it is a solution. And it is correct on all systems. And cooperates with the typechecker by adding flow information to which typecheckers are usually oblivious. And types are out in the clear. And it's the programmer, not the compiler, who decides the signedness. In contrast, using ints for array lengths beyond 2GB is a nightmare. I'm not saying it's a frequent thing though, but since you woke up the sleeping dog, I'm just barking :o).
 Perhaps we could add a "sign" property to uint and an "unsign" 
 property to int that'd give you the signed or unsigned corresponding 
 value and which could do range checking at runtime (enabled by a 
 compiler flag).

     auto diff = a1.length.sign - a2.length.sign;

 And for the general problem of "uint - uint" giving a result below 
 uint.min, as I said in my other post, that could be handled by a 
 runtime check (enabled by a compiler flag) just like array bound 
 checking.
That's not bad.
Well let's look closer at this. Consider a system in which the current rules are in vigor, plus the overflow check for uint. auto i = arr.length - offset1 + offset2; Although the context makes it clear that offset1 < offset2 and therefore i is within range and won't overflow, the poor code generator has no choice but insert checks throughout. Even though the entire expression is always correct, it will dynamically fail on the way to its correct form. Contrast with the proposed system in which the expression will not compile. They will indeed require the user to somewhat redundantly insert guides for operations, but during compilation, not through runtime failure.
 Fine. With constants there is some mileage that can be squeezed. But 
 let's keep in mind that that doesn't solve the larger issue.
Well, by making implicit convertions between uint and int illegal, we're solving the larger issue. Just not in a seemless manner.
We are of one mind. I think that constants are the root cause of the problem.
Well I strongly disagree. (I assume you mean "literals", not "constants".) I see constants as just a small part of the signedness mess. Moreover, I consider that in fact creating symbolic names with "auto" compounds the problem, and this belief runs straight against yours that it's about literals. No, IMHO it's about espousing and then propagating wrong beliefs through auto! Maybe if you walked me through your reasoning on why literals bear a significant importance I could get convinced. As far as my code is concerned, I tend to loosely go along the lines of the old adage "the only literals in a program should be 0, 1, and -1". True, the adage doesn't say how many of these three may reasonably occur, but at the end of the day I'm confused about this alleged importance of literals. Andrei
Nov 26 2008
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-26 13:30:30 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Well let's look closer at this. Consider a system in which the current 
 rules are in vigor, plus the overflow check for uint.
 
 auto i = arr.length - offset1 + offset2;
 
 Although the context makes it clear that offset1 < offset2 and 
 therefore i is within range and won't overflow, the poor code generator 
 has no choice but insert checks throughout. Even though the entire 
 expression is always correct, it will dynamically fail on the way to 
 its correct form.
That's because you're relying on a specific behaviour for overflows and that changes with range checking. True: in some cases the values going circular is desirable. But in this specific case I'd say it'd be better to just add parenthesis at the right place, or change the order of the arguments to avoid overflow. Avoiding overflows is a good practice in general. The only reason it doesn't bite here is because you're limited to additions and subtractions. If you dislike the compiler checking for overflows, just tell it not to check. That's why we need a compiler switch. Perhaps it'd be good to have a pragma to disable those checks for specific pieces of code too.
 Contrast with the proposed system in which the expression will not 
 compile. They will indeed require the user to somewhat redundantly 
 insert guides for operations, but during compilation, not through 
 runtime failure.
If you're just adding a special rule to prevent the result of substractions of unsigned values to be put into auto variables, I'm not terribly against that. I'm just unconvinced of its usefullness. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 26 2008
prev sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that  
 wouldn't even compile, which I think is a good thing; you must decide  
 whether prior information makes it an unsigned or a signed integral.
Sure, it shouldn't compile. But explicit casting to either type won't help. Let's say you expect that a1.length > a2.length and thus expect a strictly positive result. Putting an explicit cast will not detect (but suppress) an error and give you an erroneous result silently. Putting an assert(a1.length > a2.length) might help, but the check will be unavailable unless code is compiled with asserts enabled. A better solution would be to write code as follows: auto delta = unsigned(a1.length - a2.length); // returns an unsigned value, throws on overflow (i.e., "2 - 4") auto delta = signed(a1.length - a2.length); // returns result as a signed value. Throws on overflow (i.e., "int.min - 1") auto delta = a1.length - a2.length; // won't compile // this one is also handy: auto newLength = checked(a1.length - 1); // preserves type of a1.length, be it int or uint, throws on overflow I have previously shown an implementation of unsigned/signed: import std.stdio; int signed(lazy int dg) { auto result = dg(); asm { jo overflow; } return result; overflow: throw new Exception("Integer overflow occured"); } int main() { int t = int.max; try { int s = signed(t + 1); writefln("Result is %d", s); } catch(Exception e) { writefln("Whoops! %s", e.toString()); } return 0; } But Andrei has correctly pointed out that it has a problem - it may throw without a reason: int i = int.max + 1; // sets an overflow flag auto result = expectSigned(1); // raises an exception Overflow flag may also be cleared in a complex expression: auto result = expectUnsigned(1 + (uint.max + 1)); // first add will overflow and second one clears the flag -> no exception as a result A possible solution is to make the compiler aware of this construct and disallow passing none (case 2) or more that one operation (case 1) to the method.
Nov 26 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Denis Koroskin wrote:
 On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that 
 wouldn't even compile, which I think is a good thing; you must decide 
 whether prior information makes it an unsigned or a signed integral.
Sure, it shouldn't compile. But explicit casting to either type won't help. Let's say you expect that a1.length > a2.length and thus expect a strictly positive result. Putting an explicit cast will not detect (but suppress) an error and give you an erroneous result silently.
But "silently" and "putting a cast" don't go together. It's the cast that makes the erroneous result non-silent. Besides, you don't need to cast. You can always use a function that does the requisite checks. std.conv will have some of those, should any change in the rules make it necessary. By this I'm essentially replying Don's message in the bugs newsgroup: nobody puts a gun to your head to cast.
 Putting an assert(a1.length > a2.length) might help, but the check will 
 be unavailable unless code is compiled with asserts enabled.
Put an enforce(a1.length > a2.length) then.
 A better solution would be to write code as follows:
 
 auto delta = unsigned(a1.length - a2.length); // returns an unsigned 
 value, throws on overflow (i.e., "2 - 4")
 auto delta = signed(a1.length - a2.length); // returns result as a 
 signed value. Throws on overflow (i.e., "int.min - 1")
 auto delta = a1.length - a2.length; // won't compile
Amazingly this solution was discussed with these exact names! The signed and unsigned functions can be implemented as libraries, but unfortunately (or fortunately I guess) that means the bits32 and bits64 are available to all code. One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
 // this one is also handy:
 auto newLength = checked(a1.length - 1); // preserves type of a1.length, 
 be it int or uint, throws on overflow
This could be rather tricky. How can overflow be checked? By inspecting the status bits in the processor only; at the language/typesystem level there's little to do.
 I have previously shown an implementation of unsigned/signed:
 
 import std.stdio;
 
 int signed(lazy int dg)
 {
     auto result = dg();
     asm {
        jo overflow;
     }
     return result;
 
     overflow:
     throw new Exception("Integer overflow occured");
 }
 
 int main()
 {
    int t = int.max;
    try
    {
        int s = signed(t + 1);
        writefln("Result is %d", s);
    }
    catch(Exception e)
    {
        writefln("Whoops! %s", e.toString());
    }
    return 0;
 }
Ah, there we go! Thanks for pasting this code.
 But Andrei has correctly pointed out that it has a problem - it may 
 throw without a reason:
 int i = int.max + 1; // sets an overflow flag
 auto result = expectSigned(1); // raises an exception
 
 Overflow flag may also be cleared in a complex expression:
 auto result = expectUnsigned(1 + (uint.max + 1)); // first add will 
 overflow and second one clears the flag -> no exception as a result
 
 A possible solution is to make the compiler aware of this construct and 
 disallow passing none (case 2) or more that one operation (case 1) to 
 the method.
Can't you clear the overflow flag prior to invoking the operation? I'll also mention that making it a delegate reduces appeal quite a bit; expressions under the check tend to be simple which makes the relative overhead huge. Andrei
Nov 26 2008
next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Wed, 26 Nov 2008 21:45:30 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that  
 wouldn't even compile, which I think is a good thing; you must decide  
 whether prior information makes it an unsigned or a signed integral.
Sure, it shouldn't compile. But explicit casting to either type won't help. Let's say you expect that a1.length > a2.length and thus expect a strictly positive result. Putting an explicit cast will not detect (but suppress) an error and give you an erroneous result silently.
But "silently" and "putting a cast" don't go together. It's the cast that makes the erroneous result non-silent. Besides, you don't need to cast. You can always use a function that does the requisite checks. std.conv will have some of those, should any change in the rules make it necessary. By this I'm essentially replying Don's message in the bugs newsgroup: nobody puts a gun to your head to cast.
 Putting an assert(a1.length > a2.length) might help, but the check will  
 be unavailable unless code is compiled with asserts enabled.
Put an enforce(a1.length > a2.length) then.
Right, it is better. Problem is, you don't want to put checks like "a1.length > a2.length" into your code (I don't, at least). All you want is to be sure that "auto result = a1.length - a2.length" is positive. You *then* decide and solve the "a1.length - a2.length >= 0" equation that leads to the check. Moreover, why evaluate both a1.length and a2.length twice? And you should update all your checks everytime you change your code.
 A better solution would be to write code as follows:
  auto delta = unsigned(a1.length - a2.length); // returns an unsigned  
 value, throws on overflow (i.e., "2 - 4")
 auto delta = signed(a1.length - a2.length); // returns result as a  
 signed value. Throws on overflow (i.e., "int.min - 1")
 auto delta = a1.length - a2.length; // won't compile
Amazingly this solution was discussed with these exact names! The signed and unsigned functions can be implemented as libraries, but unfortunately (or fortunately I guess) that means the bits32 and bits64 are available to all code. One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
 // this one is also handy:
 auto newLength = checked(a1.length - 1); // preserves type of  
 a1.length, be it int or uint, throws on overflow
This could be rather tricky. How can overflow be checked? By inspecting the status bits in the processor only; at the language/typesystem level there's little to do.
It is an implementation detail. Expression can be calculated with higher bit precision and result compared to needed range.
 I have previously shown an implementation of unsigned/signed:
  import std.stdio;
  int signed(lazy int dg)
 {
     auto result = dg();
     asm {
        jo overflow;
     }
     return result;
      overflow:
     throw new Exception("Integer overflow occured");
 }
  int main()
 {
    int t = int.max;
    try
    {
        int s = signed(t + 1);
        writefln("Result is %d", s);
    }
    catch(Exception e)
    {
        writefln("Whoops! %s", e.toString());
    }
    return 0;
 }
Ah, there we go! Thanks for pasting this code.
 But Andrei has correctly pointed out that it has a problem - it may  
 throw without a reason:
 int i = int.max + 1; // sets an overflow flag
 auto result = expectSigned(1); // raises an exception
  Overflow flag may also be cleared in a complex expression:
 auto result = expectUnsigned(1 + (uint.max + 1)); // first add will  
 overflow and second one clears the flag -> no exception as a result
  A possible solution is to make the compiler aware of this construct  
 and disallow passing none (case 2) or more that one operation (case 1)  
 to the method.
Can't you clear the overflow flag prior to invoking the operation?
No need for this, it adds one more instruction for no gain, flag is automatically set/reset at any of add/sub/mul operations. It can only save you from "auto result = signed(1)" error, that's why I said it should be disallowed in first place.
 I'll also mention that making it a delegate reduces appeal quite a bit;  
 expressions under the check tend to be simple which makes the relative  
 overhead huge.
Such simple instructions are usually inlined, aren't they?
Nov 26 2008
prev sibling parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that 
 wouldn't even compile, which I think is a good thing; you must decide 
 whether prior information makes it an unsigned or a signed integral.
Sure, it shouldn't compile. But explicit casting to either type won't help. Let's say you expect that a1.length > a2.length and thus expect a strictly positive result. Putting an explicit cast will not detect (but suppress) an error and give you an erroneous result silently.
But "silently" and "putting a cast" don't go together. It's the cast that makes the erroneous result non-silent. Besides, you don't need to cast. You can always use a function that does the requisite checks. std.conv will have some of those, should any change in the rules make it necessary.
I doubt that would be used in practice.
 By this I'm essentially replying Don's message in the bugs newsgroup: 
 nobody puts a gun to your head to cast.
 
 Putting an assert(a1.length > a2.length) might help, but the check 
 will be unavailable unless code is compiled with asserts enabled.
Put an enforce(a1.length > a2.length) then.
 A better solution would be to write code as follows:

 auto delta = unsigned(a1.length - a2.length); // returns an unsigned 
 value, throws on overflow (i.e., "2 - 4")
 auto delta = signed(a1.length - a2.length); // returns result as a 
 signed value. Throws on overflow (i.e., "int.min - 1")
 auto delta = a1.length - a2.length; // won't compile
Amazingly this solution was discussed with these exact names! The signed and unsigned functions can be implemented as libraries, but unfortunately (or fortunately I guess) that means the bits32 and bits64 are available to all code. One fear of mine is the reaction of throwing of hands in the air "how many integral types are enough???". However, if we're to judge by the addition of long long and a slew of typedefs to C99 and C++0x, the answer is "plenty". I'd be interested in gaging how people feel about adding two (bits64, bits32) or even four (bits64, bits32, bits16, and bits8) types as basic types. They'd be bitbags with undecided sign ready to be converted to their counterparts of decided sign.
Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice. If we were going to introduce a slew of new types, I'd want them to be for 'positive int'/'natural int', 'positive byte', etc. Natural int can always be implicitly converted to either int or uint, with perfect safety. No other conversions are possible without a cast. Non-negative literals and manifest constants are naturals. The rules are: 1. Anything involving unsigned is unsigned, (same as C). 2. Else if it contains an integer, it is an integer. 3. (Now we know all quantities are natural): If it contains a subtraction, it is an integer [Probably allow subtraction of compile-time quantities to remain natural, if the values stay in range; flag an error if an overflow occurs]. 4. Else it is a natural. The reason I think literals and manifest constants are so important is that they are a significant fraction of the natural numbers in a program. [Just before posting I've discovered that other people have posted some similar ideas].
Nov 27 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air "how 
 many integral types are enough???". However, if we're to judge by the 
 addition of long long and a slew of typedefs to C99 and C++0x, the 
 answer is "plenty". I'd be interested in gaging how people feel about 
 adding two (bits64, bits32) or even four (bits64, bits32, bits16, and 
 bits8) types as basic types. They'd be bitbags with undecided sign 
 ready to be converted to their counterparts of decided sign.
Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
 If we were going to introduce a slew of new types, I'd want them to be 
 for 'positive int'/'natural int', 'positive byte', etc.
 
 Natural int can always be implicitly converted to either int or uint, 
 with perfect safety. No other conversions are possible without a cast.
 Non-negative literals and manifest constants are naturals.
 
 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the values 
 stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.
 
 
 The reason I think literals and manifest constants are so important is 
 that they are a significant fraction of the natural numbers in a program.
 
 [Just before posting I've discovered that other people have posted some 
 similar ideas].
That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Nov 27 2008
parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air "how 
 many integral types are enough???". However, if we're to judge by the 
 addition of long long and a slew of typedefs to C99 and C++0x, the 
 answer is "plenty". I'd be interested in gaging how people feel about 
 adding two (bits64, bits32) or even four (bits64, bits32, bits16, and 
 bits8) types as basic types. They'd be bitbags with undecided sign 
 ready to be converted to their counterparts of decided sign.
Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
 If we were going to introduce a slew of new types, I'd want them to be 
 for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or uint, 
 with perfect safety. No other conversions are possible without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so important is 
 that they are a significant fraction of the natural numbers in a program.

 [Just before posting I've discovered that other people have posted 
 some similar ideas].
That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
Nov 27 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to judge 
 by the addition of long long and a slew of typedefs to C99 and 
 C++0x, the answer is "plenty". I'd be interested in gaging how 
 people feel about adding two (bits64, bits32) or even four (bits64, 
 bits32, bits16, and bits8) types as basic types. They'd be bitbags 
 with undecided sign ready to be converted to their counterparts of 
 decided sign.
Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
 If we were going to introduce a slew of new types, I'd want them to 
 be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or uint, 
 with perfect safety. No other conversions are possible without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so important 
 is that they are a significant fraction of the natural numbers in a 
 program.

 [Just before posting I've discovered that other people have posted 
 some similar ideas].
That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values. (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on. What do you think? Andrei
Nov 27 2008
parent reply KennyTM~ <kennytm gmail.com> writes:
Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to judge 
 by the addition of long long and a slew of typedefs to C99 and 
 C++0x, the answer is "plenty". I'd be interested in gaging how 
 people feel about adding two (bits64, bits32) or even four (bits64, 
 bits32, bits16, and bits8) types as basic types. They'd be bitbags 
 with undecided sign ready to be converted to their counterparts of 
 decided sign.
Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
 If we were going to introduce a slew of new types, I'd want them to 
 be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or 
 uint, with perfect safety. No other conversions are possible without 
 a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so important 
 is that they are a significant fraction of the natural numbers in a 
 program.

 [Just before posting I've discovered that other people have posted 
 some similar ideas].
That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values. (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.
So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?! The opposite sounds more natural to me.
 What do you think?
 
 
 Andrei
Nov 27 2008
parent reply KennyTM~ <kennytm gmail.com> writes:
KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to 
 judge by the addition of long long and a slew of typedefs to C99 
 and C++0x, the answer is "plenty". I'd be interested in gaging how 
 people feel about adding two (bits64, bits32) or even four 
 (bits64, bits32, bits16, and bits8) types as basic types. They'd 
 be bitbags with undecided sign ready to be converted to their 
 counterparts of decided sign.
Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
 If we were going to introduce a slew of new types, I'd want them to 
 be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or 
 uint, with perfect safety. No other conversions are possible 
 without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so important 
 is that they are a significant fraction of the natural numbers in a 
 program.

 [Just before posting I've discovered that other people have posted 
 some similar ideas].
That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values. (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.
So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?! The opposite sounds more natural to me.
Em, or do you mean the tightest type that can represent all possible results? (so long*int == cent?)
 What do you think?


 Andrei
Nov 27 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
KennyTM~ wrote:
 KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to 
 judge by the addition of long long and a slew of typedefs to C99 
 and C++0x, the answer is "plenty". I'd be interested in gaging 
 how people feel about adding two (bits64, bits32) or even four 
 (bits64, bits32, bits16, and bits8) types as basic types. They'd 
 be bitbags with undecided sign ready to be converted to their 
 counterparts of decided sign.
Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
 If we were going to introduce a slew of new types, I'd want them 
 to be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or 
 uint, with perfect safety. No other conversions are possible 
 without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so 
 important is that they are a significant fraction of the natural 
 numbers in a program.

 [Just before posting I've discovered that other people have posted 
 some similar ideas].
That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values. (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.
So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?! The opposite sounds more natural to me.
Em, or do you mean the tightest type that can represent all possible results? (so long*int == cent?)
The tightest type possible depends on the operation. In that doctrine, long * int yields a long (given the demise of cent). Walters things such rules are too complicated, but I'm a big fan of operation-dependent typing. I see no good reason for requiring int * long have the same type as int / long. They are different operations with different semantics and corner cases and whatnot, so the resulting static type may as well be different. By the way, under the tightest type doctrine, uint & ubyte is typed as ubyte. Interesting that one, huh :o). Andrei
Nov 27 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Andrei Alexandrescu wrote:
 KennyTM~ wrote:
 KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to 
 judge by the addition of long long and a slew of typedefs to C99 
 and C++0x, the answer is "plenty". I'd be interested in gaging 
 how people feel about adding two (bits64, bits32) or even four 
 (bits64, bits32, bits16, and bits8) types as basic types. They'd 
 be bitbags with undecided sign ready to be converted to their 
 counterparts of decided sign.
Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
 If we were going to introduce a slew of new types, I'd want them 
 to be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or 
 uint, with perfect safety. No other conversions are possible 
 without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so 
 important is that they are a significant fraction of the natural 
 numbers in a program.

 [Just before posting I've discovered that other people have 
 posted some similar ideas].
That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values. (c) Unlike C, arithmetic and logical operations always return the tightest type possible, not a 32/64 bit value. For example, byte / int yields byte and so on.
So you mean long * int (e.g. 1234567890123L * 2) will return an int instead of a long?! The opposite sounds more natural to me.
Em, or do you mean the tightest type that can represent all possible results? (so long*int == cent?)
The tightest type possible depends on the operation. In that doctrine, long * int yields a long (given the demise of cent). Walters things such rules are too complicated, but I'm a big fan of operation-dependent typing. I see no good reason for requiring int * long have the same type as int / long. They are different operations with different semantics and corner cases and whatnot, so the resulting static type may as well be different. By the way, under the tightest type doctrine, uint & ubyte is typed as ubyte. Interesting that one, huh :o). Andrei
I just remembered a problem with simplemindedly going with the tightest type. Consider: uint a = ...; ubyte b = ...; auto c = a & b; c <<= 16; ... The programmer may reasonably expect that the bitwise operation yields an unsigned integer because it involved one. However, the zealous compiler cleverly notices the operation really never yields something larger than a ubyte, and therefore returns that "tightest" type, thus making c a ubyte. Subsequent uses of c will be surprising to the programmer who thought c has 32 bits. It looks like polysemy is the only solution here: return a polysemous value with principal type uint and possible type ubyte. That way, c will be typed as uint. But at the same time, continuing the example: ubyte d = a & b; will go through without a cast. That's pretty cool! One question I had is: say polysemy will be at work for integral arithmetic. Should we provide means in the language for user-defined polysemous functions? Or is it ok to leave it as compiler magic that saves redundant casts? Andrei
Nov 27 2008
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-27 22:34:50 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 One question I had is: say polysemy will be at work for integral 
 arithmetic. Should we provide means in the language for user-defined 
 polysemous functions? Or is it ok to leave it as compiler magic that 
 saves redundant casts?
I think that'd be a must. Otherwise how would you define your own arithmetical types so they work like the built-in ones? struct ArbitraryPrecisionInt { ... } ArbitraryPrecisionInt a = ...; uint b = ...; auto c = a & b; c <<= 16; ... Should't c be of type ArbitraryPresisionInt? And shouldn't the following work too? uint d = a & b; That said, how can a function return a polysemous value at all? Should the function return a special kind of struct with a sample of every supported type? That'd be utterly inefficient. Should it return a custom-made struct with the ability to implicitly cast itself to other types? That would make the polysemous value propagatable through auto, and probably less efficient too. The only way I can see this work correctly is with function overloading on return type, with a way to specify the default function (for when the return type is not specified, such as with auto). In the case above, you'd need something like this: struct ArbitraryPrecisionInt { default ArbitraryPrecisionInt opAnd(uint i); uint opAnd(uint i); } -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 28 2008
prev sibling parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 KennyTM~ wrote:
 KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to 
 judge by the addition of long long and a slew of typedefs to C99 
 and C++0x, the answer is "plenty". I'd be interested in gaging 
 how people feel about adding two (bits64, bits32) or even four 
 (bits64, bits32, bits16, and bits8) types as basic types. They'd 
 be bitbags with undecided sign ready to be converted to their 
 counterparts of decided sign.
Here I think we have a fundamental disagreement: what is an 'unsigned int'? There are two disparate ideas: (A) You think that it is an approximation to a natural number, ie, a 'positive int'. (B) I think that it is a 'number with NO sign'; that is, the sign depends on context. It may, for example, be part of a larger number. Thus, I largely agree with the C behaviour -- once you have an unsigned in a calculation, it's up to the programmer to provide an interpretation. Unfortunately, the two concepts are mashed together in C-family languages. (B) is the concept supported by the language typing rules, but usage of (A) is widespread in practice.
In fact we are in agreement. C tries to make it usable as both, and partially succeeds by having very lax conversions in all directions. This leads to the occasional puzzling behaviors. I do *want* uint to be an approximation of a natural number, while acknowledging that today it isn't much of that.
 If we were going to introduce a slew of new types, I'd want them 
 to be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or 
 uint, with perfect safety. No other conversions are possible 
 without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so 
 important is that they are a significant fraction of the natural 
 numbers in a program.

 [Just before posting I've discovered that other people have 
 posted some similar ideas].
That sounds encouraging. One problem is that your approach leaves the unsigned mess as it is, so although natural types are a nice addition, they don't bring a complete solution to the table. Andrei
Well, it does make unsigned numbers (case (B)) quite obscure and low-level. They could be renamed with uglier names to make this clearer. But since in this proposal there are no implicit conversions from uint to anything, it's hard to do any damage with the unsigned type which results. Basically, with any use of unsigned, the compiler says "I don't know if this thing even has a meaningful sign!". Alternatively, we could add rule 0: mixing int and unsigned is illegal. But it's OK to mix natural with int, or natural with unsigned. I don't like this as much, since it would make most usage of unsigned ugly; but maybe that's justified.
I think we're heading towards an impasse. We wouldn't want to make things much harder for systems-level programs that mix arithmetic and bit-level operations. I'm glad there is interest and that quite a few ideas were brought up. Unfortunately, it looks like all have significant disadvantages. One compromise solution Walter and I discussed in the past is to only sever one of the dangerous implicit conversions: int -> uint. Other than that, it's much like C (everything involving one unsigned is unsigned and unsigned -> signed is implicit) Let's see where that takes us. (a) There are fewer situations when a small, reasonable number implicitly becomes a large, weird numnber. (b) An exception to (a) is that u1 - u2 is also uint, and that's for the sake of C compatibility. I'd gladly drop it if I could and leave operations such as u1 - u2 return a signed number. That assumes the least and works with small, usual values.
The problem with that, is that you're then forcing the 'unsigned is a natural' interpretation when it may be erroneous. uint.max - 10 is a uint. It's an interesting case, because int = u1 - u2 is definitely incorrect when u1 > int.max. uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned as a positive number_. But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always correct, since that's what's happening mathematically. I'm strongly of the opinion that you shouldn't be able to generate an unsigned accidentally -- you should need to either declare a type as uint, or use the 'u' suffix on a literal. Right now, properties like 'length' being uint means you get too many surprising uints, especially when using 'auto'. I take your point about not wanting to give up the full 32 bits of address space. The problem is, that if you have an object x which is
2GB, and a small object y, then  x.length - y.length will erroneously 
be negative. If we want code (especially in libraries) to cope with such large objects, we need to ensure that any time there's a subtraction involving a length, the first is larger than the second. I think that would preclude the combination: length is uint byte[].length can exceed 2GB, and code is correct when it does uint - uint is an int (or even, can implicitly convert to int) As far as I can tell, at least one of these has to go.
Nov 28 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
(I lost track of quotes, so I yanked them all beyond Don's message.)

Don wrote:
 The problem with that, is that you're then forcing the 'unsigned is a 
 natural' interpretation when it may be erroneous.
 
 uint.max - 10 is a uint.
 
 It's an interesting case, because int = u1 - u2 is definitely incorrect 
 when u1 > int.max.
 
 uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned 
 as a positive number_.
 But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always 
 correct, since that's what's happening mathematically.
Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.
 I'm strongly of the opinion that you shouldn't be able to generate an 
 unsigned accidentally -- you should need to either declare a type as 
 uint, or use the 'u' suffix on a literal.
 Right now, properties like 'length' being uint means you get too many 
 surprising uints, especially when using 'auto'.
I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)
 I take your point about not wanting to give up the full 32 bits of 
 address space. The problem is, that if you have an object x which is 
  >2GB, and a small object y, then  x.length - y.length will erroneously 
 be negative. If we want code (especially in libraries) to cope with such 
 large objects, we need to ensure that any time there's a subtraction 
 involving a length, the first is larger than the second. I think that 
 would preclude the combination:
 
 length is uint
 byte[].length can exceed 2GB, and code is correct when it does
 uint - uint is an int (or even, can implicitly convert to int)
 
 As far as I can tell, at least one of these has to go.
Well none has to go in the latest design: (a) One unsigned makes everything unsigned (b) unsigned -> signed is allowed (c) signed -> unsigned is disallowed Of course the latest design has imperfections, but precludes neither of the three things you mention. Andrei
Nov 28 2008
parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 (I lost track of quotes, so I yanked them all beyond Don's message.)
 
 Don wrote:
 The problem with that, is that you're then forcing the 'unsigned is a 
 natural' interpretation when it may be erroneous.

 uint.max - 10 is a uint.

 It's an interesting case, because int = u1 - u2 is definitely 
 incorrect when u1 > int.max.

 uint = u1 - u2 may be incorrect when u1 < u2, _if you think of 
 unsigned as a positive number_.
 But, if you think of it as a natural modulo 2^32, uint = u1-u2 is 
 always correct, since that's what's happening mathematically.
Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.
 I'm strongly of the opinion that you shouldn't be able to generate an 
 unsigned accidentally -- you should need to either declare a type as 
 uint, or use the 'u' suffix on a literal.
 Right now, properties like 'length' being uint means you get too many 
 surprising uints, especially when using 'auto'.
I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)
 I take your point about not wanting to give up the full 32 bits of 
 address space. The problem is, that if you have an object x which is 
  >2GB, and a small object y, then  x.length - y.length will 
 erroneously be negative. If we want code (especially in libraries) to 
 cope with such large objects, we need to ensure that any time there's 
 a subtraction involving a length, the first is larger than the second. 
 I think that would preclude the combination:

 length is uint
 byte[].length can exceed 2GB, and code is correct when it does
 uint - uint is an int (or even, can implicitly convert to int)

 As far as I can tell, at least one of these has to go.
Well none has to go in the latest design: (a) One unsigned makes everything unsigned (b) unsigned -> signed is allowed (c) signed -> unsigned is disallowed Of course the latest design has imperfections, but precludes neither of the three things you mention.
It's close, but how can code such as: if (x.length - y.length < 100) ... be correct in the presence of length > 2GB? since (a) x.length = uint.max, y.length = 1 (b) x.length = 4, y.length = 2 both produce the same binary result (0xFFFF_FFFE = -2) Any subtraction of two lengths has a possible range of -int.max .. uint.max which is quite problematic (and the root cause of the problems, I guess). And unfortunately I think code is riddled with subtraction of lengths.
Nov 28 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Andrei Alexandrescu wrote:
 (I lost track of quotes, so I yanked them all beyond Don's message.)

 Don wrote:
 The problem with that, is that you're then forcing the 'unsigned is a 
 natural' interpretation when it may be erroneous.

 uint.max - 10 is a uint.

 It's an interesting case, because int = u1 - u2 is definitely 
 incorrect when u1 > int.max.

 uint = u1 - u2 may be incorrect when u1 < u2, _if you think of 
 unsigned as a positive number_.
 But, if you think of it as a natural modulo 2^32, uint = u1-u2 is 
 always correct, since that's what's happening mathematically.
Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.
 I'm strongly of the opinion that you shouldn't be able to generate an 
 unsigned accidentally -- you should need to either declare a type as 
 uint, or use the 'u' suffix on a literal.
 Right now, properties like 'length' being uint means you get too many 
 surprising uints, especially when using 'auto'.
I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)
 I take your point about not wanting to give up the full 32 bits of 
 address space. The problem is, that if you have an object x which is 
  >2GB, and a small object y, then  x.length - y.length will 
 erroneously be negative. If we want code (especially in libraries) to 
 cope with such large objects, we need to ensure that any time there's 
 a subtraction involving a length, the first is larger than the 
 second. I think that would preclude the combination:

 length is uint
 byte[].length can exceed 2GB, and code is correct when it does
 uint - uint is an int (or even, can implicitly convert to int)

 As far as I can tell, at least one of these has to go.
Well none has to go in the latest design: (a) One unsigned makes everything unsigned (b) unsigned -> signed is allowed (c) signed -> unsigned is disallowed Of course the latest design has imperfections, but precludes neither of the three things you mention.
It's close, but how can code such as: if (x.length - y.length < 100) ... be correct in the presence of length > 2GB? since (a) x.length = uint.max, y.length = 1 (b) x.length = 4, y.length = 2 both produce the same binary result (0xFFFF_FFFE = -2)
(You mean x.length = 2, y.length = 4 in the second case.)
 Any subtraction of two lengths has a possible range of
  -int.max .. uint.max
 which is quite problematic (and the root cause of the problems, I guess).
 And unfortunately I think code is riddled with subtraction of lengths.
Code may be riddled with subtraction of lengths, but seems to be working with today's rule that the result of that subtraction is unsigned. So definitely we're not introducing new problems. I agree the solution has problems. Following this thread that in turn follows my sleepless nights poring over the subject, I'm glad to reach a design that is better than what we currently have. I think that disallowing the signed -> unsigned conversions will be a net improvement. Andrei
Nov 28 2008
parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 (I lost track of quotes, so I yanked them all beyond Don's message.)

 Don wrote:
 The problem with that, is that you're then forcing the 'unsigned is 
 a natural' interpretation when it may be erroneous.

 uint.max - 10 is a uint.

 It's an interesting case, because int = u1 - u2 is definitely 
 incorrect when u1 > int.max.

 uint = u1 - u2 may be incorrect when u1 < u2, _if you think of 
 unsigned as a positive number_.
 But, if you think of it as a natural modulo 2^32, uint = u1-u2 is 
 always correct, since that's what's happening mathematically.
Sounds good. One important consideration is that modulo arithmetic is considerably easier to understand when two's complement and signs are not involved.
 I'm strongly of the opinion that you shouldn't be able to generate 
 an unsigned accidentally -- you should need to either declare a type 
 as uint, or use the 'u' suffix on a literal.
 Right now, properties like 'length' being uint means you get too 
 many surprising uints, especially when using 'auto'.
I am not surprised by length being unsigned. I'm also not surprised by hexadecimal constants being unsigned. (They are unsigned in C. Walter made them signed or not, depending on their value.)
 I take your point about not wanting to give up the full 32 bits of 
 address space. The problem is, that if you have an object x which is 
  >2GB, and a small object y, then  x.length - y.length will 
 erroneously be negative. If we want code (especially in libraries) 
 to cope with such large objects, we need to ensure that any time 
 there's a subtraction involving a length, the first is larger than 
 the second. I think that would preclude the combination:

 length is uint
 byte[].length can exceed 2GB, and code is correct when it does
 uint - uint is an int (or even, can implicitly convert to int)

 As far as I can tell, at least one of these has to go.
Well none has to go in the latest design: (a) One unsigned makes everything unsigned (b) unsigned -> signed is allowed (c) signed -> unsigned is disallowed Of course the latest design has imperfections, but precludes neither of the three things you mention.
It's close, but how can code such as: if (x.length - y.length < 100) ... be correct in the presence of length > 2GB? since (a) x.length = uint.max, y.length = 1 (b) x.length = 4, y.length = 2 both produce the same binary result (0xFFFF_FFFE = -2)
(You mean x.length = 2, y.length = 4 in the second case.)
Yes.
 
 Any subtraction of two lengths has a possible range of
  -int.max .. uint.max
 which is quite problematic (and the root cause of the problems, I guess).
 And unfortunately I think code is riddled with subtraction of lengths.
Code may be riddled with subtraction of lengths, but seems to be working with today's rule that the result of that subtraction is unsigned. So definitely we're not introducing new problems.
Yes. I think much existing code would fail with sizes over 2GB, though. But it's not any worse.
 
 I agree the solution has problems. Following this thread that in turn 
 follows my sleepless nights poring over the subject, I'm glad to reach a 
 design that is better than what we currently have. I think that 
 disallowing the signed -> unsigned conversions will be a net improvement.
I agree. And dealing with compile-time constants will improve things even more.
Nov 28 2008
parent Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-11-28 17:44:39 +0100, Don <nospam nospam.com> said:

 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 (I lost track of quotes, so I yanked them all beyond Don's message.)
 
 Don wrote:
 The problem with that, is that you're then forcing the 'unsigned is a 
 natural' interpretation when it may be erroneous.
 
 uint.max - 10 is a uint.
 
 It's an interesting case, because int = u1 - u2 is definitely incorrect 
 when u1 > int.max.
 
 uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned 
 as a positive number_.
 But, if you think of it as a natural modulo 2^32, uint = u1-u2 is 
 always correct, since that's what's happening mathematically.
[...]
 
Any subtraction of two lengths has a possible range of -int.max .. uint.max which is quite problematic (and the root cause of the problems, I guess). And unfortunately I think code is riddled with subtraction of lengths.
Code may be riddled with subtraction of lengths, but seems to be working with today's rule that the result of that subtraction is unsigned. So definitely we're not introducing new problems.
Yes. I think much existing code would fail with sizes over 2GB, though. But it's not any worse.
I found a couple of instances where to compare addresses simply a-b was done, instead of something like ((a<b)?-1:((a==b)?0:1)), so yes this is a pitfall that happens. Note that normally the subtraction of lengths is ok (because normally one is interested in the result and a>b), it is when it is used as quick way to introduce ordering (i.e. as comparison) that it becomes problematic. By the way the solution of going beyond 2GB is clearly using size_t, as I think is done (at least in tango). Fawzi
Dec 01 2008
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 28 Nov 2008 17:09:25 +0100, Don wrote:

 
 It's close, but how can code such as:
 
 if (x.length - y.length < 100) ...
 
 be correct in the presence of length > 2GB?
It could be transformed by the compiler into more something like ... if ((x.length <= y.length) || ((x.length - y.length) < 100)) ... -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Nov 28 2008
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Derek Parnell wrote:
 On Fri, 28 Nov 2008 17:09:25 +0100, Don wrote:
 
 It's close, but how can code such as:

 if (x.length - y.length < 100) ...

 be correct in the presence of length > 2GB?
It could be transformed by the compiler into more something like ... if ((x.length <= y.length) || ((x.length - y.length) < 100)) ...
Then it'd have different behavior from ---- auto diff = x.length - y.length; if (diff < 100) ... ---- This seems like a *bad* thing...
Nov 28 2008
parent Derek Parnell <derek psych.ward> writes:
On Sat, 29 Nov 2008 01:17:27 +0100, Frits van Bommel wrote:


 Then it'd have different behavior from
 ----
    auto diff = x.length - y.length;
    if (diff < 100) ...
 ----
 
 This seems like a *bad* thing...
I see the problem a little differently. To me, "x.length - y.length" is ambiguous and thus meaningless. The ambiguity is are you after the difference between two values or are you after the value required to add to x.length to get to y.length? These are not necessarily the same thing. The difference is always positive, as in the difference between the length of X and length of Y is 4. The answer tells us the difference between two lengths but not of course which is the smaller. So it all depends on what you are trying to find out. And note that the difference is not a length because it is not associated with any specific array. So having looked at it like this, I'm now inclined to consider that the 'diff' being declared here should be a signed type and, if possible, have more bits than '.length'. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Nov 28 2008
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
Don wrote:
 
 length is uint
 byte[].length can exceed 2GB, and code is correct when it does
 uint - uint is an int (or even, can implicitly convert to int)
 
 As far as I can tell, at least one of these has to go.
This is why I never understood ptrdiff_t in C. Having to choose between a signed value and narrower range vs. unsigned and sufficient range just stinks. Sean
Nov 28 2008
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 At the moment, we're in limbo regarding the decision to go forward with
 this. Walter, as many good long-time C programmers, knows the abusive
 unsigned rule so well he's not hurt by it and consequently has little
 incentive to see it as a problem. I have had to teach C and C++ to young
 students coming from Java introductory courses and have a more
 up-to-date perspective on the dangers.
I'll address your actual suggestion separately, but personally, I always build C/C++ code at the max warning level, and treat warnings as errors. This typically catches all signed-unsigned interactions and requires me to add a cast for the build to succeed. The advantage of this is that if I see a cast in my code then I know that the statement is deliberate rather than accidental. I would wholeheartedly support such an approach in D as well, though I can see how this may not be terribly appealing to some experienced C/C++ programmers. Sean
Nov 25 2008
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.
 
 A classic problem with C and C++ integer arithmetic is that any 
 operation involving at least an unsigned integral receives automatically 
 an unsigned type, regardless of how silly that actually is, 
 semantically. About the only advantage of this rule is that it's simple. 
 IMHO it only has disadvantages from then on.
 
 The following operations suffer from the "abusive unsigned syndrome" (u 
 is an unsigned integral, i is a signed integral):
 
 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u
I think that most of these problems are caused by C enforcing a foolish consitency between literals and variables. The idea that literals like '0' and '1' are of type int is absurd, and has caused a torrent of problems. '0' is just '0'. uint a = 1; does NOT contain an 'implicit conversion from int to uint', any more than there are implicit conversions from naturals to integers in mathematics. So I really like the polysemous types idea. For example, when is it reasonable to use -u? It's useful with literals like uint a = -1u; which is equivalent to uint a = 0xFFFF_FFFF. Anywhere else, it's probably a bug. My suspicion is, that if you allowed all signed-unsigned operations when at least one was a literal, and made everything else illegal, you'd fix most of the problems. In particular, there'd be a big reduction in people abusing 'uint' as a primitive range-limited int. Although it would be nice to have a type which was range-limited, 'uint' doesn't do it. Instead, it guarantees the number is between 0 and int.max*2+1 inclusive. Allowing mixed operations encourages programmers to focus the benefit of 'the lower bound is zero!' while forgetting that there is an enormous downside ('I'm saying that this could be larger than int.max!') Interestingly, none of these problems exist in assembly language programming, where every arithmetic instruction affects the overflow flag (for signed operations) as well as the carry flag (for unsigned).
Nov 26 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Andrei Alexandrescu wrote:
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should 
 have the same semantics.

 A classic problem with C and C++ integer arithmetic is that any 
 operation involving at least an unsigned integral receives 
 automatically an unsigned type, regardless of how silly that actually 
 is, semantically. About the only advantage of this rule is that it's 
 simple. IMHO it only has disadvantages from then on.

 The following operations suffer from the "abusive unsigned syndrome" 
 (u is an unsigned integral, i is a signed integral):

 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u
I think that most of these problems are caused by C enforcing a foolish consitency between literals and variables. The idea that literals like '0' and '1' are of type int is absurd, and has caused a torrent of problems. '0' is just '0'. uint a = 1; does NOT contain an 'implicit conversion from int to uint', any more than there are implicit conversions from naturals to integers in mathematics. So I really like the polysemous types idea.
Yah, polysemy will take care of the constants. It's also rather easy to implement for them.
 For example, when is it reasonable to use -u?
 It's useful with literals like
 uint a = -1u; which is equivalent to uint a = 0xFFFF_FFFF.
 Anywhere else, it's probably a bug.
Maybe not even for constants as all uses of -u can be easily converted in ~u + 1. I'd gladly agree to disallow -u entirely.
 My suspicion is, that if you allowed all signed-unsigned operations when 
 at least one was a literal, and made everything else illegal, you'd fix 
 most of the problems. In particular, there'd be a big reduction in 
 people abusing 'uint' as a primitive range-limited int.
Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable. Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.
 Although it would be nice to have a type which was range-limited, 'uint' 
 doesn't do it. Instead, it guarantees the number is between 0 and 
 int.max*2+1 inclusive. Allowing mixed operations encourages programmers 
 to focus the benefit of 'the lower bound is zero!' while forgetting that 
 there is an enormous downside ('I'm saying that this could be larger 
 than int.max!')
I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.
 Interestingly, none of these problems exist in assembly language 
 programming, where every arithmetic instruction affects the overflow 
 flag (for signed operations) as well as the carry flag (for unsigned).
They do exist. You need to use imul/idiv vs. mul/div depending on what signedness your operators have. Andrei
Nov 26 2008
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 
 Notice that the fact that one operand is a literal does not solve all of 
 the problems I mentioned. There is for example no progress in typing u1 
 - u2 appropriately.
What /is/ the appropriate type here? For example: uint a = uint.max; uint b = 0; uint c = uint.max - 1; int x = a - b; // wrong, should be uint uint y = c - a; // wrong, should be int I don't see any way to reliably produce a "safe" result at the language level. Sean
Nov 26 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Notice that the fact that one operand is a literal does not solve all 
 of the problems I mentioned. There is for example no progress in 
 typing u1 - u2 appropriately.
What /is/ the appropriate type here? For example: uint a = uint.max; uint b = 0; uint c = uint.max - 1; int x = a - b; // wrong, should be uint uint y = c - a; // wrong, should be int I don't see any way to reliably produce a "safe" result at the language level.
There are several schools of thought (for the lack of a better phrase): 1. The Purist Mathematician: We want unsigned to approximate natural numbers, natural numbers aren't closed for subtraction, therefore u1 - u2 should be disallowed. 2. The Practical Mathematician: we want unsigned to approximate natural numbers and natural numbers aren't closed for subtraction but closed for a subset satisfying u1 >= u2. We can rely on the programmer to check the condition before, and fall back on modulo difference when the condition isn't satisfied. They'll understand. 3. The C Veteran: Everything should be allowed. And when unsigned is within a mile, the type is unsigned. I'll take care of the rest. 4. The Assembly Programmer: Use whatever type you want. The assembly language operation for subtraction is the same. 5. The Dynamic Language Fan: Allow whatever and check it dynamically. 6. The Static Typing Nut: Use some scheme to magically weed out 73.56% mistakes and disallow only 14.95% valid uses. Your example is in fact perfect. It shows how the result of a subtraction has ultimately its fate decided by case-by-case use, not picked properly by a rule. The example perfectly underlines the advantage of my scheme: the decision of how to type u1 - u2 is left to the only entity able to account: the user of the operation. Of course there remains the question, should all that be implicit or should the user employ more syntax to specify what they want? I don't know. Andrei
Nov 26 2008
next sibling parent reply Lars Kyllingstad <public kyllingen.NOSPAMnet> writes:
Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Notice that the fact that one operand is a literal does not solve all 
 of the problems I mentioned. There is for example no progress in 
 typing u1 - u2 appropriately.
What /is/ the appropriate type here? For example: uint a = uint.max; uint b = 0; uint c = uint.max - 1; int x = a - b; // wrong, should be uint uint y = c - a; // wrong, should be int I don't see any way to reliably produce a "safe" result at the language level.
There are several schools of thought (for the lack of a better phrase): 1. The Purist Mathematician: We want unsigned to approximate natural numbers, natural numbers aren't closed for subtraction, therefore u1 - u2 should be disallowed. 2. The Practical Mathematician: we want unsigned to approximate natural numbers and natural numbers aren't closed for subtraction but closed for a subset satisfying u1 >= u2. We can rely on the programmer to check the condition before, and fall back on modulo difference when the condition isn't satisfied. They'll understand.
How about 1.5, the Somewhat Practical but Still Purist Mathematician? He (that would be me) would like integral types called nint and nlong (the "n" standing for "natural"), which can hold numbers in the range (0, int.max) and (0, long.max), respectively. Such types would have to be stored as int/long, but the sign bit should be ignored/zero in all calculations. Hence any nint/nlong would be implicitly castable to int/long. Is this a possibility? As you say, natural numbers aren't closed under subtraction, so subtractions involving nint/nlong would have to yield an int/long result. In fact, if n1 and n2 are nints, one would be certain that n1-n2 never goes out of the range of an int. Thing is, whenever I use one of the unsigned types, it is because I need to make sure I'm working with nonnegative numbers, not because I need to work outside the ranges of the signed integral types. Other people obviously have other needs, though, so I'm not saying "let's toss uint and ulong out the window". -Lars
Nov 26 2008
parent Lars Kyllingstad <public kyllingen.NOSPAMnet> writes:
Lars Kyllingstad wrote:
 Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Notice that the fact that one operand is a literal does not solve 
 all of the problems I mentioned. There is for example no progress in 
 typing u1 - u2 appropriately.
What /is/ the appropriate type here? For example: uint a = uint.max; uint b = 0; uint c = uint.max - 1; int x = a - b; // wrong, should be uint uint y = c - a; // wrong, should be int I don't see any way to reliably produce a "safe" result at the language level.
There are several schools of thought (for the lack of a better phrase): 1. The Purist Mathematician: We want unsigned to approximate natural numbers, natural numbers aren't closed for subtraction, therefore u1 - u2 should be disallowed. 2. The Practical Mathematician: we want unsigned to approximate natural numbers and natural numbers aren't closed for subtraction but closed for a subset satisfying u1 >= u2. We can rely on the programmer to check the condition before, and fall back on modulo difference when the condition isn't satisfied. They'll understand.
How about 1.5, the Somewhat Practical but Still Purist Mathematician? He (that would be me) would like integral types called nint and nlong (the "n" standing for "natural"), which can hold numbers in the range (0, int.max) and (0, long.max), respectively. Such types would have to be stored as int/long, but the sign bit should be ignored/zero in all calculations. Hence any nint/nlong would be implicitly castable to int/long. Is this a possibility? As you say, natural numbers aren't closed under subtraction, so subtractions involving nint/nlong would have to yield an int/long result. In fact, if n1 and n2 are nints, one would be certain that n1-n2 never goes out of the range of an int. Thing is, whenever I use one of the unsigned types, it is because I need to make sure I'm working with nonnegative numbers, not because I need to work outside the ranges of the signed integral types. Other people obviously have other needs, though, so I'm not saying "let's toss uint and ulong out the window". -Lars
Another point: nint would also be implicitly castable to uint and so on, so making these types the standard choice of unsigned integers in Phobos shouldn't cause too much breakage. -Lars
Nov 26 2008
prev sibling parent reply Kagamin <spam here.lot> writes:
Andrei Alexandrescu Wrote:

 There are several schools of thought (for the lack of a better phrase):
 
 1. The Purist Mathematician: We want unsigned to approximate natural 
 numbers, natural numbers aren't closed for subtraction, therefore u1 - 
 u2 should be disallowed.
I thought, mathematics doesn't distinguish between, say, natural 5, integral 5 and real 5. N, Z and R are sets, not types of numbers. There is even notion of equivalence class to deem numbers with different representation as the same (not just equal).
Nov 27 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Kagamin wrote:
 Andrei Alexandrescu Wrote:
 
 There are several schools of thought (for the lack of a better
 phrase):
 
 1. The Purist Mathematician: We want unsigned to approximate
 natural numbers, natural numbers aren't closed for subtraction,
 therefore u1 - u2 should be disallowed.
I thought, mathematics doesn't distinguish between, say, natural 5, integral 5 and real 5. N, Z and R are sets, not types of numbers. There is even notion of equivalence class to deem numbers with different representation as the same (not just equal).
Right, but the notion of set closedness for an operation comes from math. Andrei
Nov 27 2008
prev sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:

 Don wrote:
 My suspicion is, that if you allowed all signed-unsigned operations when 
 at least one was a literal, and made everything else illegal, you'd fix 
 most of the problems. In particular, there'd be a big reduction in 
 people abusing 'uint' as a primitive range-limited int.
Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable. Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.
 Although it would be nice to have a type which was range-limited, 'uint' 
 doesn't do it. Instead, it guarantees the number is between 0 and 
 int.max*2+1 inclusive. Allowing mixed operations encourages programmers 
 to focus the benefit of 'the lower bound is zero!' while forgetting that 
 there is an enormous downside ('I'm saying that this could be larger 
 than int.max!')
I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.
I'm totally with Don here. In math, natural numbers are a subset if integers. But uint is not a subset of int. If it were, most of the problems would vanish. So it's probably feasible to ban uint from SafeD, implement natural numbers by some other means, and leave uint for low-level wizardry.
Nov 26 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:
 
 Don wrote:
 My suspicion is, that if you allowed all signed-unsigned operations when 
 at least one was a literal, and made everything else illegal, you'd fix 
 most of the problems. In particular, there'd be a big reduction in 
 people abusing 'uint' as a primitive range-limited int.
Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable. Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.
 Although it would be nice to have a type which was range-limited, 'uint' 
 doesn't do it. Instead, it guarantees the number is between 0 and 
 int.max*2+1 inclusive. Allowing mixed operations encourages programmers 
 to focus the benefit of 'the lower bound is zero!' while forgetting that 
 there is an enormous downside ('I'm saying that this could be larger 
 than int.max!')
I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.
I'm totally with Don here. In math, natural numbers are a subset if integers. But uint is not a subset of int. If it were, most of the problems would vanish. So it's probably feasible to ban uint from SafeD, implement natural numbers by some other means, and leave uint for low-level wizardry.
That's also a possibility - consider unsigned types just "bags of bits" and disallow most arithmetic for them. They could actually be eliminated entirely from the core language because they can be implemented as a library. I'm not sure how that would feel like. I guess length would return an int in that case? Andrei
Nov 26 2008
next sibling parent Sergey Gromov <snake.scaly gmail.com> writes:
Wed, 26 Nov 2008 15:57:55 -0600, Andrei Alexandrescu wrote:

 Sergey Gromov wrote:
 Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:
 
 Don wrote:
 My suspicion is, that if you allowed all signed-unsigned operations when 
 at least one was a literal, and made everything else illegal, you'd fix 
 most of the problems. In particular, there'd be a big reduction in 
 people abusing 'uint' as a primitive range-limited int.
Well, part of my attempt is to transform that abuse into legit use. In other words, I do want to allow people to consider uint a reasonable model of natural numbers. It can't be perfect, but I believe we can make it reasonable. Notice that the fact that one operand is a literal does not solve all of the problems I mentioned. There is for example no progress in typing u1 - u2 appropriately.
 Although it would be nice to have a type which was range-limited, 'uint' 
 doesn't do it. Instead, it guarantees the number is between 0 and 
 int.max*2+1 inclusive. Allowing mixed operations encourages programmers 
 to focus the benefit of 'the lower bound is zero!' while forgetting that 
 there is an enormous downside ('I'm saying that this could be larger 
 than int.max!')
I'm not sure I understand this part. To me, the larger problem is underflow, e.g. when subtracting two small uints results in a large uint.
I'm totally with Don here. In math, natural numbers are a subset if integers. But uint is not a subset of int. If it were, most of the problems would vanish. So it's probably feasible to ban uint from SafeD, implement natural numbers by some other means, and leave uint for low-level wizardry.
That's also a possibility - consider unsigned types just "bags of bits" and disallow most arithmetic for them. They could actually be eliminated entirely from the core language because they can be implemented as a library. I'm not sure how that would feel like. I guess length would return an int in that case?
I guess so. Actually, simply disallowing signed<=>unsigned cast and making length signed would force most people to abandon unsigned types. And moving unsgned types documentation in a separate chapter would warn newcomers about their special status. Not a lot of changes on the compiler side, mostly throwing stuff away.
Nov 26 2008
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 That's also a possibility - consider unsigned types just "bags of bits" 
 and disallow most arithmetic for them. They could actually be eliminated 
 entirely from the core language because they can be implemented as a 
 library. I'm not sure how that would feel like.
 
 I guess length would return an int in that case?
I don't know what the solution is, but I am very happy to see that in this newsgroup there are people willing to reconsider such basic things, to try to improve the language. Most ideas turn out to be wrong, but if you aren't bold enough to consider them, there will no improvements :-) In my programs I use use unsigned integers and unsigned longs as: - bitfields, a single size_t, for example to represent a small set of items. - bitarrays, in an array of size_t, to represent a larger set, to have array of bit flags, etc. - to pack small variables into a uint, size_t, etc, for example use the first 5 bits to represent a, the following 2 bits to represent b, etc. In such situation I have never pack such variables into a signed int. - when I need very large integer values, but this has to be done with care, because they can't be converted back to ints. - I'd also like to use unsigned ints to denote that for example a function takes a nonnegative argument. I used to do this in Delphi, but I have seen it's too much unsafe in D, so now in D I prefer to use ints and then inside the function test for a negative argument and throw an exception (generally I don't use an assert for this but in the most speed critical situations). - I use unsigned bytes in some situations, now and then. I don't use signed bytes anymore, I used to use them for 8 bit digital audio, but not anymore. Now 16 bit signed audio is the norm (a short) or even 24 bit (I have created a slow 24 bit value time ago). - Probably there are few other situations, for example I think I've used an ushort once, but not many of them. Bye, bearophile
Nov 26 2008
parent reply Kagamin <spam here.lot> writes:
bearophile Wrote:

 In my programs I use use unsigned integers and unsigned longs as:
 - bitfields, a single size_t, for example to represent a small set of items.
 - bitarrays, in an array of size_t, to represent a larger set, to have array
of bit flags, etc.
 - to pack small variables into a uint, size_t, etc, for example use the first
5 bits to represent a, the following 2 bits to represent b, etc. In such
situation I have never pack such variables into a signed int.
I think, signed ints can hold bits as gracefully as unsigned ones.
 - when I need very large integer values, but this has to be done with care,
because they can't be converted back to ints.
I don't think that large integers know or respect computers-specific integers limits. They just get larger and larger.
 - Probably there are few other situations, for example I think I've used an
ushort once, but not many of them.
legacy technologies tend to use unsigneds intensively and people got used to unsigned chars (for comparison and character maps).
Nov 27 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Kagamin wrote:
 bearophile Wrote:
 
 In my programs I use use unsigned integers and unsigned longs as: -
 bitfields, a single size_t, for example to represent a small set of
 items. - bitarrays, in an array of size_t, to represent a larger
 set, to have array of bit flags, etc. - to pack small variables
 into a uint, size_t, etc, for example use the first 5 bits to
 represent a, the following 2 bits to represent b, etc. In such
 situation I have never pack such variables into a signed int.
I think, signed ints can hold bits as gracefully as unsigned ones.
Problem is there is an odd jump whenever the sign bit gets into play. An expert programmer can easily deal with that, but it's rather tricky.
 - when I need very large integer values, but this has to be done
 with care, because they can't be converted back to ints.
I don't think that large integers know or respect computers-specific integers limits. They just get larger and larger.
Often large integers hold counts or sizes of objects fitting in computer memory. There is a sense of completeness of a systems-level language in being able to use a native type to express any offset in memory. That's why it would be some of a bummer if we defined size_t as int on 32-bit systems: I, at least, would feel like giving something up. Andrei
Nov 27 2008
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Sergey Gromov wrote:
 So it's probably feasible to ban uint from
 SafeD, implement natural numbers by some other means, and leave uint for
 low-level wizardry.
SafeD is about memory safety, i.e. no corrupted memory. Dealing with integer overflows falls outside its agenda.
Nov 26 2008
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Don wrote:
 
 Although it would be nice to have a type which was range-limited, 'uint' 
 doesn't do it. Instead, it guarantees the number is between 0 and 
 int.max*2+1 inclusive. Allowing mixed operations encourages programmers 
 to focus the benefit of 'the lower bound is zero!' while forgetting that 
 there is an enormous downside ('I'm saying that this could be larger 
 than int.max!')
This inspired me to think about where I use uint and I realized that I don't. I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it. Sean
Nov 26 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Don wrote:
 Although it would be nice to have a type which was range-limited, 
 'uint' doesn't do it. Instead, it guarantees the number is between 0 
 and int.max*2+1 inclusive. Allowing mixed operations encourages 
 programmers to focus the benefit of 'the lower bound is zero!' while 
 forgetting that there is an enormous downside ('I'm saying that this 
 could be larger than int.max!')
This inspired me to think about where I use uint and I realized that I don't. I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it. Sean
For the record, I use unsigned types wherever there's a non-negative number involved (e.g. a count). So I'd be helped by better unsigned operations. I wonder how often these super-large arrays do occur on 32-bit systems. I do have programs that try to allocate as large a contiguous matrix as possible, but never sat down and tested whether a >2GB chunk was allocated on the Linux cluster I work on. I'm quite annoyed by this >2GB issue because it's a very practical and very rare issue in a weird contrast with a very principled issue (modeling natural numbers). Andrei
Nov 26 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Don wrote:
 Although it would be nice to have a type which was range-limited, 
 'uint' doesn't do it. Instead, it guarantees the number is between 0 
 and int.max*2+1 inclusive. Allowing mixed operations encourages 
 programmers to focus the benefit of 'the lower bound is zero!' while 
 forgetting that there is an enormous downside ('I'm saying that this 
 could be larger than int.max!')
This inspired me to think about where I use uint and I realized that I don't. I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it.
For the record, I use unsigned types wherever there's a non-negative number involved (e.g. a count). So I'd be helped by better unsigned operations.
To be fair, I generally use unsigned numbers for values that are logically always positive. These just tend to be sizes and counts in my code.
 I wonder how often these super-large arrays do occur on 32-bit systems. 
 I do have programs that try to allocate as large a contiguous matrix as 
 possible, but never sat down and tested whether a >2GB chunk was 
 allocated on the Linux cluster I work on. I'm quite annoyed by this >2GB 
 issue because it's a very practical and very rare issue in a weird 
 contrast with a very principled issue (modeling natural numbers).
Yeah, I have no idea how common they are, though my guess would be that they are rather uncommon. As a library programmer, I simply must assume that they are in use, which is why I use size_t as a matter of course. Sean
Nov 26 2008
parent reply "Denis Koroskin" <2korden gmail.com> writes:
27.11.08 в 03:46 Sean Kelly в своём письме писал(а):

 Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Don wrote:
 Although it would be nice to have a type which was range-limited,  
 'uint' doesn't do it. Instead, it guarantees the number is between 0  
 and int.max*2+1 inclusive. Allowing mixed operations encourages  
 programmers to focus the benefit of 'the lower bound is zero!' while  
 forgetting that there is an enormous downside ('I'm saying that this  
 could be larger than int.max!')
This inspired me to think about where I use uint and I realized that I don't. I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it.
For the record, I use unsigned types wherever there's a non-negative number involved (e.g. a count). So I'd be helped by better unsigned operations.
To be fair, I generally use unsigned numbers for values that are logically always positive. These just tend to be sizes and counts in my code.
 I wonder how often these super-large arrays do occur on 32-bit systems.  
 I do have programs that try to allocate as large a contiguous matrix as  
 possible, but never sat down and tested whether a >2GB chunk was  
 allocated on the Linux cluster I work on. I'm quite annoyed by this  
2GB issue because it's a very practical and very rare issue in a weird  
contrast with a very principled issue (modeling natural numbers).
Yeah, I have no idea how common they are, though my guess would be that they are rather uncommon. As a library programmer, I simply must assume that they are in use, which is why I use size_t as a matter of course. Sean
If they can be more than 2Gb, why can't they be more than 4GB? It is dangerous to assume that they won't, that's why uint is dangerous. You exchange one additional bit of information for safety, this is wrong. Soon enough we won't use uints the same way we don't use ushorts (I should have asked if anyone uses ushort these day first, but there is so little gain to use ushort as opposed to short or int that I consider it impractical). 64bit era will give us 64bit pointers and 64 bit counters. Do you think you will prefer ulong over long for an additional bit? You really shoudn't. My proposal Short summary: - Disallow bitwise operations on both signed types and unsigned types, allow arithmetic operations - Discourage usage of unsigned types. Introduce bits8, bits16, bits32 and bits64 as a replacement - Disallow arithmetic operations on bits* types, allow bitwise operations on them - Disallow mixed-type operations (compare, add, sub, mul and div) - Disallow implicit casts between all types - Use int and long (or ranged types) for length and indices with runtime checks (a.length-- is always dangerous no mater what CT checks you will make). - Add type constructors for int/uint/etc: "auto x = int(int.max + 1);" throws at run-time The two most common uses of uints are: 0) Bitfields or masks, packed values and hexademical constants (bitfields later on) 1) Numbers that can't be negative (counters, sizes/lengths etc) Bitfields Bitfields are handy, and using of an unsigned type over a signed is surely preferable. Most common operations on bitfields are bitwise AND, OR, (R/L)SHIFT and XOR. You shouldn't substruct from or add to them, it is an error in most cases. This is what new bits8, bits16, bits32 and bits64 types should be used for: bits32 argbColor; int alphaShift = 24; // any type here, actually // shift bits32 alphaMask = (0xFF << alphaShift); // 0xFF is of type bits8 auto value2 = value1 & mask; // all 3 are of type bits* // you can only shift bits, result is in bits, too, i.e. the following is incorrect: int i = -42; int x = (i << 8); // An error // 1) can't shift value of type int // 2) can't assign valus of type bits32 to variable of type int // ubyte is still handy sometimes (color should belong to [0..255] range) auto red = (argbColor & alphaMask) >> alphaShift; // result is in bits32, use explicit cast to convert it to target data type: ubyte red = cast(ubyte)((argbColor & alphaMask) >> alphaShift); // Alternatively: ubyte alpha = ubyte((argbColor & alphaMask) >> alphaShift); Type constructor throws an error if source value (which is of type bits32 in this example) can't be stored in ubyte. This might be a replacement for signed/unsigned methods. int i = 0xFFFFFFFF; // an error, can't convert value of type bits32 to variable of type int int i = int.max + 1; // ok int i = int(int.max + 1); // an exception is raised at runtime int i = 0xABCD - 0xDCBA; // not allowed. Add explicit casts auto u = cast(uint)0xABCD - cast(uint)0xDCBA; // result type is uint, no checks for overflow auto i = cast(int)0xABCD - cast(int)0xDCBA; // result type is int, no checks for overflow auto e = cast(uint)0xABCD - cast(int)0xDCBA; // an error, can't substruct int from uint // type ctors in action: auto i = int(cast(int)0xABCD - cast(int)0xDCBA); // result type is int, an exception on overflow auto u = int(cast(uint)0xABCD - cast(uint)0xDCBA); // same here for uint Non-negative values Just use int/long. Or some ranged type ([0..short.max], [0..int.max], [0..long.max]) could be used as well. A library type, perhaps. Let's call it nshort/nint/nlong. It should have the same set of operations as short/int/long but makes additional checks. Throws on under- and overflow. int x = 42; nint nx = x; // ok nx = -x; // throws nx = int.max; // ok ++nx; // throws nx = 0; --nx; // throws nx = 0; nint ny = 42; nx = ny; // no checking is done int y = ny; // no checking is done, either short s = ny; // error, cast needed short s = cast(short)ny; // never throws short s = short(ny); // might throw
Nov 26 2008
parent Sean Kelly <sean invisibleduck.org> writes:
Denis Koroskin wrote:
 27.11.08 в 03:46 Sean Kelly в своём письме писал(а):
 
 Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Don wrote:
 Although it would be nice to have a type which was range-limited, 
 'uint' doesn't do it. Instead, it guarantees the number is between 
 0 and int.max*2+1 inclusive. Allowing mixed operations encourages 
 programmers to focus the benefit of 'the lower bound is zero!' 
 while forgetting that there is an enormous downside ('I'm saying 
 that this could be larger than int.max!')
This inspired me to think about where I use uint and I realized that I don't. I use size_t for size/length representations (largely because sizes can theoretically be >2GB on a 32-bit system), and ubyte for bit-level stuff, but that's it.
For the record, I use unsigned types wherever there's a non-negative number involved (e.g. a count). So I'd be helped by better unsigned operations.
To be fair, I generally use unsigned numbers for values that are logically always positive. These just tend to be sizes and counts in my code.
 I wonder how often these super-large arrays do occur on 32-bit 
 systems. I do have programs that try to allocate as large a 
 contiguous matrix as possible, but never sat down and tested whether 
 a >2GB chunk was allocated on the Linux cluster I work on. I'm quite 
 annoyed by this >2GB issue because it's a very practical and very 
 rare issue in a weird contrast with a very principled issue (modeling 
 natural numbers).
Yeah, I have no idea how common they are, though my guess would be that they are rather uncommon. As a library programmer, I simply must assume that they are in use, which is why I use size_t as a matter of course.
If they can be more than 2Gb, why can't they be more than 4GB? It is dangerous to assume that they won't, that's why uint is dangerous. You exchange one additional bit of information for safety, this is wrong.
Bigger than 4GB on a 32-bit system? Files perhaps, but I'm talking about memory ranges here.
 Soon enough we won't use uints the same way we don't use ushorts (I 
 should have asked if anyone uses ushort these day first, but there is so 
 little gain to use  ushort as opposed to short or int that I consider it 
 impractical). 64bit era will give us 64bit pointers and 64 bit counters. 
 Do you think you will prefer ulong over long for an additional bit? You 
 really shoudn't.
long vs. ulong for sizes is less of an issue, because we're a long way away from running against the limitations of a 63-bit size value. The point of size_t to me, however, is that it scales automatically, so if I write array operations using size_t then I can be sure they will work on both a 32 and 64-bit system. I do like Don's point about unsigned really meaning "unsigned" however, rather than "positive." I clearly use unsigned numbers for both, even if I flag the "positive" uses via type alias such as size_t. In C/C++ I rely on compiler warnings to trap the sort of mistakes we're talking about here, but I'd love a more logically sound solution if one could be found. Sean
Nov 27 2008
prev sibling next sibling parent reply Kagamin <spam here.lot> writes:
Andrei Alexandrescu Wrote:

 I also know seasoned programmers who had 
 no idea that -u compiles and that it also oddly returns an unsigned type.
1) I see no danger here. 2) I doubt this proposal solves the danger, wheatever it is. 3) -u is funny and looks like wrong desing to me.
Nov 26 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Kagamin wrote:
 Andrei Alexandrescu Wrote:
 
 I also know seasoned programmers who had 
 no idea that -u compiles and that it also oddly returns an unsigned type.
1) I see no danger here. 2) I doubt this proposal solves the danger, wheatever it is. 3) -u is funny and looks like wrong desing to me.
I didn't want runtime checks inserted, just to tighten compilation rules. Andrei
Nov 26 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 I didn't want runtime checks inserted, just to tighten compilation rules.
The compiler may use both :-) Bye, bearophile
Nov 26 2008
prev sibling next sibling parent reply Kagamin <spam here.lot> writes:
bearophile Wrote:

 One solution is to "disable" some of the more error-prone syntax allowed in C,
turning it into a compilation error. For example I have seen newbies write bugs
caused by leaving & where a && was necessary. In such case just adopting "and"
and making "&&" a syntax error solves the problem and doesn't lead to bugs when
you convert C code to D (you just use a search&replace, replacing && with and
on the code).
Why do you want to turn D into Python? You already has one. Just write in python, migrate others to it and be done with C family.
Nov 26 2008
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
bearophile:
 One solution is to "disable" some of the more error-prone syntax allowed in C,
turning it into a compilation error. For example I have seen newbies write bugs
caused by leaving & where a && was necessary. In such case just adopting "and"
and making "&&" a syntax error solves the problem and doesn't lead to bugs when
you convert C code to D (you just use a search&replace, replacing && with and
on the code).<<
Kagamin:
Why do you want to turn D into Python? You already has one. Just write in
python, migrate others to it and be done with C family.<
The mistake I have shown of using "&&" instead of "&" or vice-versa, and "|" instead of "||" and vice-versa comes from code I have seen written by new programmers at he University. But no only newbies can put such bugs, see for example this post: http://gcc.gnu.org/ml/gcc-patches/2004-10/msg00990.html It says:
People sometimes code "a && MASK" when they intended "a & MASK". gcc itself
does not seem to have examples of this, here are some in the linux-2.4.20
kernel:<
I want to copy the syntax that leads to less bugs and more readability, and often Python gives good examples, because it's often well designed. Note that this change doesn't lead to less performance of D code. Also note that G++ already allows you to write programs with and, or, not, xor, etc. The following code compiles and run correctly, so instead of Python you may also say I want to copy G++: #include "stdio.h" #include "stdlib.h" int main(int argc, char** argv) { int b1 = argc >= 2 ? atoi(argv[1]) : 0; int b2 = argc >= 3 ? atoi(argv[2]) : 0; printf("%d\n", b1 and b2); return 0; } That can be disabled with "-fno-operator-names" while the "-foperator-names" is enabled by default. So maybe the G++ designers agree with me, instead of you. Bye, bearophile
Nov 26 2008
next sibling parent Kagamin <spam here.lot> writes:
bearophile Wrote:

 Also note that G++ already allows you to write programs with and, or, not,
xor, etc. The following code compiles and run correctly, so instead of Python
you may also say I want to copy G++:
copying G++ is not always a good idea :) As I remember this alternative syntax is supported for compatibility with keyboards which don't have kinda exotic ~^&| characters. And I don't think that there is a method to make && a syntax error as you proposed.
Nov 26 2008
prev sibling parent reply Kagamin <spam here.lot> writes:
bearophile Wrote:

 http://gcc.gnu.org/ml/gcc-patches/2004-10/msg00990.html
 It says:
People sometimes code "a && MASK" when they intended "a & MASK". gcc itself
does not seem to have examples of this, here are some in the linux-2.4.20
kernel:<
that thread is about an extra compiler warning (which is always good), not about breaking C syntax.
Nov 26 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Kagamin:
that thread is about an extra compiler warning (which is always good), not
about breaking C syntax.<
You seem unaware of the current stance of Walter towards warnings. And please don't forget that D purposes are different from C ones (D is designed to be safer, especially if this has little or no costs), and that D comes after a long experience of coding in C, and that D runs on machine thousands of times faster than the original ones the C language was designed for (today having fast kernels in your program is more and more important. Less code uses most of the running time). And that thread was more generally an example that shows why that specific C syntax is error-prone, and it also explains why some languages, among them there's Python too but it's not the only one, have refused this specific C syntax. Note that there are several other C syntaxes/semantics that are error-prone, and thanks Walter D already fixes some of them, and I hope to see more improvements in the future.
And I don't think that there is a method to make && a syntax error as you
proposed.<
Keeping two syntaxes to do the same thing is a bad form of complexity. Generally it's better to have only one obvious way to do something :-) Bye, bearophile
Nov 26 2008
prev sibling parent "Nick Sabalausky" <a a.a> writes:
"Kagamin" <spam here.lot> wrote in message 
news:ggjcfg$fqq$1 digitalmars.com...
 bearophile Wrote:

 One solution is to "disable" some of the more error-prone syntax allowed 
 in C, turning it into a compilation error. For example I have seen 
 newbies write bugs caused by leaving & where a && was necessary. In such 
 case just adopting "and" and making "&&" a syntax error solves the 
 problem and doesn't lead to bugs when you convert C code to D (you just 
 use a search&replace, replacing && with and on the code).
Why do you want to turn D into Python? You already has one. Just write in python, migrate others to it and be done with C family.
Python has other issues.
Nov 26 2008
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 (3) u - u
Just a note here, because it seems to me you're confusing two issues with that "u - u" thing. The problem with "u - u" isn't one of unsigned vs. signed integers at all. It's a problem of possibly going out of range, a problem that can happen with any type but is more likely with unsigned integers since they're often near zero. If you want to attack that problem, I think it should be done in a coherent manner with other out-of-range issues. Going below uint.min for an uint or below int.min for an int should be handled the same way. Personally, I'd just add a compiler switch for runtime range checking (just as for array bound checking). Treating the result u - u as __intuint is dangerous: uint.max - 1U gives you a value which int cannot hold, but you'd allow it to convert implicitly and without warning to int? I don't like it. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 26 2008
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 (3) u - u
Just a note here, because it seems to me you're confusing two issues with that "u - u" thing. The problem with "u - u" isn't one of unsigned vs. signed integers at all. It's a problem of possibly going out of range, a problem that can happen with any type but is more likely with unsigned integers since they're often near zero.
It's also a problem of signedness, considering that int can hold the difference of two small unsigned integrals. So if the result is unsigned there may be overflow (I abusively call it "underflow"), but if the result is an int that overflow may be avoided, or a different overflow may occur.
 If you want to attack that problem, I think it should be done in a 
 coherent manner with other out-of-range issues. Going below uint.min for 
 an uint or below int.min for an int should be handled the same way. 
 Personally, I'd just add a compiler switch for runtime range checking 
 (just as for array bound checking).
 
 Treating the result u - u as __intuint is dangerous: uint.max - 1U gives 
 you a value which int cannot hold, but you'd allow it to convert 
 implicitly and without warning to int? I don't like it.
I understand. It's what I have so far, so I'm looking forward to better ideas. Resorting to runtime checks is always a possibility but I'd like to focus on the static checking aspect for now. Andrei
Nov 26 2008
prev sibling parent "Nick Sabalausky" <a a.a> writes:
"Michel Fortin" <michel.fortin michelf.com> wrote in message 
news:ggjpn4$1v0m$1 digitalmars.com...
 On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 (3) u - u
Just a note here, because it seems to me you're confusing two issues with that "u - u" thing. The problem with "u - u" isn't one of unsigned vs. signed integers at all. It's a problem of possibly going out of range, a problem that can happen with any type but is more likely with unsigned integers since they're often near zero. If you want to attack that problem, I think it should be done in a coherent manner with other out-of-range issues. Going below uint.min for an uint or below int.min for an int should be handled the same way. Personally, I'd just add a compiler switch for runtime range checking (just as for array bound checking).
I'd love to see D get the ability to turn on/off runtime range checking, but doing nothing more than a program-wide (or module-wide if compiling one-at-a-time) compiler switch is way too large-grained and blunt. I would checked(expr) unchecked(expr) checked { code } unchecked { code }
 Treating the result u - u as __intuint is dangerous: uint.max - 1U gives 
 you a value which int cannot hold, but you'd allow it to convert 
 implicitly and without warning to int? I don't like it.

 -- 
 Michel Fortin
 michel.fortin michelf.com
 http://michelf.com/
 
Nov 26 2008
prev sibling next sibling parent reply Tomas Lindquist Olsen <tomas famolsen.dk> writes:
I'm not really sure what I think about all this. I try to always insert 
assertions before operations like this, which makes me think the nicest 
solution would be if the compiler errors out if it detects a problematic 
expression that is unchecked...

uint diff(uint begin, uint end)
{
	return end - begin; // error
}


uint diff(uint begin, uint end)
{
	assert(begin <= end);
	return end - begin; // ok because of the assert
}


I'm not going to get into how this would be implemented in the compiler, 
  but it sure would be sweet :)
Nov 26 2008
parent Christopher Wright <dhasenan gmail.com> writes:
Tomas Lindquist Olsen wrote:
 I'm not really sure what I think about all this. I try to always insert 
 assertions before operations like this, which makes me think the nicest 
 solution would be if the compiler errors out if it detects a problematic 
 expression that is unchecked...
 
 uint diff(uint begin, uint end)
 {
     return end - begin; // error
 }
 
 
 uint diff(uint begin, uint end)
 {
     assert(begin <= end);
     return end - begin; // ok because of the assert
 }
 
 
 I'm not going to get into how this would be implemented in the compiler, 
  but it sure would be sweet :)
On the other hand, the CPU can report on integer overflow, so you could turn that into an exception if the expression doesn't include a cast.
Nov 26 2008
prev sibling next sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
The more I read about this, the more I am convinced that removing the  
following
- implicit int <-> uint conversion
- uint - uint (not 100% sure about this)
- mixed int / uint arithmetic
As well as changing array.length to int, would remove most problems.

If you desperately need a > 2^31 element array, having to roll your own is  
not the main problem.

The fact that the type of uint - uint could be int or uint depending on  
what the programmer wants, tells me that the programmer should be tasked  
with informing the compiler what he really wants - i.e. cast.

-- 
Simen
Nov 26 2008
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:

 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.
Interesting ... but I don't think that this should be the principle employed. If code is 'naughty' in C/C++ then D should not also produce the same results. I would propose that a better principle to be used would be that the compiler will not allow loss or distortion of information without the coder/reader being made aware of it.
 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u
Note that "(3) u - u" and "(6) -u" seem to be really a use of (4), namely "(-1 * u)". I am assming that there is no difference between 'unsigned' and 'positive', in so much as I am not treating 'unsigned' as 'sign unknown/irrelevant'. It seems to me that the issue then is not so much one of sign but of size. It needs an extra bit to hold the sign information thus a 32-bit unsigned value needs a minimum of 33 bits to convert it to a signed equivalent. In the types (1) - (4) above, I would have the compiler compute a signed type for these. Then if the target of the result is a signed type AND larger than the 'unsigned' portion used, then the complier would not have to complain. In every other case the complier should complain because of the potential for information loss. To avoid the complaint, the coder would need to either change the result type, the input types or add a 'message' to the compliler that in effects says "I know what I'm doing, ok?" - I suggest a cast would suffice. In those cases where the target type is not explicitly coded, such as using 'auto' or as a temporary value in an expression, the compiler should assume a signed type that is 'one step' larger than the 'unsigned' element in the expression. e.g. auto x = int * uint; ==> 'x' is long. If this causes code to be incompatible to C/C++, then it implies that the C/C++ code was poor (i.e. potential information loss) in the first place and deserves to be fixed up. The scenario (5) above should also include equality comparisions, and should cause the compiler to issue a message AND generate code like ... if (u < i) ====> if ( i < 0 ? false : u < cast(typeof(u))i) if (u <= i) ====> if ( i < 0 ? false : u <= cast(typeof(u))i) if (u = i) ====> if ( i < 0 ? false : u = cast(typeof(u))i) if (u >= i) ====> if ( i < 0 ? true : u >= cast(typeof(u))i) if (u > i) ====> if ( i < 0 ? true : u > cast(typeof(u))i) The coder should be able to avoid the message and the suboptimal generated code my adding a cast ... if (u < cast(typeof u)i) I am also assuming that syntax 'cast(unsigned-type)signed-type' is telling the complier to assume that the bits in the signed-value already represent a valid unsigned-value and so therefore the compiler should not generate code to 'transform' the signed-value bits to form an unsigned-value. To summarize, (1) Perpetuating poor quality C/C++ code should not be encouraged. (2) The compiler should help the coder be aware of potential information loss. (3) The coder should have mechanisms to override the compiler's concerns. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Nov 27 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Derek Parnell wrote:
 On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:
 
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.
Interesting ... but I don't think that this should be the principle employed. If code is 'naughty' in C/C++ then D should not also produce the same results. I would propose that a better principle to be used would be that the compiler will not allow loss or distortion of information without the coder/reader being made aware of it.
These two principle are not necessarily at odds with each other. The idea of being compatible with C and C++ is simple: if I paste a C function from somewhere into a D module, the function should either not compile, or compile and run with the same result. I think that's quite reasonable. So if the C code is behaving naughtily, D doesn't need to also behave naughty. It should just not compile.
 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u
Note that "(3) u - u" and "(6) -u" seem to be really a use of (4), namely "(-1 * u)".
Correct.
 I am assming that there is no difference between 'unsigned' and 'positive',
 in so much as I am not treating 'unsigned' as 'sign unknown/irrelevant'. 
 
 It seems to me that the issue then is not so much one of sign but of size.
 It needs an extra bit to hold the sign information thus a 32-bit unsigned
 value needs a minimum of 33 bits to convert it to a signed equivalent.
  
 In the types (1) - (4) above, I would have the compiler compute a signed
 type for these. Then if the target of the result is a signed type AND
 larger than the 'unsigned' portion used, then the complier would not have
 to complain. In every other case the complier should complain because of
 the potential for information loss. To avoid the complaint, the coder would
 need to either change the result type, the input types or add a 'message'
 to the compliler that in effects says "I know what I'm doing, ok?" - I
 suggest a cast would suffice.
 
 In those cases where the target type is not explicitly coded, such as using
 'auto' or as a temporary value in an expression, the compiler should assume
 a signed type that is 'one step' larger than the 'unsigned' element in the
 expression.
 
 e.g.
    auto x = int * uint; ==> 'x' is long.
I don't think this will fly with Walter.
 If this causes code to be incompatible to C/C++, then it implies that the
 C/C++ code was poor (i.e. potential information loss) in the first place
 and deserves to be fixed up.
I don't quite think so. As long as the values are within range, the multiplication is legit and efficient.
 The scenario (5) above should also include equality comparisions, and
 should cause the compiler to issue a message AND generate code like ...
 
    if (u < i)  ====> if ( i < 0 ? false : u < cast(typeof(u))i)
    if (u <= i) ====> if ( i < 0 ? false : u <= cast(typeof(u))i)
    if (u = i)  ====> if ( i < 0 ? false : u = cast(typeof(u))i)
    if (u >= i) ====> if ( i < 0 ? true  : u >= cast(typeof(u))i)
    if (u > i)  ====> if ( i < 0 ? true  : u > cast(typeof(u))i)
 
 The coder should be able to avoid the message and the suboptimal generated
 code my adding a cast ...
 
   if (u < cast(typeof u)i) 
Yah, comparisons need to be looked at too. Andrei
Nov 27 2008
parent reply Derek Parnell <derek psych.ward> writes:
On Thu, 27 Nov 2008 16:23:12 -0600, Andrei Alexandrescu wrote:

 Derek Parnell wrote:
 On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:
 
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.
Interesting ... but I don't think that this should be the principle employed. If code is 'naughty' in C/C++ then D should not also produce the same results. I would propose that a better principle to be used would be that the compiler will not allow loss or distortion of information without the coder/reader being made aware of it.
These two principle are not necessarily at odds with each other. The idea of being compatible with C and C++ is simple: if I paste a C function from somewhere into a D module, the function should either not compile, or compile and run with the same result. I think that's quite reasonable. So if the C code is behaving naughtily, D doesn't need to also behave naughty. It should just not compile.
I think we are saying the same thing. If the C code compiles AND if it has the potential to lose information then the D compiler should not compile it *if* the coder has not given explicit permission to the compiler to do so.
 In those cases where the target type is not explicitly coded, such as using
 'auto' or as a temporary value in an expression, the compiler should assume
 a signed type that is 'one step' larger than the 'unsigned' element in the
 expression.
 
 e.g.
    auto x = int * uint; ==> 'x' is long.
I don't think this will fly with Walter.
And that there is our single point of failure.
 If this causes code to be incompatible to C/C++, then it implies that the
 C/C++ code was poor (i.e. potential information loss) in the first place
 and deserves to be fixed up.
I don't quite think so. As long as the values are within range, the multiplication is legit and efficient.
Of course. *If* the compiler can determine that the result will not lose information when being used, then it is fine. However, that is not always going to be the case. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Nov 27 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Derek Parnell wrote:
 On Thu, 27 Nov 2008 16:23:12 -0600, Andrei Alexandrescu wrote:
 
 Derek Parnell wrote:
 On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:

 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.
Interesting ... but I don't think that this should be the principle employed. If code is 'naughty' in C/C++ then D should not also produce the same results. I would propose that a better principle to be used would be that the compiler will not allow loss or distortion of information without the coder/reader being made aware of it.
These two principle are not necessarily at odds with each other. The idea of being compatible with C and C++ is simple: if I paste a C function from somewhere into a D module, the function should either not compile, or compile and run with the same result. I think that's quite reasonable. So if the C code is behaving naughtily, D doesn't need to also behave naughty. It should just not compile.
I think we are saying the same thing. If the C code compiles AND if it has the potential to lose information then the D compiler should not compile it *if* the coder has not given explicit permission to the compiler to do so.
Oh, sorry. Yes, absolutely!
 In those cases where the target type is not explicitly coded, such as using
 'auto' or as a temporary value in an expression, the compiler should assume
 a signed type that is 'one step' larger than the 'unsigned' element in the
 expression.

 e.g.
    auto x = int * uint; ==> 'x' is long.
I don't think this will fly with Walter.
And that there is our single point of failure.
 If this causes code to be incompatible to C/C++, then it implies that the
 C/C++ code was poor (i.e. potential information loss) in the first place
 and deserves to be fixed up.
I don't quite think so. As long as the values are within range, the multiplication is legit and efficient.
Of course. *If* the compiler can determine that the result will not lose information when being used, then it is fine. However, that is not always going to be the case.
Well here are two objective at odds with each other. One is the systems-y level-y aspect: on 32-bit systems there is a 32-bit multiplication operation that ought to be mapped to naturally by the 32-bit D primitive. I think there is some good reason to expect that. Then there's also the argument you're making - and with which I agree - that 32-bit multiplication really yields a 64-bit value, so the type of the result should be long. But if we really start down that path, infinite-precision integrals are the only solution. Because when you multiply two longs, you'd need something even longer and so on. Anyhow, the ultimate reality is: we won't be able to satisfy every objective we have. We'll need to strike a good compromise. Andrei
Nov 27 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Some of the purposes of a good arithmetic are:
- To give the system programmer freedom, essentially to use all the speed and
flexibility of the CPU instructions.
- To allow fast-running code, it means that having ways to specify 32 or 64 bit
operations in a short way.
- To allow programs that aren't bug-prone, both with compile-time safeties and
where they aren't enough with run-time ones (array bounds, arithmetic overflow
among not-long types, etc).
- Allow more flexibility, coming from certain usages of multi-precision
integers.
- Good CommonLisp implementations are supposed to allow both fast code
(fixnums) and safe/multiprecision integers (and even untagged fixnums).


Andrei Alexandrescu:
 But if we really start down that path, infinite-precision integrals are 
 the only solution. Because when you multiply two longs, you'd need 
 something even longer and so on.
Well, having built-in multi-precision integer values isn't bad. You then need ways to specify where you want the compiler to use fixed length numbers, for more efficiency. Bye, bearophile
Nov 28 2008
prev sibling parent Kagamin <spam here.lot> writes:
Andrei Alexandrescu Wrote:

 Often large integers hold counts or sizes of objects fitting in computer 
 memory.
Yes, if that object is system-specific like size of allocated heap chunk. Business objects don't seem to respect system constraints (they are nearly storage-agnostic). Files are the good example.
 There is a sense of completeness of a systems-level language in 
 being able to use a native type to express any offset in memory. That's 
 why it would be some of a bummer if we defined size_t as int on 32-bit 
 systems: I, at least, would feel like giving something up.
Yes, giving somethink up always feels like giving something up. But can you rely on large numbers? I heard a story about program crash on attempt to allocate memory chunk larger than half address space. It was intended to be valid since there was enough address space, it turned out that one dll happened to be relocated at the middle of address space, so there was no continuous mem chunk of requested size.
Nov 28 2008