digitalmars.D - Treating the abusive unsigned syndrome

Andrei Alexandrescu (66/66) Nov 25 2008 D pursues compatibility with C and C++ in the following manner: if a

Denis Koroskin (8/74) Nov 25 2008 I think it's fine. That's the way the LLVM stores the integral values

Andrei Alexandrescu (7/104) Nov 25 2008 Yah, but at least you actively asked for an unsigned. Compare and

bearophile (17/26) Nov 25 2008 I didn't know of such "support" for C++ syntax too, isn't such "support"...

bearophile (4/5) Nov 25 2008 Oh, yes :-) and writing "lenght" instead of "lenght" is a common mistake...

Steven Schveighoffer (2/6) Nov 25 2008 lol!!!

bearophile (40/41) Nov 25 2008 I know, I know... :-) But when people do errors so often, the error is e...

Nick Sabalausky (8/58) Nov 25 2008 If we ever get extension methods, then maybe something along these lines...

KennyTM~ (3/74) Nov 25 2008 Already works:

Nick Sabalausky (5/79) Nov 26 2008 Oh, right. For some stupid reason I was forgetting that the param would

bearophile (4/7) Nov 26 2008 From the len() code I have posted you can see there are other places whe...

Andrei Alexandrescu (6/15) Nov 26 2008 I'm rather weary of a short and suggestive name that embodies a linear

bearophile (7/11) Nov 26 2008 I remember parts of that discussion, and I like your general rule, and I...

Andrei Alexandrescu (3/19) Nov 26 2008 If it's used often it shouldn't have linear complexity :o).

Christopher Wright (12/30) Nov 26 2008 My personal rules of optimization:

Kagamin (11/12) Nov 26 2008 hmm...

Andrei Alexandrescu (6/19) Nov 25 2008 It's worthwhile keeping length an unsigned type if we can convincingly

Andrei Alexandrescu (13/13) Nov 25 2008 I remembered a couple more details. The names bits8, bits16, bits32, and...

Steven Schveighoffer (14/26) Nov 25 2008 One other thing to contemplate:

Andrei Alexandrescu (10/43) Nov 25 2008 Good point. There's no (or not much) arithmetic mixing bits32 and some

Sergey Gromov (16/31) Nov 25 2008 I'll add more. :)

Andrei Alexandrescu (7/43) Nov 25 2008 Having semantics depend so heavily and confusingly on a compiler switch

Sergey Gromov (5/18) Nov 25 2008 One of us should be missing something. There was no 'different

Andrei Alexandrescu (3/22) Nov 25 2008 Sorry, I misunderstood.

Russell Lewis (20/103) Nov 25 2008 I'm of the opinion that we should make mixed-sign operations a

Andrei Alexandrescu (11/33) Nov 25 2008 (You may want to check your system's date, unless of course you traveled...

bearophile (5/10) Nov 25 2008 That can be solved making array.length signed.

Nick Sabalausky (26/34) Nov 25 2008 I disagree. If you start using that as a solution, then you may as well

Kagamin (2/7) Nov 26 2008 Well... cutting out range can be no problem, after all a thought was flo...

Daniel de Kok (8/16) Nov 25 2008 Is that conceptually clean/clear? (If so, I'd like to request an array o...
Ary Borenszweig (4/12) Nov 26 2008 I agree. I proposed this some time ago. For example, even though C# has

Sean Kelly (6/17) Nov 25 2008 Perhaps not, but the fact that constants are signed integers has been

Andrei Alexandrescu (9/26) Nov 25 2008 Well with constants we can do many tricks; I mentioned an extreme

Michel Fortin (24/34) Nov 26 2008 Then the problem is that integer literals are of a specific type. Just

Andrei Alexandrescu (16/49) Nov 26 2008 Well that at best takes care of _some_ operations involving constants,

Michel Fortin (34/55) Nov 26 2008 How does it not solve the problem. array.length is of type uint, 1 is

Don (10/41) Nov 26 2008 Actually, there's no solution.

Andrei Alexandrescu (47/95) Nov 26 2008 There is. We need to find the block of marble it's in and then chip the

Michel Fortin (19/33) Nov 26 2008 That's because you're relying on a specific behaviour for overflows and

Denis Koroskin (53/58) Nov 26 2008 Sure, it shouldn't compile. But explicit casting to either type won't

Andrei Alexandrescu (29/99) Nov 26 2008 But "silently" and "putting a cast" don't go together. It's the cast

Denis Koroskin (16/106) Nov 26 2008 Right, it is better. Problem is, you don't want to put checks like
Don (30/82) Nov 27 2008 Here I think we have a fundamental disagreement: what is an 'unsigned

Andrei Alexandrescu (10/54) Nov 27 2008 In fact we are in agreement. C tries to make it usable as both, and

Don (12/72) Nov 27 2008 Well, it does make unsigned numbers (case (B)) quite obscure and

Andrei Alexandrescu (21/97) Nov 27 2008 I think we're heading towards an impasse. We wouldn't want to make

KennyTM~ (4/111) Nov 27 2008 So you mean long * int (e.g. 1234567890123L * 2) will return an int

KennyTM~ (3/118) Nov 27 2008 Em, or do you mean the tightest type that can represent all possible

Andrei Alexandrescu (11/127) Nov 27 2008 The tightest type possible depends on the operation. In that doctrine,

Andrei Alexandrescu (24/156) Nov 27 2008 I just remembered a problem with simplemindedly going with the tightest

Michel Fortin (31/35) Nov 28 2008 I think that'd be a must. Otherwise how would you define your own

Don (25/130) Nov 28 2008 The problem with that, is that you're then forcing the 'unsigned is a

Andrei Alexandrescu (15/45) Nov 28 2008 Sounds good. One important consideration is that modulo arithmetic is

Don (12/66) Nov 28 2008 It's close, but how can code such as:

Andrei Alexandrescu (10/80) Nov 28 2008 Code may be riddled with subtraction of lengths, but seems to be working...

Don (6/89) Nov 28 2008 Yes. I think much existing code would fail with sizes over 2GB, though.

Fawzi Mohamed (12/42) Dec 01 2008 I found a couple of instances where to compare addresses simply a-b was

Derek Parnell (7/13) Nov 28 2008 It could be transformed by the compiler into more something like ...

Frits van Bommel (7/18) Nov 28 2008 Then it'd have different behavior from

Derek Parnell (17/24) Nov 28 2008 I see the problem a little differently. To me, "x.length - y.length" is

Sean Kelly (5/11) Nov 28 2008 This is why I never understood ptrdiff_t in C. Having to choose between...

Sean Kelly (10/16) Nov 25 2008 I'll address your actual suggestion separately, but personally, I always...
Don (26/46) Nov 26 2008 I think that most of these problems are caused by C enforcing a foolish

Andrei Alexandrescu (17/65) Nov 26 2008 Yah, polysemy will take care of the constants. It's also rather easy to

Sean Kelly (10/14) Nov 26 2008 What /is/ the appropriate type here? For example:

Andrei Alexandrescu (25/42) Nov 26 2008 There are several schools of thought (for the lack of a better phrase):

Lars Kyllingstad (18/48) Nov 26 2008 How about 1.5, the Somewhat Practical but Still Purist Mathematician? He...

Lars Kyllingstad (5/57) Nov 26 2008 Another point: nint would also be implicitly castable to uint and so on,...

Kagamin (2/7) Nov 27 2008 I thought, mathematics doesn't distinguish between, say, natural 5, inte...

Andrei Alexandrescu (3/16) Nov 27 2008 Right, but the notion of set closedness for an operation comes from math...

Sergey Gromov (6/30) Nov 26 2008 I'm totally with Don here. In math, natural numbers are a subset if

Andrei Alexandrescu (7/37) Nov 26 2008 That's also a possibility - consider unsigned types just "bags of bits"

Sergey Gromov (6/44) Nov 26 2008 I guess so. Actually, simply disallowing signed<=>unsigned cast and
bearophile (13/19) Nov 26 2008 I don't know what the solution is, but I am very happy to see that in th...

Kagamin (4/10) Nov 27 2008 I don't think that large integers know or respect computers-specific int...

Andrei Alexandrescu (9/25) Nov 27 2008 Problem is there is an odd jump whenever the sign bit gets into play. An...

Walter Bright (3/6) Nov 26 2008 SafeD is about memory safety, i.e. no corrupted memory. Dealing with

Sean Kelly (6/13) Nov 26 2008 This inspired me to think about where I use uint and I realized that I

Andrei Alexandrescu (11/27) Nov 26 2008 For the record, I use unsigned types wherever there's a non-negative

Sean Kelly (8/32) Nov 26 2008 To be fair, I generally use unsigned numbers for values that are

Denis Koroskin (89/120) Nov 26 2008 If they can be more than 2Gb, why can't they be more than 4GB? It is

Sean Kelly (15/62) Nov 27 2008 Bigger than 4GB on a 32-bit system? Files perhaps, but I'm talking

Kagamin (5/7) Nov 26 2008 1) I see no danger here.

Andrei Alexandrescu (3/12) Nov 26 2008 I didn't want runtime checks inserted, just to tighten compilation rules...

bearophile (4/5) Nov 26 2008 The compiler may use both :-)

Kagamin (2/3) Nov 26 2008 Why do you want to turn D into Python? You already has one. Just write i...

bearophile (19/22) Nov 26 2008 The mistake I have shown of using "&&" instead of "&" or vice-versa, and...

Kagamin (2/3) Nov 26 2008 copying G++ is not always a good idea :) As I remember this alternative ...
Kagamin (2/5) Nov 26 2008 that thread is about an extra compiler warning (which is always good), n...

bearophile (7/9) Nov 26 2008 You seem unaware of the current stance of Walter towards warnings. And p...

Nick Sabalausky (3/12) Nov 26 2008 Python has other issues.

Michel Fortin (19/20) Nov 26 2008 Just a note here, because it seems to me you're confusing two issues

Andrei Alexandrescu (10/29) Nov 26 2008 It's also a problem of signedness, considering that int can hold the
Nick Sabalausky (10/31) Nov 26 2008 I'd love to see D get the ability to turn on/off runtime range checking,...

Tomas Lindquist Olsen (15/15) Nov 26 2008 I'm not really sure what I think about all this. I try to always insert

Christopher Wright (3/23) Nov 26 2008 On the other hand, the CPU can report on integer overflow, so you could

Simen Kjaeraas (13/13) Nov 26 2008 The more I read about this, the more I am convinced that removing the
Derek Parnell (56/66) Nov 27 2008 Interesting ... but I don't think that this should be the principle

Andrei Alexandrescu (13/75) Nov 27 2008 These two principle are not necessarily at odds with each other. The

Derek Parnell (12/48) Nov 27 2008 I think we are saying the same thing. If the C code compiles AND if it h...

Andrei Alexandrescu (15/60) Nov 27 2008 Well here are two objective at odds with each other. One is the

bearophile (10/13) Nov 28 2008 Some of the purposes of a good arithmetic are:

Kagamin (3/9) Nov 28 2008 Yes, giving somethink up always feels like giving something up. But can ...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

D pursues compatibility with C and C++ in the following manner: if a 
code snippet compiles in both C and D or C++ and D, then it should have 
the same semantics.

A classic problem with C and C++ integer arithmetic is that any 
operation involving at least an unsigned integral receives automatically 
an unsigned type, regardless of how silly that actually is, 
semantically. About the only advantage of this rule is that it's simple. 
IMHO it only has disadvantages from then on.

The following operations suffer from the "abusive unsigned syndrome" (u 
is an unsigned integral, i is a signed integral):

(1) u + i, i + u
(2) u - i, i - u
(3) u - u
(4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
requires that these all return unsigned, ouch)
(5) u < i, i < u, u <= i etc. (all ordering comparisons)
(6) -u

Logic operations &, |, and ^ also yield unsigned, but such cases are 
less abusive because at least the operation wasn't arithmetic in the 
first place. Comparing for equality is also quite a conundrum - should 
minus two billion compare equal to 2_294_967_296? I'll ignore these for 
now and focus on (1) - (6).

So far we haven't found a solid solution to this problem that at the 
same time allows "good" code pass through, weeds out "bad" code, and is 
compatible with C and C++. The closest I got was to have the compiler 
define the following internal types:

__intuint
__longulong

I've called them "dual-signed integers" in the past, but let's try the 
shorter "undecided sign". Each of these is a subtype of both the signed 
and the unsigned integral in its name, e.g. __intuint is a subtype of 
both int and uint. (Originally I thought of defining __byteubyte and 
__shortushort as well but dropped them in the interest of simplicity.)

The sign-ambiguous operations (1) - (6) yield __intuint if no operand 
size was larger than 32 bits, and __longulong otherwise. Undecided sign 
types define their own operations. Let x and y be values of undecided 
sign. Then x + y, x - y, and -x also return a sign-ambiguous integral 
(the size is that of the largest operand). However, the other operators 
do not work on sign-ambiguous integrals, e.g. x / y would not compile 
because you must decide what sign x and y should have prior to invoking 
the operation. (Rationale: multiplication/division work differently 
depending on the signedness of their operands).

User code cannot define a symbol of sign-ambiguous type, e.g.

auto a = u + i;

would not compile. However, given that __intuint is a subtype of both 
int and uint, it can be freely converted to either whenever there's no 
ambiguity:

int a = u + i; // fine
uint b = u + i; // fine

The advantage of this scheme is that it weeds out many (most? all?) 
surprises and oddities caused by the abusive unsigned rule of C and C++. 
The disadvantage is that it is more complex and may surprise the novice 
in its own way by refusing to compile code that looks legit.

At the moment, we're in limbo regarding the decision to go forward with 
this. Walter, as many good long-time C programmers, knows the abusive 
unsigned rule so well he's not hurt by it and consequently has little 
incentive to see it as a problem. I have had to teach C and C++ to young 
students coming from Java introductory courses and have a more 
up-to-date perspective on the dangers. My strong belief is that we need 
to address this mess somehow, which type inference will only make more 
painful (in the hand of the beginner, auto can be a quite dangerous tool 
for wrong belief propagation). I also know seasoned programmers who had 
no idea that -u compiles and that it also oddly returns an unsigned type.

Your opinions, comments, and suggestions for improvements would as 
always be welcome.


Andrei

Nov 25 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 25 Nov 2008 18:59:01 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 D pursues compatibility with C and C++ in the following manner: if a  
 code snippet compiles in both C and D or C++ and D, then it should have  
 the same semantics.

 A classic problem with C and C++ integer arithmetic is that any  
 operation involving at least an unsigned integral receives automatically  
 an unsigned type, regardless of how silly that actually is,  
 semantically. About the only advantage of this rule is that it's simple.  
 IMHO it only has disadvantages from then on.

 The following operations suffer from the "abusive unsigned syndrome" (u  
 is an unsigned integral, i is a signed integral):

 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C  
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u

 Logic operations &, |, and ^ also yield unsigned, but such cases are  
 less abusive because at least the operation wasn't arithmetic in the  
 first place. Comparing for equality is also quite a conundrum - should  
 minus two billion compare equal to 2_294_967_296? I'll ignore these for  
 now and focus on (1) - (6).

 So far we haven't found a solid solution to this problem that at the  
 same time allows "good" code pass through, weeds out "bad" code, and is  
 compatible with C and C++. The closest I got was to have the compiler  
 define the following internal types:

 __intuint
 __longulong

 I've called them "dual-signed integers" in the past, but let's try the  
 shorter "undecided sign". Each of these is a subtype of both the signed  
 and the unsigned integral in its name, e.g. __intuint is a subtype of  
 both int and uint. (Originally I thought of defining __byteubyte and  
 __shortushort as well but dropped them in the interest of simplicity.)

 The sign-ambiguous operations (1) - (6) yield __intuint if no operand  
 size was larger than 32 bits, and __longulong otherwise. Undecided sign  
 types define their own operations. Let x and y be values of undecided  
 sign. Then x + y, x - y, and -x also return a sign-ambiguous integral  
 (the size is that of the largest operand). However, the other operators  
 do not work on sign-ambiguous integrals, e.g. x / y would not compile  
 because you must decide what sign x and y should have prior to invoking  
 the operation. (Rationale: multiplication/division work differently  
 depending on the signedness of their operands).

 User code cannot define a symbol of sign-ambiguous type, e.g.

 auto a = u + i;

 would not compile. However, given that __intuint is a subtype of both  
 int and uint, it can be freely converted to either whenever there's no  
 ambiguity:

 int a = u + i; // fine
 uint b = u + i; // fine

 The advantage of this scheme is that it weeds out many (most? all?)  
 surprises and oddities caused by the abusive unsigned rule of C and C++.  
 The disadvantage is that it is more complex and may surprise the novice  
 in its own way by refusing to compile code that looks legit.

 At the moment, we're in limbo regarding the decision to go forward with  
 this. Walter, as many good long-time C programmers, knows the abusive  
 unsigned rule so well he's not hurt by it and consequently has little  
 incentive to see it as a problem. I have had to teach C and C++ to young  
 students coming from Java introductory courses and have a more  
 up-to-date perspective on the dangers. My strong belief is that we need  
 to address this mess somehow, which type inference will only make more  
 painful (in the hand of the beginner, auto can be a quite dangerous tool  
 for wrong belief propagation). I also know seasoned programmers who had  
 no idea that -u compiles and that it also oddly returns an unsigned type.

 Your opinions, comments, and suggestions for improvements would as  
 always be welcome.


 Andrei

I think it's fine. That's the way the LLVM stores the integral values  
internally, IIRC.

But what is the type of -u? If it is undecided, then the following should  
compile:

uint u = 100;
uint s = -u; // undecided implicitly convertible to unsigned

Nov 25 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 On Tue, 25 Nov 2008 18:59:01 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should 
 have the same semantics.

 A classic problem with C and C++ integer arithmetic is that any 
 operation involving at least an unsigned integral receives 
 automatically an unsigned type, regardless of how silly that actually 
 is, semantically. About the only advantage of this rule is that it's 
 simple. IMHO it only has disadvantages from then on.

 The following operations suffer from the "abusive unsigned syndrome" 
 (u is an unsigned integral, i is a signed integral):

 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u

 Logic operations &, |, and ^ also yield unsigned, but such cases are 
 less abusive because at least the operation wasn't arithmetic in the 
 first place. Comparing for equality is also quite a conundrum - should 
 minus two billion compare equal to 2_294_967_296? I'll ignore these 
 for now and focus on (1) - (6).

 So far we haven't found a solid solution to this problem that at the 
 same time allows "good" code pass through, weeds out "bad" code, and 
 is compatible with C and C++. The closest I got was to have the 
 compiler define the following internal types:

 __intuint
 __longulong

 I've called them "dual-signed integers" in the past, but let's try the 
 shorter "undecided sign". Each of these is a subtype of both the 
 signed and the unsigned integral in its name, e.g. __intuint is a 
 subtype of both int and uint. (Originally I thought of defining 
 __byteubyte and __shortushort as well but dropped them in the interest 
 of simplicity.)

 The sign-ambiguous operations (1) - (6) yield __intuint if no operand 
 size was larger than 32 bits, and __longulong otherwise. Undecided 
 sign types define their own operations. Let x and y be values of 
 undecided sign. Then x + y, x - y, and -x also return a sign-ambiguous 
 integral (the size is that of the largest operand). However, the other 
 operators do not work on sign-ambiguous integrals, e.g. x / y would 
 not compile because you must decide what sign x and y should have 
 prior to invoking the operation. (Rationale: multiplication/division 
 work differently depending on the signedness of their operands).

 User code cannot define a symbol of sign-ambiguous type, e.g.

 auto a = u + i;

 would not compile. However, given that __intuint is a subtype of both 
 int and uint, it can be freely converted to either whenever there's no 
 ambiguity:

 int a = u + i; // fine
 uint b = u + i; // fine

 The advantage of this scheme is that it weeds out many (most? all?) 
 surprises and oddities caused by the abusive unsigned rule of C and 
 C++. The disadvantage is that it is more complex and may surprise the 
 novice in its own way by refusing to compile code that looks legit.

 At the moment, we're in limbo regarding the decision to go forward 
 with this. Walter, as many good long-time C programmers, knows the 
 abusive unsigned rule so well he's not hurt by it and consequently has 
 little incentive to see it as a problem. I have had to teach C and C++ 
 to young students coming from Java introductory courses and have a 
 more up-to-date perspective on the dangers. My strong belief is that 
 we need to address this mess somehow, which type inference will only 
 make more painful (in the hand of the beginner, auto can be a quite 
 dangerous tool for wrong belief propagation). I also know seasoned 
 programmers who had no idea that -u compiles and that it also oddly 
 returns an unsigned type.

 Your opinions, comments, and suggestions for improvements would as 
 always be welcome.


 Andrei

 
 I think it's fine. That's the way the LLVM stores the integral values 
 internally, IIRC.
 
 But what is the type of -u? If it is undecided, then the following 
 should compile:
 
 uint u = 100;
 uint s = -u; // undecided implicitly convertible to unsigned

Yah, but at least you actively asked for an unsigned. Compare and 
contrast with surprises such as:

uint a = 5;
writeln(-a); // this won't print -5

Such code would be disallowed in the undecided-sign regime.


Andrei

Nov 25 2008

bearophile <bearophileHUGS lycos.com> writes:

Few general comments.

Andrei Alexandrescu:

 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.

I didn't know of such "support" for C++ syntax too, isn't such "support" for C
syntax only? D has very little to share with C++.

This rule is good because you can take a piece of C code and convert it to D
with less work and fewer surprises. I have already translated large pieces of C
code to D, so I appreciate this.

But in several things C syntax and semantics is too much error prone or
"wrong", so sometimes it can also become a significant disadvantage for a
language like D that tries to be much less error-prone than C.

One solution is to "disable" some of the more error-prone syntax allowed in C,
turning it into a compilation error. For example I have seen newbies write bugs
caused by leaving & where a && was necessary. In such case just adopting "and"
and making "&&" a syntax error solves the problem and doesn't lead to bugs when
you convert C code to D (you just use a search&replace, replacing && with and
on the code).

In other situations it may be less easy to find such kind of solutions (that is
invent an alternative syntax/semantics and make the C one a syntax error), in
such cases I think it's better to discuss each one of such situations
independently. In some situations we can even break the standard way D pursues
compatibility, for the sake of avoiding bugs and making the semantics better.


 The disadvantage is that it is more complex

It's not really more complex, it just makes visible some hidden complexity that
is already present and inherent of the signed/unsigned nature of the numbers.
It also follows the Python Zen rule: "In the face of ambiguity, refuse the
temptation to guess."


 and may surprise the novice 
 in its own way by refusing to compile code that looks legit.

A compile error is better than a potential runtime bug.


 Walter, as many good long-time C programmers, knows the abusive 
 unsigned rule so well he's not hurt by it and consequently has little 
 incentive to see it as a problem.

I'm not a newbie of programming, but in the last year I have put in my code two
bugs related to this, so I suggest to find ways to avoid this silly situation.
I think the first bug was something like:
if (arr.lenght > x) ...
where x was a signed int with value -5 (this specific bug can also be solved
making array length a signed value. What's the point of making it unsigned in
the first place? I have seen that in D it's safer to use signed values
everywhere you don't strictly need an unsigned value. And that length doesn't
need to be unsigned).

Beside the unsigned/signed problems discussed here, it may be positive to list
some of other situations where the C syntax/semantics may lead to bugs. For
example, does fixes the C semantics of % (modulo) operation?
Another example: in both Pascal and Python3 there are two different operators
for the division, one for the FP one and one for the integer one (in Pascal
they are / and div, in Python3 they are / and // ).. So can it be positive for
D too to define two different operators for such purpose? 

Bye,
bearophile

Nov 25 2008

bearophile <bearophileHUGS lycos.com> writes:

bearophile:
 if (arr.lenght > x) ...

Oh, yes :-) and writing "lenght" instead of "lenght" is a common mistake of
mine, usually the code editor allows me to avoid this error because the right
one becomes colored. That's why in the past I have suggested something simpler
and shorter like "len" (others have suggested "size" instead, it too is
acceptable to me).

Bye,
bearophile

Nov 25 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"bearophile" wrote
 bearophile:
 if (arr.lenght > x) ...

 Oh, yes :-) and writing "lenght" instead of "lenght" is a common mistake 
 of mine

lol!!!

Nov 25 2008

bearophile <bearophileHUGS lycos.com> writes:

Steven Schveighoffer:
 lol!!! 

I know, I know... :-) But when people do errors so often, the error is
elsewhere, in the original choice of that word to denote how many items an
iterable has.

In my libs I have defined len() like this, that I use now and then (where
running speed isn't essential):

long len(TyItems)(TyItems items) {
    static if (HasLength!(TyItems))
        return items.length;
    else {
        long len;
        // this generates: foreach (p1, p2, p3; items) len++;  with a variable
number of p1, p2...
        mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~
"; items) len++;");
        return len;
    }
} // End of len(items)

/// ditto
long len(TyItems, TyFun)(TyItems items, TyFun pred) {
    static assert(IsCallable!(TyFun), "len(): predicate must be a callable");
    long len;

    static if (IsAA!(TyItems)) {
        foreach (key, val; items)
            if (pred(key, val))
                len++;
    } else static if (is(typeof(TyItems.opApply))) {
        mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~
"; items)
            if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) ~ "))
                len++;");
    } else {
        foreach (el; items)
            if (pred(el))
                len++;
    }

    return len;
} // End of len(items, pred)

alias len!(string) strLen; /// ditto
alias len!(int[]) intLen; /// ditto
alias len!(float[]) floatLen; /// ditto

Having a global callable like len() instead of an attribute is (sometimes)
better, because you can use it for example like this (this is working syntax of
my dlibs):

children.sort(&len!(string));
That sorts the array of strings "children" according to the given callable key,
that is the len of the strings.

Bye,
bearophile

Nov 25 2008

"Nick Sabalausky" <a a.a> writes:

"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:gghc97$1mfo$1 digitalmars.com...
 Steven Schveighoffer:
 lol!!!

 I know, I know... :-) But when people do errors so often, the error is 
 elsewhere, in the original choice of that word to denote how many items an 
 iterable has.

 In my libs I have defined len() like this, that I use now and then (where 
 running speed isn't essential):

 long len(TyItems)(TyItems items) {
    static if (HasLength!(TyItems))
        return items.length;
    else {
        long len;
        // this generates: foreach (p1, p2, p3; items) len++;  with a 
 variable number of p1, p2...
        mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 
 1) ~ "; items) len++;");
        return len;
    }
 } // End of len(items)

 /// ditto
 long len(TyItems, TyFun)(TyItems items, TyFun pred) {
    static assert(IsCallable!(TyFun), "len(): predicate must be a 
 callable");
    long len;

    static if (IsAA!(TyItems)) {
        foreach (key, val; items)
            if (pred(key, val))
                len++;
    } else static if (is(typeof(TyItems.opApply))) {
        mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 
 1) ~ "; items)
            if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) 
 ~ "))
                len++;");
    } else {
        foreach (el; items)
            if (pred(el))
                len++;
    }

    return len;
 } // End of len(items, pred)

 alias len!(string) strLen; /// ditto
 alias len!(int[]) intLen; /// ditto
 alias len!(float[]) floatLen; /// ditto

 Having a global callable like len() instead of an attribute is (sometimes) 
 better, because you can use it for example like this (this is working 
 syntax of my dlibs):

 children.sort(&len!(string));
 That sorts the array of strings "children" according to the given callable 
 key, that is the len of the strings.

If we ever get extension methods, then maybe something along these lines 
would be nice:

extension typeof(T.length) len(T t)
{
    return T.length;
}

Nov 25 2008

KennyTM~ <kennytm gmail.com> writes:

Nick Sabalausky wrote:
 "bearophile" <bearophileHUGS lycos.com> wrote in message 
 news:gghc97$1mfo$1 digitalmars.com...
 Steven Schveighoffer:
 lol!!!

 I know, I know... :-) But when people do errors so often, the error is 
 elsewhere, in the original choice of that word to denote how many items an 
 iterable has.

 In my libs I have defined len() like this, that I use now and then (where 
 running speed isn't essential):

 long len(TyItems)(TyItems items) {
    static if (HasLength!(TyItems))
        return items.length;
    else {
        long len;
        // this generates: foreach (p1, p2, p3; items) len++;  with a 
 variable number of p1, p2...
        mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 
 1) ~ "; items) len++;");
        return len;
    }
 } // End of len(items)

 /// ditto
 long len(TyItems, TyFun)(TyItems items, TyFun pred) {
    static assert(IsCallable!(TyFun), "len(): predicate must be a 
 callable");
    long len;

    static if (IsAA!(TyItems)) {
        foreach (key, val; items)
            if (pred(key, val))
                len++;
    } else static if (is(typeof(TyItems.opApply))) {
        mixin("foreach (" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 
 1) ~ "; items)
            if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 1) 
 ~ "))
                len++;");
    } else {
        foreach (el; items)
            if (pred(el))
                len++;
    }

    return len;
 } // End of len(items, pred)

 alias len!(string) strLen; /// ditto
 alias len!(int[]) intLen; /// ditto
 alias len!(float[]) floatLen; /// ditto

 Having a global callable like len() instead of an attribute is (sometimes) 
 better, because you can use it for example like this (this is working 
 syntax of my dlibs):

 children.sort(&len!(string));
 That sorts the array of strings "children" according to the given callable 
 key, that is the len of the strings.

 
 If we ever get extension methods, then maybe something along these lines 
 would be nice:
 
 extension typeof(T.length) len(T t)
 {
     return T.length;
 }
 
 

Already works:

uint len(A) (in A x) { return x.length; }

Nov 25 2008

"Nick Sabalausky" <a a.a> writes:

"KennyTM~" <kennytm gmail.com> wrote in message 
news:ggipu6$26mr$1 digitalmars.com...
 Nick Sabalausky wrote:
 "bearophile" <bearophileHUGS lycos.com> wrote in message 
 news:gghc97$1mfo$1 digitalmars.com...
 Steven Schveighoffer:
 lol!!!

 I know, I know... :-) But when people do errors so often, the error is 
 elsewhere, in the original choice of that word to denote how many items 
 an iterable has.

 In my libs I have defined len() like this, that I use now and then 
 (where running speed isn't essential):

 long len(TyItems)(TyItems items) {
    static if (HasLength!(TyItems))
        return items.length;
    else {
        long len;
        // this generates: foreach (p1, p2, p3; items) len++;  with a 
 variable number of p1, p2...
        mixin("foreach (" ~ SeriesGen1!("p", ", ", 
 OpApplyCount!(TyItems), 1) ~ "; items) len++;");
        return len;
    }
 } // End of len(items)

 /// ditto
 long len(TyItems, TyFun)(TyItems items, TyFun pred) {
    static assert(IsCallable!(TyFun), "len(): predicate must be a 
 callable");
    long len;

    static if (IsAA!(TyItems)) {
        foreach (key, val; items)
            if (pred(key, val))
                len++;
    } else static if (is(typeof(TyItems.opApply))) {
        mixin("foreach (" ~ SeriesGen1!("p", ", ", 
 OpApplyCount!(TyItems), 1) ~ "; items)
            if (pred(" ~ SeriesGen1!("p", ", ", OpApplyCount!(TyItems), 
 1) ~ "))
                len++;");
    } else {
        foreach (el; items)
            if (pred(el))
                len++;
    }

    return len;
 } // End of len(items, pred)

 alias len!(string) strLen; /// ditto
 alias len!(int[]) intLen; /// ditto
 alias len!(float[]) floatLen; /// ditto

 Having a global callable like len() instead of an attribute is 
 (sometimes) better, because you can use it for example like this (this 
 is working syntax of my dlibs):

 children.sort(&len!(string));
 That sorts the array of strings "children" according to the given 
 callable key, that is the len of the strings.

 If we ever get extension methods, then maybe something along these lines 
 would be nice:

 extension typeof(T.length) len(T t)
 {
     return T.length;
 }

 Already works:

 uint len(A) (in A x) { return x.length; }

Oh, right. For some stupid reason I was forgetting that the param would 
always be an array and therefore be eligible for the existing array property 
syntax (and that .length always returns a uint).

Nov 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Nick Sabalausky:
 Oh, right. For some stupid reason I was forgetting that the param would 
 always be an array and therefore be eligible for the existing array property 
 syntax (and that .length always returns a uint).

From the len() code I have posted you can see there are other places where you
want to use len(), in particular to count the number of items that a lazy
generator (opApply for now) yields.

Bye,
bearophile

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Nick Sabalausky:
 Oh, right. For some stupid reason I was forgetting that the param would 
 always be an array and therefore be eligible for the existing array property 
 syntax (and that .length always returns a uint).

 
 From the len() code I have posted you can see there are other places where you
want to use len(), in particular to count the number of items that a lazy
generator (opApply for now) yields.
 
 Bye,
 bearophile

I'm rather weary of a short and suggestive name that embodies a linear 
operation. I recall there was a discussion about that a while ago in 
this newsgroup. I'd rather call it linearLength or something that 
suggests it's a best-effort function that may take O(n).

Andrei

Nov 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 I'm rather weary of a short and suggestive name that embodies a linear 
 operation. I recall there was a discussion about that a while ago in 
 this newsgroup. I'd rather call it linearLength or something that 
 suggests it's a best-effort function that may take O(n).

I remember parts of that discussion, and I like your general rule, and I agree
that generally it's better to give the programmer a hint of the complexity of a
specific operation, for example a method of a user defined class, etc.

But len() is supposed to be used very often, so it's better to keep it short,
because if you don't have an IDE it's not nice to type linearLength() one time
every 2 lines of code.

Being used so often also implies that you remember how it works, so you are
supposed to be able to remember it can be O(n) on lazy iterators.

So in this specific case I think it's acceptable to break your general rule,
for practical reasons.

Bye,
bearophile

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 I'm rather weary of a short and suggestive name that embodies a linear 
 operation. I recall there was a discussion about that a while ago in 
 this newsgroup. I'd rather call it linearLength or something that 
 suggests it's a best-effort function that may take O(n).

 
 I remember parts of that discussion, and I like your general rule, and I agree
that generally it's better to give the programmer a hint of the complexity of a
specific operation, for example a method of a user defined class, etc.
 
 But len() is supposed to be used very often, so it's better to keep it short,
because if you don't have an IDE it's not nice to type linearLength() one time
every 2 lines of code.
 
 Being used so often also implies that you remember how it works, so you are
supposed to be able to remember it can be O(n) on lazy iterators.
 
 So in this specific case I think it's acceptable to break your general rule,
for practical reasons.
 
 Bye,
 bearophile

If it's used often it shouldn't have linear complexity :o).

Andrei

Nov 26 2008

Christopher Wright <dhasenan gmail.com> writes:

Andrei Alexandrescu wrote:
 bearophile wrote:
 Nick Sabalausky:
 Oh, right. For some stupid reason I was forgetting that the param 
 would always be an array and therefore be eligible for the existing 
 array property syntax (and that .length always returns a uint).

 From the len() code I have posted you can see there are other places 
 where you want to use len(), in particular to count the number of 
 items that a lazy generator (opApply for now) yields.

 Bye,
 bearophile

 
 I'm rather weary of a short and suggestive name that embodies a linear 
 operation. I recall there was a discussion about that a while ago in 
 this newsgroup. I'd rather call it linearLength or something that 
 suggests it's a best-effort function that may take O(n).

My personal rules of optimization:
  - I don't know what's slow.
  - I don't know what's called often enough to be worth speeding up.
  - Most of the time, my data sets are small.

If getting the length of an array were a linear operation, that wouldn't 
much affect any of my code. Most of my arrays are probably no larger 
than twenty elements, and I don't often need to get their lengths.

If I need to change data structures for better performance, I'd like to 
be able to replace them (or switch to generators) without undo effort. 
Things like changing function names according to the algorithmic 
complexity of the implementation just hurts.

 Andrei

Nov 26 2008

Kagamin <spam here.lot> writes:

bearophile Wrote:

 From the len() code I have posted you can see there are other places where you
want to use len(), in particular to count the number of items that a lazy
generator (opApply for now) yields.

hmm...

import std.stdio, std.algorithm;

void main()
{
	bool pred(int x){ return x>2; }
	auto counter=(int count, int x){ return pred(x)?count+1:count; };
	int[] a=[0,1,2,3,4];
	auto lazylen=reduce!(counter)(0,a);
	writeln(lazylen); //2
}

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Walter, as many good long-time C programmers, knows the abusive 
 unsigned rule so well he's not hurt by it and consequently has
 little incentive to see it as a problem.

 
 I'm not a newbie of programming, but in the last year I have put in
 my code two bugs related to this, so I suggest to find ways to avoid
 this silly situation. I think the first bug was something like: if
 (arr.lenght > x) ...

 where x was a signed int with value -5 (this specific bug can also be
 solved making array length a signed value. What's the point of making
 it unsigned in the first place? I have seen that in D it's safer to
 use signed values everywhere you don't strictly need an unsigned
 value. And that length doesn't need to be unsigned).

It's worthwhile keeping length an unsigned type if we can convincingly 
sell unsigned types as models of natural numbers. With the current 
rules, we can't make a convincing argument. But if we do manage to 
improve the rules, then we'll all be better off.

Andrei

Nov 25 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I remembered a couple more details. The names bits8, bits16, bits32, and 
bits64 were a possible choice for undecided-sign integrals. Walter and I 
liked that quite some. Walter also suggested that we make those actually 
full types accessible to programmers. We both were concerned that they'd 
add to the already large panoply of integral types in D. Dropping bits8 
and bits16 would reduce bloating at the cost of consistency.

So we're contemplating:

(a) Add bits8, bits16, bit32, bits64 public types.
(b) Add bit32, bits64 public types.
(c) Add bits8, bits16, bit32, bits64 compiler-internal types.
(d) Add bit32, bits64 compiler-internal types.

Make your pick or add more choices!


Andrei

Nov 25 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
I remembered a couple more details. The names bits8, bits16, bits32, and 
bits64 were a possible choice for undecided-sign integrals. Walter and I 
liked that quite some. Walter also suggested that we make those actually 
full types accessible to programmers. We both were concerned that they'd 
add to the already large panoply of integral types in D. Dropping bits8 and 
bits16 would reduce bloating at the cost of consistency.

 So we're contemplating:

 (a) Add bits8, bits16, bit32, bits64 public types.
 (b) Add bit32, bits64 public types.
 (c) Add bits8, bits16, bit32, bits64 compiler-internal types.
 (d) Add bit32, bits64 compiler-internal types.

 Make your pick or add more choices!

One other thing to contemplate:

What happens if you add a bits32 to a bits64, long, or ulong value?  This 
needs to be illegal since you don't know whether to sign-extend or not.  Or 
you could reinterpret the expression to promote the original types to 64-bit 
first?

This makes the version with 8 and 16 bit types less attractive.

Another alternative is to select the bits type based on the entire 
expression.  Of course, you'd have to disallow them as public types.  And 
you'd want to do some special optimizations.  You could represent it 
conceptually as calculating for all the bits types until the one that is 
decided is used, and then the compiler can optimize out the unused ones, 
which would at least keep it context-free.

-Steve

Nov 25 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 I remembered a couple more details. The names bits8, bits16, bits32, and 
 bits64 were a possible choice for undecided-sign integrals. Walter and I 
 liked that quite some. Walter also suggested that we make those actually 
 full types accessible to programmers. We both were concerned that they'd 
 add to the already large panoply of integral types in D. Dropping bits8 and 
 bits16 would reduce bloating at the cost of consistency.

 So we're contemplating:

 (a) Add bits8, bits16, bit32, bits64 public types.
 (b) Add bit32, bits64 public types.
 (c) Add bits8, bits16, bit32, bits64 compiler-internal types.
 (d) Add bit32, bits64 compiler-internal types.

 Make your pick or add more choices!

 
 One other thing to contemplate:
 
 What happens if you add a bits32 to a bits64, long, or ulong value?  This 
 needs to be illegal since you don't know whether to sign-extend or not.  Or 
 you could reinterpret the expression to promote the original types to 64-bit 
 first?

Good point. There's no (or not much) arithmetic mixing bits32 and some 
64-bit integral because it's unclear whether extending the bits32 
operand should extend the sign bit or not.

 This makes the version with 8 and 16 bit types less attractive.
 
 Another alternative is to select the bits type based on the entire 
 expression.  Of course, you'd have to disallow them as public types.  And 
 you'd want to do some special optimizations.  You could represent it 
 conceptually as calculating for all the bits types until the one that is 
 decided is used, and then the compiler can optimize out the unused ones, 
 which would at least keep it context-free.
 
 -Steve 

That's the intent of defining arithmetic on sign-ambiguous values. The 
type information propagates in a complex expression. I haven't heard of 
typechecking on entire expression patterns and I think it would be a 
rather unclean technique (it means either that there are values that you 
can't tell the type of, or that a given value has a context-dependent type).


Andrei

Nov 25 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Tue, 25 Nov 2008 11:06:32 -0600, Andrei Alexandrescu wrote:

 I remembered a couple more details. The names bits8, bits16, bits32, and 
 bits64 were a possible choice for undecided-sign integrals. Walter and I 
 liked that quite some. Walter also suggested that we make those actually 
 full types accessible to programmers. We both were concerned that they'd 
 add to the already large panoply of integral types in D. Dropping bits8 
 and bits16 would reduce bloating at the cost of consistency.
 
 So we're contemplating:
 
 (a) Add bits8, bits16, bit32, bits64 public types.
 (b) Add bit32, bits64 public types.
 (c) Add bits8, bits16, bit32, bits64 compiler-internal types.
 (d) Add bit32, bits64 compiler-internal types.
 
 Make your pick or add more choices!

I'll add more.  :)

The problem with signed/unsigned types is that neither int nor uint is a
sub-type of one another.  They're essentially incompatible.  Therefore a
possible solution is:

1.  Disallow implicit signed <=> unsigned conversion.

2.  For those willing to port large C/C++ codebases introduce a compiler
compatibility switch which would add global operators mimicking the C
behavior:

uint opAdd(int, uint)
uint opAdd(uint, int)
ulong opAdd(long, ulong)
etc.

This way you can even implement compatibility levels: only C-style
additions, or additions with multiplications, or complete compatibility
including the original signed/unsigned comparison behavior.

Nov 25 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Tue, 25 Nov 2008 11:06:32 -0600, Andrei Alexandrescu wrote:
 
 I remembered a couple more details. The names bits8, bits16, bits32, and 
 bits64 were a possible choice for undecided-sign integrals. Walter and I 
 liked that quite some. Walter also suggested that we make those actually 
 full types accessible to programmers. We both were concerned that they'd 
 add to the already large panoply of integral types in D. Dropping bits8 
 and bits16 would reduce bloating at the cost of consistency.

 So we're contemplating:

 (a) Add bits8, bits16, bit32, bits64 public types.
 (b) Add bit32, bits64 public types.
 (c) Add bits8, bits16, bit32, bits64 compiler-internal types.
 (d) Add bit32, bits64 compiler-internal types.

 Make your pick or add more choices!

 
 I'll add more.  :)
 
 The problem with signed/unsigned types is that neither int nor uint is a
 sub-type of one another.  They're essentially incompatible.  Therefore a
 possible solution is:
 
 1.  Disallow implicit signed <=> unsigned conversion.

I forgot to mention that that's implied in the bitsNN approach too.

 2.  For those willing to port large C/C++ codebases introduce a compiler
 compatibility switch which would add global operators mimicking the C
 behavior:
 
 uint opAdd(int, uint)
 uint opAdd(uint, int)
 ulong opAdd(long, ulong)
 etc.

Having semantics depend so heavily and confusingly on a compiler switch 
is extremely dangerous. Note that actually quite a lot of code will 
compile, with different semantics, with or without the switch.

 This way you can even implement compatibility levels: only C-style
 additions, or additions with multiplications, or complete compatibility
 including the original signed/unsigned comparison behavior.

I don't think we can pursue such a path.


Andrei

Nov 25 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Tue, 25 Nov 2008 15:49:23 -0600, Andrei Alexandrescu wrote:

 Sergey Gromov wrote:
 2.  For those willing to port large C/C++ codebases introduce a compiler
 compatibility switch which would add global operators mimicking the C
 behavior:
 
 uint opAdd(int, uint)
 uint opAdd(uint, int)
 ulong opAdd(long, ulong)
 etc.

 
 Having semantics depend so heavily and confusingly on a compiler switch 
 is extremely dangerous. Note that actually quite a lot of code will 
 compile, with different semantics, with or without the switch.

One of us should be missing something.  There was no 'different
semantics' in my proposal.  The code either compiles and behaves exactly
like in C or does not compile at all.  The amount of code which compiles
or fails depends on a compiler switch, not semantics.

Nov 25 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Tue, 25 Nov 2008 15:49:23 -0600, Andrei Alexandrescu wrote:
 
 Sergey Gromov wrote:
 2.  For those willing to port large C/C++ codebases introduce a compiler
 compatibility switch which would add global operators mimicking the C
 behavior:

 uint opAdd(int, uint)
 uint opAdd(uint, int)
 ulong opAdd(long, ulong)
 etc.

 Having semantics depend so heavily and confusingly on a compiler switch 
 is extremely dangerous. Note that actually quite a lot of code will 
 compile, with different semantics, with or without the switch.

 
 One of us should be missing something.  There was no 'different
 semantics' in my proposal.  The code either compiles and behaves exactly
 like in C or does not compile at all.  The amount of code which compiles
 or fails depends on a compiler switch, not semantics.

Sorry, I misunderstood.

Andrei

Nov 25 2008

Russell Lewis <webmaster villagersonline.com> writes:

I'm of the opinion that we should make mixed-sign operations a 
compile-time error.  I know that it would be annoying in some 
situations, but IMHO it gives you clearer, more reliable code.

IMHO, it's a mistake to have implicit casts that lose information.


Want to hear a funny/sad, but somewhat related story?  I was chasing 
down a segfault recently at work.  I hunted and hunted, and finally 
found out that the pointer returned from malloc() was bad.  I figured 
that I was overwriting the heap, right?  So I added tracing and 
debugging everywhere...no luck.

I finally, in desperation, included <stdlib.h> to the source file (there 
was a warning about malloc() not being prototyped)...and the segfaults 
vanished!!!

The problem was that the xlc compiler, when it doesn't have the 
prototype for a function, assumes that it returns int...but int is 32 
bits.  Moreover, the compiler was happily implicitly casting that int to 
a pointer...which was 64 bits.

The compiler was silently cropping the top 32 bits off my pointers.

And it all was a "feature" to make programming "easier."


Russ

Andrei Alexandrescu wrote:
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.
 
 A classic problem with C and C++ integer arithmetic is that any 
 operation involving at least an unsigned integral receives automatically 
 an unsigned type, regardless of how silly that actually is, 
 semantically. About the only advantage of this rule is that it's simple. 
 IMHO it only has disadvantages from then on.
 
 The following operations suffer from the "abusive unsigned syndrome" (u 
 is an unsigned integral, i is a signed integral):
 
 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u
 
 Logic operations &, |, and ^ also yield unsigned, but such cases are 
 less abusive because at least the operation wasn't arithmetic in the 
 first place. Comparing for equality is also quite a conundrum - should 
 minus two billion compare equal to 2_294_967_296? I'll ignore these for 
 now and focus on (1) - (6).
 
 So far we haven't found a solid solution to this problem that at the 
 same time allows "good" code pass through, weeds out "bad" code, and is 
 compatible with C and C++. The closest I got was to have the compiler 
 define the following internal types:
 
 __intuint
 __longulong
 
 I've called them "dual-signed integers" in the past, but let's try the 
 shorter "undecided sign". Each of these is a subtype of both the signed 
 and the unsigned integral in its name, e.g. __intuint is a subtype of 
 both int and uint. (Originally I thought of defining __byteubyte and 
 __shortushort as well but dropped them in the interest of simplicity.)
 
 The sign-ambiguous operations (1) - (6) yield __intuint if no operand 
 size was larger than 32 bits, and __longulong otherwise. Undecided sign 
 types define their own operations. Let x and y be values of undecided 
 sign. Then x + y, x - y, and -x also return a sign-ambiguous integral 
 (the size is that of the largest operand). However, the other operators 
 do not work on sign-ambiguous integrals, e.g. x / y would not compile 
 because you must decide what sign x and y should have prior to invoking 
 the operation. (Rationale: multiplication/division work differently 
 depending on the signedness of their operands).
 
 User code cannot define a symbol of sign-ambiguous type, e.g.
 
 auto a = u + i;
 
 would not compile. However, given that __intuint is a subtype of both 
 int and uint, it can be freely converted to either whenever there's no 
 ambiguity:
 
 int a = u + i; // fine
 uint b = u + i; // fine
 
 The advantage of this scheme is that it weeds out many (most? all?) 
 surprises and oddities caused by the abusive unsigned rule of C and C++. 
 The disadvantage is that it is more complex and may surprise the novice 
 in its own way by refusing to compile code that looks legit.
 
 At the moment, we're in limbo regarding the decision to go forward with 
 this. Walter, as many good long-time C programmers, knows the abusive 
 unsigned rule so well he's not hurt by it and consequently has little 
 incentive to see it as a problem. I have had to teach C and C++ to young 
 students coming from Java introductory courses and have a more 
 up-to-date perspective on the dangers. My strong belief is that we need 
 to address this mess somehow, which type inference will only make more 
 painful (in the hand of the beginner, auto can be a quite dangerous tool 
 for wrong belief propagation). I also know seasoned programmers who had 
 no idea that -u compiles and that it also oddly returns an unsigned type.
 
 Your opinions, comments, and suggestions for improvements would as 
 always be welcome.
 
 
 Andrei

Nov 25 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

(You may want to check your system's date, unless of course you traveled 
in time.)

Russell Lewis wrote:
 I'm of the opinion that we should make mixed-sign operations a 
 compile-time error.  I know that it would be annoying in some 
 situations, but IMHO it gives you clearer, more reliable code.

The problem is, it's much more annoying than one might imagine. Even 
array.length - 1 is up for scrutiny. Technically, even array.length + 1 
is a problem because 1 is really a signed int. We could provide 
exceptions for constants, but exceptions are generally not solving the 
core issue.

 IMHO, it's a mistake to have implicit casts that lose information.

Hear, hear.

 Want to hear a funny/sad, but somewhat related story?  I was chasing 
 down a segfault recently at work.  I hunted and hunted, and finally 
 found out that the pointer returned from malloc() was bad.  I figured 
 that I was overwriting the heap, right?  So I added tracing and 
 debugging everywhere...no luck.
 
 I finally, in desperation, included <stdlib.h> to the source file (there 
 was a warning about malloc() not being prototyped)...and the segfaults 
 vanished!!!
 
 The problem was that the xlc compiler, when it doesn't have the 
 prototype for a function, assumes that it returns int...but int is 32 
 bits.  Moreover, the compiler was happily implicitly casting that int to 
 a pointer...which was 64 bits.
 
 The compiler was silently cropping the top 32 bits off my pointers.
 
 And it all was a "feature" to make programming "easier."

Good story for reminding ourselves of the advantages of type safety!


Andrei

Nov 25 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 The problem is, it's much more annoying than one might imagine. Even 
 array.length - 1 is up for scrutiny. Technically, even array.length + 1 
 is a problem because 1 is really a signed int. We could provide 
 exceptions for constants, but exceptions are generally not solving the 
 core issue.

That can be solved making array.length signed.
Can you list few other annoying situations?

Bye,
bearophile

Nov 25 2008

"Nick Sabalausky" <a a.a> writes:

"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:gghsa1$2u0c$1 digitalmars.com...
 Andrei Alexandrescu:
 The problem is, it's much more annoying than one might imagine. Even
 array.length - 1 is up for scrutiny. Technically, even array.length + 1
 is a problem because 1 is really a signed int. We could provide
 exceptions for constants, but exceptions are generally not solving the
 core issue.

 That can be solved making array.length signed.
 Can you list few other annoying situations?

I disagree. If you start using that as a solution, then you may as well 
eliminate unsigned values entirely.

I think the root problem with disallowing mixed-sign operations is that math 
just doesn't work that way. What I mean by that is, disallowing mixed-sign 
operations implies that we have these nice cleanly separated worlds of 
"signed math" and "unsigned math". But depending on the operator, the 
signs/ordering of the operands, and what the operands actually represent, 
math has tendancy to switch back and forth between the signed ("can be 
negative") and unsigned ("can't be negative") worlds. So if we have a type 
system that forces us to jump through hoops every time that world-switch 
happens, and we then decide that it's justifiable to say "well, let's fix it 
for array.length by tossing that over to the 'can be negative' world, even 
though it cuts our range of allowable values in half", then there's nothing 
stopping us from solving the rest of the cases by throwing them over the 
"can be negative" wall as well. All of a sudden, we have no unsigned.

Just a thought: Maybe some sort of built-in "units" system could help here? 
Instead of just making array.length a "signed" or "unsigned" and leavng it 
as that, add a "units system" and tag array.length as being a length, with 
length tags carring the connotation that negative is disallowed. 
Adding/subtracting a pure constant to a length would cause the constant to 
be automaticlly tagged as a "length delta" (which can be negative). And the 
units system would, of course, contain the rule that a length delta 
added/subtracted from a length results in a length. The units system could 
then translate all of that into "signed vs unsigned".

Nov 25 2008

Kagamin <spam here.lot> writes:

Nick Sabalausky Wrote:

 happens, and we then decide that it's justifiable to say "well, let's fix it 
 for array.length by tossing that over to the 'can be negative' world, even 
 though it cuts our range of allowable values in half", then there's nothing 
 stopping us from solving the rest of the cases by throwing them over the 
 "can be negative" wall as well. All of a sudden, we have no unsigned.

Well... cutting out range can be no problem, after all a thought was floating
around that structs shouldn't be larger that a couple of kb, note that array of
shorts with signed length spans entire 32-bit address space.

Nov 26 2008

Daniel de Kok <daniel nowhere.nospam> writes:

On Tue, 25 Nov 2008 16:56:17 -0500, bearophile wrote:
 Andrei Alexandrescu:
 The problem is, it's much more annoying than one might imagine. Even
 array.length - 1 is up for scrutiny. Technically, even array.length + 1
 is a problem because 1 is really a signed int. We could provide
 exceptions for constants, but exceptions are generally not solving the
 core issue.

 
 That can be solved making array.length signed.

Is that conceptually clean/clear? (If so, I'd like to request an array of 
length -1.)

I like Andrei's proposal because it keeps clarity in such cases: sizes 
are non-negative quantities. Once you start subtracting ints, it's 
possibly not a size anymore, in such cases you want the user to decide 
explicitly.

-- Daniel

Nov 25 2008

Ary Borenszweig <ary esperanto.org.ar> writes:

bearophile wrote:
 Andrei Alexandrescu:
 The problem is, it's much more annoying than one might imagine. Even 
 array.length - 1 is up for scrutiny. Technically, even array.length + 1 
 is a problem because 1 is really a signed int. We could provide 
 exceptions for constants, but exceptions are generally not solving the 
 core issue.

 
 That can be solved making array.length signed.


unsigned types, the length of a list, array, etc., is always int. In 
this way, they prevented the bugs and problems everyone mention here.

Nov 26 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 (You may want to check your system's date, unless of course you traveled
 in time.)
 Russell Lewis wrote:
 I'm of the opinion that we should make mixed-sign operations a
 compile-time error.  I know that it would be annoying in some
 situations, but IMHO it gives you clearer, more reliable code.

 The problem is, it's much more annoying than one might imagine. Even
 array.length - 1 is up for scrutiny. Technically, even array.length + 1
 is a problem because 1 is really a signed int. We could provide
 exceptions for constants, but exceptions are generally not solving the
 core issue.

Perhaps not, but the fact that constants are signed integers has been
mentioned as a problem before.  Would making these polysemous
values help at all?  That seems to be what your proposal is effectively
trying to do anyway.


Sean

Nov 25 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 (You may want to check your system's date, unless of course you traveled
 in time.)
 Russell Lewis wrote:
 I'm of the opinion that we should make mixed-sign operations a
 compile-time error.  I know that it would be annoying in some
 situations, but IMHO it gives you clearer, more reliable code.

 The problem is, it's much more annoying than one might imagine. Even
 array.length - 1 is up for scrutiny. Technically, even array.length + 1
 is a problem because 1 is really a signed int. We could provide
 exceptions for constants, but exceptions are generally not solving the
 core issue.

 
 Perhaps not, but the fact that constants are signed integers has been
 mentioned as a problem before.  Would making these polysemous
 values help at all?  That seems to be what your proposal is effectively
 trying to do anyway.

Well with constants we can do many tricks; I mentioned an extreme 
example. Polysemy does indeed help but my latest design (described in 
the post starting this thread) gets away with simple subtyping. I like 
polysemy (the name is really cool :o)) but I don't want to be 
concept-heavy: if a classic technique words, I'd use that and save 
polysemy for a tougher task that cannot be comfortably tackled with 
existing means.

Andrei

Nov 25 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-25 16:39:05 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Russell Lewis wrote:
 I'm of the opinion that we should make mixed-sign operations a 
 compile-time error.  I know that it would be annoying in some 
 situations, but IMHO it gives you clearer, more reliable code.

 
 The problem is, it's much more annoying than one might imagine. Even 
 array.length - 1 is up for scrutiny. Technically, even array.length + 1 
 is a problem because 1 is really a signed int. We could provide 
 exceptions for constants, but exceptions are generally not solving the 
 core issue.

Then the problem is that integer literals are of a specific type. Just 
make them polysemous and the problem is solved.

I'm with Russel on this one. To me, a litteral value (123, -8, 0) is 
not an int, not even a constant: it's just a number which doesn't imply 
any type at all until you place it into a variable (or a constant, or 
an enum, etc.).

And if you're afraid the word polysemous will scare people, don't say 
the word and call it a "integer litteral". Polysemy in this case is 
just a mechanism used by the compiler to make the value work as 
expected with all integral types. All you really need is a type 
implicitly castable to everything capable of holding the numerical 
value (much like your __intuint).

I'd make "auto x = 1" create a signed integer variable for the sake of 
simplicity.

And all this would also make "uint x = -1" illegal... but then you can 
easily use "uint x = uint.max" if you want to enable all the bits. It's 
easier as in C: you don't have to include the right header and remember 
the name of a constant.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2008-11-25 16:39:05 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Russell Lewis wrote:
 I'm of the opinion that we should make mixed-sign operations a 
 compile-time error.  I know that it would be annoying in some 
 situations, but IMHO it gives you clearer, more reliable code.

 The problem is, it's much more annoying than one might imagine. Even 
 array.length - 1 is up for scrutiny. Technically, even array.length + 
 1 is a problem because 1 is really a signed int. We could provide 
 exceptions for constants, but exceptions are generally not solving the 
 core issue.

 
 Then the problem is that integer literals are of a specific type. Just 
 make them polysemous and the problem is solved.

Well that at best takes care of _some_ operations involving constants, 
but for example does not quite take care of array.length - 1.

I am now sorry I gave the silly example of array.length + 1. Many people 
latched on it and thought that solving that solves the whole problem. 
That's not quite the case.

Also consider:

auto delta = a1.length - a2.length;

What should the type of delta be? Well, it depends. In my scheme that 
wouldn't even compile, which I think is a good thing; you must decide 
whether prior information makes it an unsigned or a signed integral.

 I'm with Russel on this one. To me, a litteral value (123, -8, 0) is not 
 an int, not even a constant: it's just a number which doesn't imply any 
 type at all until you place it into a variable (or a constant, or an 
 enum, etc.).

 And if you're afraid the word polysemous will scare people, don't say 
 the word and call it a "integer litteral". Polysemy in this case is just 
 a mechanism used by the compiler to make the value work as expected with 
 all integral types. All you really need is a type implicitly castable to 
 everything capable of holding the numerical value (much like your 
 __intuint).
 
 I'd make "auto x = 1" create a signed integer variable for the sake of 
 simplicity.

That can be formalized by having polysemous types have a "lemma", a 
default type.

 And all this would also make "uint x = -1" illegal... but then you can 
 easily use "uint x = uint.max" if you want to enable all the bits. It's 
 easier as in C: you don't have to include the right header and remember 
 the name of a constant.

Fine. With constants there is some mileage that can be squeezed. But 
let's keep in mind that that doesn't solve the larger issue.


Andrei

Nov 26 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Well that at best takes care of _some_ operations involving constants, 
 but for example does not quite take care of array.length - 1.

How does it not solve the problem. array.length is of type uint, 1 is 
polysemous(byte, ubyte, short, ushort, int, uint, long, ulong). Only 
"uint - uint" is acceptable, and its result is "uint".


 Also consider:
 
 auto delta = a1.length - a2.length;
 
 What should the type of delta be? Well, it depends. In my scheme that 
 wouldn't even compile, which I think is a good thing; you must decide 
 whether prior information makes it an unsigned or a signed integral.

In my scheme it would give you a uint. You'd have to cast to get a 
signed integer... I see how it's not ideal, but I can't imagine how it 
could be coherent otherwise.

	auto diff = cast(int)a1.length - cast(int)a2.length;

By casting explicitly, you indicate in the code that if a1.length or 
a2.length contain numbers which are too big to be represented as int, 
you'll get garbage. In this case, it'd be pretty surprising to get that 
problem. In other cases it may not be so clear-cut.

Perhaps we could add a "sign" property to uint and an "unsign" property 
to int that'd give you the signed or unsigned corresponding value and 
which could do range checking at runtime (enabled by a compiler flag).

	auto diff = a1.length.sign - a2.length.sign;

And for the general problem of "uint - uint" giving a result below 
uint.min, as I said in my other post, that could be handled by a 
runtime check (enabled by a compiler flag) just like array bound 
checking.

One last thing. I think that in general it's a much better habit to 
change the type to signed prior doing the substratction. It may be 
harmless in the case of a substraction, but as you said when starting 
the thread, it isn't for others (multiply, divide, modulo). I think the 
scheme above promotes this good habit by making it easier to change the 
type at the operands rather than at the result.


 I'd make "auto x = 1" create a signed integer variable for the sake of 
 simplicity.

 
 That can be formalized by having polysemous types have a "lemma", a 
 default type.

That's indeed what I'm suggesting.


 And all this would also make "uint x = -1" illegal... but then you can 
 easily use "uint x = uint.max" if you want to enable all the bits. It's 
 easier as in C: you don't have to include the right header and remember 
 the name of a constant.

 
 Fine. With constants there is some mileage that can be squeezed. But 
 let's keep in mind that that doesn't solve the larger issue.

Well, by making implicit convertions between uint and int illegal, 
we're solving the larger issue. Just not in a seemless manner.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 26 2008

Don <nospam nospam.com> writes:

Michel Fortin wrote:
 On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that 
 wouldn't even compile, which I think is a good thing; you must decide 
 whether prior information makes it an unsigned or a signed integral.

 
 In my scheme it would give you a uint. You'd have to cast to get a 
 signed integer... I see how it's not ideal, but I can't imagine how it 
 could be coherent otherwise.
 
     auto diff = cast(int)a1.length - cast(int)a2.length;

Actually, there's no solution.
Imagine a 32 bit system, where one object can be greater than 2GB in 
size (not possible in Windows AFAIK, but theoretically possible). Then 
if a1 is 3GB, delta cannot be stored in an int. If a2 is 3GB, it 
requires an int for storage, since result is less than 0.

==> I think length has to be an int. It's less bad than uint.


 Perhaps we could add a "sign" property to uint and an "unsign" property 
 to int that'd give you the signed or unsigned corresponding value and 
 which could do range checking at runtime (enabled by a compiler flag).
 
     auto diff = a1.length.sign - a2.length.sign;
 
 And for the general problem of "uint - uint" giving a result below 
 uint.min, as I said in my other post, that could be handled by a runtime 
 check (enabled by a compiler flag) just like array bound checking.

That's not bad.
 Fine. With constants there is some mileage that can be squeezed. But 
 let's keep in mind that that doesn't solve the larger issue.

 
 Well, by making implicit convertions between uint and int illegal, we're 
 solving the larger issue. Just not in a seemless manner.

We are of one mind. I think that constants are the root cause of the 
problem.

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Michel Fortin wrote:
 On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that 
 wouldn't even compile, which I think is a good thing; you must decide 
 whether prior information makes it an unsigned or a signed integral.

 In my scheme it would give you a uint. You'd have to cast to get a 
 signed integer... I see how it's not ideal, but I can't imagine how it 
 could be coherent otherwise.

     auto diff = cast(int)a1.length - cast(int)a2.length;

 
 Actually, there's no solution.

There is. We need to find the block of marble it's in and then chip the 
extra marble off it.

 Imagine a 32 bit system, where one object can be greater than 2GB in 
 size (not possible in Windows AFAIK, but theoretically possible).

It is possible in Windows if you change some I-forgot-which parameter in 
boot.ini.

 Then 
 if a1 is 3GB, delta cannot be stored in an int. If a2 is 3GB, it 
 requires an int for storage, since result is less than 0.
 
 ==> I think length has to be an int. It's less bad than uint.

I'm not sure how the conclusion follows from the premises, but consider 
this. If someone deals with large arrays, they do have the possibility 
of doing things like:

if (a1.length >= a2.length) {
     size_t delta = a1.length - a2.length;
     ... use delta ...
} else {
     size_t rDelta = a2.length - a1.length;
     ... use rDelta ...
}

I'm not saying it's better than sliced bread, but it is a solution. And 
it is correct on all systems. And cooperates with the typechecker by 
adding flow information to which typecheckers are usually oblivious. And 
types are out in the clear. And it's the programmer, not the compiler, 
who decides the signedness.

In contrast, using ints for array lengths beyond 2GB is a nightmare. I'm 
not saying it's a frequent thing though, but since you woke up the 
sleeping dog, I'm just barking :o).

 Perhaps we could add a "sign" property to uint and an "unsign" 
 property to int that'd give you the signed or unsigned corresponding 
 value and which could do range checking at runtime (enabled by a 
 compiler flag).

     auto diff = a1.length.sign - a2.length.sign;

 And for the general problem of "uint - uint" giving a result below 
 uint.min, as I said in my other post, that could be handled by a 
 runtime check (enabled by a compiler flag) just like array bound 
 checking.

 
 That's not bad.

Well let's look closer at this. Consider a system in which the current 
rules are in vigor, plus the overflow check for uint.

auto i = arr.length - offset1 + offset2;

Although the context makes it clear that offset1 < offset2 and therefore 
i is within range and won't overflow, the poor code generator has no 
choice but insert checks throughout. Even though the entire expression 
is always correct, it will dynamically fail on the way to its correct form.

Contrast with the proposed system in which the expression will not 
compile. They will indeed require the user to somewhat redundantly 
insert guides for operations, but during compilation, not through 
runtime failure.

 Fine. With constants there is some mileage that can be squeezed. But 
 let's keep in mind that that doesn't solve the larger issue.

 Well, by making implicit convertions between uint and int illegal, 
 we're solving the larger issue. Just not in a seemless manner.

 
 We are of one mind. I think that constants are the root cause of the 
 problem.

Well I strongly disagree. (I assume you mean "literals", not 
"constants".) I see constants as just a small part of the signedness 
mess. Moreover, I consider that in fact creating symbolic names with 
"auto" compounds the problem, and this belief runs straight against 
yours that it's about literals. No, IMHO it's about espousing and then 
propagating wrong beliefs through auto!

Maybe if you walked me through your reasoning on why literals bear a 
significant importance I could get convinced. As far as my code is 
concerned, I tend to loosely go along the lines of the old adage "the 
only literals in a program should be 0, 1, and -1". True, the adage 
doesn't say how many of these three may reasonably occur, but at the end 
of the day I'm confused about this alleged importance of literals.


Andrei

Nov 26 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-26 13:30:30 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Well let's look closer at this. Consider a system in which the current 
 rules are in vigor, plus the overflow check for uint.
 
 auto i = arr.length - offset1 + offset2;
 
 Although the context makes it clear that offset1 < offset2 and 
 therefore i is within range and won't overflow, the poor code generator 
 has no choice but insert checks throughout. Even though the entire 
 expression is always correct, it will dynamically fail on the way to 
 its correct form.

That's because you're relying on a specific behaviour for overflows and 
that changes with range checking. True: in some cases the values going 
circular is desirable. But in this specific case I'd say it'd be better 
to just add parenthesis at the right place, or change the order of the 
arguments to avoid overflow. Avoiding overflows is a good practice in 
general. The only reason it doesn't bite here is because you're limited 
to additions and subtractions.

If you dislike the compiler checking for overflows, just tell it not to 
check. That's why we need a compiler switch. Perhaps it'd be good to 
have a pragma to disable those checks for specific pieces of code too.


 Contrast with the proposed system in which the expression will not 
 compile. They will indeed require the user to somewhat redundantly 
 insert guides for operations, but during compilation, not through 
 runtime failure.

If you're just adding a special rule to prevent the result of 
substractions of unsigned values to be put into auto variables, I'm not 
terribly against that. I'm just unconvinced of its usefullness.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 26 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that  
 wouldn't even compile, which I think is a good thing; you must decide  
 whether prior information makes it an unsigned or a signed integral.

Sure, it shouldn't compile. But explicit casting to either type won't  
help. Let's say you expect that a1.length > a2.length and thus expect a  
strictly positive result. Putting an explicit cast will not detect (but  
suppress) an error and give you an erroneous result silently.

Putting an assert(a1.length > a2.length) might help, but the check will be  
unavailable unless code is compiled with asserts enabled.

A better solution would be to write code as follows:

auto delta = unsigned(a1.length - a2.length); // returns an unsigned  
value, throws on overflow (i.e., "2 - 4")
auto delta = signed(a1.length - a2.length); // returns result as a signed  
value. Throws on overflow (i.e., "int.min - 1")
auto delta = a1.length - a2.length; // won't compile

// this one is also handy:
auto newLength = checked(a1.length - 1); // preserves type of a1.length,  
be it int or uint, throws on overflow

I have previously shown an implementation of unsigned/signed:

import std.stdio;

int signed(lazy int dg)
{
     auto result = dg();
     asm {
        jo overflow;
     }
     return result;

     overflow:
     throw new Exception("Integer overflow occured");
}

int main()
{
    int t = int.max;
    try
    {
        int s = signed(t + 1);
        writefln("Result is %d", s);
    }
    catch(Exception e)
    {
        writefln("Whoops! %s", e.toString());
    }
    return 0;
}

But Andrei has correctly pointed out that it has a problem - it may throw  
without a reason:
int i = int.max + 1; // sets an overflow flag
auto result = expectSigned(1); // raises an exception

Overflow flag may also be cleared in a complex expression:
auto result = expectUnsigned(1 + (uint.max + 1)); // first add will  
overflow and second one clears the flag -> no exception as a result

A possible solution is to make the compiler aware of this construct and  
disallow passing none (case 2) or more that one operation (case 1) to the  
method.

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that 
 wouldn't even compile, which I think is a good thing; you must decide 
 whether prior information makes it an unsigned or a signed integral.

 
 Sure, it shouldn't compile. But explicit casting to either type won't 
 help. Let's say you expect that a1.length > a2.length and thus expect a 
 strictly positive result. Putting an explicit cast will not detect (but 
 suppress) an error and give you an erroneous result silently.

But "silently" and "putting a cast" don't go together. It's the cast 
that makes the erroneous result non-silent.

Besides, you don't need to cast. You can always use a function that does 
the requisite checks. std.conv will have some of those, should any 
change in the rules make it necessary.

By this I'm essentially replying Don's message in the bugs newsgroup: 
nobody puts a gun to your head to cast.

 Putting an assert(a1.length > a2.length) might help, but the check will 
 be unavailable unless code is compiled with asserts enabled.

Put an enforce(a1.length > a2.length) then.

 A better solution would be to write code as follows:
 
 auto delta = unsigned(a1.length - a2.length); // returns an unsigned 
 value, throws on overflow (i.e., "2 - 4")
 auto delta = signed(a1.length - a2.length); // returns result as a 
 signed value. Throws on overflow (i.e., "int.min - 1")
 auto delta = a1.length - a2.length; // won't compile

Amazingly this solution was discussed with these exact names! The signed 
and unsigned functions can be implemented as libraries, but 
unfortunately (or fortunately I guess) that means the bits32 and bits64 
are available to all code.

One fear of mine is the reaction of throwing of hands in the air "how 
many integral types are enough???". However, if we're to judge by the 
addition of long long and a slew of typedefs to C99 and C++0x, the 
answer is "plenty". I'd be interested in gaging how people feel about 
adding two (bits64, bits32) or even four (bits64, bits32, bits16, and 
bits8) types as basic types. They'd be bitbags with undecided sign ready 
to be converted to their counterparts of decided sign.

 // this one is also handy:
 auto newLength = checked(a1.length - 1); // preserves type of a1.length, 
 be it int or uint, throws on overflow

This could be rather tricky. How can overflow be checked? By inspecting 
the status bits in the processor only; at the language/typesystem level 
there's little to do.

 I have previously shown an implementation of unsigned/signed:
 
 import std.stdio;
 
 int signed(lazy int dg)
 {
     auto result = dg();
     asm {
        jo overflow;
     }
     return result;
 
     overflow:
     throw new Exception("Integer overflow occured");
 }
 
 int main()
 {
    int t = int.max;
    try
    {
        int s = signed(t + 1);
        writefln("Result is %d", s);
    }
    catch(Exception e)
    {
        writefln("Whoops! %s", e.toString());
    }
    return 0;
 }

Ah, there we go! Thanks for pasting this code.

 But Andrei has correctly pointed out that it has a problem - it may 
 throw without a reason:
 int i = int.max + 1; // sets an overflow flag
 auto result = expectSigned(1); // raises an exception
 
 Overflow flag may also be cleared in a complex expression:
 auto result = expectUnsigned(1 + (uint.max + 1)); // first add will 
 overflow and second one clears the flag -> no exception as a result
 
 A possible solution is to make the compiler aware of this construct and 
 disallow passing none (case 2) or more that one operation (case 1) to 
 the method.

Can't you clear the overflow flag prior to invoking the operation?

I'll also mention that making it a delegate reduces appeal quite a bit; 
expressions under the check tend to be simple which makes the relative 
overhead huge.


Andrei

Nov 26 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 26 Nov 2008 21:45:30 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that  
 wouldn't even compile, which I think is a good thing; you must decide  
 whether prior information makes it an unsigned or a signed integral.

  Sure, it shouldn't compile. But explicit casting to either type won't  
 help. Let's say you expect that a1.length > a2.length and thus expect a  
 strictly positive result. Putting an explicit cast will not detect (but  
 suppress) an error and give you an erroneous result silently.

 But "silently" and "putting a cast" don't go together. It's the cast  
 that makes the erroneous result non-silent.

 Besides, you don't need to cast. You can always use a function that does  
 the requisite checks. std.conv will have some of those, should any  
 change in the rules make it necessary.

 By this I'm essentially replying Don's message in the bugs newsgroup:  
 nobody puts a gun to your head to cast.

 Putting an assert(a1.length > a2.length) might help, but the check will  
 be unavailable unless code is compiled with asserts enabled.

 Put an enforce(a1.length > a2.length) then.

Right, it is better. Problem is, you don't want to put checks like  
"a1.length > a2.length" into your code (I don't, at least). All you want  
is to be sure that "auto result = a1.length - a2.length" is positive. You  
*then* decide and solve the "a1.length - a2.length >= 0" equation that  
leads to the check. Moreover, why evaluate both a1.length and a2.length  
twice? And you should update all your checks everytime you change your  
code.

 A better solution would be to write code as follows:
  auto delta = unsigned(a1.length - a2.length); // returns an unsigned  
 value, throws on overflow (i.e., "2 - 4")
 auto delta = signed(a1.length - a2.length); // returns result as a  
 signed value. Throws on overflow (i.e., "int.min - 1")
 auto delta = a1.length - a2.length; // won't compile

 Amazingly this solution was discussed with these exact names! The signed  
 and unsigned functions can be implemented as libraries, but  
 unfortunately (or fortunately I guess) that means the bits32 and bits64  
 are available to all code.

 One fear of mine is the reaction of throwing of hands in the air "how  
 many integral types are enough???". However, if we're to judge by the  
 addition of long long and a slew of typedefs to C99 and C++0x, the  
 answer is "plenty". I'd be interested in gaging how people feel about  
 adding two (bits64, bits32) or even four (bits64, bits32, bits16, and  
 bits8) types as basic types. They'd be bitbags with undecided sign ready  
 to be converted to their counterparts of decided sign.

 // this one is also handy:
 auto newLength = checked(a1.length - 1); // preserves type of  
 a1.length, be it int or uint, throws on overflow

 This could be rather tricky. How can overflow be checked? By inspecting  
 the status bits in the processor only; at the language/typesystem level  
 there's little to do.

It is an implementation detail. Expression can be calculated with higher  
bit precision and result compared to needed range.

 I have previously shown an implementation of unsigned/signed:
  import std.stdio;
  int signed(lazy int dg)
 {
     auto result = dg();
     asm {
        jo overflow;
     }
     return result;
      overflow:
     throw new Exception("Integer overflow occured");
 }
  int main()
 {
    int t = int.max;
    try
    {
        int s = signed(t + 1);
        writefln("Result is %d", s);
    }
    catch(Exception e)
    {
        writefln("Whoops! %s", e.toString());
    }
    return 0;
 }

 Ah, there we go! Thanks for pasting this code.

 But Andrei has correctly pointed out that it has a problem - it may  
 throw without a reason:
 int i = int.max + 1; // sets an overflow flag
 auto result = expectSigned(1); // raises an exception
  Overflow flag may also be cleared in a complex expression:
 auto result = expectUnsigned(1 + (uint.max + 1)); // first add will  
 overflow and second one clears the flag -> no exception as a result
  A possible solution is to make the compiler aware of this construct  
 and disallow passing none (case 2) or more that one operation (case 1)  
 to the method.

 Can't you clear the overflow flag prior to invoking the operation?

No need for this, it adds one more instruction for no gain, flag is  
automatically set/reset at any of add/sub/mul operations. It can only save  
you from "auto result = signed(1)" error, that's why I said it should be  
disallowed in first place.

 I'll also mention that making it a delegate reduces appeal quite a bit;  
 expressions under the check tend to be simple which makes the relative  
 overhead huge.

Such simple instructions are usually inlined, aren't they?

Nov 26 2008

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Denis Koroskin wrote:
 On Wed, 26 Nov 2008 18:24:17 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Also consider:

 auto delta = a1.length - a2.length;

 What should the type of delta be? Well, it depends. In my scheme that 
 wouldn't even compile, which I think is a good thing; you must decide 
 whether prior information makes it an unsigned or a signed integral.

 Sure, it shouldn't compile. But explicit casting to either type won't 
 help. Let's say you expect that a1.length > a2.length and thus expect 
 a strictly positive result. Putting an explicit cast will not detect 
 (but suppress) an error and give you an erroneous result silently.

 
 But "silently" and "putting a cast" don't go together. It's the cast 
 that makes the erroneous result non-silent.
 
 Besides, you don't need to cast. You can always use a function that does 
 the requisite checks. std.conv will have some of those, should any 
 change in the rules make it necessary.

I doubt that would be used in practice.

 By this I'm essentially replying Don's message in the bugs newsgroup: 
 nobody puts a gun to your head to cast.
 
 Putting an assert(a1.length > a2.length) might help, but the check 
 will be unavailable unless code is compiled with asserts enabled.

 
 Put an enforce(a1.length > a2.length) then.
 
 A better solution would be to write code as follows:

 auto delta = unsigned(a1.length - a2.length); // returns an unsigned 
 value, throws on overflow (i.e., "2 - 4")
 auto delta = signed(a1.length - a2.length); // returns result as a 
 signed value. Throws on overflow (i.e., "int.min - 1")
 auto delta = a1.length - a2.length; // won't compile

 
 Amazingly this solution was discussed with these exact names! The signed 
 and unsigned functions can be implemented as libraries, but 
 unfortunately (or fortunately I guess) that means the bits32 and bits64 
 are available to all code.
 
 One fear of mine is the reaction of throwing of hands in the air "how 
 many integral types are enough???". However, if we're to judge by the 
 addition of long long and a slew of typedefs to C99 and C++0x, the 
 answer is "plenty". I'd be interested in gaging how people feel about 
 adding two (bits64, bits32) or even four (bits64, bits32, bits16, and 
 bits8) types as basic types. They'd be bitbags with undecided sign ready 
 to be converted to their counterparts of decided sign.

Here I think we have a fundamental disagreement: what is an 'unsigned 
int'? There are two disparate ideas:

(A) You think that it is an approximation to a natural number, ie, a 
'positive int'.
(B) I think that it is a 'number with NO sign'; that is, the sign 
depends on context. It may, for example, be part of a larger number. 
Thus, I largely agree with the C behaviour -- once you have an unsigned 
in a calculation, it's up to the programmer to provide an interpretation.

Unfortunately, the two concepts are mashed together in C-family 
languages. (B) is the concept supported by the language typing rules, 
but usage of (A) is widespread in practice.

If we were going to introduce a slew of new types, I'd want them to be 
for 'positive int'/'natural int', 'positive byte', etc.

Natural int can always be implicitly converted to either int or uint, 
with perfect safety. No other conversions are possible without a cast.
Non-negative literals and manifest constants are naturals.

The rules are:
1. Anything involving unsigned is unsigned, (same as C).
2. Else if it contains an integer, it is an integer.
3. (Now we know all quantities are natural):
If it contains a subtraction, it is an integer [Probably allow 
subtraction of compile-time quantities to remain natural, if the values 
stay in range; flag an error if an overflow occurs].
4. Else it is a natural.


The reason I think literals and manifest constants are so important is 
that they are a significant fraction of the natural numbers in a program.

[Just before posting I've discovered that other people have posted some 
similar ideas].

Nov 27 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air "how 
 many integral types are enough???". However, if we're to judge by the 
 addition of long long and a slew of typedefs to C99 and C++0x, the 
 answer is "plenty". I'd be interested in gaging how people feel about 
 adding two (bits64, bits32) or even four (bits64, bits32, bits16, and 
 bits8) types as basic types. They'd be bitbags with undecided sign 
 ready to be converted to their counterparts of decided sign.

 
 Here I think we have a fundamental disagreement: what is an 'unsigned 
 int'? There are two disparate ideas:
 
 (A) You think that it is an approximation to a natural number, ie, a 
 'positive int'.
 (B) I think that it is a 'number with NO sign'; that is, the sign 
 depends on context. It may, for example, be part of a larger number. 
 Thus, I largely agree with the C behaviour -- once you have an unsigned 
 in a calculation, it's up to the programmer to provide an interpretation.
 
 Unfortunately, the two concepts are mashed together in C-family 
 languages. (B) is the concept supported by the language typing rules, 
 but usage of (A) is widespread in practice.

In fact we are in agreement. C tries to make it usable as both, and 
partially succeeds by having very lax conversions in all directions. 
This leads to the occasional puzzling behaviors. I do *want* uint to be 
an approximation of a natural number, while acknowledging that today it 
isn't much of that.

 If we were going to introduce a slew of new types, I'd want them to be 
 for 'positive int'/'natural int', 'positive byte', etc.
 
 Natural int can always be implicitly converted to either int or uint, 
 with perfect safety. No other conversions are possible without a cast.
 Non-negative literals and manifest constants are naturals.
 
 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the values 
 stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.
 
 
 The reason I think literals and manifest constants are so important is 
 that they are a significant fraction of the natural numbers in a program.
 
 [Just before posting I've discovered that other people have posted some 
 similar ideas].

That sounds encouraging. One problem is that your approach leaves the 
unsigned mess as it is, so although natural types are a nice addition, 
they don't bring a complete solution to the table.


Andrei

Nov 27 2008

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air "how 
 many integral types are enough???". However, if we're to judge by the 
 addition of long long and a slew of typedefs to C99 and C++0x, the 
 answer is "plenty". I'd be interested in gaging how people feel about 
 adding two (bits64, bits32) or even four (bits64, bits32, bits16, and 
 bits8) types as basic types. They'd be bitbags with undecided sign 
 ready to be converted to their counterparts of decided sign.

 Here I think we have a fundamental disagreement: what is an 'unsigned 
 int'? There are two disparate ideas:

 (A) You think that it is an approximation to a natural number, ie, a 
 'positive int'.
 (B) I think that it is a 'number with NO sign'; that is, the sign 
 depends on context. It may, for example, be part of a larger number. 
 Thus, I largely agree with the C behaviour -- once you have an 
 unsigned in a calculation, it's up to the programmer to provide an 
 interpretation.

 Unfortunately, the two concepts are mashed together in C-family 
 languages. (B) is the concept supported by the language typing rules, 
 but usage of (A) is widespread in practice.

 
 In fact we are in agreement. C tries to make it usable as both, and 
 partially succeeds by having very lax conversions in all directions. 
 This leads to the occasional puzzling behaviors. I do *want* uint to be 
 an approximation of a natural number, while acknowledging that today it 
 isn't much of that.
 
 If we were going to introduce a slew of new types, I'd want them to be 
 for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or uint, 
 with perfect safety. No other conversions are possible without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so important is 
 that they are a significant fraction of the natural numbers in a program.

 [Just before posting I've discovered that other people have posted 
 some similar ideas].

 
 That sounds encouraging. One problem is that your approach leaves the 
 unsigned mess as it is, so although natural types are a nice addition, 
 they don't bring a complete solution to the table.
 
 
 Andrei

Well, it does make unsigned numbers (case (B)) quite obscure and 
low-level. They could be renamed with uglier names to make this clearer.
But since in this proposal there are no implicit conversions from uint 
to anything, it's hard to do any damage with the unsigned type which 
results.
Basically, with any use of unsigned, the compiler says "I don't know if 
this thing even has a meaningful sign!".

Alternatively, we could add rule 0: mixing int and unsigned is illegal. 
But it's OK to mix natural with int, or natural with unsigned.
I don't like this as much, since it would make most usage of unsigned 
ugly; but maybe that's justified.

Nov 27 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to judge 
 by the addition of long long and a slew of typedefs to C99 and 
 C++0x, the answer is "plenty". I'd be interested in gaging how 
 people feel about adding two (bits64, bits32) or even four (bits64, 
 bits32, bits16, and bits8) types as basic types. They'd be bitbags 
 with undecided sign ready to be converted to their counterparts of 
 decided sign.

 Here I think we have a fundamental disagreement: what is an 'unsigned 
 int'? There are two disparate ideas:

 (A) You think that it is an approximation to a natural number, ie, a 
 'positive int'.
 (B) I think that it is a 'number with NO sign'; that is, the sign 
 depends on context. It may, for example, be part of a larger number. 
 Thus, I largely agree with the C behaviour -- once you have an 
 unsigned in a calculation, it's up to the programmer to provide an 
 interpretation.

 Unfortunately, the two concepts are mashed together in C-family 
 languages. (B) is the concept supported by the language typing rules, 
 but usage of (A) is widespread in practice.

 In fact we are in agreement. C tries to make it usable as both, and 
 partially succeeds by having very lax conversions in all directions. 
 This leads to the occasional puzzling behaviors. I do *want* uint to 
 be an approximation of a natural number, while acknowledging that 
 today it isn't much of that.

 If we were going to introduce a slew of new types, I'd want them to 
 be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or uint, 
 with perfect safety. No other conversions are possible without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so important 
 is that they are a significant fraction of the natural numbers in a 
 program.

 [Just before posting I've discovered that other people have posted 
 some similar ideas].

 That sounds encouraging. One problem is that your approach leaves the 
 unsigned mess as it is, so although natural types are a nice addition, 
 they don't bring a complete solution to the table.


 Andrei

 
 Well, it does make unsigned numbers (case (B)) quite obscure and 
 low-level. They could be renamed with uglier names to make this clearer.
 But since in this proposal there are no implicit conversions from uint 
 to anything, it's hard to do any damage with the unsigned type which 
 results.
 Basically, with any use of unsigned, the compiler says "I don't know if 
 this thing even has a meaningful sign!".
 
 Alternatively, we could add rule 0: mixing int and unsigned is illegal. 
 But it's OK to mix natural with int, or natural with unsigned.
 I don't like this as much, since it would make most usage of unsigned 
 ugly; but maybe that's justified.

I think we're heading towards an impasse. We wouldn't want to make 
things much harder for systems-level programs that mix arithmetic and 
bit-level operations.

I'm glad there is interest and that quite a few ideas were brought up. 
Unfortunately, it looks like all have significant disadvantages.

One compromise solution Walter and I discussed in the past is to only 
sever one of the dangerous implicit conversions: int -> uint. Other than 
that, it's much like C (everything involving one unsigned is unsigned 
and unsigned -> signed is implicit) Let's see where that takes us.

(a) There are fewer situations when a small, reasonable number 
implicitly becomes a large, weird numnber.

(b) An exception to (a) is that u1 - u2 is also uint, and that's for the 
sake of C compatibility. I'd gladly drop it if I could and leave 
operations such as u1 - u2 return a signed number. That assumes the 
least and works with small, usual values.

(c) Unlike C, arithmetic and logical operations always return the 
tightest type possible, not a 32/64 bit value. For example, byte / int 
yields byte and so on.

What do you think?


Andrei

Nov 27 2008

KennyTM~ <kennytm gmail.com> writes:

Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to judge 
 by the addition of long long and a slew of typedefs to C99 and 
 C++0x, the answer is "plenty". I'd be interested in gaging how 
 people feel about adding two (bits64, bits32) or even four (bits64, 
 bits32, bits16, and bits8) types as basic types. They'd be bitbags 
 with undecided sign ready to be converted to their counterparts of 
 decided sign.

 Here I think we have a fundamental disagreement: what is an 
 'unsigned int'? There are two disparate ideas:

 (A) You think that it is an approximation to a natural number, ie, a 
 'positive int'.
 (B) I think that it is a 'number with NO sign'; that is, the sign 
 depends on context. It may, for example, be part of a larger number. 
 Thus, I largely agree with the C behaviour -- once you have an 
 unsigned in a calculation, it's up to the programmer to provide an 
 interpretation.

 Unfortunately, the two concepts are mashed together in C-family 
 languages. (B) is the concept supported by the language typing 
 rules, but usage of (A) is widespread in practice.

 In fact we are in agreement. C tries to make it usable as both, and 
 partially succeeds by having very lax conversions in all directions. 
 This leads to the occasional puzzling behaviors. I do *want* uint to 
 be an approximation of a natural number, while acknowledging that 
 today it isn't much of that.

 If we were going to introduce a slew of new types, I'd want them to 
 be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or 
 uint, with perfect safety. No other conversions are possible without 
 a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so important 
 is that they are a significant fraction of the natural numbers in a 
 program.

 [Just before posting I've discovered that other people have posted 
 some similar ideas].

 That sounds encouraging. One problem is that your approach leaves the 
 unsigned mess as it is, so although natural types are a nice 
 addition, they don't bring a complete solution to the table.


 Andrei

 Well, it does make unsigned numbers (case (B)) quite obscure and 
 low-level. They could be renamed with uglier names to make this clearer.
 But since in this proposal there are no implicit conversions from uint 
 to anything, it's hard to do any damage with the unsigned type which 
 results.
 Basically, with any use of unsigned, the compiler says "I don't know 
 if this thing even has a meaningful sign!".

 Alternatively, we could add rule 0: mixing int and unsigned is 
 illegal. But it's OK to mix natural with int, or natural with unsigned.
 I don't like this as much, since it would make most usage of unsigned 
 ugly; but maybe that's justified.

 
 I think we're heading towards an impasse. We wouldn't want to make 
 things much harder for systems-level programs that mix arithmetic and 
 bit-level operations.
 
 I'm glad there is interest and that quite a few ideas were brought up. 
 Unfortunately, it looks like all have significant disadvantages.
 
 One compromise solution Walter and I discussed in the past is to only 
 sever one of the dangerous implicit conversions: int -> uint. Other than 
 that, it's much like C (everything involving one unsigned is unsigned 
 and unsigned -> signed is implicit) Let's see where that takes us.
 
 (a) There are fewer situations when a small, reasonable number 
 implicitly becomes a large, weird numnber.
 
 (b) An exception to (a) is that u1 - u2 is also uint, and that's for the 
 sake of C compatibility. I'd gladly drop it if I could and leave 
 operations such as u1 - u2 return a signed number. That assumes the 
 least and works with small, usual values.
 
 (c) Unlike C, arithmetic and logical operations always return the 
 tightest type possible, not a 32/64 bit value. For example, byte / int 
 yields byte and so on.
 

So you mean long * int (e.g. 1234567890123L * 2) will return an int 
instead of a long?!

The opposite sounds more natural to me.

 What do you think?
 
 
 Andrei

Nov 27 2008

KennyTM~ <kennytm gmail.com> writes:

KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to 
 judge by the addition of long long and a slew of typedefs to C99 
 and C++0x, the answer is "plenty". I'd be interested in gaging how 
 people feel about adding two (bits64, bits32) or even four 
 (bits64, bits32, bits16, and bits8) types as basic types. They'd 
 be bitbags with undecided sign ready to be converted to their 
 counterparts of decided sign.

 Here I think we have a fundamental disagreement: what is an 
 'unsigned int'? There are two disparate ideas:

 (A) You think that it is an approximation to a natural number, ie, 
 a 'positive int'.
 (B) I think that it is a 'number with NO sign'; that is, the sign 
 depends on context. It may, for example, be part of a larger 
 number. Thus, I largely agree with the C behaviour -- once you have 
 an unsigned in a calculation, it's up to the programmer to provide 
 an interpretation.

 Unfortunately, the two concepts are mashed together in C-family 
 languages. (B) is the concept supported by the language typing 
 rules, but usage of (A) is widespread in practice.

 In fact we are in agreement. C tries to make it usable as both, and 
 partially succeeds by having very lax conversions in all directions. 
 This leads to the occasional puzzling behaviors. I do *want* uint to 
 be an approximation of a natural number, while acknowledging that 
 today it isn't much of that.

 If we were going to introduce a slew of new types, I'd want them to 
 be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or 
 uint, with perfect safety. No other conversions are possible 
 without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so important 
 is that they are a significant fraction of the natural numbers in a 
 program.

 [Just before posting I've discovered that other people have posted 
 some similar ideas].

 That sounds encouraging. One problem is that your approach leaves 
 the unsigned mess as it is, so although natural types are a nice 
 addition, they don't bring a complete solution to the table.


 Andrei

 Well, it does make unsigned numbers (case (B)) quite obscure and 
 low-level. They could be renamed with uglier names to make this clearer.
 But since in this proposal there are no implicit conversions from 
 uint to anything, it's hard to do any damage with the unsigned type 
 which results.
 Basically, with any use of unsigned, the compiler says "I don't know 
 if this thing even has a meaningful sign!".

 Alternatively, we could add rule 0: mixing int and unsigned is 
 illegal. But it's OK to mix natural with int, or natural with unsigned.
 I don't like this as much, since it would make most usage of unsigned 
 ugly; but maybe that's justified.

 I think we're heading towards an impasse. We wouldn't want to make 
 things much harder for systems-level programs that mix arithmetic and 
 bit-level operations.

 I'm glad there is interest and that quite a few ideas were brought up. 
 Unfortunately, it looks like all have significant disadvantages.

 One compromise solution Walter and I discussed in the past is to only 
 sever one of the dangerous implicit conversions: int -> uint. Other 
 than that, it's much like C (everything involving one unsigned is 
 unsigned and unsigned -> signed is implicit) Let's see where that 
 takes us.

 (a) There are fewer situations when a small, reasonable number 
 implicitly becomes a large, weird numnber.

 (b) An exception to (a) is that u1 - u2 is also uint, and that's for 
 the sake of C compatibility. I'd gladly drop it if I could and leave 
 operations such as u1 - u2 return a signed number. That assumes the 
 least and works with small, usual values.

 (c) Unlike C, arithmetic and logical operations always return the 
 tightest type possible, not a 32/64 bit value. For example, byte / int 
 yields byte and so on.

 
 So you mean long * int (e.g. 1234567890123L * 2) will return an int 
 instead of a long?!
 
 The opposite sounds more natural to me.
 

Em, or do you mean the tightest type that can represent all possible 
results? (so long*int == cent?)

 What do you think?


 Andrei

Nov 27 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

KennyTM~ wrote:
 KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to 
 judge by the addition of long long and a slew of typedefs to C99 
 and C++0x, the answer is "plenty". I'd be interested in gaging 
 how people feel about adding two (bits64, bits32) or even four 
 (bits64, bits32, bits16, and bits8) types as basic types. They'd 
 be bitbags with undecided sign ready to be converted to their 
 counterparts of decided sign.

 Here I think we have a fundamental disagreement: what is an 
 'unsigned int'? There are two disparate ideas:

 (A) You think that it is an approximation to a natural number, ie, 
 a 'positive int'.
 (B) I think that it is a 'number with NO sign'; that is, the sign 
 depends on context. It may, for example, be part of a larger 
 number. Thus, I largely agree with the C behaviour -- once you 
 have an unsigned in a calculation, it's up to the programmer to 
 provide an interpretation.

 Unfortunately, the two concepts are mashed together in C-family 
 languages. (B) is the concept supported by the language typing 
 rules, but usage of (A) is widespread in practice.

 In fact we are in agreement. C tries to make it usable as both, and 
 partially succeeds by having very lax conversions in all 
 directions. This leads to the occasional puzzling behaviors. I do 
 *want* uint to be an approximation of a natural number, while 
 acknowledging that today it isn't much of that.

 If we were going to introduce a slew of new types, I'd want them 
 to be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or 
 uint, with perfect safety. No other conversions are possible 
 without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so 
 important is that they are a significant fraction of the natural 
 numbers in a program.

 [Just before posting I've discovered that other people have posted 
 some similar ideas].

 That sounds encouraging. One problem is that your approach leaves 
 the unsigned mess as it is, so although natural types are a nice 
 addition, they don't bring a complete solution to the table.


 Andrei

 Well, it does make unsigned numbers (case (B)) quite obscure and 
 low-level. They could be renamed with uglier names to make this 
 clearer.
 But since in this proposal there are no implicit conversions from 
 uint to anything, it's hard to do any damage with the unsigned type 
 which results.
 Basically, with any use of unsigned, the compiler says "I don't know 
 if this thing even has a meaningful sign!".

 Alternatively, we could add rule 0: mixing int and unsigned is 
 illegal. But it's OK to mix natural with int, or natural with unsigned.
 I don't like this as much, since it would make most usage of 
 unsigned ugly; but maybe that's justified.

 I think we're heading towards an impasse. We wouldn't want to make 
 things much harder for systems-level programs that mix arithmetic and 
 bit-level operations.

 I'm glad there is interest and that quite a few ideas were brought 
 up. Unfortunately, it looks like all have significant disadvantages.

 One compromise solution Walter and I discussed in the past is to only 
 sever one of the dangerous implicit conversions: int -> uint. Other 
 than that, it's much like C (everything involving one unsigned is 
 unsigned and unsigned -> signed is implicit) Let's see where that 
 takes us.

 (a) There are fewer situations when a small, reasonable number 
 implicitly becomes a large, weird numnber.

 (b) An exception to (a) is that u1 - u2 is also uint, and that's for 
 the sake of C compatibility. I'd gladly drop it if I could and leave 
 operations such as u1 - u2 return a signed number. That assumes the 
 least and works with small, usual values.

 (c) Unlike C, arithmetic and logical operations always return the 
 tightest type possible, not a 32/64 bit value. For example, byte / 
 int yields byte and so on.

 So you mean long * int (e.g. 1234567890123L * 2) will return an int 
 instead of a long?!

 The opposite sounds more natural to me.

 
 Em, or do you mean the tightest type that can represent all possible 
 results? (so long*int == cent?)

The tightest type possible depends on the operation. In that doctrine, 
long * int yields a long (given the demise of cent). Walters things such 
rules are too complicated, but I'm a big fan of operation-dependent 
typing. I see no good reason for requiring int * long have the same type 
as int / long. They are different operations with different semantics 
and corner cases and whatnot, so the resulting static type may as well 
be different.

By the way, under the tightest type doctrine, uint & ubyte is typed as 
ubyte. Interesting that one, huh :o).


Andrei

Nov 27 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Andrei Alexandrescu wrote:
 KennyTM~ wrote:
 KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to 
 judge by the addition of long long and a slew of typedefs to C99 
 and C++0x, the answer is "plenty". I'd be interested in gaging 
 how people feel about adding two (bits64, bits32) or even four 
 (bits64, bits32, bits16, and bits8) types as basic types. They'd 
 be bitbags with undecided sign ready to be converted to their 
 counterparts of decided sign.

 Here I think we have a fundamental disagreement: what is an 
 'unsigned int'? There are two disparate ideas:

 (A) You think that it is an approximation to a natural number, 
 ie, a 'positive int'.
 (B) I think that it is a 'number with NO sign'; that is, the sign 
 depends on context. It may, for example, be part of a larger 
 number. Thus, I largely agree with the C behaviour -- once you 
 have an unsigned in a calculation, it's up to the programmer to 
 provide an interpretation.

 Unfortunately, the two concepts are mashed together in C-family 
 languages. (B) is the concept supported by the language typing 
 rules, but usage of (A) is widespread in practice.

 In fact we are in agreement. C tries to make it usable as both, 
 and partially succeeds by having very lax conversions in all 
 directions. This leads to the occasional puzzling behaviors. I do 
 *want* uint to be an approximation of a natural number, while 
 acknowledging that today it isn't much of that.

 If we were going to introduce a slew of new types, I'd want them 
 to be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or 
 uint, with perfect safety. No other conversions are possible 
 without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so 
 important is that they are a significant fraction of the natural 
 numbers in a program.

 [Just before posting I've discovered that other people have 
 posted some similar ideas].

 That sounds encouraging. One problem is that your approach leaves 
 the unsigned mess as it is, so although natural types are a nice 
 addition, they don't bring a complete solution to the table.


 Andrei

 Well, it does make unsigned numbers (case (B)) quite obscure and 
 low-level. They could be renamed with uglier names to make this 
 clearer.
 But since in this proposal there are no implicit conversions from 
 uint to anything, it's hard to do any damage with the unsigned type 
 which results.
 Basically, with any use of unsigned, the compiler says "I don't 
 know if this thing even has a meaningful sign!".

 Alternatively, we could add rule 0: mixing int and unsigned is 
 illegal. But it's OK to mix natural with int, or natural with 
 unsigned.
 I don't like this as much, since it would make most usage of 
 unsigned ugly; but maybe that's justified.

 I think we're heading towards an impasse. We wouldn't want to make 
 things much harder for systems-level programs that mix arithmetic 
 and bit-level operations.

 I'm glad there is interest and that quite a few ideas were brought 
 up. Unfortunately, it looks like all have significant disadvantages.

 One compromise solution Walter and I discussed in the past is to 
 only sever one of the dangerous implicit conversions: int -> uint. 
 Other than that, it's much like C (everything involving one unsigned 
 is unsigned and unsigned -> signed is implicit) Let's see where that 
 takes us.

 (a) There are fewer situations when a small, reasonable number 
 implicitly becomes a large, weird numnber.

 (b) An exception to (a) is that u1 - u2 is also uint, and that's for 
 the sake of C compatibility. I'd gladly drop it if I could and leave 
 operations such as u1 - u2 return a signed number. That assumes the 
 least and works with small, usual values.

 (c) Unlike C, arithmetic and logical operations always return the 
 tightest type possible, not a 32/64 bit value. For example, byte / 
 int yields byte and so on.

 So you mean long * int (e.g. 1234567890123L * 2) will return an int 
 instead of a long?!

 The opposite sounds more natural to me.

 Em, or do you mean the tightest type that can represent all possible 
 results? (so long*int == cent?)

 
 The tightest type possible depends on the operation. In that doctrine, 
 long * int yields a long (given the demise of cent). Walters things such 
 rules are too complicated, but I'm a big fan of operation-dependent 
 typing. I see no good reason for requiring int * long have the same type 
 as int / long. They are different operations with different semantics 
 and corner cases and whatnot, so the resulting static type may as well 
 be different.
 
 By the way, under the tightest type doctrine, uint & ubyte is typed as 
 ubyte. Interesting that one, huh :o).
 
 
 Andrei

I just remembered a problem with simplemindedly going with the tightest 
type. Consider:

uint a = ...;
ubyte b = ...;
auto c = a & b;
c <<= 16;
...

The programmer may reasonably expect that the bitwise operation yields 
an unsigned integer because it involved one. However, the zealous 
compiler cleverly notices the operation really never yields something 
larger than a ubyte, and therefore returns that "tightest" type, thus 
making c a ubyte. Subsequent uses of c will be surprising to the 
programmer who thought c has 32 bits.

It looks like polysemy is the only solution here: return a polysemous 
value with principal type uint and possible type ubyte. That way, c will 
be typed as uint. But at the same time, continuing the example:

ubyte d = a & b;

will go through without a cast. That's pretty cool!

One question I had is: say polysemy will be at work for integral 
arithmetic. Should we provide means in the language for user-defined 
polysemous functions? Or is it ok to leave it as compiler magic that 
saves redundant casts?


Andrei

Nov 27 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-27 22:34:50 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 One question I had is: say polysemy will be at work for integral 
 arithmetic. Should we provide means in the language for user-defined 
 polysemous functions? Or is it ok to leave it as compiler magic that 
 saves redundant casts?

I think that'd be a must. Otherwise how would you define your own 
arithmetical types so they work like the built-in ones?

	struct ArbitraryPrecisionInt { ... }

	ArbitraryPrecisionInt a = ...;
	uint b = ...;
	auto c = a & b;
	c <<= 16;
	...

Should't c be of type ArbitraryPresisionInt? And shouldn't the 
following work too?

	uint d = a & b;

That said, how can a function return a polysemous value at all? Should 
the function return a special kind of struct with a sample of every 
supported type? That'd be utterly inefficient. Should it return a 
custom-made struct with the ability to implicitly cast itself to other 
types? That would make the polysemous value propagatable through auto, 
and probably less efficient too.

The only way I can see this work correctly is with function overloading 
on return type, with a way to specify the default function (for when 
the return type is not specified, such as with auto). In the case 
above, you'd need something like this:

	struct ArbitraryPrecisionInt {
		default ArbitraryPrecisionInt opAnd(uint i);
		uint opAnd(uint i);
	}


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 28 2008

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 KennyTM~ wrote:
 KennyTM~ wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 One fear of mine is the reaction of throwing of hands in the air 
 "how many integral types are enough???". However, if we're to 
 judge by the addition of long long and a slew of typedefs to C99 
 and C++0x, the answer is "plenty". I'd be interested in gaging 
 how people feel about adding two (bits64, bits32) or even four 
 (bits64, bits32, bits16, and bits8) types as basic types. They'd 
 be bitbags with undecided sign ready to be converted to their 
 counterparts of decided sign.

 Here I think we have a fundamental disagreement: what is an 
 'unsigned int'? There are two disparate ideas:

 (A) You think that it is an approximation to a natural number, 
 ie, a 'positive int'.
 (B) I think that it is a 'number with NO sign'; that is, the sign 
 depends on context. It may, for example, be part of a larger 
 number. Thus, I largely agree with the C behaviour -- once you 
 have an unsigned in a calculation, it's up to the programmer to 
 provide an interpretation.

 Unfortunately, the two concepts are mashed together in C-family 
 languages. (B) is the concept supported by the language typing 
 rules, but usage of (A) is widespread in practice.

 In fact we are in agreement. C tries to make it usable as both, 
 and partially succeeds by having very lax conversions in all 
 directions. This leads to the occasional puzzling behaviors. I do 
 *want* uint to be an approximation of a natural number, while 
 acknowledging that today it isn't much of that.

 If we were going to introduce a slew of new types, I'd want them 
 to be for 'positive int'/'natural int', 'positive byte', etc.

 Natural int can always be implicitly converted to either int or 
 uint, with perfect safety. No other conversions are possible 
 without a cast.
 Non-negative literals and manifest constants are naturals.

 The rules are:
 1. Anything involving unsigned is unsigned, (same as C).
 2. Else if it contains an integer, it is an integer.
 3. (Now we know all quantities are natural):
 If it contains a subtraction, it is an integer [Probably allow 
 subtraction of compile-time quantities to remain natural, if the 
 values stay in range; flag an error if an overflow occurs].
 4. Else it is a natural.


 The reason I think literals and manifest constants are so 
 important is that they are a significant fraction of the natural 
 numbers in a program.

 [Just before posting I've discovered that other people have 
 posted some similar ideas].

 That sounds encouraging. One problem is that your approach leaves 
 the unsigned mess as it is, so although natural types are a nice 
 addition, they don't bring a complete solution to the table.


 Andrei

 Well, it does make unsigned numbers (case (B)) quite obscure and 
 low-level. They could be renamed with uglier names to make this 
 clearer.
 But since in this proposal there are no implicit conversions from 
 uint to anything, it's hard to do any damage with the unsigned type 
 which results.
 Basically, with any use of unsigned, the compiler says "I don't 
 know if this thing even has a meaningful sign!".

 Alternatively, we could add rule 0: mixing int and unsigned is 
 illegal. But it's OK to mix natural with int, or natural with 
 unsigned.
 I don't like this as much, since it would make most usage of 
 unsigned ugly; but maybe that's justified.

 I think we're heading towards an impasse. We wouldn't want to make 
 things much harder for systems-level programs that mix arithmetic 
 and bit-level operations.

 I'm glad there is interest and that quite a few ideas were brought 
 up. Unfortunately, it looks like all have significant disadvantages.

 One compromise solution Walter and I discussed in the past is to 
 only sever one of the dangerous implicit conversions: int -> uint. 
 Other than that, it's much like C (everything involving one unsigned 
 is unsigned and unsigned -> signed is implicit) Let's see where that 
 takes us.

 (a) There are fewer situations when a small, reasonable number 
 implicitly becomes a large, weird numnber.

 (b) An exception to (a) is that u1 - u2 is also uint, and that's for 
 the sake of C compatibility. I'd gladly drop it if I could and leave 
 operations such as u1 - u2 return a signed number. That assumes the 
 least and works with small, usual values.




The problem with that, is that you're then forcing the 'unsigned is a 
natural' interpretation when it may be erroneous.

uint.max - 10 is a uint.

It's an interesting case, because int = u1 - u2 is definitely incorrect 
when u1 > int.max.

uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned 
as a positive number_.
But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always 
correct, since that's what's happening mathematically.

I'm strongly of the opinion that you shouldn't be able to generate an 
unsigned accidentally -- you should need to either declare a type as 
uint, or use the 'u' suffix on a literal.
Right now, properties like 'length' being uint means you get too many 
surprising uints, especially when using 'auto'.

I take your point about not wanting to give up the full 32 bits of 
address space. The problem is, that if you have an object x which is 
2GB, and a small object y, then  x.length - y.length will erroneously 

be negative. If we want code (especially in libraries) to cope with such 
large objects, we need to ensure that any time there's a subtraction 
involving a length, the first is larger than the second. I think that 
would preclude the combination:

length is uint
byte[].length can exceed 2GB, and code is correct when it does
uint - uint is an int (or even, can implicitly convert to int)

As far as I can tell, at least one of these has to go.

Nov 28 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

(I lost track of quotes, so I yanked them all beyond Don's message.)

Don wrote:
 The problem with that, is that you're then forcing the 'unsigned is a 
 natural' interpretation when it may be erroneous.
 
 uint.max - 10 is a uint.
 
 It's an interesting case, because int = u1 - u2 is definitely incorrect 
 when u1 > int.max.
 
 uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned 
 as a positive number_.
 But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always 
 correct, since that's what's happening mathematically.

Sounds good. One important consideration is that modulo arithmetic is 
considerably easier to understand when two's complement and signs are 
not involved.

 I'm strongly of the opinion that you shouldn't be able to generate an 
 unsigned accidentally -- you should need to either declare a type as 
 uint, or use the 'u' suffix on a literal.
 Right now, properties like 'length' being uint means you get too many 
 surprising uints, especially when using 'auto'.

I am not surprised by length being unsigned. I'm also not surprised by 
hexadecimal constants being unsigned. (They are unsigned in C. Walter 
made them signed or not, depending on their value.)

 I take your point about not wanting to give up the full 32 bits of 
 address space. The problem is, that if you have an object x which is 
  >2GB, and a small object y, then  x.length - y.length will erroneously 
 be negative. If we want code (especially in libraries) to cope with such 
 large objects, we need to ensure that any time there's a subtraction 
 involving a length, the first is larger than the second. I think that 
 would preclude the combination:
 
 length is uint
 byte[].length can exceed 2GB, and code is correct when it does
 uint - uint is an int (or even, can implicitly convert to int)
 
 As far as I can tell, at least one of these has to go.

Well none has to go in the latest design:

(a) One unsigned makes everything unsigned

(b) unsigned -> signed is allowed

(c) signed -> unsigned is disallowed

Of course the latest design has imperfections, but precludes neither of 
the three things you mention.


Andrei

Nov 28 2008

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 (I lost track of quotes, so I yanked them all beyond Don's message.)
 
 Don wrote:
 The problem with that, is that you're then forcing the 'unsigned is a 
 natural' interpretation when it may be erroneous.

 uint.max - 10 is a uint.

 It's an interesting case, because int = u1 - u2 is definitely 
 incorrect when u1 > int.max.

 uint = u1 - u2 may be incorrect when u1 < u2, _if you think of 
 unsigned as a positive number_.
 But, if you think of it as a natural modulo 2^32, uint = u1-u2 is 
 always correct, since that's what's happening mathematically.

 
 Sounds good. One important consideration is that modulo arithmetic is 
 considerably easier to understand when two's complement and signs are 
 not involved.
 
 I'm strongly of the opinion that you shouldn't be able to generate an 
 unsigned accidentally -- you should need to either declare a type as 
 uint, or use the 'u' suffix on a literal.
 Right now, properties like 'length' being uint means you get too many 
 surprising uints, especially when using 'auto'.

 
 I am not surprised by length being unsigned. I'm also not surprised by 
 hexadecimal constants being unsigned. (They are unsigned in C. Walter 
 made them signed or not, depending on their value.)
 
 I take your point about not wanting to give up the full 32 bits of 
 address space. The problem is, that if you have an object x which is 
  >2GB, and a small object y, then  x.length - y.length will 
 erroneously be negative. If we want code (especially in libraries) to 
 cope with such large objects, we need to ensure that any time there's 
 a subtraction involving a length, the first is larger than the second. 
 I think that would preclude the combination:

 length is uint
 byte[].length can exceed 2GB, and code is correct when it does
 uint - uint is an int (or even, can implicitly convert to int)

 As far as I can tell, at least one of these has to go.

 
 Well none has to go in the latest design:
 
 (a) One unsigned makes everything unsigned
 
 (b) unsigned -> signed is allowed
 
 (c) signed -> unsigned is disallowed
 
 Of course the latest design has imperfections, but precludes neither of 
 the three things you mention.

It's close, but how can code such as:

if (x.length - y.length < 100) ...

be correct in the presence of length > 2GB?

since
(a) x.length  = uint.max, y.length = 1
(b) x.length = 4, y.length = 2
both produce the same binary result (0xFFFF_FFFE = -2)

Any subtraction of two lengths has a possible range of
  -int.max .. uint.max
which is quite problematic (and the root cause of the problems, I guess).
And unfortunately I think code is riddled with subtraction of lengths.

Nov 28 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 (I lost track of quotes, so I yanked them all beyond Don's message.)

 Don wrote:
 The problem with that, is that you're then forcing the 'unsigned is a 
 natural' interpretation when it may be erroneous.

 uint.max - 10 is a uint.

 It's an interesting case, because int = u1 - u2 is definitely 
 incorrect when u1 > int.max.

 uint = u1 - u2 may be incorrect when u1 < u2, _if you think of 
 unsigned as a positive number_.
 But, if you think of it as a natural modulo 2^32, uint = u1-u2 is 
 always correct, since that's what's happening mathematically.

 Sounds good. One important consideration is that modulo arithmetic is 
 considerably easier to understand when two's complement and signs are 
 not involved.

 I'm strongly of the opinion that you shouldn't be able to generate an 
 unsigned accidentally -- you should need to either declare a type as 
 uint, or use the 'u' suffix on a literal.
 Right now, properties like 'length' being uint means you get too many 
 surprising uints, especially when using 'auto'.

 I am not surprised by length being unsigned. I'm also not surprised by 
 hexadecimal constants being unsigned. (They are unsigned in C. Walter 
 made them signed or not, depending on their value.)

 I take your point about not wanting to give up the full 32 bits of 
 address space. The problem is, that if you have an object x which is 
  >2GB, and a small object y, then  x.length - y.length will 
 erroneously be negative. If we want code (especially in libraries) to 
 cope with such large objects, we need to ensure that any time there's 
 a subtraction involving a length, the first is larger than the 
 second. I think that would preclude the combination:

 length is uint
 byte[].length can exceed 2GB, and code is correct when it does
 uint - uint is an int (or even, can implicitly convert to int)

 As far as I can tell, at least one of these has to go.

 Well none has to go in the latest design:

 (a) One unsigned makes everything unsigned

 (b) unsigned -> signed is allowed

 (c) signed -> unsigned is disallowed

 Of course the latest design has imperfections, but precludes neither 
 of the three things you mention.

 
 It's close, but how can code such as:
 
 if (x.length - y.length < 100) ...
 
 be correct in the presence of length > 2GB?
 
 since
 (a) x.length  = uint.max, y.length = 1
 (b) x.length = 4, y.length = 2
 both produce the same binary result (0xFFFF_FFFE = -2)

(You mean x.length = 2, y.length = 4 in the second case.)

 Any subtraction of two lengths has a possible range of
  -int.max .. uint.max
 which is quite problematic (and the root cause of the problems, I guess).
 And unfortunately I think code is riddled with subtraction of lengths.

Code may be riddled with subtraction of lengths, but seems to be working 
with today's rule that the result of that subtraction is unsigned. So 
definitely we're not introducing new problems.

I agree the solution has problems. Following this thread that in turn 
follows my sleepless nights poring over the subject, I'm glad to reach a 
design that is better than what we currently have. I think that 
disallowing the signed -> unsigned conversions will be a net improvement.


Andrei

Nov 28 2008

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 (I lost track of quotes, so I yanked them all beyond Don's message.)

 Don wrote:
 The problem with that, is that you're then forcing the 'unsigned is 
 a natural' interpretation when it may be erroneous.

 uint.max - 10 is a uint.

 It's an interesting case, because int = u1 - u2 is definitely 
 incorrect when u1 > int.max.

 uint = u1 - u2 may be incorrect when u1 < u2, _if you think of 
 unsigned as a positive number_.
 But, if you think of it as a natural modulo 2^32, uint = u1-u2 is 
 always correct, since that's what's happening mathematically.

 Sounds good. One important consideration is that modulo arithmetic is 
 considerably easier to understand when two's complement and signs are 
 not involved.

 I'm strongly of the opinion that you shouldn't be able to generate 
 an unsigned accidentally -- you should need to either declare a type 
 as uint, or use the 'u' suffix on a literal.
 Right now, properties like 'length' being uint means you get too 
 many surprising uints, especially when using 'auto'.

 I am not surprised by length being unsigned. I'm also not surprised 
 by hexadecimal constants being unsigned. (They are unsigned in C. 
 Walter made them signed or not, depending on their value.)

 I take your point about not wanting to give up the full 32 bits of 
 address space. The problem is, that if you have an object x which is 
  >2GB, and a small object y, then  x.length - y.length will 
 erroneously be negative. If we want code (especially in libraries) 
 to cope with such large objects, we need to ensure that any time 
 there's a subtraction involving a length, the first is larger than 
 the second. I think that would preclude the combination:

 length is uint
 byte[].length can exceed 2GB, and code is correct when it does
 uint - uint is an int (or even, can implicitly convert to int)

 As far as I can tell, at least one of these has to go.

 Well none has to go in the latest design:

 (a) One unsigned makes everything unsigned

 (b) unsigned -> signed is allowed

 (c) signed -> unsigned is disallowed

 Of course the latest design has imperfections, but precludes neither 
 of the three things you mention.

 It's close, but how can code such as:

 if (x.length - y.length < 100) ...

 be correct in the presence of length > 2GB?

 since
 (a) x.length  = uint.max, y.length = 1
 (b) x.length = 4, y.length = 2
 both produce the same binary result (0xFFFF_FFFE = -2)

 
 (You mean x.length = 2, y.length = 4 in the second case.)

Yes.

 
 Any subtraction of two lengths has a possible range of
  -int.max .. uint.max
 which is quite problematic (and the root cause of the problems, I guess).
 And unfortunately I think code is riddled with subtraction of lengths.

 
 Code may be riddled with subtraction of lengths, but seems to be working 
 with today's rule that the result of that subtraction is unsigned. So 
 definitely we're not introducing new problems.

Yes. I think much existing code would fail with sizes over 2GB, though. 
But it's not any worse.

 
 I agree the solution has problems. Following this thread that in turn 
 follows my sleepless nights poring over the subject, I'm glad to reach a 
 design that is better than what we currently have. I think that 
 disallowing the signed -> unsigned conversions will be a net improvement.

I agree. And dealing with compile-time constants will improve things 
even more.

Nov 28 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-11-28 17:44:39 +0100, Don <nospam nospam.com> said:

 Andrei Alexandrescu wrote:
 Don wrote:
 Andrei Alexandrescu wrote:
 (I lost track of quotes, so I yanked them all beyond Don's message.)
 
 Don wrote:
 The problem with that, is that you're then forcing the 'unsigned is a 
 natural' interpretation when it may be erroneous.
 
 uint.max - 10 is a uint.
 
 It's an interesting case, because int = u1 - u2 is definitely incorrect 
 when u1 > int.max.
 
 uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned 
 as a positive number_.
 But, if you think of it as a natural modulo 2^32, uint = u1-u2 is 
 always correct, since that's what's happening mathematically.





[...]
 

 Any subtraction of two lengths has a possible range of
  -int.max .. uint.max
 which is quite problematic (and the root cause of the problems, I guess).
 And unfortunately I think code is riddled with subtraction of lengths.

 
 Code may be riddled with subtraction of lengths, but seems to be 
 working with today's rule that the result of that subtraction is 
 unsigned. So definitely we're not introducing new problems.

 
 Yes. I think much existing code would fail with sizes over 2GB, though. 
 But it's not any worse.

I found a couple of instances where to compare addresses simply a-b was 
done, instead of something like ((a<b)?-1:((a==b)?0:1)), so yes this is 
a pitfall that happens.

Note that normally the subtraction of lengths is ok (because normally 
one is interested in the result and a>b), it is when it is used as 
quick way to introduce ordering (i.e. as comparison) that it becomes 
problematic.

By the way the solution of going beyond 2GB is clearly using size_t, as 
I think is done (at least in tango).

Fawzi

Dec 01 2008

Derek Parnell <derek psych.ward> writes:

On Fri, 28 Nov 2008 17:09:25 +0100, Don wrote:

 
 It's close, but how can code such as:
 
 if (x.length - y.length < 100) ...
 
 be correct in the presence of length > 2GB?

It could be transformed by the compiler into more something like ...

  if ((x.length <= y.length) || ((x.length - y.length) < 100)) ...


-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Nov 28 2008

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Derek Parnell wrote:
 On Fri, 28 Nov 2008 17:09:25 +0100, Don wrote:
 
 It's close, but how can code such as:

 if (x.length - y.length < 100) ...

 be correct in the presence of length > 2GB?

 
 It could be transformed by the compiler into more something like ...
 
   if ((x.length <= y.length) || ((x.length - y.length) < 100)) ...

Then it'd have different behavior from
----
   auto diff = x.length - y.length;
   if (diff < 100) ...
----

This seems like a *bad* thing...

Nov 28 2008

Derek Parnell <derek psych.ward> writes:

On Sat, 29 Nov 2008 01:17:27 +0100, Frits van Bommel wrote:


 Then it'd have different behavior from
 ----
    auto diff = x.length - y.length;
    if (diff < 100) ...
 ----
 
 This seems like a *bad* thing...

I see the problem a little differently. To me, "x.length - y.length" is
ambiguous and thus meaningless. The ambiguity is are you after the
difference between two values or are you after the value required to add to
x.length to get to y.length? These are not necessarily the same thing.

The difference is always positive, as in the difference between the length
of X and length of Y is 4. The answer tells us the difference between two
lengths but not of course which is the smaller. So it all depends on what
you are trying to find out. And note that the difference is not a length
because it is not associated with any specific array. 

So having looked at it like this, I'm now inclined to consider that the
'diff' being declared here should be a signed type and, if possible, have
more bits than '.length'.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Nov 28 2008

Sean Kelly <sean invisibleduck.org> writes:

Don wrote:
 
 length is uint
 byte[].length can exceed 2GB, and code is correct when it does
 uint - uint is an int (or even, can implicitly convert to int)
 
 As far as I can tell, at least one of these has to go.

This is why I never understood ptrdiff_t in C.  Having to choose between 
a signed value and narrower range vs. unsigned and sufficient range just 
stinks.


Sean

Nov 28 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 At the moment, we're in limbo regarding the decision to go forward with
 this. Walter, as many good long-time C programmers, knows the abusive
 unsigned rule so well he's not hurt by it and consequently has little
 incentive to see it as a problem. I have had to teach C and C++ to young
 students coming from Java introductory courses and have a more
 up-to-date perspective on the dangers.

I'll address your actual suggestion separately, but personally, I always build
C/C++ code at the max warning level, and treat warnings as errors.  This
typically catches all signed-unsigned interactions and requires me to add
a cast for the build to succeed.  The advantage of this is that if I see a cast
in my code then I know that the statement is deliberate rather than accidental.
I would wholeheartedly support such an approach in D as well, though I can
see how this may not be terribly appealing to some experienced C/C++
programmers.


Sean

Nov 25 2008

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.
 
 A classic problem with C and C++ integer arithmetic is that any 
 operation involving at least an unsigned integral receives automatically 
 an unsigned type, regardless of how silly that actually is, 
 semantically. About the only advantage of this rule is that it's simple. 
 IMHO it only has disadvantages from then on.
 
 The following operations suffer from the "abusive unsigned syndrome" (u 
 is an unsigned integral, i is a signed integral):
 
 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u

I think that most of these problems are caused by C enforcing a foolish 
consitency between literals and variables.
The idea that literals like '0' and '1' are of type int is absurd, and 
has caused a torrent of problems. '0' is just '0'.

uint a = 1;
does NOT contain an 'implicit conversion from int to uint', any more 
than there are implicit conversions from naturals to integers in 
mathematics. So I really like the polysemous types idea.

For example, when is it reasonable to use -u?
It's useful with literals like
uint a = -1u; which is equivalent to uint a = 0xFFFF_FFFF.
Anywhere else, it's probably a bug.

My suspicion is, that if you allowed all signed-unsigned operations when 
at least one was a literal, and made everything else illegal, you'd fix 
most of the problems. In particular, there'd be a big reduction in 
people abusing 'uint' as a primitive range-limited int.

Although it would be nice to have a type which was range-limited, 'uint' 
doesn't do it. Instead, it guarantees the number is between 0 and 
int.max*2+1 inclusive. Allowing mixed operations encourages programmers 
to focus the benefit of 'the lower bound is zero!' while forgetting that 
there is an enormous downside ('I'm saying that this could be larger 
than int.max!')

Interestingly, none of these problems exist in assembly language 
programming, where every arithmetic instruction affects the overflow 
flag (for signed operations) as well as the carry flag (for unsigned).

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should 
 have the same semantics.

 A classic problem with C and C++ integer arithmetic is that any 
 operation involving at least an unsigned integral receives 
 automatically an unsigned type, regardless of how silly that actually 
 is, semantically. About the only advantage of this rule is that it's 
 simple. IMHO it only has disadvantages from then on.

 The following operations suffer from the "abusive unsigned syndrome" 
 (u is an unsigned integral, i is a signed integral):

 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u

 
 I think that most of these problems are caused by C enforcing a foolish 
 consitency between literals and variables.
 The idea that literals like '0' and '1' are of type int is absurd, and 
 has caused a torrent of problems. '0' is just '0'.
 
 uint a = 1;
 does NOT contain an 'implicit conversion from int to uint', any more 
 than there are implicit conversions from naturals to integers in 
 mathematics. So I really like the polysemous types idea.

Yah, polysemy will take care of the constants. It's also rather easy to 
implement for them.

 For example, when is it reasonable to use -u?
 It's useful with literals like
 uint a = -1u; which is equivalent to uint a = 0xFFFF_FFFF.
 Anywhere else, it's probably a bug.

Maybe not even for constants as all uses of -u can be easily converted 
in ~u + 1. I'd gladly agree to disallow -u entirely.

 My suspicion is, that if you allowed all signed-unsigned operations when 
 at least one was a literal, and made everything else illegal, you'd fix 
 most of the problems. In particular, there'd be a big reduction in 
 people abusing 'uint' as a primitive range-limited int.

Well, part of my attempt is to transform that abuse into legit use. In 
other words, I do want to allow people to consider uint a reasonable 
model of natural numbers. It can't be perfect, but I believe we can make 
it reasonable.

Notice that the fact that one operand is a literal does not solve all of 
the problems I mentioned. There is for example no progress in typing u1 
- u2 appropriately.

 Although it would be nice to have a type which was range-limited, 'uint' 
 doesn't do it. Instead, it guarantees the number is between 0 and 
 int.max*2+1 inclusive. Allowing mixed operations encourages programmers 
 to focus the benefit of 'the lower bound is zero!' while forgetting that 
 there is an enormous downside ('I'm saying that this could be larger 
 than int.max!')

I'm not sure I understand this part. To me, the larger problem is 
underflow, e.g. when subtracting two small uints results in a large uint.

 Interestingly, none of these problems exist in assembly language 
 programming, where every arithmetic instruction affects the overflow 
 flag (for signed operations) as well as the carry flag (for unsigned).

They do exist. You need to use imul/idiv vs. mul/div depending on what 
signedness your operators have.


Andrei

Nov 26 2008

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 
 Notice that the fact that one operand is a literal does not solve all of 
 the problems I mentioned. There is for example no progress in typing u1 
 - u2 appropriately.

What /is/ the appropriate type here?  For example:

     uint a = uint.max;
     uint b = 0;
     uint c = uint.max - 1;

     int  x = a - b; // wrong, should be uint
     uint y = c - a; // wrong, should be int

I don't see any way to reliably produce a "safe" result at the language 
level.


Sean

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Notice that the fact that one operand is a literal does not solve all 
 of the problems I mentioned. There is for example no progress in 
 typing u1 - u2 appropriately.

 
 What /is/ the appropriate type here?  For example:
 
     uint a = uint.max;
     uint b = 0;
     uint c = uint.max - 1;
 
     int  x = a - b; // wrong, should be uint
     uint y = c - a; // wrong, should be int
 
 I don't see any way to reliably produce a "safe" result at the language 
 level.

There are several schools of thought (for the lack of a better phrase):

1. The Purist Mathematician: We want unsigned to approximate natural 
numbers, natural numbers aren't closed for subtraction, therefore u1 - 
u2 should be disallowed.

2. The Practical Mathematician: we want unsigned to approximate natural 
numbers and natural numbers aren't closed for subtraction but closed for 
a subset satisfying u1 >= u2. We can rely on the programmer to check the 
condition before, and fall back on modulo difference when the condition 
isn't satisfied. They'll understand.

3. The C Veteran: Everything should be allowed. And when unsigned is 
within a mile, the type is unsigned. I'll take care of the rest.

4. The Assembly Programmer: Use whatever type you want. The assembly 
language operation for subtraction is the same.

5. The Dynamic Language Fan: Allow whatever and check it dynamically.

6. The Static Typing Nut: Use some scheme to magically weed out 73.56% 
mistakes and disallow only 14.95% valid uses.

Your example is in fact perfect. It shows how the result of a 
subtraction has ultimately its fate decided by case-by-case use, not 
picked properly by a rule. The example perfectly underlines the 
advantage of my scheme: the decision of how to type u1 - u2 is left to 
the only entity able to account: the user of the operation. Of course 
there remains the question, should all that be implicit or should the 
user employ more syntax to specify what they want? I don't know.


Andrei

Nov 26 2008

Lars Kyllingstad <public kyllingen.NOSPAMnet> writes:

Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Notice that the fact that one operand is a literal does not solve all 
 of the problems I mentioned. There is for example no progress in 
 typing u1 - u2 appropriately.

 What /is/ the appropriate type here?  For example:

     uint a = uint.max;
     uint b = 0;
     uint c = uint.max - 1;

     int  x = a - b; // wrong, should be uint
     uint y = c - a; // wrong, should be int

 I don't see any way to reliably produce a "safe" result at the 
 language level.

 
 There are several schools of thought (for the lack of a better phrase):
 
 1. The Purist Mathematician: We want unsigned to approximate natural 
 numbers, natural numbers aren't closed for subtraction, therefore u1 - 
 u2 should be disallowed.
 
 2. The Practical Mathematician: we want unsigned to approximate natural 
 numbers and natural numbers aren't closed for subtraction but closed for 
 a subset satisfying u1 >= u2. We can rely on the programmer to check the 
 condition before, and fall back on modulo difference when the condition 
 isn't satisfied. They'll understand.

How about 1.5, the Somewhat Practical but Still Purist Mathematician? He 
(that would be me) would like integral types called nint and nlong (the 
"n" standing for "natural"), which can hold numbers in the range (0, 
int.max) and (0, long.max), respectively. Such types would have to be 
stored as int/long, but the sign bit should be ignored/zero in all 
calculations. Hence any nint/nlong would be implicitly castable to 
int/long. Is this a possibility?

As you say, natural numbers aren't closed under subtraction, so 
subtractions involving nint/nlong would have to yield an int/long 
result. In fact, if n1 and n2 are nints, one would be certain that n1-n2 
never goes out of the range of an int.

Thing is, whenever I use one of the unsigned types, it is because I need 
to make sure I'm working with nonnegative numbers, not because I need to 
work outside the ranges of the signed integral types. Other people 
obviously have other needs, though, so I'm not saying "let's toss uint 
and ulong out the window".

-Lars

Nov 26 2008

Lars Kyllingstad <public kyllingen.NOSPAMnet> writes:

Lars Kyllingstad wrote:
 Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 Notice that the fact that one operand is a literal does not solve 
 all of the problems I mentioned. There is for example no progress in 
 typing u1 - u2 appropriately.

 What /is/ the appropriate type here?  For example:

     uint a = uint.max;
     uint b = 0;
     uint c = uint.max - 1;

     int  x = a - b; // wrong, should be uint
     uint y = c - a; // wrong, should be int

 I don't see any way to reliably produce a "safe" result at the 
 language level.

 There are several schools of thought (for the lack of a better phrase):

 1. The Purist Mathematician: We want unsigned to approximate natural 
 numbers, natural numbers aren't closed for subtraction, therefore u1 - 
 u2 should be disallowed.

 2. The Practical Mathematician: we want unsigned to approximate 
 natural numbers and natural numbers aren't closed for subtraction but 
 closed for a subset satisfying u1 >= u2. We can rely on the programmer 
 to check the condition before, and fall back on modulo difference when 
 the condition isn't satisfied. They'll understand.

 
 How about 1.5, the Somewhat Practical but Still Purist Mathematician? He 
 (that would be me) would like integral types called nint and nlong (the 
 "n" standing for "natural"), which can hold numbers in the range (0, 
 int.max) and (0, long.max), respectively. Such types would have to be 
 stored as int/long, but the sign bit should be ignored/zero in all 
 calculations. Hence any nint/nlong would be implicitly castable to 
 int/long. Is this a possibility?
 
 As you say, natural numbers aren't closed under subtraction, so 
 subtractions involving nint/nlong would have to yield an int/long 
 result. In fact, if n1 and n2 are nints, one would be certain that n1-n2 
 never goes out of the range of an int.
 
 Thing is, whenever I use one of the unsigned types, it is because I need 
 to make sure I'm working with nonnegative numbers, not because I need to 
 work outside the ranges of the signed integral types. Other people 
 obviously have other needs, though, so I'm not saying "let's toss uint 
 and ulong out the window".
 
 -Lars

Another point: nint would also be implicitly castable to uint and so on, 
so making these types the standard choice of unsigned integers in Phobos 
shouldn't cause too much breakage.

-Lars

Nov 26 2008

Kagamin <spam here.lot> writes:

Andrei Alexandrescu Wrote:

 There are several schools of thought (for the lack of a better phrase):
 
 1. The Purist Mathematician: We want unsigned to approximate natural 
 numbers, natural numbers aren't closed for subtraction, therefore u1 - 
 u2 should be disallowed.

I thought, mathematics doesn't distinguish between, say, natural 5, integral 5
and real 5. N, Z and R are sets, not types of numbers. There is even notion of
equivalence class to deem numbers with different representation as the same
(not just equal).

Nov 27 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Kagamin wrote:
 Andrei Alexandrescu Wrote:
 
 There are several schools of thought (for the lack of a better
 phrase):
 
 1. The Purist Mathematician: We want unsigned to approximate
 natural numbers, natural numbers aren't closed for subtraction,
 therefore u1 - u2 should be disallowed.

 
 I thought, mathematics doesn't distinguish between, say, natural 5,
 integral 5 and real 5. N, Z and R are sets, not types of numbers.
 There is even notion of equivalence class to deem numbers with
 different representation as the same (not just equal).

Right, but the notion of set closedness for an operation comes from math.

Andrei

Nov 27 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:

 Don wrote:
 My suspicion is, that if you allowed all signed-unsigned operations when 
 at least one was a literal, and made everything else illegal, you'd fix 
 most of the problems. In particular, there'd be a big reduction in 
 people abusing 'uint' as a primitive range-limited int.

 
 Well, part of my attempt is to transform that abuse into legit use. In 
 other words, I do want to allow people to consider uint a reasonable 
 model of natural numbers. It can't be perfect, but I believe we can make 
 it reasonable.
 
 Notice that the fact that one operand is a literal does not solve all of 
 the problems I mentioned. There is for example no progress in typing u1 
 - u2 appropriately.
 
 Although it would be nice to have a type which was range-limited, 'uint' 
 doesn't do it. Instead, it guarantees the number is between 0 and 
 int.max*2+1 inclusive. Allowing mixed operations encourages programmers 
 to focus the benefit of 'the lower bound is zero!' while forgetting that 
 there is an enormous downside ('I'm saying that this could be larger 
 than int.max!')

 
 I'm not sure I understand this part. To me, the larger problem is 
 underflow, e.g. when subtracting two small uints results in a large uint.

I'm totally with Don here.  In math, natural numbers are a subset if
integers.  But uint is not a subset of int.  If it were, most of the
problems would vanish.  So it's probably feasible to ban uint from
SafeD, implement natural numbers by some other means, and leave uint for
low-level wizardry.

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:
 
 Don wrote:
 My suspicion is, that if you allowed all signed-unsigned operations when 
 at least one was a literal, and made everything else illegal, you'd fix 
 most of the problems. In particular, there'd be a big reduction in 
 people abusing 'uint' as a primitive range-limited int.

 Well, part of my attempt is to transform that abuse into legit use. In 
 other words, I do want to allow people to consider uint a reasonable 
 model of natural numbers. It can't be perfect, but I believe we can make 
 it reasonable.

 Notice that the fact that one operand is a literal does not solve all of 
 the problems I mentioned. There is for example no progress in typing u1 
 - u2 appropriately.

 Although it would be nice to have a type which was range-limited, 'uint' 
 doesn't do it. Instead, it guarantees the number is between 0 and 
 int.max*2+1 inclusive. Allowing mixed operations encourages programmers 
 to focus the benefit of 'the lower bound is zero!' while forgetting that 
 there is an enormous downside ('I'm saying that this could be larger 
 than int.max!')

 I'm not sure I understand this part. To me, the larger problem is 
 underflow, e.g. when subtracting two small uints results in a large uint.

 
 I'm totally with Don here.  In math, natural numbers are a subset if
 integers.  But uint is not a subset of int.  If it were, most of the
 problems would vanish.  So it's probably feasible to ban uint from
 SafeD, implement natural numbers by some other means, and leave uint for
 low-level wizardry.

That's also a possibility - consider unsigned types just "bags of bits" 
and disallow most arithmetic for them. They could actually be eliminated 
entirely from the core language because they can be implemented as a 
library. I'm not sure how that would feel like.

I guess length would return an int in that case?

Andrei

Nov 26 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Wed, 26 Nov 2008 15:57:55 -0600, Andrei Alexandrescu wrote:

 Sergey Gromov wrote:
 Wed, 26 Nov 2008 09:12:12 -0600, Andrei Alexandrescu wrote:
 
 Don wrote:
 My suspicion is, that if you allowed all signed-unsigned operations when 
 at least one was a literal, and made everything else illegal, you'd fix 
 most of the problems. In particular, there'd be a big reduction in 
 people abusing 'uint' as a primitive range-limited int.

 Well, part of my attempt is to transform that abuse into legit use. In 
 other words, I do want to allow people to consider uint a reasonable 
 model of natural numbers. It can't be perfect, but I believe we can make 
 it reasonable.

 Notice that the fact that one operand is a literal does not solve all of 
 the problems I mentioned. There is for example no progress in typing u1 
 - u2 appropriately.

 Although it would be nice to have a type which was range-limited, 'uint' 
 doesn't do it. Instead, it guarantees the number is between 0 and 
 int.max*2+1 inclusive. Allowing mixed operations encourages programmers 
 to focus the benefit of 'the lower bound is zero!' while forgetting that 
 there is an enormous downside ('I'm saying that this could be larger 
 than int.max!')

 I'm not sure I understand this part. To me, the larger problem is 
 underflow, e.g. when subtracting two small uints results in a large uint.

 
 I'm totally with Don here.  In math, natural numbers are a subset if
 integers.  But uint is not a subset of int.  If it were, most of the
 problems would vanish.  So it's probably feasible to ban uint from
 SafeD, implement natural numbers by some other means, and leave uint for
 low-level wizardry.

 
 That's also a possibility - consider unsigned types just "bags of bits" 
 and disallow most arithmetic for them. They could actually be eliminated 
 entirely from the core language because they can be implemented as a 
 library. I'm not sure how that would feel like.
 
 I guess length would return an int in that case?

I guess so.  Actually, simply disallowing signed<=>unsigned cast and
making length signed would force most people to abandon unsigned types.
And moving unsgned types documentation in a separate chapter would warn
newcomers about their special status.  Not a lot of changes on the
compiler side, mostly throwing stuff away.

Nov 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 That's also a possibility - consider unsigned types just "bags of bits" 
 and disallow most arithmetic for them. They could actually be eliminated 
 entirely from the core language because they can be implemented as a 
 library. I'm not sure how that would feel like.
 
 I guess length would return an int in that case?

I don't know what the solution is, but I am very happy to see that in this
newsgroup there are people willing to reconsider such basic things, to try to
improve the language.
Most ideas turn out to be wrong, but if you aren't bold enough to consider
them, there will no improvements :-)

In my programs I use use unsigned integers and unsigned longs as:
- bitfields, a single size_t, for example to represent a small set of items.
- bitarrays, in an array of size_t, to represent a larger set, to have array of
bit flags, etc.
- to pack small variables into a uint, size_t, etc, for example use the first 5
bits to represent a, the following 2 bits to represent b, etc. In such
situation I have never pack such variables into a signed int.
- when I need very large integer values, but this has to be done with care,
because they can't be converted back to ints.
- I'd also like to use unsigned ints to denote that for example a function
takes a nonnegative argument. I used to do this in Delphi, but I have seen it's
too much unsafe in D, so now in D I prefer to use ints and then inside the
function test for a negative argument and throw an exception (generally I don't
use an assert for this but in the most speed critical situations).
- I use unsigned bytes in some situations, now and then. I don't use signed
bytes anymore, I used to use them for 8 bit digital audio, but not anymore. Now
16 bit signed audio is the norm (a short) or even 24 bit (I have created a slow
24 bit value time ago).
- Probably there are few other situations, for example I think I've used an
ushort once, but not many of them.

Bye,
bearophile

Nov 26 2008

Kagamin <spam here.lot> writes:

bearophile Wrote:

 In my programs I use use unsigned integers and unsigned longs as:
 - bitfields, a single size_t, for example to represent a small set of items.
 - bitarrays, in an array of size_t, to represent a larger set, to have array
of bit flags, etc.
 - to pack small variables into a uint, size_t, etc, for example use the first
5 bits to represent a, the following 2 bits to represent b, etc. In such
situation I have never pack such variables into a signed int.

I think, signed ints can hold bits as gracefully as unsigned ones.

 - when I need very large integer values, but this has to be done with care,
because they can't be converted back to ints.

I don't think that large integers know or respect computers-specific integers
limits. They just get larger and larger.

 - Probably there are few other situations, for example I think I've used an
ushort once, but not many of them.

legacy technologies tend to use unsigneds intensively and people got used to
unsigned chars (for comparison and character maps).

Nov 27 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Kagamin wrote:
 bearophile Wrote:
 
 In my programs I use use unsigned integers and unsigned longs as: -
 bitfields, a single size_t, for example to represent a small set of
 items. - bitarrays, in an array of size_t, to represent a larger
 set, to have array of bit flags, etc. - to pack small variables
 into a uint, size_t, etc, for example use the first 5 bits to
 represent a, the following 2 bits to represent b, etc. In such
 situation I have never pack such variables into a signed int.

 
 I think, signed ints can hold bits as gracefully as unsigned ones.

Problem is there is an odd jump whenever the sign bit gets into play. An 
expert programmer can easily deal with that, but it's rather tricky.

 - when I need very large integer values, but this has to be done
 with care, because they can't be converted back to ints.

 
 I don't think that large integers know or respect computers-specific
 integers limits. They just get larger and larger.

Often large integers hold counts or sizes of objects fitting in computer 
memory. There is a sense of completeness of a systems-level language in 
being able to use a native type to express any offset in memory. That's 
why it would be some of a bummer if we defined size_t as int on 32-bit 
systems: I, at least, would feel like giving something up.


Andrei

Nov 27 2008

Walter Bright <newshound1 digitalmars.com> writes:

Sergey Gromov wrote:
 So it's probably feasible to ban uint from
 SafeD, implement natural numbers by some other means, and leave uint for
 low-level wizardry.

SafeD is about memory safety, i.e. no corrupted memory. Dealing with 
integer overflows falls outside its agenda.

Nov 26 2008

Sean Kelly <sean invisibleduck.org> writes:

Don wrote:
 
 Although it would be nice to have a type which was range-limited, 'uint' 
 doesn't do it. Instead, it guarantees the number is between 0 and 
 int.max*2+1 inclusive. Allowing mixed operations encourages programmers 
 to focus the benefit of 'the lower bound is zero!' while forgetting that 
 there is an enormous downside ('I'm saying that this could be larger 
 than int.max!')

This inspired me to think about where I use uint and I realized that I 
don't.  I use size_t for size/length representations (largely because 
sizes can theoretically be >2GB on a 32-bit system), and ubyte for 
bit-level stuff, but that's it.


Sean

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sean Kelly wrote:
 Don wrote:
 Although it would be nice to have a type which was range-limited, 
 'uint' doesn't do it. Instead, it guarantees the number is between 0 
 and int.max*2+1 inclusive. Allowing mixed operations encourages 
 programmers to focus the benefit of 'the lower bound is zero!' while 
 forgetting that there is an enormous downside ('I'm saying that this 
 could be larger than int.max!')

 
 This inspired me to think about where I use uint and I realized that I 
 don't.  I use size_t for size/length representations (largely because 
 sizes can theoretically be >2GB on a 32-bit system), and ubyte for 
 bit-level stuff, but that's it.
 
 
 Sean

For the record, I use unsigned types wherever there's a non-negative 
number involved (e.g. a count). So I'd be helped by better unsigned 
operations.

I wonder how often these super-large arrays do occur on 32-bit systems. 
I do have programs that try to allocate as large a contiguous matrix as 
possible, but never sat down and tested whether a >2GB chunk was 
allocated on the Linux cluster I work on. I'm quite annoyed by this >2GB 
issue because it's a very practical and very rare issue in a weird 
contrast with a very principled issue (modeling natural numbers).


Andrei

Nov 26 2008

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Don wrote:
 Although it would be nice to have a type which was range-limited, 
 'uint' doesn't do it. Instead, it guarantees the number is between 0 
 and int.max*2+1 inclusive. Allowing mixed operations encourages 
 programmers to focus the benefit of 'the lower bound is zero!' while 
 forgetting that there is an enormous downside ('I'm saying that this 
 could be larger than int.max!')

 This inspired me to think about where I use uint and I realized that I 
 don't.  I use size_t for size/length representations (largely because 
 sizes can theoretically be >2GB on a 32-bit system), and ubyte for 
 bit-level stuff, but that's it.

 
 For the record, I use unsigned types wherever there's a non-negative 
 number involved (e.g. a count). So I'd be helped by better unsigned 
 operations.

To be fair, I generally use unsigned numbers for values that are 
logically always positive.  These just tend to be sizes and counts in my 
code.

 I wonder how often these super-large arrays do occur on 32-bit systems. 
 I do have programs that try to allocate as large a contiguous matrix as 
 possible, but never sat down and tested whether a >2GB chunk was 
 allocated on the Linux cluster I work on. I'm quite annoyed by this >2GB 
 issue because it's a very practical and very rare issue in a weird 
 contrast with a very principled issue (modeling natural numbers).

Yeah, I have no idea how common they are, though my guess would be that 
they are rather uncommon.  As a library programmer, I simply must assume 
that they are in use, which is why I use size_t as a matter of course.


Sean

Nov 26 2008

"Denis Koroskin" <2korden gmail.com> writes:

27.11.08 в 03:46 Sean Kelly в своём письме писал(а):

 Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Don wrote:
 Although it would be nice to have a type which was range-limited,  
 'uint' doesn't do it. Instead, it guarantees the number is between 0  
 and int.max*2+1 inclusive. Allowing mixed operations encourages  
 programmers to focus the benefit of 'the lower bound is zero!' while  
 forgetting that there is an enormous downside ('I'm saying that this  
 could be larger than int.max!')

 This inspired me to think about where I use uint and I realized that I  
 don't.  I use size_t for size/length representations (largely because  
 sizes can theoretically be >2GB on a 32-bit system), and ubyte for  
 bit-level stuff, but that's it.

  For the record, I use unsigned types wherever there's a non-negative  
 number involved (e.g. a count). So I'd be helped by better unsigned  
 operations.

 To be fair, I generally use unsigned numbers for values that are  
 logically always positive.  These just tend to be sizes and counts in my  
 code.

 I wonder how often these super-large arrays do occur on 32-bit systems.  
 I do have programs that try to allocate as large a contiguous matrix as  
 possible, but never sat down and tested whether a >2GB chunk was  
 allocated on the Linux cluster I work on. I'm quite annoyed by this  
2GB issue because it's a very practical and very rare issue in a weird  

 contrast with a very principled issue (modeling natural numbers).

 Yeah, I have no idea how common they are, though my guess would be that  
 they are rather uncommon.  As a library programmer, I simply must assume  
 that they are in use, which is why I use size_t as a matter of course.


 Sean

If they can be more than 2Gb, why can't they be more than 4GB? It is  
dangerous to assume that they won't, that's why uint is dangerous. You  
exchange one additional bit of information for safety, this is wrong.

Soon enough we won't use uints the same way we don't use ushorts (I should  
have asked if anyone uses ushort these day first, but there is so little  
gain to use  ushort as opposed to short or int that I consider it  
impractical). 64bit era will give us 64bit pointers and 64 bit counters.  
Do you think you will prefer ulong over long for an additional bit? You  
really shoudn't.

My proposal

Short summary:
- Disallow bitwise operations on both signed types and unsigned types,  
allow arithmetic operations
- Discourage usage of unsigned types. Introduce bits8, bits16, bits32 and  
bits64 as a replacement
- Disallow arithmetic operations on bits* types, allow bitwise operations  
on them
- Disallow mixed-type operations (compare, add, sub, mul and div)
- Disallow implicit casts between all types
- Use int and long (or ranged types) for length and indices with runtime  
checks (a.length-- is always dangerous no mater what CT checks you will  
make).
- Add type constructors for int/uint/etc: "auto x = int(int.max + 1);"  
throws at run-time


The two most common uses of uints are:

0) Bitfields or masks, packed values and hexademical constants (bitfields  
later on)
1) Numbers that can't be negative (counters, sizes/lengths etc)


Bitfields

Bitfields are handy, and using of an unsigned type over a signed is surely  
preferable. Most common operations on bitfields are bitwise AND, OR,  
(R/L)SHIFT and XOR. You shouldn't substruct from or add to them, it is an  
error in most cases. This is what new bits8, bits16, bits32 and bits64  
types should be used for:

bits32 argbColor;
int alphaShift = 24; // any type here, actually

// shift
bits32 alphaMask = (0xFF << alphaShift); // 0xFF is of type bits8

auto value2 = value1 & mask; // all 3 are of type bits*

// you can only shift bits, result is in bits, too, i.e. the following is  
incorrect:
int i = -42;
int x = (i << 8); // An error
// 1) can't shift value of type int
// 2) can't assign valus of type bits32 to variable of type int

// ubyte is still handy sometimes (color should belong to [0..255] range)
auto red = (argbColor & alphaMask) >> alphaShift; // result is in bits32,  
use explicit cast to convert it to target data type:

ubyte red = cast(ubyte)((argbColor & alphaMask) >> alphaShift);

// Alternatively:
ubyte alpha = ubyte((argbColor & alphaMask) >> alphaShift);

Type constructor throws an error if source value (which is of type bits32  
in this example) can't be stored in ubyte. This might be a replacement for  
signed/unsigned methods.

int i = 0xFFFFFFFF; // an error, can't convert value of type bits32 to  
variable of type int
int i = int.max + 1; // ok
int i =  int(int.max + 1); // an exception is raised at runtime

int i = 0xABCD - 0xDCBA; // not allowed. Add explicit casts

auto u = cast(uint)0xABCD - cast(uint)0xDCBA; // result type is uint, no  
checks for overflow
auto i = cast(int)0xABCD - cast(int)0xDCBA; // result type is int, no  
checks for overflow

auto e = cast(uint)0xABCD - cast(int)0xDCBA; // an error, can't substruct  
int from uint

// type ctors in action:
auto i = int(cast(int)0xABCD - cast(int)0xDCBA); // result type is int, an  
exception on overflow
auto u = int(cast(uint)0xABCD - cast(uint)0xDCBA); // same here for uint


Non-negative values

Just use int/long. Or some ranged type ([0..short.max], [0..int.max],  
[0..long.max]) could be used as well. A library type, perhaps. Let's call  
it nshort/nint/nlong. It should have the same set of operations as  
short/int/long but makes additional checks. Throws on under- and overflow.

int x = 42;
nint nx = x; // ok
nx = -x; // throws

nx = int.max; // ok
++nx; // throws

nx = 0;
--nx; // throws

nx = 0;
nint ny = 42;

nx = ny; // no checking is done

int y = ny; // no checking is done, either
short s = ny; // error, cast needed

short s = cast(short)ny; // never throws
short s = short(ny); // might throw

Nov 26 2008

Sean Kelly <sean invisibleduck.org> writes:

Denis Koroskin wrote:
 27.11.08 в 03:46 Sean Kelly в своём письме писал(а):
 
 Andrei Alexandrescu wrote:
 Sean Kelly wrote:
 Don wrote:
 Although it would be nice to have a type which was range-limited, 
 'uint' doesn't do it. Instead, it guarantees the number is between 
 0 and int.max*2+1 inclusive. Allowing mixed operations encourages 
 programmers to focus the benefit of 'the lower bound is zero!' 
 while forgetting that there is an enormous downside ('I'm saying 
 that this could be larger than int.max!')

 This inspired me to think about where I use uint and I realized that 
 I don't.  I use size_t for size/length representations (largely 
 because sizes can theoretically be >2GB on a 32-bit system), and 
 ubyte for bit-level stuff, but that's it.

  For the record, I use unsigned types wherever there's a non-negative 
 number involved (e.g. a count). So I'd be helped by better unsigned 
 operations.

 To be fair, I generally use unsigned numbers for values that are 
 logically always positive.  These just tend to be sizes and counts in 
 my code.

 I wonder how often these super-large arrays do occur on 32-bit 
 systems. I do have programs that try to allocate as large a 
 contiguous matrix as possible, but never sat down and tested whether 
 a >2GB chunk was allocated on the Linux cluster I work on. I'm quite 
 annoyed by this >2GB issue because it's a very practical and very 
 rare issue in a weird contrast with a very principled issue (modeling 
 natural numbers).

 Yeah, I have no idea how common they are, though my guess would be 
 that they are rather uncommon.  As a library programmer, I simply must 
 assume that they are in use, which is why I use size_t as a matter of 
 course.

 
 If they can be more than 2Gb, why can't they be more than 4GB? It is 
 dangerous to assume that they won't, that's why uint is dangerous. You 
 exchange one additional bit of information for safety, this is wrong.

Bigger than 4GB on a 32-bit system?  Files perhaps, but I'm talking 
about memory ranges here.

 Soon enough we won't use uints the same way we don't use ushorts (I 
 should have asked if anyone uses ushort these day first, but there is so 
 little gain to use  ushort as opposed to short or int that I consider it 
 impractical). 64bit era will give us 64bit pointers and 64 bit counters. 
 Do you think you will prefer ulong over long for an additional bit? You 
 really shoudn't.

long vs. ulong for sizes is less of an issue, because we're a long way 
away from running against the limitations of a 63-bit size value.  The 
point of size_t to me, however, is that it scales automatically, so if I 
write array operations using size_t then I can be sure they will work on 
both a 32 and 64-bit system.

I do like Don's point about unsigned really meaning "unsigned" however, 
rather than "positive."  I clearly use unsigned numbers for both, even 
if I flag the "positive" uses via type alias such as size_t.  In C/C++ I 
rely on compiler warnings to trap the sort of mistakes we're talking 
about here, but I'd love a more logically sound solution if one could be 
found.


Sean

Nov 27 2008

Kagamin <spam here.lot> writes:

Andrei Alexandrescu Wrote:

 I also know seasoned programmers who had 
 no idea that -u compiles and that it also oddly returns an unsigned type.

1) I see no danger here.
2) I doubt this proposal solves the danger, wheatever it is.
3) -u is funny and looks like wrong desing to me.

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Kagamin wrote:
 Andrei Alexandrescu Wrote:
 
 I also know seasoned programmers who had 
 no idea that -u compiles and that it also oddly returns an unsigned type.

 1) I see no danger here.
 2) I doubt this proposal solves the danger, wheatever it is.
 3) -u is funny and looks like wrong desing to me.


I didn't want runtime checks inserted, just to tighten compilation rules.

Andrei

Nov 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 I didn't want runtime checks inserted, just to tighten compilation rules.

The compiler may use both :-)

Bye,
bearophile

Nov 26 2008

Kagamin <spam here.lot> writes:

bearophile Wrote:

 One solution is to "disable" some of the more error-prone syntax allowed in C,
turning it into a compilation error. For example I have seen newbies write bugs
caused by leaving & where a && was necessary. In such case just adopting "and"
and making "&&" a syntax error solves the problem and doesn't lead to bugs when
you convert C code to D (you just use a search&replace, replacing && with and
on the code).

Why do you want to turn D into Python? You already has one. Just write in
python, migrate others to it and be done with C family.

Nov 26 2008

bearophile <bearophileHUGS lycos.com> writes:

bearophile:
 One solution is to "disable" some of the more error-prone syntax allowed in C,
turning it into a compilation error. For example I have seen newbies write bugs
caused by leaving & where a && was necessary. In such case just adopting "and"
and making "&&" a syntax error solves the problem and doesn't lead to bugs when
you convert C code to D (you just use a search&replace, replacing && with and
on the code).<<



Kagamin:
Why do you want to turn D into Python? You already has one. Just write in
python, migrate others to it and be done with C family.<


The mistake I have shown of using "&&" instead of "&" or vice-versa, and "|"
instead of "||" and vice-versa comes from code I have seen written by new
programmers at he University. But no only newbies can put such bugs, see for
example this post:

http://gcc.gnu.org/ml/gcc-patches/2004-10/msg00990.html
It says:
People sometimes code "a && MASK" when they intended "a & MASK". gcc itself
does not seem to have examples of this, here are some in the linux-2.4.20
kernel:<

I want to copy the syntax that leads to less bugs and more readability, and
often Python gives good examples, because it's often well designed.

Note that this change doesn't lead to less performance of D code.

Also note that G++ already allows you to write programs with and, or, not, xor,
etc. The following code compiles and run correctly, so instead of Python you
may also say I want to copy G++:

#include "stdio.h"
#include "stdlib.h"

int main(int argc, char** argv) {
    int b1 = argc >= 2 ? atoi(argv[1]) : 0;
    int b2 = argc >= 3 ? atoi(argv[2]) : 0;
    printf("%d\n", b1 and b2);

    return 0;
}

That can be disabled with "-fno-operator-names" while the "-foperator-names" is
enabled by default. So maybe the G++ designers agree with me, instead of you.

Bye,
bearophile

Nov 26 2008

Kagamin <spam here.lot> writes:

bearophile Wrote:

 Also note that G++ already allows you to write programs with and, or, not,
xor, etc. The following code compiles and run correctly, so instead of Python
you may also say I want to copy G++:

copying G++ is not always a good idea :) As I remember this alternative syntax
is supported for compatibility with keyboards which don't have kinda exotic
~^&| characters. And I don't think that there is a method to make && a syntax
error as you proposed.

Nov 26 2008

Kagamin <spam here.lot> writes:

bearophile Wrote:

 http://gcc.gnu.org/ml/gcc-patches/2004-10/msg00990.html
 It says:
People sometimes code "a && MASK" when they intended "a & MASK". gcc itself
does not seem to have examples of this, here are some in the linux-2.4.20
kernel:<


that thread is about an extra compiler warning (which is always good), not
about breaking C syntax.

Nov 26 2008

bearophile <bearophileHUGS lycos.com> writes:

Kagamin:
that thread is about an extra compiler warning (which is always good), not
about breaking C syntax.<

You seem unaware of the current stance of Walter towards warnings. And please
don't forget that D purposes are different from C ones (D is designed to be
safer, especially if this has little or no costs), and that D comes after a
long experience of coding in C, and that D runs on machine thousands of times
faster than the original ones the C language was designed for (today having
fast kernels in your program is more and more important. Less code uses most of
the running time).

And that thread was more generally an example that shows why that specific C
syntax is error-prone, and it also explains why some languages, among them
there's Python too but it's not the only one, have refused this specific C
syntax.

Note that there are several other C syntaxes/semantics that are error-prone,
and thanks Walter D already fixes some of them, and I hope to see more
improvements in the future.


And I don't think that there is a method to make && a syntax error as you
proposed.<

Keeping two syntaxes to do the same thing is a bad form of complexity.
Generally it's better to have only one obvious way to do something :-)

Bye,
bearophile

Nov 26 2008

"Nick Sabalausky" <a a.a> writes:

"Kagamin" <spam here.lot> wrote in message 
news:ggjcfg$fqq$1 digitalmars.com...
 bearophile Wrote:

 One solution is to "disable" some of the more error-prone syntax allowed 
 in C, turning it into a compilation error. For example I have seen 
 newbies write bugs caused by leaving & where a && was necessary. In such 
 case just adopting "and" and making "&&" a syntax error solves the 
 problem and doesn't lead to bugs when you convert C code to D (you just 
 use a search&replace, replacing && with and on the code).

 Why do you want to turn D into Python? You already has one. Just write in 
 python, migrate others to it and be done with C family.

Python has other issues.

Nov 26 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 (3) u - u

Just a note here, because it seems to me you're confusing two issues 
with that "u - u" thing. The problem with "u - u" isn't one of unsigned 
vs. signed integers at all. It's a problem of possibly going out of 
range, a problem that can happen with any type but is more likely with 
unsigned integers since they're often near zero.

If you want to attack that problem, I think it should be done in a 
coherent manner with other out-of-range issues. Going below uint.min 
for an uint or below int.min for an int should be handled the same way. 
Personally, I'd just add a compiler switch for runtime range checking 
(just as for array bound checking).

Treating the result u - u as __intuint is dangerous: uint.max - 1U 
gives you a value which int cannot hold, but you'd allow it to convert 
implicitly and without warning to int? I don't like it.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 (3) u - u

 
 Just a note here, because it seems to me you're confusing two issues 
 with that "u - u" thing. The problem with "u - u" isn't one of unsigned 
 vs. signed integers at all. It's a problem of possibly going out of 
 range, a problem that can happen with any type but is more likely with 
 unsigned integers since they're often near zero.

It's also a problem of signedness, considering that int can hold the 
difference of two small unsigned integrals. So if the result is unsigned 
there may be overflow (I abusively call it "underflow"), but if the 
result is an int that overflow may be avoided, or a different overflow 
may occur.

 If you want to attack that problem, I think it should be done in a 
 coherent manner with other out-of-range issues. Going below uint.min for 
 an uint or below int.min for an int should be handled the same way. 
 Personally, I'd just add a compiler switch for runtime range checking 
 (just as for array bound checking).
 
 Treating the result u - u as __intuint is dangerous: uint.max - 1U gives 
 you a value which int cannot hold, but you'd allow it to convert 
 implicitly and without warning to int? I don't like it.

I understand. It's what I have so far, so I'm looking forward to better 
ideas. Resorting to runtime checks is always a possibility but I'd like 
to focus on the static checking aspect for now.


Andrei

Nov 26 2008

"Nick Sabalausky" <a a.a> writes:

"Michel Fortin" <michel.fortin michelf.com> wrote in message 
news:ggjpn4$1v0m$1 digitalmars.com...
 On 2008-11-25 10:59:01 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 (3) u - u

 Just a note here, because it seems to me you're confusing two issues with 
 that "u - u" thing. The problem with "u - u" isn't one of unsigned vs. 
 signed integers at all. It's a problem of possibly going out of range, a 
 problem that can happen with any type but is more likely with unsigned 
 integers since they're often near zero.

 If you want to attack that problem, I think it should be done in a 
 coherent manner with other out-of-range issues. Going below uint.min for 
 an uint or below int.min for an int should be handled the same way. 
 Personally, I'd just add a compiler switch for runtime range checking 
 (just as for array bound checking).

I'd love to see D get the ability to turn on/off runtime range checking, but 
doing nothing more than a program-wide (or module-wide if compiling 
one-at-a-time) compiler switch is way too large-grained and blunt. I would 


checked(expr)
unchecked(expr)
checked { code }
unchecked { code }

 Treating the result u - u as __intuint is dangerous: uint.max - 1U gives 
 you a value which int cannot hold, but you'd allow it to convert 
 implicitly and without warning to int? I don't like it.

 -- 
 Michel Fortin
 michel.fortin michelf.com
 http://michelf.com/

Nov 26 2008

Tomas Lindquist Olsen <tomas famolsen.dk> writes:

I'm not really sure what I think about all this. I try to always insert 
assertions before operations like this, which makes me think the nicest 
solution would be if the compiler errors out if it detects a problematic 
expression that is unchecked...

uint diff(uint begin, uint end)
{
	return end - begin; // error
}


uint diff(uint begin, uint end)
{
	assert(begin <= end);
	return end - begin; // ok because of the assert
}


I'm not going to get into how this would be implemented in the compiler, 
  but it sure would be sweet :)

Nov 26 2008

Christopher Wright <dhasenan gmail.com> writes:

Tomas Lindquist Olsen wrote:
 I'm not really sure what I think about all this. I try to always insert 
 assertions before operations like this, which makes me think the nicest 
 solution would be if the compiler errors out if it detects a problematic 
 expression that is unchecked...
 
 uint diff(uint begin, uint end)
 {
     return end - begin; // error
 }
 
 
 uint diff(uint begin, uint end)
 {
     assert(begin <= end);
     return end - begin; // ok because of the assert
 }
 
 
 I'm not going to get into how this would be implemented in the compiler, 
  but it sure would be sweet :)

On the other hand, the CPU can report on integer overflow, so you could 
turn that into an exception if the expression doesn't include a cast.

Nov 26 2008

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

The more I read about this, the more I am convinced that removing the  
following
- implicit int <-> uint conversion
- uint - uint (not 100% sure about this)
- mixed int / uint arithmetic
As well as changing array.length to int, would remove most problems.

If you desperately need a > 2^31 element array, having to roll your own is  
not the main problem.

The fact that the type of uint - uint could be int or uint depending on  
what the programmer wants, tells me that the programmer should be tasked  
with informing the compiler what he really wants - i.e. cast.

-- 
Simen

Nov 26 2008

Derek Parnell <derek psych.ward> writes:

On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:

 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.

Interesting ... but I don't think that this should be the principle
employed. If code is 'naughty' in C/C++ then D should not also produce the
same results.

I would propose that a better principle to be used would be that the
compiler will not allow loss or distortion of information without the
coder/reader being made aware of it.

 
 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u

Note that "(3) u - u" and "(6) -u" seem to be really a use of (4), namely
"(-1 * u)".

I am assming that there is no difference between 'unsigned' and 'positive',
in so much as I am not treating 'unsigned' as 'sign unknown/irrelevant'. 

It seems to me that the issue then is not so much one of sign but of size.
It needs an extra bit to hold the sign information thus a 32-bit unsigned
value needs a minimum of 33 bits to convert it to a signed equivalent.
 
In the types (1) - (4) above, I would have the compiler compute a signed
type for these. Then if the target of the result is a signed type AND
larger than the 'unsigned' portion used, then the complier would not have
to complain. In every other case the complier should complain because of
the potential for information loss. To avoid the complaint, the coder would
need to either change the result type, the input types or add a 'message'
to the compliler that in effects says "I know what I'm doing, ok?" - I
suggest a cast would suffice.

In those cases where the target type is not explicitly coded, such as using
'auto' or as a temporary value in an expression, the compiler should assume
a signed type that is 'one step' larger than the 'unsigned' element in the
expression.

e.g.
   auto x = int * uint; ==> 'x' is long.

If this causes code to be incompatible to C/C++, then it implies that the
C/C++ code was poor (i.e. potential information loss) in the first place
and deserves to be fixed up.

The scenario (5) above should also include equality comparisions, and
should cause the compiler to issue a message AND generate code like ...

   if (u < i)  ====> if ( i < 0 ? false : u < cast(typeof(u))i)
   if (u <= i) ====> if ( i < 0 ? false : u <= cast(typeof(u))i)
   if (u = i)  ====> if ( i < 0 ? false : u = cast(typeof(u))i)
   if (u >= i) ====> if ( i < 0 ? true  : u >= cast(typeof(u))i)
   if (u > i)  ====> if ( i < 0 ? true  : u > cast(typeof(u))i)

The coder should be able to avoid the message and the suboptimal generated
code my adding a cast ...

  if (u < cast(typeof u)i) 

I am also assuming that syntax 'cast(unsigned-type)signed-type' is telling
the complier to assume that the bits in the signed-value already represent
a valid unsigned-value and so therefore the compiler should not generate
code to 'transform' the signed-value bits to form an unsigned-value.


To summarize, 
(1) Perpetuating poor quality C/C++ code should not be encouraged. 
(2) The compiler should help the coder be aware of potential information
loss.
(3) The coder should have mechanisms to override the compiler's concerns.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Nov 27 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Derek Parnell wrote:
 On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:
 
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.

 
 Interesting ... but I don't think that this should be the principle
 employed. If code is 'naughty' in C/C++ then D should not also produce the
 same results.
 
 I would propose that a better principle to be used would be that the
 compiler will not allow loss or distortion of information without the
 coder/reader being made aware of it.

These two principle are not necessarily at odds with each other. The 
idea of being compatible with C and C++ is simple: if I paste a C 
function from somewhere into a D module, the function should either not 
compile, or compile and run with the same result. I think that's quite 
reasonable. So if the C code is behaving naughtily, D doesn't need to 
also behave naughty. It should just not compile.

 (1) u + i, i + u
 (2) u - i, i - u
 (3) u - u
 (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
 requires that these all return unsigned, ouch)
 (5) u < i, i < u, u <= i etc. (all ordering comparisons)
 (6) -u

 
 Note that "(3) u - u" and "(6) -u" seem to be really a use of (4), namely
 "(-1 * u)".

Correct.

 I am assming that there is no difference between 'unsigned' and 'positive',
 in so much as I am not treating 'unsigned' as 'sign unknown/irrelevant'. 
 
 It seems to me that the issue then is not so much one of sign but of size.
 It needs an extra bit to hold the sign information thus a 32-bit unsigned
 value needs a minimum of 33 bits to convert it to a signed equivalent.
  
 In the types (1) - (4) above, I would have the compiler compute a signed
 type for these. Then if the target of the result is a signed type AND
 larger than the 'unsigned' portion used, then the complier would not have
 to complain. In every other case the complier should complain because of
 the potential for information loss. To avoid the complaint, the coder would
 need to either change the result type, the input types or add a 'message'
 to the compliler that in effects says "I know what I'm doing, ok?" - I
 suggest a cast would suffice.
 
 In those cases where the target type is not explicitly coded, such as using
 'auto' or as a temporary value in an expression, the compiler should assume
 a signed type that is 'one step' larger than the 'unsigned' element in the
 expression.
 
 e.g.
    auto x = int * uint; ==> 'x' is long.

I don't think this will fly with Walter.

 If this causes code to be incompatible to C/C++, then it implies that the
 C/C++ code was poor (i.e. potential information loss) in the first place
 and deserves to be fixed up.

I don't quite think so. As long as the values are within range, the 
multiplication is legit and efficient.

 The scenario (5) above should also include equality comparisions, and
 should cause the compiler to issue a message AND generate code like ...
 
    if (u < i)  ====> if ( i < 0 ? false : u < cast(typeof(u))i)
    if (u <= i) ====> if ( i < 0 ? false : u <= cast(typeof(u))i)
    if (u = i)  ====> if ( i < 0 ? false : u = cast(typeof(u))i)
    if (u >= i) ====> if ( i < 0 ? true  : u >= cast(typeof(u))i)
    if (u > i)  ====> if ( i < 0 ? true  : u > cast(typeof(u))i)
 
 The coder should be able to avoid the message and the suboptimal generated
 code my adding a cast ...
 
   if (u < cast(typeof u)i) 

Yah, comparisons need to be looked at too.


Andrei

Nov 27 2008

Derek Parnell <derek psych.ward> writes:

On Thu, 27 Nov 2008 16:23:12 -0600, Andrei Alexandrescu wrote:

 Derek Parnell wrote:
 On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:
 
 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.

 
 Interesting ... but I don't think that this should be the principle
 employed. If code is 'naughty' in C/C++ then D should not also produce the
 same results.
 
 I would propose that a better principle to be used would be that the
 compiler will not allow loss or distortion of information without the
 coder/reader being made aware of it.

 
 These two principle are not necessarily at odds with each other. The 
 idea of being compatible with C and C++ is simple: if I paste a C 
 function from somewhere into a D module, the function should either not 
 compile, or compile and run with the same result. I think that's quite 
 reasonable. So if the C code is behaving naughtily, D doesn't need to 
 also behave naughty. It should just not compile.

I think we are saying the same thing. If the C code compiles AND if it has
the potential to lose information then the D compiler should not compile it
*if* the coder has not given explicit permission to the compiler to do so.

 In those cases where the target type is not explicitly coded, such as using
 'auto' or as a temporary value in an expression, the compiler should assume
 a signed type that is 'one step' larger than the 'unsigned' element in the
 expression.
 
 e.g.
    auto x = int * uint; ==> 'x' is long.

 
 I don't think this will fly with Walter.

And that there is our single point of failure. 

 If this causes code to be incompatible to C/C++, then it implies that the
 C/C++ code was poor (i.e. potential information loss) in the first place
 and deserves to be fixed up.

 
 I don't quite think so. As long as the values are within range, the 
 multiplication is legit and efficient.

Of course. *If* the compiler can determine that the result will not lose
information when being used, then it is fine. However, that is not always
going to be the case.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Nov 27 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Derek Parnell wrote:
 On Thu, 27 Nov 2008 16:23:12 -0600, Andrei Alexandrescu wrote:
 
 Derek Parnell wrote:
 On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:

 D pursues compatibility with C and C++ in the following manner: if a 
 code snippet compiles in both C and D or C++ and D, then it should have 
 the same semantics.

 Interesting ... but I don't think that this should be the principle
 employed. If code is 'naughty' in C/C++ then D should not also produce the
 same results.

 I would propose that a better principle to be used would be that the
 compiler will not allow loss or distortion of information without the
 coder/reader being made aware of it.

 These two principle are not necessarily at odds with each other. The 
 idea of being compatible with C and C++ is simple: if I paste a C 
 function from somewhere into a D module, the function should either not 
 compile, or compile and run with the same result. I think that's quite 
 reasonable. So if the C code is behaving naughtily, D doesn't need to 
 also behave naughty. It should just not compile.

 
 I think we are saying the same thing. If the C code compiles AND if it has
 the potential to lose information then the D compiler should not compile it
 *if* the coder has not given explicit permission to the compiler to do so.

Oh, sorry. Yes, absolutely!

 In those cases where the target type is not explicitly coded, such as using
 'auto' or as a temporary value in an expression, the compiler should assume
 a signed type that is 'one step' larger than the 'unsigned' element in the
 expression.

 e.g.
    auto x = int * uint; ==> 'x' is long.

 I don't think this will fly with Walter.

 
 And that there is our single point of failure. 
 
 If this causes code to be incompatible to C/C++, then it implies that the
 C/C++ code was poor (i.e. potential information loss) in the first place
 and deserves to be fixed up.

 I don't quite think so. As long as the values are within range, the 
 multiplication is legit and efficient.

 
 Of course. *If* the compiler can determine that the result will not lose
 information when being used, then it is fine. However, that is not always
 going to be the case.

Well here are two objective at odds with each other. One is the 
systems-y level-y aspect: on 32-bit systems there is a 32-bit 
multiplication operation that ought to be mapped to naturally by the 
32-bit D primitive. I think there is some good reason to expect that. 
Then there's also the argument you're making - and with which I agree - 
that 32-bit multiplication really yields a 64-bit value, so the type of 
the result should be long.

But if we really start down that path, infinite-precision integrals are 
the only solution. Because when you multiply two longs, you'd need 
something even longer and so on.

Anyhow, the ultimate reality is: we won't be able to satisfy every 
objective we have. We'll need to strike a good compromise.


Andrei

Nov 27 2008

bearophile <bearophileHUGS lycos.com> writes:

Some of the purposes of a good arithmetic are:
- To give the system programmer freedom, essentially to use all the speed and
flexibility of the CPU instructions.
- To allow fast-running code, it means that having ways to specify 32 or 64 bit
operations in a short way.
- To allow programs that aren't bug-prone, both with compile-time safeties and
where they aren't enough with run-time ones (array bounds, arithmetic overflow
among not-long types, etc).
- Allow more flexibility, coming from certain usages of multi-precision
integers.
- Good CommonLisp implementations are supposed to allow both fast code
(fixnums) and safe/multiprecision integers (and even untagged fixnums).


Andrei Alexandrescu:
 But if we really start down that path, infinite-precision integrals are 
 the only solution. Because when you multiply two longs, you'd need 
 something even longer and so on.

Well, having built-in multi-precision integer values isn't bad. You then need
ways to specify where you want the compiler to use fixed length numbers, for
more efficiency.

Bye,
bearophile

Nov 28 2008

Kagamin <spam here.lot> writes:

Andrei Alexandrescu Wrote:

 Often large integers hold counts or sizes of objects fitting in computer 
 memory.

Yes, if that object is system-specific like size of allocated heap chunk.
Business objects don't seem to respect system constraints (they are nearly
storage-agnostic). Files are the good example.

 There is a sense of completeness of a systems-level language in 
 being able to use a native type to express any offset in memory. That's 
 why it would be some of a bummer if we defined size_t as int on 32-bit 
 systems: I, at least, would feel like giving something up.

Yes, giving somethink up always feels like giving something up. But can you
rely on large numbers? I heard a story about program crash on attempt to
allocate memory chunk larger than half address space. It was intended to be
valid since there was enough address space, it turned out that one dll happened
to be relocated at the middle of address space, so there was no continuous mem
chunk of requested size.

Nov 28 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Treating the abusive unsigned syndrome