digitalmars.D - bearophile can say "i told you so" (re uint->int implicit conv)

Adam D. Ruppe (26/26) Mar 28 2013 I was working on a project earlier today that stores IP addresses

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (28/51) Mar 28 2013 This is exactly why many new languages only allow implicit integer
Nick Sabalausky (12/13) Mar 28 2013 While I won't necessarily disagree with the rest, that right there is

Adam D. Ruppe (2/3) Mar 28 2013 Yes, and no on strict mode, I didn't even know it had one!

bearophile (27/37) Mar 28 2013 If you remove the implicit uint==>int assignment from D you have

Adam D. Ruppe (3/7) Mar 28 2013 Oh maybe I got it mixed up, but I definitely remember talking
bearophile (5/8) Mar 29 2013 On the other hand I have not tried D with such change, so that's

Timon Gehr (8/20) Mar 28 2013 While I agree that implicit uint <-> int is a bad situation, I think the...

Adam D. Ruppe (13/14) Mar 28 2013 Part of why I did it this way was the annoyance that I can't do a

Benjamin Thaut (5/8) Mar 29 2013 Who says you can't ? In fact you can using the NVI idiom:

Adam D. Ruppe (12/13) Mar 29 2013 Is that fairly new in D? I'm almost certain I tried it and it

Jonathan M Davis (5/10) Mar 29 2013 It'll work with classes and protected. It's _supposed_ to work with inte...

Minas Mina (4/4) Mar 29 2013 Consider:

Jonathan M Davis (14/19) Mar 29 2013 No. -w makes it so that warnings are errors, so you generally can't make...

Benjamin Thaut (13/22) Mar 29 2013 Reading this tells me two things:

Jesse Phillips (4/7) Mar 29 2013 uint value = 3408924;

Kagamin (3/5) Mar 30 2013 There are so many implicit conversions between signed and

bearophile (6/11) Mar 30 2013 I think Jonathan doesn't have enough proof that forbidding

Jonathan M Davis (4/15) Mar 30 2013 Walter is the one that you have to convince, and I don't think that that...

bearophile (9/11) Mar 30 2013 I understand. But maybe Walter too don't have that proof... I

Kagamin (3/3) Mar 31 2013 I vaguely remember Walter said those diagnostics are mostly false

bearophile (4/7) Mar 31 2013 I agree several of them seem innocuous.

Kagamin (4/4) Mar 31 2013 I can say C compilers bug me with InterlockedIncrement function:

Franz (7/24) Apr 02 2013 This reason alone ain't good enough to justify the implicit cast

Benjamin Thaut (27/29) Mar 29 2013 Yes it's fairly new. I think dmd 2.060 or something along that line. I

H. S. Teoh (16/45) Mar 28 2013 IMO, the compiler should insert bounds checks in non-release mode when

Adam D. Ruppe (3/5) Mar 28 2013 Yeah, I usually don't either, but apparently I did here. Murphy's

Jonathan M Davis (6/8) Mar 28 2013 It's not terribly pretty, but you can always do this

Adam D. Ruppe (3/4) Mar 28 2013 We could also do more C++ looking:

Jonathan M Davis (5/11) Mar 28 2013 It would be pretty trivial to add a wrapper function to make it cleaner....

Kagamin (5/14) Mar 28 2013 short signed(ushort n){ return cast(short)n; }

Kagamin (4/24) Apr 01 2013 BTW phobos already has the function:

Walter Bright (3/7) Mar 28 2013 http://dlang.org/phobos/std_traits.html#.Unsigned

Don (11/37) Apr 02 2013 IMHO, array.length is *the* place where unsigned does *not* work.

renoX (9/19) Apr 02 2013 You forgot something: an explanation why you feel that way..

Don (22/42) Apr 02 2013 You can actually see it from the name. An unsigned number is

bearophile (18/20) Apr 02 2013 Sometimes you need the modular nature of unsigned values, and

Jonathan M Davis (25/75) Apr 02 2013 Naturally, the biggest reason to have size_t be unsigned is so that you ...

Don (15/101) Apr 02 2013 My feeling is, that since the 16 bit days, using more than half
Kagamin (12/30) Apr 04 2013 Length exists to limit access to memory. If you want unlimited

Jonathan M Davis (9/14) Apr 04 2013 It's a difference of a factor of 2. You can access twice as much memory ...

Kagamin (8/8) Apr 04 2013 I'm afraid, a factor of 2 is too small. If an application needs
Kagamin (7/7) Apr 04 2013 BTW don't we already have a hungry application *with* unsigned

Jonathan M Davis (21/29) Apr 04 2013 I wasn't arguing otherwise. Some applications need 64-bits to do what th...

Kagamin (11/28) Apr 04 2013 How is that if the problem is not in size_t? If dmd would need a

Andrei Alexandrescu (7/16) Apr 02 2013 I used to lean a lot more toward this opinion until I got to work on a

Walter Bright (3/8) Apr 02 2013 For example, with a signed array index, a bounds check is two comparison...

Steven Schveighoffer (10/21) Apr 02 2013 Why?

Andrei Alexandrescu (3/25) Apr 02 2013 As I said - either two tests or casts all over.

Don (11/45) Apr 03 2013 Yeah, but I think that what this is, is demonstrating what a

Steven Schveighoffer (7/15) Apr 03 2013 Hm.. would it be useful to have a "guaranteed non-negative" integer type...

Don (16/34) Apr 04 2013 I think it would be extremely useful. I think "always positive"

Steven Schveighoffer (20/49) Apr 03 2013 But this is not "all over", it's in one place, for bounds checking.

Walter Bright (3/14) Apr 04 2013 Being able to cast to unsigned implies that the unsigned types exist. So...

Steven Schveighoffer (8/27) Apr 04 2013 The issue is the type of length, not that uints exist. In fact, opIndex...

"Adam D. Ruppe" <destructionator gmail.com> writes:

I was working on a project earlier today that stores IP addresses 
in a database as a uint. For some reason though, some addresses 
were coming out as 0.0.0.0, despite the fact that if(ip == 0) 
return; in the only place it actually saves them (which was my 
first attempted quick fix for the bug).

Turns out the problem was this:

if (arg == typeid(uint)) {
	int e = va_arg!uint(_argptr);
	a = to!string(e);
}


See, I copy/pasted it from the int check, but didn't update the 
type on the left hand side. So it correctly pulled a uint out of 
the varargs, but then assigned it to an int, which the compiler 
accepted silently, so to!string() printed -blah instead of 
bigblah... which then got truncated by the database, resulting in 
zero being stored.

I've since changed it to be "auto e = ..." and it all works 
correctly now.



Anyway I thought I'd share this just because one of the many 
times bearophile has talked about this as a potentially buggy 
situation, I was like "bah humbug"... and now I've actually been 
there!

I still don't think I'm for changing the language though just 
because of potential annoyances in other places unsigned works 
(such as array.length) but at least I've actually felt the other 
side of the argument in real world code now.

Mar 28 2013

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 28-03-2013 21:03, Adam D. Ruppe wrote:
 I was working on a project earlier today that stores IP addresses in a
 database as a uint. For some reason though, some addresses were coming
 out as 0.0.0.0, despite the fact that if(ip == 0) return; in the only
 place it actually saves them (which was my first attempted quick fix for
 the bug).

 Turns out the problem was this:

 if (arg == typeid(uint)) {
      int e = va_arg!uint(_argptr);
      a = to!string(e);
 }


 See, I copy/pasted it from the int check, but didn't update the type on
 the left hand side. So it correctly pulled a uint out of the varargs,
 but then assigned it to an int, which the compiler accepted silently, so
 to!string() printed -blah instead of bigblah... which then got truncated
 by the database, resulting in zero being stored.

 I've since changed it to be "auto e = ..." and it all works correctly now.



 Anyway I thought I'd share this just because one of the many times
 bearophile has talked about this as a potentially buggy situation, I was
 like "bah humbug"... and now I've actually been there!

 I still don't think I'm for changing the language though just because of
 potential annoyances in other places unsigned works (such as
 array.length) but at least I've actually felt the other side of the
 argument in real world code now.

This is exactly why many new languages only allow implicit integer 
conversions where the target type is strictly a >= type with the same 
sign, i.e. uint -> ulong, short -> int, and so on.

It is indeed very unfortunate that we have these dangerous implicit 
conversions in D. I would welcome a change to remove them (because it 
would likely catch real bugs in many cases).



... And, you know, many other changes to the language/compiler over the 
last couple of releases have broken plenty of my code. I wonder when 
we'll finally say "this is the D programming language, period". The 
current situation where some breaking changes are perfectly OK while 
others are not is kind of ridiculous.

I'm personally in favor of fixing some of the serious issues we have in 
the language once and for all and then *finally* stabilizing the 
language. It's ridiculous that we claim the language to be stable (or 
stabilizing) while we're still actively breaking real code to fix 
language issues. Fixing language issues is good and we should do it more 
so we can actually get to a point where we can call D stable. The 
current situation where some changes get blocked because the reviewer 
happens to be in a "D is stable" mood is -- sorry, but really -- stupid.

I used to even tell people "we're stabilizing D" when they ask why we 
don't fix some particular language design issue. I don't anymore, 
because I realized just how ridiculous this situation has gotten.

Well... end of rant.

-- 
Alex Rønne Petersen
alex alexrp.com / alex lycus.org
http://alexrp.com / http://lycus.org

Mar 28 2013

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On Thu, 28 Mar 2013 21:03:07 +0100
"Adam D. Ruppe" <destructionator gmail.com> wrote:
 which then got truncated by the database,

While I won't necessarily disagree with the rest, that right there is
"the real WTF". A database that silently alters data is unreliable,
and therefore fundamentally broken as a database. It should have raised
an error instead.

Is this MySQL, by any chance? And if so, are you making sure to use
strict-mode? That might help. From what I can tell, having strict mode
disabled is basically MySQL's "please fuck up half of my data" feature.
Not that I necessarily trust its strict mode to always be right
(which could very well be unfounded pessimism on my part), but it should
at least help.

Mar 28 2013

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Thursday, 28 March 2013 at 21:29:55 UTC, Nick Sabalausky wrote:
 Is this MySQL, by any chance?

Yes, and no on strict mode, I didn't even know it had one!

Mar 28 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Adam D. Ruppe:

 if (arg == typeid(uint)) {
 	int e = va_arg!uint(_argptr);
 	a = to!string(e);
 }


 See, I copy/pasted it from the int check, but didn't update the 
 type on the left hand side. So it correctly pulled a uint out 
 of the varargs, but then assigned it to an int, which the 
 compiler accepted silently,

If you remove the implicit uint==>int assignment from D you have 
to add many cast() in the code. And casts are dangerous, maybe 
even more than implicit casts. That's why D is the way it is.

Maybe here a cast(signed) is a bit safer.

I didn't write a Bugzilla request to remove the implicit 
uint==>int assignment. (I think the signed-unsigned comparisons 
are more dangerous than those signed-unsigned assignments. But 
maybe too is a problem with no solution).

------------------

Alex Rønne Petersen:

I'm personally in favor of fixing some of the serious issues we 
have in

the language once and for all<

That's quite hard to do because the problems are not easy to 
fix/improve, it takes time and a _lot_ of thinking. You can't 
quickly fix "shared", memory ownership problems, redesign things 
to not preclude the future creation of a far more parallel GC, 
and so on. And even much simpler things like properties need time 
to be redesigned. Maybe in the D world there's some need for a 
theoretician, beside Andrei.

But I agree most of the time should now be used facing the larger 
holes, design problems and missing parts of D, and less on 
everything else. Because the more time passes, the less easy it 
becomes to fix/improve those things. It's a shame to have to 
leave D after all this work just because similar problems get 
essentially frozen.

Bye,
bearophile

Mar 28 2013

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Thursday, 28 March 2013 at 21:58:05 UTC, bearophile wrote:
 I didn't write a Bugzilla request to remove the implicit 
 uint==>int assignment. (I think the signed-unsigned comparisons 
 are more dangerous than those signed-unsigned assignments. But 
 maybe too is a problem with no solution).

Oh maybe I got it mixed up, but I definitely remember talking 
about signed/unsigned something with you before!

Mar 28 2013

"bearophile" <bearophileHUGS lycos.com> writes:

 If you remove the implicit uint==>int assignment from D you 
 have to add many cast() in the code. And casts are dangerous, 
 maybe even more than implicit casts.

On the other hand I have not tried D with such change, so that's
just an hypothesis. And maybe a library-defined
toSigned()/toUnsigned() are enough here.

Bye,
bearophile

Mar 29 2013

Timon Gehr <timon.gehr gmx.ch> writes:

On 03/28/2013 09:03 PM, Adam D. Ruppe wrote:
 I was working on a project earlier today that stores IP addresses in a
 database as a uint. For some reason though, some addresses were coming
 out as 0.0.0.0, despite the fact that if(ip == 0) return; in the only
 place it actually saves them (which was my first attempted quick fix for
 the bug).

 Turns out the problem was this:

 if (arg == typeid(uint)) {
      int e = va_arg!uint(_argptr);
      a = to!string(e);
 }


 See, I copy/pasted it from the int check, but didn't update the type on
 the left hand side. ...

While I agree that implicit uint <-> int is a bad situation, I think the 
following practises deserve the larger part of the blame:

- Having too much redundant information in the code.
- Copypasta & edit instead of string mixins / static foreach.

Of course, sometimes there is a significant amount of temptation.

(Also, that code snippet is nowhere near the most convenient line 
length. Eliminating the temporary completely is a valid option. :o))

Mar 28 2013

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Thursday, 28 March 2013 at 22:04:36 UTC, Timon Gehr wrote:
 - Copypasta & edit instead of string mixins / static foreach.

Part of why I did it this way was the annoyance that I can't do a 
variadic template in an interface. I'd REALLY prefer to do it 
that way so there wouldn't be a list of types at all - just plain 
to!string(foo).

The actual line in the program is a little longer too more like 
this:

if(arg == typeid(string) || arg == typeid(immutable(string)) || 
arg == typeid(const(string)))

It annoyed me that there's so many different typeids even though 
it really doesn't matter for me here. But oh well, I got this 
code to a point where it works (with a few practices I keep in 
mind) and now I generally don't think about it anymore.

Mar 28 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 29.03.2013 02:28, schrieb Adam D. Ruppe:
 Part of why I did it this way was the annoyance that I can't do a
 variadic template in an interface. I'd REALLY prefer to do it that way
 so there wouldn't be a list of types at all - just plain to!string(foo).

Who says you can't ? In fact you can using the NVI idiom:

http://dpaste.dzfl.pl/d3b6dc77

Kind Regards
Benjamin Thaut

Mar 29 2013

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 29 March 2013 at 09:26:33 UTC, Benjamin Thaut wrote:
 Who says you can't ? In fact you can using the NVI idiom:

Is that fairly new in D? I'm almost certain I tried it and it 
didn't work when I originally wrote this code (which was a couple 
years ago).

But it'd be worth redoing it now. The other place I use runtime 
varargs is:

         // vararg hack so property assignment works right, even 
with null
         string opDispatch(string field, string file = __FILE__, 
size_t line =   __LINE__)(...)


I think there's a better way to do that now too. I'll have to 
spend some weekend gime on this.

Mar 29 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, March 29, 2013 13:10:02 Adam D. Ruppe wrote:
 On Friday, 29 March 2013 at 09:26:33 UTC, Benjamin Thaut wrote:
 Who says you can't ? In fact you can using the NVI idiom:

 Is that fairly new in D? I'm almost certain I tried it and it
 didn't work when I originally wrote this code (which was a couple
 years ago).

It'll work with classes and protected. It's _supposed_ to work with interfaces 
and private according to TDPL, but AFAIK, that hasn't been implemented yet 
(though it might be; I don't know).

- Jonathan M Davis

Mar 29 2013

"Minas Mina" <minas_mina1990 hotmail.co.uk> writes:

Consider:
uint u = ...;
int x = u;

Wouldn't a warning be enough?

Mar 29 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, March 29, 2013 17:27:10 Minas Mina wrote:
 Consider:
 uint u = ...;
 int x = u;
 
 Wouldn't a warning be enough?

No. -w makes it so that warnings are errors, so you generally can't make 
anything a warning unless you're willing for it to be treated as an error at 
least some of the time (and a lot of people compile with -w), and this sort of 
thing is _supposed_ to work without a warning - primarily because if it 
doesn't, you're forced to cast all over the place when you're dealing with 
both signed and unsigned types, and the casts actually make your code more 
error-prone, because you could end up casting something other than uint to int 
or int to uint by accident (e.g. long to uint) and end up with bugs due to 
that.

There are definitely cases where it would be nice to warn about conversions 
between signed and unsigned values, but there's a definite cost to it as well, 
so the situation is not at all clear cut.

- Jonathan M Davis

Mar 29 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 29.03.2013 20:29, schrieb Jonathan M Davis:
 No. -w makes it so that warnings are errors, so you generally can't make
 anything a warning unless you're willing for it to be treated as an error at
 least some of the time (and a lot of people compile with -w), and this sort of
 thing is _supposed_ to work without a warning - primarily because if it
 doesn't, you're forced to cast all over the place when you're dealing with
 both signed and unsigned types, and the casts actually make your code more
 error-prone, because you could end up casting something other than uint to int
 or int to uint by accident (e.g. long to uint) and end up with bugs due to
 that.

Reading this tells me two things:

1) The D-Cast is seriously broken, the default behavior should not be 
one that "breaks" stuff if you don't use it right. I personally really 
like the idea of having different types of casts. Some of which still 
doe checks and other that just do what you want because you know what yu 
are doing.
2) The library needs something like an int_cast which checks casts from 
one integer type to another and asserts / throws on error. (For an 
example see 
https://github.com/Ingrater/thBase/blob/master/src/thBase/casts.d#L28)

Kind Regards
Benjamin Thaut

Mar 29 2013

"Jesse Phillips" <Jessekphillips+D gmail.com> writes:

On Friday, 29 March 2013 at 19:38:32 UTC, Benjamin Thaut wrote:

 2) The library needs something like an int_cast which checks 
 casts from one integer type to another and asserts / throws on 
 error.

uint value = 3408924;
auto v = std.conv.to!int(value);

Exception is thrown if an overflow occurs.

Mar 29 2013

"Kagamin" <spam here.lot> writes:

On Friday, 29 March 2013 at 19:29:21 UTC, Jonathan M Davis wrote:
 because if it
 doesn't, you're forced to cast all over the place

There are so many implicit conversions between signed and 
unsigned? Are they all ok?

Mar 30 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Kagamin:

 Jonathan M Davis wrote:
 because if it
 doesn't, you're forced to cast all over the place

 There are so many implicit conversions between signed and 
 unsigned? Are they all ok?

I think Jonathan doesn't have enough proof that forbidding 
signed<->unsigned implicit casts in D is worse than the current 
situation that allows them.

Bye,
bearophile

Mar 30 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, March 30, 2013 22:12:30 bearophile wrote:
 Kagamin:
 Jonathan M Davis wrote:
 because if it
 doesn't, you're forced to cast all over the place

 
 There are so many implicit conversions between signed and
 unsigned? Are they all ok?

 
 I think Jonathan doesn't have enough proof that forbidding
 signed<->unsigned implicit casts in D is worse than the current
 situation that allows them.

Walter is the one that you have to convince, and I don't think that that's 
ever going to happen.

- Jonathan M Davis

Mar 30 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Jonathan M Davis:

 Walter is the one that you have to convince, and I don't think 
 that that's ever going to happen.

I understand. But maybe Walter too don't have that proof... I 
compile C code with all warnings, and the compiler tells me most 
cases of mixing signed with unsigned. I usually remove most of 
them.

I think the Go language doesn't have that implicit cast and Go 
programmers seem able to survive.

Bye,
bearophile

Mar 30 2013

"Kagamin" <spam here.lot> writes:

I vaguely remember Walter said those diagnostics are mostly false 
positives. Though I don't remember whether if was about implicit 
conversions.

Mar 31 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Kagamin:

 I vaguely remember Walter said those diagnostics are mostly 
 false positives. Though I don't remember whether if was about 
 implicit conversions.

I agree several of them seem innocuous.

Bye,
bearophile

Mar 31 2013

"Kagamin" <spam here.lot> writes:

I can say C compilers bug me with InterlockedIncrement function: 
it can be called on both volatile and non-volatile variables so 
type qualification of argument can't always match that of 
parameter and the compiler complains. I found that silly.

Mar 31 2013

"Franz" <franziskaner a.com> writes:

On Friday, 29 March 2013 at 19:29:21 UTC, Jonathan M Davis wrote:
 No. -w makes it so that warnings are errors, so you generally 
 can't make
 anything a warning unless you're willing for it to be treated 
 as an error at
 least some of the time (and a lot of people compile with -w), 
 and this sort of
 thing is _supposed_ to work without a warning - primarily 
 because if it
 doesn't, you're forced to cast all over the place when you're 
 dealing with
 both signed and unsigned types, and the casts actually make 
 your code more
 error-prone, because you could end up casting something other 
 than uint to int
 or int to uint by accident (e.g. long to uint) and end up with 
 bugs due to
 that.

This reason alone ain't good enough to justify the implicit cast
from unsigned to signed and vice-versa. When I sum 2 short values
I am forced to manually cast the result to short if I want to
assign it to a short variable. Isn't that prone to errors, too?
Yet the compiler forces me to cast. I really think we should
eliminate this discrepancy.

Apr 02 2013

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 29.03.2013 13:10, schrieb Adam D. Ruppe:
 Is that fairly new in D? I'm almost certain I tried it and it didn't
 work when I originally wrote this code (which was a couple years ago).

Yes it's fairly new. I think dmd 2.060 or something along that line. I 
tend to use it a lot because its so awesome ^^

For example my streaming interface for binary streams:


interface IInputStream
{
   public:
     final size_t read(T)(ref T data) if(!thBase.traits.isArray!T)
     {
       static assert(!is(T == const) && !is(T == immutable), "can not 
read into const / immutable value");
       return readImpl((cast(void*)&data)[0..T.sizeof]);
     }

     final size_t read(T)(T data) if(thBase.traits.isArray!T)
     {
       static assert(!is(typeof(data[0]) == const) && 
!is(typeof(data[0]) == immutable), "can not read into const / immutable 
array");
       return readImpl((cast(void*)data.ptr)[0..(arrayType!T.sizeof * 
data.length)]);
     }

     size_t skip(size_t bytes);

   protected:
     size_t readImpl(void[] buffer);
}


Kind Regards
Benjamin Thaut

Mar 29 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Mar 28, 2013 at 09:03:07PM +0100, Adam D. Ruppe wrote:
 I was working on a project earlier today that stores IP addresses in
 a database as a uint. For some reason though, some addresses were
 coming out as 0.0.0.0, despite the fact that if(ip == 0) return; in
 the only place it actually saves them (which was my first attempted
 quick fix for the bug).
 
 Turns out the problem was this:
 
 if (arg == typeid(uint)) {
 	int e = va_arg!uint(_argptr);
 	a = to!string(e);
 }
 
 
 See, I copy/pasted it from the int check, but didn't update the type
 on the left hand side. So it correctly pulled a uint out of the
 varargs, but then assigned it to an int, which the compiler accepted
 silently, so to!string() printed -blah instead of bigblah... which
 then got truncated by the database, resulting in zero being stored.

IMO, the compiler should insert bounds checks in non-release mode when
implicitly converting between signed and unsigned.

Also, I don't like repeating types, precisely for this reason; if that
second line had been written:

	auto e = va_arg!uint(_argptr);

then this bug wouldn't have happened. But once you repeat 'uint' twice,
there's the risk that you'll forget to update both instances when
changing/copying the code. DRY is a good principle to live by when it
comes to coding.


 I've since changed it to be "auto e = ..." and it all works
 correctly now.

Yep! :)


 Anyway I thought I'd share this just because one of the many times
 bearophile has talked about this as a potentially buggy situation, I
 was like "bah humbug"... and now I've actually been there!
 
 I still don't think I'm for changing the language though just
 because of potential annoyances in other places unsigned works (such
 as array.length) but at least I've actually felt the other side of
 the argument in real world code now.

Maybe it's time to introduce cast(signed) or cast(unsigned) to the
language, as bearophile suggests?


T

-- 
Государство делает вид, что платит нам
зарплату, а мы делаем вид, что работаем.

Mar 28 2013

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Thursday, 28 March 2013 at 22:12:57 UTC, H. S. Teoh wrote:
 Also, I don't like repeating types, precisely for this reason; 
 if that second line had been written:

Yeah, I usually don't either, but apparently I did here. Murphy's 
law at work perhaps!

Mar 28 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, March 28, 2013 15:11:02 H. S. Teoh wrote:
 Maybe it's time to introduce cast(signed) or cast(unsigned) to the
 language, as bearophile suggests?

It's not terribly pretty, but you can always do this

auto foo = cast(Unsigned!(typeof(var))var;

or

auto bar = to!(Unsigned!(typeof(var)))(var);

- Jonathan M Davis

Mar 28 2013

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 29 March 2013 at 01:18:03 UTC, Jonathan M Davis wrote:
 It's not terribly pretty, but you can always do this

We could also do more C++ looking:

unsigned_cast!foo or IFTI or whatever;

Mar 28 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, March 29, 2013 02:19:49 Adam D. Ruppe wrote:
 On Friday, 29 March 2013 at 01:18:03 UTC, Jonathan M Davis wrote:
 It's not terribly pretty, but you can always do this

 
 We could also do more C++ looking:
 
 unsigned_cast!foo or IFTI or whatever;

It would be pretty trivial to add a wrapper function to make it cleaner. I was 
just pointing out that we already provided a way to cast to an unsigned type 
of the same size without needing to add anything to the language.

- Jonathan M Davis

Mar 28 2013

"Kagamin" <spam here.lot> writes:

On Friday, 29 March 2013 at 01:18:03 UTC, Jonathan M Davis wrote:
 On Thursday, March 28, 2013 15:11:02 H. S. Teoh wrote:
 Maybe it's time to introduce cast(signed) or cast(unsigned) to 
 the
 language, as bearophile suggests?

 It's not terribly pretty, but you can always do this

 auto foo = cast(Unsigned!(typeof(var))var;

 or

 auto bar = to!(Unsigned!(typeof(var)))(var);

 - Jonathan M Davis

short signed(ushort n){ return cast(short)n; }
int signed(uint n){ return cast(int)n; }
long signed(ulong n){ return cast(long)n; }

int n = va_arg!uint(_argptr).signed;

Mar 28 2013

"Kagamin" <spam here.lot> writes:

On Friday, 29 March 2013 at 05:34:07 UTC, Kagamin wrote:
 On Friday, 29 March 2013 at 01:18:03 UTC, Jonathan M Davis 
 wrote:
 On Thursday, March 28, 2013 15:11:02 H. S. Teoh wrote:
 Maybe it's time to introduce cast(signed) or cast(unsigned) 
 to the
 language, as bearophile suggests?

 It's not terribly pretty, but you can always do this

 auto foo = cast(Unsigned!(typeof(var))var;

 or

 auto bar = to!(Unsigned!(typeof(var)))(var);

 - Jonathan M Davis

 short signed(ushort n){ return cast(short)n; }
 int signed(uint n){ return cast(int)n; }
 long signed(ulong n){ return cast(long)n; }

 int n = va_arg!uint(_argptr).signed;

BTW phobos already has the function:


I'm not sure if it's enough without `signed` counterpart.

Apr 01 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 3/28/2013 6:17 PM, Jonathan M Davis wrote:
 On Thursday, March 28, 2013 15:11:02 H. S. Teoh wrote:
 Maybe it's time to introduce cast(signed) or cast(unsigned) to the
 language, as bearophile suggests?

 It's not terribly pretty, but you can always do this:

Mar 28 2013

"Don" <turnyourkidsintocash nospam.com> writes:

On Thursday, 28 March 2013 at 20:03:08 UTC, Adam D. Ruppe wrote:
 I was working on a project earlier today that stores IP 
 addresses in a database as a uint. For some reason though, some 
 addresses were coming out as 0.0.0.0, despite the fact that 
 if(ip == 0) return; in the only place it actually saves them 
 (which was my first attempted quick fix for the bug).

 Turns out the problem was this:

 if (arg == typeid(uint)) {
 	int e = va_arg!uint(_argptr);
 	a = to!string(e);
 }


 See, I copy/pasted it from the int check, but didn't update the 
 type on the left hand side. So it correctly pulled a uint out 
 of the varargs, but then assigned it to an int, which the 
 compiler accepted silently, so to!string() printed -blah 
 instead of bigblah... which then got truncated by the database, 
 resulting in zero being stored.

 I've since changed it to be "auto e = ..." and it all works 
 correctly now.



 Anyway I thought I'd share this just because one of the many 
 times bearophile has talked about this as a potentially buggy 
 situation, I was like "bah humbug"... and now I've actually 
 been there!

 I still don't think I'm for changing the language though just 
 because of potential annoyances in other places unsigned works 
 (such as array.length) but at least I've actually felt the 
 other side of the argument in real world code now.

IMHO, array.length is *the* place where unsigned does *not* work. 
size_t should be an integer. We're not supporting 16 bit systems, 
and the few cases where a size_t value can potentially exceed 
int.max could be disallowed.

The problem with unsigned is that it gets used as "positive 
integer", which it is not. I think it was a big mistake that D 
turned C's  "unsigned long" into "ulong", thereby making it look 
more attractive. Nobody should be using unsigned types unless 
they have a really good reason. Unfortunately, size_t forces you 
to use them.

Apr 02 2013

"renoX" <renozyx gmail.com> writes:

On Tuesday, 2 April 2013 at 07:49:04 UTC, Don wrote:
[cut]
 IMHO, array.length is *the* place where unsigned does *not* 
 work. size_t should be an integer. We're not supporting 16 bit 
 systems, and the few cases where a size_t value can potentially 
 exceed int.max could be disallowed.

 The problem with unsigned is that it gets used as "positive 
 integer", which it is not. I think it was a big mistake that D 
 turned C's  "unsigned long" into "ulong", thereby making it 
 look more attractive. Nobody should be using unsigned types 
 unless they have a really good reason. Unfortunately, size_t 
 forces you to use them.

You forgot something: an explanation why you feel that way..
I do consider unsigned int as "positive integer", why do you 
think that isn't the case?
IMHO the issue with unsigned are
1) implicit conversion: a C mistake and an even worst mistake to 
copy it from C knowing that this will lead to many errors!
2) lack of overflow checks by default.

Apr 02 2013

"Don" <turnyourkidsintocash nospam.com> writes:

On Tuesday, 2 April 2013 at 08:29:41 UTC, renoX wrote:
 On Tuesday, 2 April 2013 at 07:49:04 UTC, Don wrote:
 [cut]
 IMHO, array.length is *the* place where unsigned does *not* 
 work. size_t should be an integer. We're not supporting 16 bit 
 systems, and the few cases where a size_t value can 
 potentially exceed int.max could be disallowed.

 The problem with unsigned is that it gets used as "positive 
 integer", which it is not. I think it was a big mistake that D 
 turned C's  "unsigned long" into "ulong", thereby making it 
 look more attractive. Nobody should be using unsigned types 
 unless they have a really good reason. Unfortunately, size_t 
 forces you to use them.

 You forgot something: an explanation why you feel that way..
 I do consider unsigned int as "positive integer", why do you 
 think that isn't the case?

You can actually see it from the name. An unsigned number is 
exactly that -- it's a value with *no sign*. That's quite 
different from a positive integer, which is a number where the 
sign is known to be positive.

If it has no sign, that means that the interpretation of the sign 
requires further information. For example, it may be the low 
digits of a multi-byte number. (In fact, in the Intel docs, 
multi-word operations are the primary reason for the existence of 
unsigned operations). It might also be a bag of bits.

Mathematically, a positive integer is Z+, just with a limited 
range. If an operation exceeds the range, it's really an overflow 
error, the representation has broken down.

An uint, however, is a value mod 2^^32, and follows completely 
normal modular arithmetic rules. It's the responsibility of the 
surrounding code to add meaning to it.

But very often, people use 'uint' when they really want an int, 
whose sign bit is zero.

 IMHO the issue with unsigned are
 1) implicit conversion: a C mistake and an even worst mistake 
 to copy it from C knowing that this will lead to many errors!
 2) lack of overflow checks by default.

I'm not sure how (2) is relevant.
Note that overflow of unsigned operations is impossible. Only 
signed numbers can overflow. Unsigned numbers wrap instead, and 
this is not an error, it's the central feature of their semantics.

Apr 02 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Don:

 But very often, people use 'uint' when they really want an int, 
 whose sign bit is zero.

Sometimes you need the modular nature of unsigned values, and 
some other times you just need an integer that according to the 
logic of the program never gets negative and you want the full 
range of a word, not throwing away one bit, but you don't want it 
to wrap-around. In programs I'd like to use:

1) integers of various sizes (with error if you try to go outside 
their range);
2) subranges of 1 (with error if you try to go outside their 
range);
3) unsigned integers of various sizes (with error if you try to 
go outside their range);
4) subranges of 3 (with error if you try to go outside their 
range);
5) unsigned integers with wrap-around;
6) multi precision integer;

Bye,
bearophile

Apr 02 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Tuesday, April 02, 2013 09:49:03 Don wrote:
 On Thursday, 28 March 2013 at 20:03:08 UTC, Adam D. Ruppe wrote:
 I was working on a project earlier today that stores IP
 addresses in a database as a uint. For some reason though, some
 addresses were coming out as 0.0.0.0, despite the fact that
 if(ip == 0) return; in the only place it actually saves them
 (which was my first attempted quick fix for the bug).
 
 Turns out the problem was this:
 
 if (arg == typeid(uint)) {
 
 int e = va_arg!uint(_argptr);
 a = to!string(e);
 
 }
 
 
 See, I copy/pasted it from the int check, but didn't update the
 type on the left hand side. So it correctly pulled a uint out
 of the varargs, but then assigned it to an int, which the
 compiler accepted silently, so to!string() printed -blah
 instead of bigblah... which then got truncated by the database,
 resulting in zero being stored.
 
 I've since changed it to be "auto e = ..." and it all works
 correctly now.
 
 
 
 Anyway I thought I'd share this just because one of the many
 times bearophile has talked about this as a potentially buggy
 situation, I was like "bah humbug"... and now I've actually
 been there!
 
 I still don't think I'm for changing the language though just
 because of potential annoyances in other places unsigned works
 (such as array.length) but at least I've actually felt the
 other side of the argument in real world code now.

 
 IMHO, array.length is *the* place where unsigned does *not* work.
 size_t should be an integer. We're not supporting 16 bit systems,
 and the few cases where a size_t value can potentially exceed
 int.max could be disallowed.
 
 The problem with unsigned is that it gets used as "positive
 integer", which it is not. I think it was a big mistake that D
 turned C's "unsigned long" into "ulong", thereby making it look
 more attractive. Nobody should be using unsigned types unless
 they have a really good reason. Unfortunately, size_t forces you
 to use them.

Naturally, the biggest reason to have size_t be unsigned is so that you can 
access the whole address space, though on 64-bit machines, that's not 
particularly relevant, since you're obviouly not going to have a machine with 
that much RAM (you're extremely unlikely to even have machine with that much 
hard drive space, though I think that I've heard of some machines existing 
which have run into that problem on 64-bit machines as crazy as that would 
be). For some people though, it _is_ a big deal on 32-bit machines. For 
instance, IIRC, David Simcha need 64-bit support for some of the stuff he was 
doing (biology stuff I think), because he couldn't address enough memory on a 
32-bit machine to do what he was doing. And I know that one of the products 
where I work is going to have to move to 64-bit OS, because they're failing at 
keeping its main process' memory footprint low enough to work on a 32-bit box. 
Having a signed size_t would make it even worse. Granted, they're using C++, 
not D, but the issue is the same.

So, it's arguably important on 32-bit machines that size_t be unsigned, but 
64-bit doesn't really have that excuse. However, making size_t unsigned on 32-
bit machines and signed on 64-bit machines would create its own set of 
problems, and I suspect that would be an even worse idea than making size_t 
signed on 64-bit machines.

I do agree though that in general, unsigned types should be used with 
discretion, and they tend to be overused IMHO. I'm not convinced that that's 
the case with size_t though, since 32-bit machines do make it a necessity 
sometimes.

- Jonathan M Davis

Apr 02 2013

"Don" <turnyourkidsintocash nospam.com> writes:

On Tuesday, 2 April 2013 at 09:43:37 UTC, Jonathan M Davis wrote:
 On Tuesday, April 02, 2013 09:49:03 Don wrote:
 On Thursday, 28 March 2013 at 20:03:08 UTC, Adam D. Ruppe 
 wrote:
 I was working on a project earlier today that stores IP
 addresses in a database as a uint. For some reason though, 
 some
 addresses were coming out as 0.0.0.0, despite the fact that
 if(ip == 0) return; in the only place it actually saves them
 (which was my first attempted quick fix for the bug).
 
 Turns out the problem was this:
 
 if (arg == typeid(uint)) {
 
 int e = va_arg!uint(_argptr);
 a = to!string(e);
 
 }
 
 
 See, I copy/pasted it from the int check, but didn't update 
 the
 type on the left hand side. So it correctly pulled a uint out
 of the varargs, but then assigned it to an int, which the
 compiler accepted silently, so to!string() printed -blah
 instead of bigblah... which then got truncated by the 
 database,
 resulting in zero being stored.
 
 I've since changed it to be "auto e = ..." and it all works
 correctly now.
 
 
 
 Anyway I thought I'd share this just because one of the many
 times bearophile has talked about this as a potentially buggy
 situation, I was like "bah humbug"... and now I've actually
 been there!
 
 I still don't think I'm for changing the language though just
 because of potential annoyances in other places unsigned 
 works
 (such as array.length) but at least I've actually felt the
 other side of the argument in real world code now.

 
 IMHO, array.length is *the* place where unsigned does *not* 
 work.
 size_t should be an integer. We're not supporting 16 bit 
 systems,
 and the few cases where a size_t value can potentially exceed
 int.max could be disallowed.
 
 The problem with unsigned is that it gets used as "positive
 integer", which it is not. I think it was a big mistake that D
 turned C's "unsigned long" into "ulong", thereby making it look
 more attractive. Nobody should be using unsigned types unless
 they have a really good reason. Unfortunately, size_t forces 
 you
 to use them.

 Naturally, the biggest reason to have size_t be unsigned is so 
 that you can
 access the whole address space, though on 64-bit machines, 
 that's not
 particularly relevant, since you're obviouly not going to have 
 a machine with
 that much RAM (you're extremely unlikely to even have machine 
 with that much
 hard drive space, though I think that I've heard of some 
 machines existing
 which have run into that problem on 64-bit machines as crazy as 
 that would
 be). For some people though, it _is_ a big deal on 32-bit 
 machines. For
 instance, IIRC, David Simcha need 64-bit support for some of 
 the stuff he was
 doing (biology stuff I think), because he couldn't address 
 enough memory on a
 32-bit machine to do what he was doing. And I know that one of 
 the products
 where I work is going to have to move to 64-bit OS, because 
 they're failing at
 keeping its main process' memory footprint low enough to work 
 on a 32-bit box.
 Having a signed size_t would make it even worse. Granted, 
 they're using C++,
 not D, but the issue is the same.

My feeling is, that since the 16 bit days, using more than half 
of the address space is such an usual activity that it deserves 
special treatment in the code.
I don't think its unreasonable to require a cast for every use of 
those super-sized sizes.
Even if you have an array which doesn't fit into an int, you can 
only have one such array in your program!

This really, really obscure corner case doesn't deserve to be 
polluting the language.
All those signed/unsigned issues basically come from it. It's a 
helluva price to pay.

It's looking like an even worse deal now, because anybody with 
large memory requirements will be on 64 bits. We've made this 
sacrifice for the sake of a situation that is no longer relevant.

Apr 02 2013

"Kagamin" <spam here.lot> writes:

On Tuesday, 2 April 2013 at 09:43:37 UTC, Jonathan M Davis wrote:
 Naturally, the biggest reason to have size_t be unsigned is so 
 that you can
 access the whole address space

Length exists to limit access to memory. If you want unlimited 
access, use just a pointer.

 For some people though, it _is_ a big deal on 32-bit machines. 
 For
 instance, IIRC, David Simcha need 64-bit support for some of 
 the stuff he was
 doing (biology stuff I think), because he couldn't address 
 enough memory on a
 32-bit machine to do what he was doing. And I know that one of 
 the products
 where I work is going to have to move to 64-bit OS, because 
 they're failing at
 keeping its main process' memory footprint low enough to work 
 on a 32-bit box.
 Having a signed size_t would make it even worse. Granted, 
 they're using C++,
 not D, but the issue is the same.

I'm afraid, those applications are not tied to 32-bit ints. They 
just want a lot of memory because they have a lot of data. It 
means they want more than 4 gigs, so uint won't help in the 
slightest: it can't address more than 4 gigs, and applications 
will keep failing. There's a technology to use more than 4 gigs 
on 32-bit system:
http://en.wikipedia.org/wiki/Address_Windowing_Extensions
but uint still has no advantage over int, as it still can't 
address all the needed memory (which is more than 4 gigs).

Apr 04 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, April 04, 2013 15:20:26 Kagamin wrote:
 I'm afraid, those applications are not tied to 32-bit ints. They
 just want a lot of memory because they have a lot of data. It
 means they want more than 4 gigs, so uint won't help in the
 slightest: it can't address more than 4 gigs, and applications
 will keep failing.

It's a difference of a factor of 2. You can access twice as much memory with a 
uint than an int. It's quite possible to need enough memory that an int 
wouldn't be enough and a uint would be. Of course, going 64-bit pretty much 
solves the problem, because you're not going to have enough memory to need 
anywhere near 64-bits of address space any time soon (and probaly not ever), 
but uint _can_ make a difference or 32-bit machines, because it gives you twice 
as much memory to play around with.

- Jonathan M Davis

Apr 04 2013

"Kagamin" <spam here.lot> writes:

I'm afraid, a factor of 2 is too small. If an application needs 
gigabytes, you'll have hard time trying to convince it to not use 
more than 4 gigs. Or more specifically between 2 and 4 gigs.

Your examples don't specify if those applications needed large 
contiguous allocations (which is another problem in itself), only 
a memory consumption. Actually a program can consume more memory 
in small allocations, because this way it can use fragmented 
address space to the fullest.

Apr 04 2013

"Kagamin" <spam here.lot> writes:

BTW don't we already have a hungry application *with* unsigned 
integers?
http://d.puremagic.com/issues/show_bug.cgi?id=4236
http://d.puremagic.com/issues/show_bug.cgi?id=6498
http://d.puremagic.com/issues/show_bug.cgi?id=3719

http://d.puremagic.com/issues/show_bug.cgi?id=4984 - and who 
reported this?

Apr 04 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, April 04, 2013 21:39:35 Kagamin wrote:
 BTW don't we already have a hungry application *with* unsigned
 integers?
 http://d.puremagic.com/issues/show_bug.cgi?id=4236
 http://d.puremagic.com/issues/show_bug.cgi?id=6498
 http://d.puremagic.com/issues/show_bug.cgi?id=3719
 
 http://d.puremagic.com/issues/show_bug.cgi?id=4984 - and who
 reported this?

I wasn't arguing otherwise. Some applications need 64-bits to do what they do. 
My point was that with 32-bit programs, using unsigned integers gives you 
lengths twice as long, so it's quite possible for a 32-bit program to work 
with size_t being unsigned but not work if it were signed. But regardless of 
whether size_t is signed or unsigned, there's a limit to how much memory you 
can deal with in a 32-bit program, and some programs will need to go to 64-
bit. It's just that if size_t is unsigned, the limit is higher.

But I would point out that the bugs that you listed are not at really related 
to this discussion. They're about dmd running out of memory when compiling, 
and it's running out of memory not because it needs 64-bit to have enough 
memory or because size_t is signed (because it is) but because it doesn't 
reuse memory like it's supposed to. It generally just eats more without 
releasing it properly. It should be perfectly possible for a 32-bit dmd to 
compile those programs without running out of memory. And if that issue has 
anything to do with this discussion, it would be to point out that dmd's 
problems would be made worse by making size_t signed, which would just 
underline the fact that making size_t signed on 32-bit systems would make 
things worse (though dmd is currently written in C++, so whether size_t is 
signed or unsigned in D doesn't really matter for it at the moment).

- Jonathan M Davis

Apr 04 2013

"Kagamin" <spam here.lot> writes:

On Friday, 5 April 2013 at 01:26:27 UTC, Jonathan M Davis wrote:
 But I would point out that the bugs that you listed are not at 
 really related
 to this discussion. They're about dmd running out of memory 
 when compiling,
 and it's running out of memory not because it needs 64-bit to 
 have enough
 memory or because size_t is signed (because it is) but because 
 it doesn't
 reuse memory like it's supposed to. It generally just eats more 
 without
 releasing it properly. It should be perfectly possible for a 
 32-bit dmd to
 compile those programs without running out of memory. And if 
 that issue has
 anything to do with this discussion, it would be to point out 
 that dmd's
 problems would be made worse by making size_t signed

How is that if the problem is not in size_t? If dmd would need a 
large array, it won't be possible to solve by properly releasing 
memory: if the array is needed, no matter what you release, 
nothing you can do with that array. The issues show a real memory 
consumption mechanics: in a case, when an application needs 
gigabytes, it won't stop at 4 gigs just *because* there is 32-bit 
limit, so if uint buys you anything, it's too negligible to be 
considered: it's much easier to migrate to 64-bit than playing 
russian roulette pushing limits of 32-bit and see whether you hit 
them.

Apr 04 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/2/13 3:49 AM, Don wrote:
 IMHO, array.length is *the* place where unsigned does *not* work. size_t
 should be an integer. We're not supporting 16 bit systems, and the few
 cases where a size_t value can potentially exceed int.max could be
 disallowed.

 The problem with unsigned is that it gets used as "positive integer",
 which it is not. I think it was a big mistake that D turned C's
 "unsigned long" into "ulong", thereby making it look more attractive.
 Nobody should be using unsigned types unless they have a really good
 reason. Unfortunately, size_t forces you to use them.

I used to lean a lot more toward this opinion until I got to work on a 
C++ codebase using signed integers as array sizes and indices. It's an 
pain all over the code - two tests instead of one or casts all over, 
more cases to worry about... changing the code to use unsigned 
throughout ended up being an improvement.

Andrei

Apr 02 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 4/2/2013 12:47 PM, Andrei Alexandrescu wrote:
 I used to lean a lot more toward this opinion until I got to work on a C++
 codebase using signed integers as array sizes and indices. It's an pain all
over
 the code - two tests instead of one or casts all over, more cases to worry
 about... changing the code to use unsigned throughout ended up being an
 improvement.

For example, with a signed array index, a bounds check is two comparisons
rather 
than one.

Apr 02 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 02 Apr 2013 16:32:21 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 4/2/2013 12:47 PM, Andrei Alexandrescu wrote:
 I used to lean a lot more toward this opinion until I got to work on a  
 C++
 codebase using signed integers as array sizes and indices. It's an pain  
 all over
 the code - two tests instead of one or casts all over, more cases to  
 worry
 about... changing the code to use unsigned throughout ended up being an
 improvement.

 For example, with a signed array index, a bounds check is two  
 comparisons rather than one.

Why?

struct myArr
{
    int length;
    int opIndex(int idx) { if(cast(uint)idx >= cast(uint)length) throw new  
RangeError(); ...}
}

-Steve

Apr 02 2013

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/2/13 11:10 PM, Steven Schveighoffer wrote:
 On Tue, 02 Apr 2013 16:32:21 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 4/2/2013 12:47 PM, Andrei Alexandrescu wrote:
 I used to lean a lot more toward this opinion until I got to work on
 a C++
 codebase using signed integers as array sizes and indices. It's an
 pain all over
 the code - two tests instead of one or casts all over, more cases to
 worry
 about... changing the code to use unsigned throughout ended up being an
 improvement.

 For example, with a signed array index, a bounds check is two
 comparisons rather than one.

 Why?

 struct myArr
 {
 int length;
 int opIndex(int idx) { if(cast(uint)idx >= cast(uint)length) throw new
 RangeError(); ...}
 }

 -Steve

As I said - either two tests or casts all over.

Andrei

Apr 02 2013

"Don" <turnyourkidsintocash nospam.com> writes:

On Wednesday, 3 April 2013 at 03:26:54 UTC, Andrei Alexandrescu 
wrote:
 On 4/2/13 11:10 PM, Steven Schveighoffer wrote:
 On Tue, 02 Apr 2013 16:32:21 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 4/2/2013 12:47 PM, Andrei Alexandrescu wrote:
 I used to lean a lot more toward this opinion until I got to 
 work on
 a C++
 codebase using signed integers as array sizes and indices. 
 It's an
 pain all over
 the code - two tests instead of one or casts all over, more 
 cases to
 worry
 about... changing the code to use unsigned throughout ended 
 up being an
 improvement.

 For example, with a signed array index, a bounds check is two
 comparisons rather than one.

 Why?

 struct myArr
 {
 int length;
 int opIndex(int idx) { if(cast(uint)idx >= cast(uint)length) 
 throw new
 RangeError(); ...}
 }

 -Steve

 As I said - either two tests or casts all over.

 Andrei

Yeah, but I think that what this is, is demonstrating what a 
useful concept a positive integer type is. There's huge value in 
statically knowing that the sign bit is never negative. 
Unfortunately, using uint for this purpose gives the wrong 
semantics, and introduces these signed/unsigned issues, which are 
basically silly.

Personally I suspect there aren't many uses for unsigned types of 
sizes other than the full machine word. In all the other sizes, a 
positive integer would be more useful.

Apr 03 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 03 Apr 2013 07:33:05 -0400, Don <turnyourkidsintocash nospam.com>  
wrote:

 Yeah, but I think that what this is, is demonstrating what a useful  
 concept a positive integer type is. There's huge value in statically  
 knowing that the sign bit is never negative. Unfortunately, using uint  
 for this purpose gives the wrong semantics, and introduces these  
 signed/unsigned issues, which are basically silly.

 Personally I suspect there aren't many uses for unsigned types of sizes  
 other than the full machine word. In all the other sizes, a positive  
 integer would be more useful.

Hm.. would it be useful to have a "guaranteed non-negative" integer type?   
Like array length.  Then the compiler could make that assumption, and do  
something like what I did as an optimization?

Subtracting from that type would result in a plain-old int.

-Steve

Apr 03 2013

"Don" <turnyourkidsintocash nospam.com> writes:

On Wednesday, 3 April 2013 at 14:54:03 UTC, Steven Schveighoffer 
wrote:
 On Wed, 03 Apr 2013 07:33:05 -0400, Don 
 <turnyourkidsintocash nospam.com> wrote:

 Yeah, but I think that what this is, is demonstrating what a 
 useful concept a positive integer type is. There's huge value 
 in statically knowing that the sign bit is never negative. 
 Unfortunately, using uint for this purpose gives the wrong 
 semantics, and introduces these signed/unsigned issues, which 
 are basically silly.

 Personally I suspect there aren't many uses for unsigned types 
 of sizes other than the full machine word. In all the other 
 sizes, a positive integer would be more useful.

 Hm.. would it be useful to have a "guaranteed non-negative" 
 integer type?  Like array length.  Then the compiler could make 
 that assumption, and do something like what I did as an 
 optimization?

 Subtracting from that type would result in a plain-old int.

 -Steve

I think it would be extremely useful. I think "always positive" 
is a fundamental mathematical property that isn't captured by the 
type system. But I fear the heritage from C just has too much 
momentum.

One thing we could do immediately, without changing anything in 
the language definition at all, is add range propagation for 
array length.

So that, for any array A, A.length is in the range 0 .. 
(size_t.max/A[0].sizeof)
which would mean that unless A is of type byte, ubyte, void, or 
char, the length is known to be a positive integer. And of course 
for a static array, the exact length is known.

Although that has very limited applicability (only works within a 
single expression), I think it might help quite a lot.

Apr 04 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 02 Apr 2013 23:26:54 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 4/2/13 11:10 PM, Steven Schveighoffer wrote:
 On Tue, 02 Apr 2013 16:32:21 -0400, Walter Bright
 <newshound2 digitalmars.com> wrote:

 On 4/2/2013 12:47 PM, Andrei Alexandrescu wrote:
 I used to lean a lot more toward this opinion until I got to work on
 a C++
 codebase using signed integers as array sizes and indices. It's an
 pain all over
 the code - two tests instead of one or casts all over, more cases to
 worry
 about... changing the code to use unsigned throughout ended up being  
 an
 improvement.

 For example, with a signed array index, a bounds check is two
 comparisons rather than one.

 Why?

 struct myArr
 {
 int length;
 int opIndex(int idx) { if(cast(uint)idx >= cast(uint)length) throw new
 RangeError(); ...}
 }

 -Steve

 As I said - either two tests or casts all over.

But this is not "all over", it's in one place, for bounds checking.

I find that using unsigned int doesn't really hurt much, but it can make  
things awkward.

For example, it's better to do addition than subtraction:

for(int i = 0; i < arr.length - 1; ++i)
{
    if(arr[i] >= arr[i+1])
       throw new Exception("Not sorted!");
}

This has a bug, and is better written as:

for(int i = 0; i + 1 < arr.length; ++i)

These are the kinds of things that can get you into trouble.  With a  
signed length, then both loops are equivalent, and we don't have that  
error.

I'm not sure which is better.  It feels to me that if you CAN achieve the  
correct performance (even if this means casting), but the default errs on  
the side of safety, that might be a better option.

-Steve

Apr 03 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 4/2/2013 8:10 PM, Steven Schveighoffer wrote:
 On Tue, 02 Apr 2013 16:32:21 -0400, Walter Bright <newshound2 digitalmars.com>
 wrote:
 For example, with a signed array index, a bounds check is two comparisons
 rather than one.

 Why?

 struct myArr
 {
     int length;
     int opIndex(int idx) { if(cast(uint)idx >= cast(uint)length) throw new
 RangeError(); ...}
 }

Being able to cast to unsigned implies that the unsigned types exist. So no 
improvement.

Apr 04 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 04 Apr 2013 15:10:28 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 4/2/2013 8:10 PM, Steven Schveighoffer wrote:
 On Tue, 02 Apr 2013 16:32:21 -0400, Walter Bright  
 <newshound2 digitalmars.com>
 wrote:
 For example, with a signed array index, a bounds check is two  
 comparisons
 rather than one.

 Why?

 struct myArr
 {
     int length;
     int opIndex(int idx) { if(cast(uint)idx >= cast(uint)length) throw  
 new
 RangeError(); ...}
 }

 Being able to cast to unsigned implies that the unsigned types exist. So  
 no improvement.

The issue is the type of length, not that uints exist.  In fact, opIndex  
can take a uint, and then you don't need any casts, as far as I know:

int opIndex(uint idx) { if(idx >= length) throw new RangeError(); ...}

I think length will be promoted to uint (and it is always positive), so  
it's fine, only requires one check.

-Steve

Apr 04 2013

D Programming

C/C++ Programming

Other

digitalmars.D - bearophile can say "i told you so" (re uint->int implicit conv)