www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Casts and some suggestions to avoid them

reply "bearophile" <bearophileHUGS lycos.com> writes:
In D (and other languages) casts are dangerous because often they 
punch holes in the type system, and they shut up the compiler, so 
nothing catches your mistakes. And even if you write correct code 
the first time, later you can change some types in your code and 
introduce some incongruity that casts will not complain about.

Phobos and D help avoid casts in several ways, like value range 
analysis, the new double(x) syntax, functions and templates like 
std.conv.signed and std.traits.Signed, the powerful converter 
to!, using "cast()" to convert to mutable without writing the 
type, using strongly pure functions to convert mutable results to 
immutable implicitly, using CTFE to initialize immutable data, 
using Unqual!, or using std.string.representation, using 
std.exception.assumeUnique, etc. D and Phobos are doing a lot to 
avoid the need to cast, but perhaps more can be done.

I've done a little statistics on about 208 casts in code I have 
written. The relative frequency of the various casts changes 
according to the kind of D code you write, if you do a lot of OOP 
with dynamic casts, or if you do lot of low-level programming (or 
lot of interfacing with C code), that often requires some casts.

Here beside the usage frequencies, I also show some examples of 
each kind, and some ideas to reduce the need to cast, usually 
with Phobos code.

- - - - - - - -

Of those casts about 73 casts are conversions from a floating 
point value to integral value, like:
cast(uint)(x * 1.75)
cast(int)sqrt(real(ns))

In some cases you can use the to! template instead of cast.

- - - - - - - -

About 20 casts are conversions from a floating point value 
returned by floor/round/ceil to integral, like:
cast(ubyte)round(x)
cast(int)floor(y)

At first looking at std.math I was a bit puzzled by those 
functions returning a floating point value. 99% of the times I 
need to cast their result to an integral value. But what type of 
integral type? So I think I'd like those functions (or similar 
functions) to accept a template type argument to specify what 
type I want the result:
round!ubyte(x)
floor!int(y)

- - - - - - - -

About 20 casts are for the return type of 
malloc/calloc/realloc/alloca, like:
cast(ubyte*)alloca(ubyte.sizeof * x);
cast(T*)malloc(typeof(T).sizeof * 10);

A set of 3 little wrappers around those functions in Phobos can 
remove those casts (this can't be done with alloca), they are 
safer than using the raw C functions:
cMalloc!T(n)
cCalloc!T(n)
cRealloc(ptr, n)

- - - - - - - -

About 14 are reinterpret casts, sometimes to see an uint as a 
sequence of ubytes, array casts, etc:
cast(ubyte*)&x;
cast(ubyte[4]*)&data;
cast(uint[])text.to!(dchar[])
cast(ubyte[3])[x % 256, y % 256, x % 256]

- - - - - - - -

About 8 casts are needed by the opposite of 
std.string.representation, so they replace a unrepresentation 
function.

See:
https://d.puremagic.com/issues/show_bug.cgi?id=10162

With such function in Phobos all or most of such casts are not 
needed.

- - - - - - - -

About 7 are caused by feqrel, that requires mutable arguments:
const double x, y;
feqrel(cast()x, cast()y)
I presume this is just a Phobos bug, so such casts can eventually 
be removed.

https://d.puremagic.com/issues/show_bug.cgi?id=6586

- - - - - - - -

About 6 casts are used to convert an array of enums to an array 
of the underlying type, like:

enum C : char { A='a', B='b' }
C[50] arr;
cast(char[])arr

Keeping 'arr' as an array of C is handy for safety or for other 
reasons, but perhaps you need to print arr compactly or you need 
the char[] for other reasons.

I think you can't use to! in this case.

- - - - - - - -

About 5 casts are used to convert the result of std.file.read to 
an usable array type (because in some cases readText is not the 
right function to use), like:
cast(char[])"data1.txt".read
cast(ubyte[])"data2.txt".read

The cast can be avoided with  similar function that accepts a 
template type (there are perhaps ways to this with already 
present Phobos functions, suggestions are welcome):
read!(char[])("data1.txt")

- - - - - - - -

About 4 casts are needed because the D compiler misses some 
"obvious" value range propagations, like:

void foo(immutable ulong x) {
     if (x <= uint.max)
         uint y = x;

     char['z' - 'a' + 1] arr;
     foreach (immutable i, ref c; arr)
         c = 'a' + i;
}


struct Foo {
     immutable char c;

     this(in int c_)
     in {
         assert(c_ >= '0' && c_ <= '9');
     } body {
         this.c = c_;
     }
}


See:
https://d.puremagic.com/issues/show_bug.cgi?id=9570
https://d.puremagic.com/issues/show_bug.cgi?id=10594
https://d.puremagic.com/issues/show_bug.cgi?id=10685
https://d.puremagic.com/issues/show_bug.cgi?id=12514

- - - - - - - -

About 4 casts are used by hex strings, like:

ubyte[] data = cast(ubyte[])x"00 11 22 33 AB";

I think hex strings should be implicitly castable to ubyte[], 
avoiding the need to a cast, or if you don't like implicit casts 
then I think they should be of type ubyte[], because in about 
100% of the cases I don't want a char[].

There are many cases of such useless cast in Phobos:
https://d.puremagic.com/issues/show_bug.cgi?id=10453

- - - - - - - -

In about 4 cases I have used a cast to take part of a number, 
like taking the lower 32 bits of a ulong, and so on.

In some cases you can remove such casts using a union (like a 
union of one ulong and a uint[2]).

- - - - - - - -

In 2 cases I have used cast because despite array concatenations 
generate a new array, if you concatenate two const/immutable 
arrays the result can't be a mutable (and I needed a mutable 
result):

void main() {
     const char[] a, b;
     char[] c = a ~ b;
     char d;
     char[] e = a ~ d;
}


This is an old issue:
https://d.puremagic.com/issues/show_bug.cgi?id=1654

- - - - - - - -

In 2 cases I have had to cast to convert an array length to type 
uint to allow the code compile on both a 32 and 64 bit system, to 
assign such length to some uint value.

- - - - - - - -

In 1 case I've had to use a dynamic cast on class instances. In 
theory in Phobos you can add specialized upcasts, downcasts, etc, 
that are more explicit and safer.

- - - - - - - -

I have also counted about 38 unsorted casts that don't easily fit 
in the precedent categories. They are so varied that it's not 
easy to find ways to avoid them.

Bye,
bearophile
Apr 08 2014
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
 round!ubyte(x)
 floor!int(y)
https://d.puremagic.com/issues/show_bug.cgi?id=12547
 cMalloc!T(n)
 cCalloc!T(n)
 cRealloc(ptr, n)
https://d.puremagic.com/issues/show_bug.cgi?id=12548 Bye, bearophile
Apr 08 2014
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 08, 2014 at 06:38:46PM +0000, bearophile wrote:
[...]
 I've done a little statistics on about 208 casts in code I have written.
[...]
 Of those casts about 73 casts are conversions from a floating point
 value to integral value, like:
 cast(uint)(x * 1.75)
 cast(int)sqrt(real(ns))
 
 In some cases you can use the to! template instead of cast.
Which cases don't work? My impression is that to! should be preferred to casts in this case, because it will actually check runtime value ranges and throw an error if, say, the float exceeds the range of int. Using a cast will silently ignore overflowed values, leading to hard-to-find bugs. [...]
 About 20 casts are for the return type of malloc/calloc/realloc/alloca,
 like:
 cast(ubyte*)alloca(ubyte.sizeof * x);
 cast(T*)malloc(typeof(T).sizeof * 10);
 
 A set of 3 little wrappers around those functions in Phobos can remove
 those casts (this can't be done with alloca), they are safer than
 using the raw C functions:
 cMalloc!T(n)
 cCalloc!T(n)
 cRealloc(ptr, n)
This issue will (hopefully?) be addressed when Andrei finalizes his allocators, perhaps? [...]
 About 14 are reinterpret casts, sometimes to see an uint as a sequence
 of ubytes, array casts, etc:
 cast(ubyte*)&x;
 cast(ubyte[4]*)&data;
 cast(uint[])text.to!(dchar[])
 cast(ubyte[3])[x % 256, y % 256, x % 256]
Reinterpret casts are probably irreplaceable, because often they are used when you want to directly access the raw representation of some piece of data (e.g., to transmit a struct over the network, or serialize it to file, etc.). D does give some useful tools to do this with minimal risks (e.g., .sizeof), but still, this kind of cast is inherently dangerous and prone to breakage when you redefine your types. [...]
 About 6 casts are used to convert an array of enums to an array of the
 underlying type, like:
 
 enum C : char { A='a', B='b' }
 C[50] arr;
 cast(char[])arr
 
 Keeping 'arr' as an array of C is handy for safety or for other
 reasons, but perhaps you need to print arr compactly or you need the
 char[] for other reasons.
 
 I think you can't use to! in this case.
I think to! can probably be extended to perform this conversion.
 About 5 casts are used to convert the result of std.file.read to an
 usable array type (because in some cases readText is not the right
 function to use), like:
 cast(char[])"data1.txt".read
 cast(ubyte[])"data2.txt".read
 
 The cast can be avoided with  similar function that accepts a template
 type (there are perhaps ways to this with already present Phobos
 functions, suggestions are welcome):
 read!(char[])("data1.txt")
Agreed. [...]
 About 4 casts are used by hex strings, like:
 
 ubyte[] data = cast(ubyte[])x"00 11 22 33 AB";
 
 I think hex strings should be implicitly castable to ubyte[], avoiding
 the need to a cast, or if you don't like implicit casts then I think
 they should be of type ubyte[], because in about 100% of the cases I
 don't want a char[].
Agreed, I can't think of any common use case where you'd want a hex string to be char[] instead of ubyte[]. The only case I can think of, (which is not common at all) is when you want to explicitly construct test cases for UTF strings with specific code point sequences (e.g., invalid sequences to test UTF error-catching code). [...]
 In about 4 cases I have used a cast to take part of a number, like
 taking the lower 32 bits of a ulong, and so on.
 
 In some cases you can remove such casts using a union (like a union of
 one ulong and a uint[2]).
Using a union here is not a good idea, because the results depend on the endianness of the machine! It's better to just use (a & 0xFFFF) or (a >> 16) instead. [...]
 In 2 cases I have had to cast to convert an array length to type uint
 to allow the code compile on both a 32 and 64 bit system, to assign
 such length to some uint value.
This is inherently unsafe, since it risks silent truncation of very large arrays. Admittedly, that's unlikely on a 32-bit machine, but still... I think a cast is justified here (as a warning sign that the code may have fragile behaviour -- e.g., while running on a 64-bit machine). [...]
 In 1 case I've had to use a dynamic cast on class instances. In theory
 in Phobos you can add specialized upcasts, downcasts, etc, that are
 more explicit and safer.
In OO, explicit downcasting is usually frowned upon as the sign of bad design (due to the Liskov Substitution Principle). Nevertheless, AFAIK, downcasting in D is actually safe: BaseClass b; auto d = cast(DerivedClass) b; if (d is null) { // b was not an instance of DerivedClass } else { // d is safe to use } So I don't think this case counts. The cast operator was explicitly designed to handle this case (among other cases). T -- If creativity is stifled by rigid discipline, then it is not true creativity.
Apr 08 2014
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
H. S. Teoh:

 Which cases don't work?
Example: in a nothrow function. Unless you catch the exception locally. To solve this in Bugzilla I have proposed a nothrow function maybeTo that returns a Nullable!T: https://d.puremagic.com/issues/show_bug.cgi?id=6840 Also a cast is faster and lighter than to! so in some cases it's needed.
 This issue will (hopefully?) be addressed when Andrei finalizes 
 his allocators, perhaps?
Andrei allocators are very nice, and they help, but I think they can't replace the C allocation functions in every case.
 I think to! can probably be extended to perform this conversion.
It's not so simple, there are some constraints.
 Using a union here is not a good idea, because the results 
 depend on the endianness of the machine! It's better to
 just use (a & 0xFFFF) or (a >> 16) instead.
Right.
 This is inherently unsafe, since it risks silent truncation of 
 very large arrays.
In some cases you can assume to not have huge arrays. And you can even test the hugeness of the length before the cast or inside the function precondition. Bye, bearophile
Apr 08 2014
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
H. S. Teoh:

 In some cases you can remove such casts using a union (like a 
 union of one ulong and a uint[2]).
Using a union here is not a good idea, because the results depend on the endianness of the machine! It's better to just use (a & 0xFFFF) or (a >> 16) instead.
Better to avoid magic constants, you can forget one F or something. In this case you have to use 0xFFFF_FFFFu. This is safer and more readable: a & uint.max Bye, bearophile
Apr 08 2014
prev sibling next sibling parent reply "Colden Cullen" <ColdenCullen gmail.com> writes:
One issue I've had huge amounts of trouble with is casting to and 
from shared. The primary problem is that most of phobos doesn't 
handle shared values at all.

If there was some inout style thing but for shared/unshared 
instead of mutable/immutable/const that would be super helpful.
Apr 08 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 08 Apr 2014 21:30:08 +0000
schrieb "Colden Cullen" <ColdenCullen gmail.com>:

 One issue I've had huge amounts of trouble with is casting to and 
 from shared. The primary problem is that most of phobos doesn't 
 handle shared values at all.
 
 If there was some inout style thing but for shared/unshared 
 instead of mutable/immutable/const that would be super helpful.
Can you explain what level of atomicity you expect? 1) what atomicity? 2) atomic operations on single instructions 3) the whole Phobos function should be atomic with respect to the shared values passed to it 4) some mutex in your "business logic" will make sure there are no race conditions Shared currently does two things I know of (besides circumventing TLS): - simply tag a variable as "multi-threaded" so you don't forget that fact - the compiler will not reorder or cache access to it So what would it add to Phobos if everything accepted shared? In particular how would that improve thread-safety, which is the aim of marking things shared? It doesn't, because only the functions in core.atomic make sense to accept shared. The reason is simply that they are running a single instruction on a single shared operand and not a complete algorithm. Anything longer needs to be implemented with thought put into race conditions. Example: x = min(a, b); Say a == 1 and b == 2. The function would load a from memory into a CPU register, then some other thread changes a to 3, then the function compares the register content with b and returns 1, which is no longer correct at this point in time. It is not that it can never be what you want, but that min() alone cannot decide what is right for YOUR code. So instead of passing shared values to generic algorithms, we only really need UNSHARED! -- Marco
Apr 09 2014
parent "Colden Cullen" <ColdenCullen gmail.com> writes:
On Wednesday, 9 April 2014 at 11:27:24 UTC, Marco Leise wrote:
 Am Tue, 08 Apr 2014 21:30:08 +0000
 schrieb "Colden Cullen" <ColdenCullen gmail.com>:

 One issue I've had huge amounts of trouble with is casting to 
 and from shared. The primary problem is that most of phobos 
 doesn't handle shared values at all.
 
 If there was some inout style thing but for shared/unshared 
 instead of mutable/immutable/const that would be super helpful.
Can you explain what level of atomicity you expect? 1) what atomicity? 2) atomic operations on single instructions 3) the whole Phobos function should be atomic with respect to the shared values passed to it 4) some mutex in your "business logic" will make sure there are no race conditions Shared currently does two things I know of (besides circumventing TLS): - simply tag a variable as "multi-threaded" so you don't forget that fact - the compiler will not reorder or cache access to it So what would it add to Phobos if everything accepted shared? In particular how would that improve thread-safety, which is the aim of marking things shared? It doesn't, because only the functions in core.atomic make sense to accept shared. The reason is simply that they are running a single instruction on a single shared operand and not a complete algorithm. Anything longer needs to be implemented with thought put into race conditions. Example: x = min(a, b); Say a == 1 and b == 2. The function would load a from memory into a CPU register, then some other thread changes a to 3, then the function compares the register content with b and returns 1, which is no longer correct at this point in time. It is not that it can never be what you want, but that min() alone cannot decide what is right for YOUR code. So instead of passing shared values to generic algorithms, we only really need UNSHARED!
I was under the impression that casting away from shared was bad form. Is this not true? I don't expect any atomicity (at least from the standard library). All locking should be done by the user. I just want to not have to cast away from shared whenever using the standard library. I'm not asking for guaranteed atomicity, just something that says that this function may take a shared value. I would like to reiterate that I think that having to cast away from shared is a bad solution.
Apr 15 2014
prev sibling next sibling parent "Rikki Cattermole" <alphaglosined gmail.com> writes:
On Tuesday, 8 April 2014 at 18:38:47 UTC, bearophile wrote:
...
 In 2 cases I have had to cast to convert an array length to 
 type uint to allow the code compile on both a 32 and 64 bit 
 system, to assign such length to some uint value.
...
 Bye,
 bearophile
Personally I design my code around size_t/ptrdiff_t to eliminate these issues as much as possible. Yeah its more memory but it does mean less issues with 32/64bit.
Apr 09 2014
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
 I have also counted about 38 unsorted casts that don't easily 
 fit in the precedent categories. They are so varied that it's 
 not easy to find ways to avoid them.
In my post I have not shown examples of the casts for the this "unsorted" category. They are sometimes needed to work around compiler bugs, like this one (the code doesn't compile if you remove the cast): void main() { enum E { a, b } int[E][E] foo = cast()[E.a: [E.a: 1, E.b: 2], E.b: [E.a: 3, E.b: 4]]; } Bye, bearophile
Apr 09 2014
parent "Meta" <jared771 gmail.com> writes:
On Wednesday, 9 April 2014 at 21:18:38 UTC, bearophile wrote:
 I have also counted about 38 unsorted casts that don't easily 
 fit in the precedent categories. They are so varied that it's 
 not easy to find ways to avoid them.
In my post I have not shown examples of the casts for the this "unsorted" category. They are sometimes needed to work around compiler bugs, like this one (the code doesn't compile if you remove the cast): void main() { enum E { a, b } int[E][E] foo = cast()[E.a: [E.a: 1, E.b: 2], E.b: [E.a: 3, E.b: 4]]; } Bye, bearophile
I forgot that nested AAs were even possible. I was thinking about this yesterday and was positive that they weren't.
Apr 09 2014