www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - the cast mess

reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
Hello digitalmars.D,

I'm concerned about the cast() situation with D. Right now, we use the same
keyword and syntax to do three different things:

1) The obvious plain-old-data cast (casting one pointer type to another,
structures, etc.)
2) Type conversion casts (casting floats to ints)
3) Dynamic casts for classes (may result in null)
(There may be more magic going on behind the scenes when e.g. casting between
arrays with different element sizes - a proposal I saw some time ago to replace
start/length with start/end would get rid of that)

I attached a small program demonstrating a few uses of cast that may have
unexpected results for new users. 

I'd like to ask: am I the only one who thinks that this inconsistent behavior
of one of the language's base features sticks out like a sore thumb?
I can see where this mess is coming from - the first two are from C, and the
third is from the more dynamic languages (C++ has dynamic_cast but it's a
separate language construct). I'm not aware of a language that abuses the same
cast operator for such varying effects.
Someone doing a code review will have to put in extra effort to tell apart safe
casts from unsafe ones.
In a metaprogramming context, nothing can be said about "cast(T)x" - whether
it's type-safe, what overheads will it incur, etc.

I have the following suggestions to remedy the problem. They're probably too
drastic even for D2, so even if they'd get accepted - probably for D3:

1) Use the cast() keyword/syntax only to change the compiler's representation
of the type. The source and target types must be of the same size. Practically
it would be the same as *cast(T)(&x) with the current system, except with the
additional size check. The compiler should issue a warning when previous
versions would have performed data conversion.

2) Move all "magic" value type conversions (float => int etc.) to intrinsic
functions declared in the standard library.
An additional argument for this move is that, IMO, taking the integer part of a
floating-point number should be as accessible as rounding the number to the
nearest integer, or any other float=>int conversions (trunc/ceil/etc.)

3) Switch to a new syntax for dynamic_cast-style casts. I propose the "c is T"
syntax, which should function identically to the current "cast(T)c" and is much
more readable.

Alternatively, we could bring back C-style casts for safe and "magic" type
conversions (points 2 and 3), and leave cast() (or rename it to something
scarier) for unsafe "reinterpret" casts.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com
May 17 2009
next sibling parent reply "Tim Matthews" <tim.matthews7 gmail.com> writes:
Having a float -5.7 magically turn into -5 is usually what is wanted for a  
float to int conversion. I am not sure if anyone would want it to be  
converted to -1061788058 instead but dmd currently allows both by having  
the normal way easiest to do and having the unusuall way achievable by  
placing it in an array or struct for the cast.

I am not sure whether I like the inconstistency but is there a use case  
for when -5.7 should ever be -1061788058?

When such invalid casts like "auto aa = cast(A[])fa;" are visible at  
compile time why doesn't all the error throwing happen at compile time?
May 18 2009
next sibling parent Tomas Lindquist Olsen <tomas.l.olsen gmail.com> writes:
On Mon, May 18, 2009 at 10:28 AM, Tim Matthews <tim.matthews7 gmail.com> wrote:
 Having a float -5.7 magically turn into -5 is usually what is wanted for a
 float to int conversion. I am not sure if anyone would want it to be
 converted to -1061788058 instead but dmd currently allows both by having the
 normal way easiest to do and having the unusuall way achievable by placing
 it in an array or struct for the cast.

 I am not sure whether I like the inconstistency but is there a use case for
 when -5.7 should ever be -1061788058?

 When such invalid casts like "auto aa = cast(A[])fa;" are visible at compile
 time why doesn't all the error throwing happen at compile time?
cast(T) is also very poorly documented.
May 18 2009
prev sibling parent Kagamin <spam here.lot> writes:
Tim Matthews Wrote:

 When such invalid casts like "auto aa = cast(A[])fa;" are visible at  
 compile time why doesn't all the error throwing happen at compile time?
currently cast behaves mostly like C cast, special case being dynamic cast and opCast. So it's very unsafe.
May 18 2009
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
Vladimir Panteleev wrote:
 Hello digitalmars.D,
 
 I'm concerned about the cast() situation with D. Right now, we use the same
keyword and syntax to do three different things:
 
 1) The obvious plain-old-data cast (casting one pointer type to another,
structures, etc.)
 2) Type conversion casts (casting floats to ints)
 3) Dynamic casts for classes (may result in null)
 (There may be more magic going on behind the scenes when e.g. casting between
arrays with different element sizes - a proposal I saw some time ago to replace
start/length with start/end would get rid of that)
4) removing const/immutable.
 
 I attached a small program demonstrating a few uses of cast that may have
unexpected results for new users. 
 
 I'd like to ask: am I the only one who thinks that this inconsistent behavior
of one of the language's base features sticks out like a sore thumb?
 I can see where this mess is coming from - the first two are from C, and the
third is from the more dynamic languages (C++ has dynamic_cast but it's a
separate language construct). I'm not aware of a language that abuses the same
cast operator for such varying effects.
 Someone doing a code review will have to put in extra effort to tell apart
safe casts from unsafe ones.
 In a metaprogramming context, nothing can be said about "cast(T)x" - whether
it's type-safe, what overheads will it incur, etc.
 
 I have the following suggestions to remedy the problem. They're probably too
drastic even for D2, so even if they'd get accepted - probably for D3:
 
 1) Use the cast() keyword/syntax only to change the compiler's representation
of the type. The source and target types must be of the same size. Practically
it would be the same as *cast(T)(&x) with the current system, except with the
additional size check. The compiler should issue a warning when previous
versions would have performed data conversion.
 
 2) Move all "magic" value type conversions (float => int etc.) to intrinsic
functions declared in the standard library.
 An additional argument for this move is that, IMO, taking the integer part of
a floating-point number should be as accessible as rounding the number to the
nearest integer, or any other float=>int conversions (trunc/ceil/etc.)
Personally, I _hate_ the fact that you can cast a float to an int. It's pretty amazing to me that Intel has done three different hardware modifications to get around a careless decision in the original C spec.
 3) Switch to a new syntax for dynamic_cast-style casts. I propose the "c is T"
syntax, which should function identically to the current "cast(T)c" and is much
more readable.
 
 Alternatively, we could bring back C-style casts for safe and "magic" type
conversions (points 2 and 3), and leave cast() (or rename it to something
scarier) for unsafe "reinterpret" casts.
 
May 18 2009
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 18 May 2009 05:09:10 -0400, Don <nospam nospam.com> wrote:

 Vladimir Panteleev wrote:
 Hello digitalmars.D,
  I'm concerned about the cast() situation with D. Right now, we use the  
 same keyword and syntax to do three different things:
  1) The obvious plain-old-data cast (casting one pointer type to  
 another, structures, etc.)
 2) Type conversion casts (casting floats to ints)
 3) Dynamic casts for classes (may result in null)
 (There may be more magic going on behind the scenes when e.g. casting  
 between arrays with different element sizes - a proposal I saw some  
 time ago to replace start/length with start/end would get rid of that)
4) removing const/immutable.
5) casting TO immutable. Note that this is far too easy, since the implications are not provable by the compiler. I'm all for overhauling the cast system, As long as we can make the syntax better than C++ (and I suspect we can). -Steve
May 18 2009
prev sibling parent reply grauzone <none example.net> writes:
I'd propose to remove cast(T) from the language, and replace it by 
several special template functions in object.d:
- a "B cast!(A, B)(A x);", which _always_ does the most safe thing, it 
would allow the following: casting between objects and interfaces, all 
implicit casts allowed by the language, and possibly use declared opCast 
from user data types
- a constCast() to fight with thew const/immutable/shared mess
- a lossyCast() for numeric conversions
- possibly a checkedCast() for numeric conversions, which dynamically 
checks for overflows and throws an exception on failure
- reinterpretCast(), the most evil thing that can exist

Note that some casts need compiler support. Like float<->int conversions 
is done with special CPU instructions on the lowest level. To allow 
implementation of the cast functions in pure D code, these instructions 
can be provided as intrinsic functions. (Just like bit scan or port I/O 
instructions.)

The worst thing about the current cast(T) operator is that it easily 
allows reinterpret cast. This is SO NOT SAFE! Especially with arrays, it 
doesn't do something what the user expects. It's a hidden reinterpret 
cast. This really should be changed.

(Oh btw., with my proposal, you could use cast!(T) for proper array 
casts, which allocate a new array, casts each element with cast!(T), and 
return it; or you could use reinterpretCast() to get the current behavior.)
May 18 2009
parent Christopher Wright <dhasenan gmail.com> writes:
grauzone wrote:
 The worst thing about the current cast(T) operator is that it easily 
 allows reinterpret cast. This is SO NOT SAFE! Especially with arrays, it 
 doesn't do something what the user expects. It's a hidden reinterpret 
 cast. This really should be changed.
I use this functionality in a few places. However, there are several other ways of doing it.
May 18 2009