digitalmars.D - A char is also not an int

Arcane Jill (18/21) May 27 2004 While we're on the subject of disunifying one type from another, may I p...

Matthew (7/9) May 27 2004 Can one implicitly convert char to int?
Benji Smith (8/12) May 27 2004 I don't even like the notion of being able to explicitly cast from a

Kevin Bealer (14/27) May 27 2004 I think the opposite is true; with Unicode, the semantics CAN be solid. ...

Stewart Gordon (10/21) May 27 2004 Not even in cryptography and the like?
Walter (30/50) May 27 2004 point out

James McComb (13/18) May 27 2004 I agree with you about chars Walter, but this is because I think chars
Roberto Mariottini (16/37) May 28 2004 That's strange, because this is one of the reasons the makes me *like* P...

Phill (2/2) May 28 2004 Roberto:

Roberto Mariottini (3/5) May 31 2004 "French", "Italian" ? ;-)

Matthew (2/21) Jun 04 2004 But yet we cannot overload on single-b...

Derek Parnell (19/46) May 27 2004 Maybe... Another way of looking at is that a character has (at least) tw...

Arcane Jill <Arcane_member pathlink.com> writes:

While we're on the subject of disunifying one type from another, may I point out
that a char is also not an int.

Back in the old days of C, there was no 8-bit wide type other than char, so if
you wanted an 8-bit wide numeric type, you used a char.

Similarly, in Java, there is no UNSIGNED 16-bit wide type other than char, so if
that's what you need, you use char.

D has no such problems, so maybe it's about time to make the distinction clear.
Logically, it makes no sense to try to do addition and subtraction with the
at-sign or the square-right-bracket symbol. We all KNOW that the zero glyph is
*NOT* the same thing as the number 48.

This was true even back in the days of ASCII, but it's even more true in
Unicode. A char in D stores, not a character, but a fragment of UTF-8, an
encoding of Unicode character - and even a Unicode character is /itself/ an
encoding. There is no longer a one-to-one correspondance between character and
glyph. (There IS such a one-to-one correspondence in the old ASCII range of
\u0020 to \u007E, of course, since Unicode is a superset of ASCII).

Perhaps it's time to change this one too?

       int a = 'X';            // wrong
       char a = 'X';           // right
       int a = cast(int) 'X'   // right

Arcane Jill

May 27 2004

"Matthew" <matthew.hat stlsoft.dot.org> writes:

 While we're on the subject of disunifying one type from another, may I point

out
 that a char is also not an int.

Can one implicitly convert char to int?

Man, that sucks!

Pardon my indignance by crediting my claim never to have tried it because I have
a long-standing aversion to such things from C/C++.

If it's true it needs to be made untrue ASAP.

(Was that strong enough? I hope so ...)

May 27 2004

Benji Smith <dlanguage xxagg.com> writes:

On Thu, 27 May 2004 07:16:19 +0000 (UTC), Arcane Jill
<Arcane_member pathlink.com> wrote:

Perhaps it's time to change this one too?

       int a = 'X';            // wrong
       char a = 'X';           // right
       int a = cast(int) 'X'   // right


I don't even like the notion of being able to explicitly cast from a
char to an int. Especially in the case of unicode characters, the
semantics of a cast (even an explicit cast) are not very well defined.

Getting the int value of a character should, in my opinion, be the
provice of a static method from a specific string class.

--Benji

May 27 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <fd8cb0dfge0cm85o781a2rjpp9ait6fskq 4ax.com>, Benji Smith says...
On Thu, 27 May 2004 07:16:19 +0000 (UTC), Arcane Jill
<Arcane_member pathlink.com> wrote:

Perhaps it's time to change this one too?

       int a = 'X';            // wrong
       char a = 'X';           // right
       int a = cast(int) 'X'   // right


I don't even like the notion of being able to explicitly cast from a
char to an int. Especially in the case of unicode characters, the
semantics of a cast (even an explicit cast) are not very well defined.

Getting the int value of a character should, in my opinion, be the
provice of a static method from a specific string class.

--Benji

I think the opposite is true; with Unicode, the semantics CAN be solid.  In a
normal C program, this is not the case.  Consider:

int chA = 'A';
int chZ = 'Z';

if ((chZ - chA) == 25) {
// Is this true for EBCDIC?  I dunno.
}

In C, the encoding is assumed to be the default system architecture encoding,
which is not necessarily Unicode or ASCII.  But, if the language DEFINES unicode
as the operative representation, then the value 'A' should always be the same
integer value.

In any case, sometimes you need the integer value.

Kevin

May 27 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Arcane Jill wrote:

<snip>
 D has no such problems, so maybe it's about time to make the
 distinction clear. Logically, it makes no sense to try to do addition
 and subtraction with the at-sign or the square-right-bracket symbol.

Not even in cryptography and the like?

 We all KNOW that the zero glyph is *NOT* the same thing as the number
 48.
 
 This was true even back in the days of ASCII, but it's even more true
 in Unicode. A char in D stores, not a character, but a fragment of
 UTF-8, an encoding of Unicode character - and even a Unicode
 character is /itself/ an encoding. There is no longer a one-to-one
 correspondance between character and glyph.

<snip>

By 'character' do you mean 'character' or 'char value'?

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the
unfortunate victim of intensive mail-bombing at the moment.  Please keep
replies on the 'group where everyone may benefit.

May 27 2004

"Walter" <newshound digitalmars.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:c944k3$1o53$1 digitaldaemon.com...
 While we're on the subject of disunifying one type from another, may I

point out
 that a char is also not an int.

 Back in the old days of C, there was no 8-bit wide type other than char,

so if
 you wanted an 8-bit wide numeric type, you used a char.

 Similarly, in Java, there is no UNSIGNED 16-bit wide type other than char,

so if
 that's what you need, you use char.

 D has no such problems, so maybe it's about time to make the distinction

clear.
 Logically, it makes no sense to try to do addition and subtraction with

the
 at-sign or the square-right-bracket symbol. We all KNOW that the zero

glyph is
 *NOT* the same thing as the number 48.

 This was true even back in the days of ASCII, but it's even more true in
 Unicode. A char in D stores, not a character, but a fragment of UTF-8, an
 encoding of Unicode character - and even a Unicode character is /itself/

an
 encoding. There is no longer a one-to-one correspondance between character

and
 glyph. (There IS such a one-to-one correspondence in the old ASCII range

of
 \u0020 to \u007E, of course, since Unicode is a superset of ASCII).

 Perhaps it's time to change this one too?

       int a = 'X';            // wrong
       char a = 'X';           // right
       int a = cast(int) 'X'   // right


I understand where you're coming from, and this is a compelling idea, but
this idea has been tried out before in Pascal. And I can say from personal
experience it is one reason I hate Pascal <g>. Chars do want to be integral
data types, and requiring a cast for it leads to execrably ugly expressions
filled with casts. In moving to C, one of the breaths of fresh air was to
not need all those %^&*^^% casts any more. Let me enumerate a few ways that
chars are used as integral types:

1) converting case
2) using char as index into a translation table
3) encoding/decoding UTF strings
4) encryption/decryption software
5) compression code
6) hashing
7) regex internal implementation
8) char value as input to a state machine like a lexer
9) encoding/decoding strings to/from integers

in other words, routine system programming tasks. The improvement D has,
however, is to have chars be a separate type from byte, which makes for
better self-documenting code, and one can have different overloads for them.

May 27 2004

James McComb <alan jamesmccomb.id.au> writes:

Walter wrote:

 I understand where you're coming from, and this is a compelling idea, but
 this idea has been tried out before in Pascal. And I can say from personal
 experience it is one reason I hate Pascal <g>. Chars do want to be integral
 data types, and requiring a cast for it leads to execrably ugly expressions
 filled with casts.

I agree with you about chars Walter, but this is because I think chars 
are different from bools.

The way I see it, bools can be either TRUE or FALSE, and these values 
not numeric. TRUE + 32 is not defined. (Of course, bools will be 
*implemented* as numeric values, but I'm talking about syntax.)

But character standards, such as ASCII and Unicode, *define* characters 
as numeric quantities. ASCII *defines* A to be 65. So characters really 
are numeric.

'A' + 32 equals 'a'. This behaviour is well-defined.

So I'd like to have a proper bool type, but I'd prefer D chars to remain 
as they are.

James McComb

May 27 2004

Roberto Mariottini <Roberto_member pathlink.com> writes:

In article <c95al5$19mr$1 digitaldaemon.com>, Walter says...
I understand where you're coming from, and this is a compelling idea, but
this idea has been tried out before in Pascal. And I can say from personal
experience it is one reason I hate Pascal <g>. 

That's strange, because this is one of the reasons the makes me *like* Pascal
:-)

Chars do want to be integral
data types, and requiring a cast for it leads to execrably ugly expressions
filled with casts. In moving to C, one of the breaths of fresh air was to
not need all those %^&*^^% casts any more.

In my experience, only poor programming practice leads to manu int <-> char
casts.

Let me enumerate a few ways that
chars are used as integral types:

1) converting case

This is true only for English. Real natural languages are more complex than
this, needing collating tables. I don't know about non-latin alphabets.

2) using char as index into a translation table

type
a: array['a'..'z'] of 'A'..'Z';
b: array[char] of char;

3) encoding/decoding UTF strings
4) encryption/decryption software
5) compression code
6) hashing
7) regex internal implementation

This is something you just won't do frequently, once they are in a library.
Simply converting all input to integers and reconverting the final output to
chars should work.

8) char value as input to a state machine like a lexer
9) encoding/decoding strings to/from integers

I don't see the point here.

in other words, routine system programming tasks. The improvement D has,
however, is to have chars be a separate type from byte, which makes for
better self-documenting code, and one can have different overloads for them.

This is better than nothing :-)

Ciao

May 28 2004

"Phill" <phill pacific.net.au> writes:

Roberto:

Can you explain what you mean by "Real natural languages"?

May 28 2004

Roberto Mariottini <Roberto_member pathlink.com> writes:

In article <c99c0u$12gr$1 digitaldaemon.com>, Phill says...
Roberto:

Can you explain what you mean by "Real natural languages"?

"French", "Italian" ? ;-)

Ciao

May 31 2004

"Matthew" <matthew.hat stlsoft.dot.org> writes:

 I understand where you're coming from, and this is a compelling idea, but
 this idea has been tried out before in Pascal. And I can say from personal
 experience it is one reason I hate Pascal <g>. Chars do want to be integral
 data types, and requiring a cast for it leads to execrably ugly expressions
 filled with casts. In moving to C, one of the breaths of fresh air was to
 not need all those %^&*^^% casts any more. Let me enumerate a few ways that
 chars are used as integral types:

 1) converting case
 2) using char as index into a translation table
 3) encoding/decoding UTF strings
 4) encryption/decryption software
 5) compression code
 6) hashing
 7) regex internal implementation
 8) char value as input to a state machine like a lexer
 9) encoding/decoding strings to/from integers

 in other words, routine system programming tasks. The improvement D has,
 however, is to have chars be a separate type from byte, which makes for
 better self-documenting code, and one can have different overloads for them.

<Horse state="dead" action="flog">But yet we cannot overload on single-bit
integrals and boolean values!</Horse>

Jun 04 2004

Derek Parnell <derek psych.ward> writes:

On Thu, 27 May 2004 07:16:19 +0000 (UTC), Arcane Jill wrote:

 While we're on the subject of disunifying one type from another, may I point
out
 that a char is also not an int.
 
 Back in the old days of C, there was no 8-bit wide type other than char, so if
 you wanted an 8-bit wide numeric type, you used a char.
 
 Similarly, in Java, there is no UNSIGNED 16-bit wide type other than char, so
if
 that's what you need, you use char.
 
 D has no such problems, so maybe it's about time to make the distinction clear.
 Logically, it makes no sense to try to do addition and subtraction with the
 at-sign or the square-right-bracket symbol. We all KNOW that the zero glyph is
 *NOT* the same thing as the number 48.
 
 This was true even back in the days of ASCII, but it's even more true in
 Unicode. A char in D stores, not a character, but a fragment of UTF-8, an
 encoding of Unicode character - and even a Unicode character is /itself/ an
 encoding. There is no longer a one-to-one correspondance between character and
 glyph. (There IS such a one-to-one correspondence in the old ASCII range of
 \u0020 to \u007E, of course, since Unicode is a superset of ASCII).
 
 Perhaps it's time to change this one too?
 
       int a = 'X';            // wrong
       char a = 'X';           // right
       int a = cast(int) 'X'   // right

 

Maybe... Another way of looking at is that a character has (at least) two
properties: a Glyph and an Identifer. Within an encoding set (eg. Unicode,
ASCII, EBCDIC, ...), no two characters have the same identifier even though
they may have the same glyph (eg. Space and Non-Breaking Space). One may
then argue that an efficient datatype for the identier is an unsigned
integer value. This makes it simple to be used as an index into a glyph
table. In fact, an encoding set is like to have multiple glyph tables for
various font representations, but that is another issue all together.

So, an implicit cast from char to int would be just getting the character's
identifier value, which is not such a bad thing. 

What is a bad thing is making assumptions about the relationships between
character identifers. There is no necessary correlation between an
character set's collation sequence and the characters' identifiers.

I frequently work with encryption algorithms, and integer character
identifiers are a *very* handy thing indeed.

-- 
Derek
28/May/04 10:50:16 AM

May 27 2004

D Programming

C/C++ Programming

Other

digitalmars.D - A char is also not an int