D - UNICODE operators

Mark Brudnak (37/37) Dec 02 2003 When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16...

Georg Wrede (6/12) Dec 03 2003 I see it as a problem for code maintainers and debugging people.
Walter (3/3) Dec 03 2003 These ideas have merit. Something useful ought to be done with unicode! ...

Sean L. Palmer (9/12) Dec 03 2003 That really doesn't matter. That's what Character Map or BabelMap are f...

Mark Brudnak (22/34) Dec 03 2003 for!

Sean L. Palmer (8/45) Dec 03 2003 I want more operators. I am with you. I want to take advantage of unic...

Walter (9/13) Dec 03 2003 of

Sean L. Palmer (10/23) Dec 03 2003 of

Mark J. Brudnak (33/120) Dec 03 2003 The UNICODE spec has a lot of mathematical symbols already defined (~100...

Ilya Minkov (9/12) Dec 03 2003 You shall have a big, no, really HUGE parser handling these...

Mark Brudnak (19/31) Dec 03 2003 No, the parser would have to detect three tokens <[, identifier, ]>. ...

Hauke Duden (15/29) Dec 03 2003 My email client shows '?' for all your suggestions. I expect most

Sean L. Palmer (11/40) Dec 03 2003 Win95 is dying, if not dead, for development purposes.

Hauke Duden (6/7) Dec 03 2003 Win95 is close to dead: about 2% of our customers. But we still have 30%...

Roald Ribe (9/15) Dec 03 2003 UNICODE support files for Win95 -> Me

Hauke Duden (8/24) Dec 04 2003 The MSLU is just a layer above the normal ANSI API. It converts all

Roald Ribe (11/25) Dec 04 2003 Yes, that is true. But it also means that if the user/admin has set

Hauke Duden (8/25) Dec 04 2003 That was not the topic of this discussion. My point was that we

Walter (8/10) Dec 19 2003 I agree. D should fully support developing unicode apps. I should point ...

Elias Martenson (13/21) Dec 04 2003 Unix has pretty much settled on using UTF-8 for external representation

Sean L. Palmer (6/27) Dec 04 2003 Right. And the OS should provide at least one font that has every singl...

Elias Martenson (19/22) Dec 04 2003 Yes it certainly should. Now, my Linux installationlacks fonts for a lar...

Sean L. Palmer (15/37) Dec 04 2003 That's fine with me, so long as they are not expressly prohibited, I can...

J C Calvarese (6/21) Dec 04 2003 Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode...

Sean L. Palmer (27/63) Dec 05 2003 Yeah, just have to set this "free" browser to Encoding... Unicode UTF-8

J C Calvarese (12/118) Dec 05 2003 OK, so I didn't send it right. (That's what a WASP like me gets for

Elias Martenson (5/8) Dec 05 2003 Neat. Although your newsreader didn't include a proper encoding header.
Mark J. Brudnak (11/34) Dec 05 2003 I think only "letter-like" unicode characters should be allowed in D

Sean L. Palmer (5/11) Dec 05 2003 Agreed, though I would like to use symbols as operators.
J C Calvarese (8/29) Dec 05 2003 My mail program garbled the UTF-8 file that I was trying to use as an
Walter (4/6) Dec 19 2003 You're right, and that's the way it works now. I'm going by the C98

Andy Friesen (5/17) Dec 03 2003 Bjarne suggested something similar to this for C++ once:
Antti =?iso-8859-1?Q?Syk=E4ri?= (8/11) Dec 03 2003 This is also a problem that the language designer cannot fix by fixing

Sean L. Palmer (4/15) Dec 03 2003 So someone can make a killing selling D Programmers' Keyboards!! ;)

Elias Martenson (11/12) Dec 04 2003 Remember APL? Let's not go there again. :-)

"Mark Brudnak" <malibrud provide.net> writes:

When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16,
UTF-32 source code formats. I propose that D extend its available set of
operators (and maintain the current set) and draw from the unicode
extensions for additional operators.  For example:

LOGICAL OPERATORS
==================
? (unicode 2264)     may be used instead of     <=
? (unicode 2265)     may be used instead of     >=
? (unicode 2260)     may be used instead of     !=
? (unicode 225F)     may be used instead of     ==
? (unicode 2227)     may be used instead of     &&
? (unicode 2228)     may be used instead of     ||

INFIX OPERATORS (may only be overloaded)
================
? (unicode 2218)    may be introduced as the Schur product
? (unicode 22C5)    may be introduced as the dot product
� (unicode 00D7)    may be introduced as the cross product
? (unicode 22C2)    may be introduced as the union of two sets



etc...



UNARY OPERATORS (may only be overloaded)

============

? (unicode 2218)    may be introduced as the square root



These were just chosen to provide some examples.  There are a slew of
symbols, most of which do not make sense in a programming environment.
However some of these symbols may be useful to those who wish to over load
them for a particular class they are developing.



i.e.



a = b � c ;



is cleaner than



a = cross(b, c) ;



or worse yet



a = b.cross(c) ;



The largest difficulty with such a scheme is that our keyboards are not
UNICODE friendly.  This I see as a problem for the editor and operating
system and not so much for the D language itself.



Any way ... your thoughts??



Mark.

Dec 02 2003

Georg Wrede <Georg_member pathlink.com> writes:

In article <bqjndj$138p$1 digitaldaemon.com>, Mark Brudnak says...
When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16,
UTF-32 source code formats. I propose that D extend its available set of
operators (and maintain the current set) and draw from the unicode

..
The largest difficulty with such a scheme is that our keyboards are not
UNICODE friendly.  This I see as a problem for the editor and operating
system and not so much for the D language itself.

I see it as a problem for code maintainers and debugging people.
_They_ are not guaranteed to have the last and most international
os version at hand, or if they do they still might no be able to see
or even type such characters.

Dec 03 2003

"Walter" <walter digitalmars.com> writes:

These ideas have merit. Something useful ought to be done with unicode! The
lack of a decent unicode keyboard is a problem, though, as it will be hard
for anyone to type in the unicode operators.

Dec 03 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

That really doesn't matter.  That's what Character Map or BabelMap are for!

Besides you'd likely be able to cut and paste them either from the header or
the documentation.

If someone makes some code that uses wierd unicode operators, you don't have
to use it (or you can wrap it in ugly function call syntax).

Sean


"Walter" <walter digitalmars.com> wrote in message
news:bql9s5$bkg$1 digitaldaemon.com...
 These ideas have merit. Something useful ought to be done with unicode!

The
 lack of a decent unicode keyboard is a problem, though, as it will be hard
 for anyone to type in the unicode operators.

Dec 03 2003

"Mark Brudnak" <malibrud provide.net> writes:

"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:bqlo7i$111q$1 digitaldaemon.com...
 That really doesn't matter.  That's what Character Map or BabelMap are

for!

Yes, this will work however it is not optimal.  Too much keyboard - mouse
switching is difficult.  The thing that will make this work is the editor.
I use VIM but am not familiar with its macro or shortcut features.  EMACS
must have similar features.  In either of these editors (or
others....LEDS....DIDE...) there will be a hassle/benefit tradeoff to the
macro approach.  The tipping point will be (I think) when the following
happens:

1) The symbols are rendered in the editor (I can see the typeface, unlike my
original post :^) ).
2) A symbol can be entered from a QWERTY keyboard using an escape/control
key plus 3-5 other key strokes.  this would be editor-specific.

Mark.


 Besides you'd likely be able to cut and paste them either from the header

or
 the documentation.

Too much hassle.

 If someone makes some code that uses wierd unicode operators, you don't

have
 to use it (or you can wrap it in ugly function call syntax).

It makes sense to reserve all UNICODE "ARROWS" and "MATH OPERATORS" as
symbols that cannot be used in identifiers.  We should then choose a handful
to serve as valid operators to start out with.

 Sean


 "Walter" <walter digitalmars.com> wrote in message
 news:bql9s5$bkg$1 digitaldaemon.com...
 These ideas have merit. Something useful ought to be done with unicode!

 The
 lack of a decent unicode keyboard is a problem, though, as it will be


hard
 for anyone to type in the unicode operators.

Dec 03 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

I want more operators.  I am with you.  I want to take advantage of unicode.

I really see no reason why we should not be able to take any combination of
characters that Unicode classifies as symbols, and make an operator out of
it.  The designers of D cannot possibly predict all the operators people are
going to need or want.

Sean

"Mark Brudnak" <malibrud provide.net> wrote in message
news:bqjndj$138p$1 digitaldaemon.com...
 When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16,
 UTF-32 source code formats. I propose that D extend its available set of
 operators (and maintain the current set) and draw from the unicode
 extensions for additional operators.  For example:

 LOGICAL OPERATORS
 ==================
 ? (unicode 2264)     may be used instead of     <=
 ? (unicode 2265)     may be used instead of     >=
 ? (unicode 2260)     may be used instead of     !=
 ? (unicode 225F)     may be used instead of     ==
 ? (unicode 2227)     may be used instead of     &&
 ? (unicode 2228)     may be used instead of     ||

 INFIX OPERATORS (may only be overloaded)
 ================
 ? (unicode 2218)    may be introduced as the Schur product
 ? (unicode 22C5)    may be introduced as the dot product
 � (unicode 00D7)    may be introduced as the cross product
 ? (unicode 22C2)    may be introduced as the union of two sets



 etc...



 UNARY OPERATORS (may only be overloaded)

 ============

 ? (unicode 2218)    may be introduced as the square root



 These were just chosen to provide some examples.  There are a slew of
 symbols, most of which do not make sense in a programming environment.
 However some of these symbols may be useful to those who wish to over load
 them for a particular class they are developing.



 i.e.



 a = b � c ;



 is cleaner than



 a = cross(b, c) ;



 or worse yet



 a = b.cross(c) ;



 The largest difficulty with such a scheme is that our keyboards are not
 UNICODE friendly.  This I see as a problem for the editor and operating
 system and not so much for the D language itself.



 Any way ... your thoughts??



 Mark.

Dec 03 2003

"Walter" <walter digitalmars.com> writes:

"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:bqlasv$d7e$1 digitaldaemon.com...
 I really see no reason why we should not be able to take any combination

of
 characters that Unicode classifies as symbols, and make an operator out of
 it.  The designers of D cannot possibly predict all the operators people

are
 going to need or want.

Some problems:
1) the precedence level of those operators.
2) what this implies is user-definable tokens, which is a big problem with a
language that has as a design goal the ability to tokenize it without
needing to do parse or semantic analysis.

Dec 03 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

"Walter" <walter digitalmars.com> wrote in message
news:bqlc2t$f0p$2 digitaldaemon.com...
 "Sean L. Palmer" <palmer.sean verizon.net> wrote in message
 news:bqlasv$d7e$1 digitaldaemon.com...
 I really see no reason why we should not be able to take any combination

 of
 characters that Unicode classifies as symbols, and make an operator out


of
 it.  The designers of D cannot possibly predict all the operators people

 are
 going to need or want.

 Some problems:
 1) the precedence level of those operators.

Yah that's a biggie.  But I'd be ok with them defaulting to the lowest
precedence and my being forced to use parenthesis.

 2) what this implies is user-definable tokens, which is a big problem with

a
 language that has as a design goal the ability to tokenize it without
 needing to do parse or semantic analysis.

So require whitespace between operator tokens.  It's easy to distinguish the
boundary between brackets and symbols, or alphanumeric and symbols.
Maybe limit the user-defined operators to no more than two symbols.

Sean

Dec 03 2003

"Mark J. Brudnak" <mjbrudna oakland.edu> writes:

The UNICODE spec has a lot of mathematical symbols already defined (~100's).
In my view combining ASCII symbols to form more operators is *not* the way
to go.  It would make the syntax even more difficult to parse and probably
lead to abmbiguous syntax.  A UNICODE character is one text symbol which can
map to an operation (easy to parse).

In "ASCII-land" the best approach to arbitrary operators is to define them
with strings along with some yet-to-be-defined "bracket operator" to delimit
them.

For example, say I wanted to define some obtuse binary operator like the
vector-exterior-product then my operator would be defined as a string like
'extprod' and some language-defined bracket, say <[ and ]>.  To call this
operator the code would look like this.

myBivector = oneVector <[extprod]> anotherVector ; /* traditional infix
notation w/ bulky operator */

The operator would be defined as:

class ga {
    float [] vector ;
    int    size ;

    ga = <[extprod]>( ga vectorB) {
        /* compute the exterior product of 'this' and vectorB */
    }
}

It is bulky, but it would allow the definition of arbitrary operators in
ASCII!  As was said earlier, UNICODE is the way to go, it has a defined
symbol for the exterior product :^).

mark.

"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:bqlasv$d7e$1 digitaldaemon.com...
 I want more operators.  I am with you.  I want to take advantage of

unicode.
 I really see no reason why we should not be able to take any combination

of
 characters that Unicode classifies as symbols, and make an operator out of
 it.  The designers of D cannot possibly predict all the operators people

are
 going to need or want.

 Sean

 "Mark Brudnak" <malibrud provide.net> wrote in message
 news:bqjndj$138p$1 digitaldaemon.com...
 When reading the D spec I noticed that it supports UNICODE UTF-8,


UTF-16,
 UTF-32 source code formats. I propose that D extend its available set of
 operators (and maintain the current set) and draw from the unicode
 extensions for additional operators.  For example:

 LOGICAL OPERATORS
 ==================
 ? (unicode 2264)     may be used instead of     <=
 ? (unicode 2265)     may be used instead of     >=
 ? (unicode 2260)     may be used instead of     !=
 ? (unicode 225F)     may be used instead of     ==
 ? (unicode 2227)     may be used instead of     &&
 ? (unicode 2228)     may be used instead of     ||

 INFIX OPERATORS (may only be overloaded)
 ================
 ? (unicode 2218)    may be introduced as the Schur product
 ? (unicode 22C5)    may be introduced as the dot product
 � (unicode 00D7)    may be introduced as the cross product
 ? (unicode 22C2)    may be introduced as the union of two sets



 etc...



 UNARY OPERATORS (may only be overloaded)

 ============

 ? (unicode 2218)    may be introduced as the square root



 These were just chosen to provide some examples.  There are a slew of
 symbols, most of which do not make sense in a programming environment.
 However some of these symbols may be useful to those who wish to over


load
 them for a particular class they are developing.



 i.e.



 a = b � c ;



 is cleaner than



 a = cross(b, c) ;



 or worse yet



 a = b.cross(c) ;



 The largest difficulty with such a scheme is that our keyboards are not
 UNICODE friendly.  This I see as a problem for the editor and operating
 system and not so much for the D language itself.



 Any way ... your thoughts??



 Mark.

Dec 03 2003

Ilya Minkov <minkov cs.tum.edu> writes:

Mark J. Brudnak wrote:

 It is bulky, but it would allow the definition of arbitrary operators in
 ASCII!  As was said earlier, UNICODE is the way to go, it has a defined
 symbol for the exterior product :^).

You shall have a big, no, really HUGE parser handling these...

Because the parsing manner is not generic and you need to set operator 
precedence by constructing a big... mess!

or have all these operators have the same precedence?

or even make it an error to rely on precedence of these operators like 
lint does?

Another idea: 'blabla' should be enough for the ascii infix notation.

-eye

Dec 03 2003

"Mark Brudnak" <malibrud provide.net> writes:

"Ilya Minkov" <minkov cs.tum.edu> wrote in message
news:bqljrj$q74$1 digitaldaemon.com...
 Mark J. Brudnak wrote:

 It is bulky, but it would allow the definition of arbitrary operators in
 ASCII!  As was said earlier, UNICODE is the way to go, it has a defined
 symbol for the exterior product :^).

 You shall have a big, no, really HUGE parser handling these...

No, the parser would have to detect three tokens <[,  identifier, ]>.   It
would have to take care of right/left matching.  'identifier' could be any
valid D identifier like 'foo', 'bar'.  For example:

moo  =  foo <[ goo ]> zoo ;

This statement parses to the following tokens ;
moo
=
foo
<[
goo
]>
zoo
;

The compiler then knows that this is equivalent to:

moo  = foo.goo(zoo) ;


 Because the parsing manner is not generic and you need to set operator
 precedence by constructing a big... mess!

 or have all these operators have the same precedence?

They would be higher than assignment. Otherwise be made explicit with
parenthesies.

 or even make it an error to rely on precedence of these operators like
 lint does?

 Another idea: 'blabla' should be enough for the ascii infix notation.

 -eye

Dec 03 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

Mark Brudnak wrote:

 When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16,
 UTF-32 source code formats. I propose that D extend its available set of
 operators (and maintain the current set) and draw from the unicode
 extensions for additional operators.  For example:
 
 LOGICAL OPERATORS
 ==================
 ? (unicode 2264)     may be used instead of     <=
 ? (unicode 2265)     may be used instead of     >=
 ? (unicode 2260)     may be used instead of     !=
 ? (unicode 225F)     may be used instead of     ==
 ? (unicode 2227)     may be used instead of     &&
 ? (unicode 2228)     may be used instead of     ||
 

My email client shows '?' for all your suggestions. I expect most 
current code editors will do the same, since most programming languages 
use ASCII encoding for their source code.

It would be quite some task to figure out what another programmer meant 
when he wrote:

x = ((a ? b) ? c ? d ) ? e;

Some operating systems (i.e. Win9x) don't even have support for printing 
unicode text on the screen, unless the used characters happen to also be 
available in the current code page. So it would be close to impossible 
to write a proper Unicode code editor on those OSs.

And then, of course, there's the problem of entering such operators. My 
keyboard doesn't have any keys for (unicode 2264), (unicode 2265),... .

It's a great idea, but currently I fear it is not practical.

Hauke

Dec 03 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

Win95 is dying, if not dead, for development purposes.

You should look forward, it won't be long before all operating systems and
all applications support unicode fully.

Unless you think we're all gonna give up on this unicode nonsense in the
near future, and go back to ascii.  ;)

It is a feature that doesn't have to be 100% implemented right away, and it
is a feature that you are not forced to use.

Sean

"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:bqlept$j09$1 digitaldaemon.com...
 Mark Brudnak wrote:

 When reading the D spec I noticed that it supports UNICODE UTF-8,


UTF-16,
 UTF-32 source code formats. I propose that D extend its available set of
 operators (and maintain the current set) and draw from the unicode
 extensions for additional operators.  For example:

 LOGICAL OPERATORS
 ==================
 ? (unicode 2264)     may be used instead of     <=
 ? (unicode 2265)     may be used instead of     >=
 ? (unicode 2260)     may be used instead of     !=
 ? (unicode 225F)     may be used instead of     ==
 ? (unicode 2227)     may be used instead of     &&
 ? (unicode 2228)     may be used instead of     ||

 My email client shows '?' for all your suggestions. I expect most
 current code editors will do the same, since most programming languages
 use ASCII encoding for their source code.

 It would be quite some task to figure out what another programmer meant
 when he wrote:

 x = ((a ? b) ? c ? d ) ? e;

 Some operating systems (i.e. Win9x) don't even have support for printing
 unicode text on the screen, unless the used characters happen to also be
 available in the current code page. So it would be close to impossible
 to write a proper Unicode code editor on those OSs.

 And then, of course, there's the problem of entering such operators. My
 keyboard doesn't have any keys for (unicode 2264), (unicode 2265),... .

 It's a great idea, but currently I fear it is not practical.

 Hauke

Dec 03 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

 Win95 is dying, if not dead, for development purposes.

Win95 is close to dead: about 2% of our customers. But we still have 30% 
customers using Win98 or WinME.

And I'm sure there are lots of Unix systems that would also have their 
problems with this - having been invented when ASCII ruled the world and 
Unicode didn't even exist.

Hauke

Dec 03 2003

"Roald Ribe" <rr.no spam.teikom.no> writes:

"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:bqlunf$1ag4$2 digitaldaemon.com...
 Win95 is dying, if not dead, for development purposes.

 Win95 is close to dead: about 2% of our customers. But we still have 30%
 customers using Win98 or WinME.

 And I'm sure there are lots of Unix systems that would also have their
 problems with this - having been invented when ASCII ruled the world and
 Unicode didn't even exist.

UNICODE support files for Win95 -> Me

Microsoft Layer for Unicode on Windows 95/95/ME Systems (MSLU)
    version 1.0  (http://tinyurl.com/qynq)

The question at hand is: is D going to be a language of the future,
for all languages, all over the globe, or will it be a conservative
backward looking effort?

Roald

Dec 03 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

Roald Ribe wrote:
Win95 is close to dead: about 2% of our customers. But we still have 30%
customers using Win98 or WinME.

And I'm sure there are lots of Unix systems that would also have their
problems with this - having been invented when ASCII ruled the world and
Unicode didn't even exist.

 
 
 UNICODE support files for Win95 -> Me
 
 Microsoft Layer for Unicode on Windows 95/95/ME Systems (MSLU)
     version 1.0  (http://tinyurl.com/qynq)
 
 The question at hand is: is D going to be a language of the future,
 for all languages, all over the globe, or will it be a conservative
 backward looking effort?

The MSLU is just a layer above the normal ANSI API. It converts all 
Unicode strings to ANSI before passing it to functions and converts the 
results back to Unicode afterwards.

That means that Unicode characters that cannot be represented in the 
current (ANSI) code page will just be replaced with '?', or whatever the 
conversion routines use in such a case.

Hauke

Dec 04 2003

"Roald Ribe" <rr.no spam.teikom.no> writes:

 UNICODE support files for Win95 -> Me

 Microsoft Layer for Unicode on Windows 95/95/ME Systems (MSLU)
     version 1.0  (http://tinyurl.com/qynq)

 The question at hand is: is D going to be a language of the future,
 for all languages, all over the globe, or will it be a conservative
 backward looking effort?

 The MSLU is just a layer above the normal ANSI API. It converts all
 Unicode strings to ANSI before passing it to functions and converts the
 results back to Unicode afterwards.

 That means that Unicode characters that cannot be represented in the
 current (ANSI) code page will just be replaced with '?', or whatever the
 conversion routines use in such a case.

Yes, that is true. But it also means that if the user/admin has set
up the correct codepage/fonts for the language they work in, the
application using the API will not need to know what codepage that
is, it will just work with UNICODE. (openoffice.org uses this
system on older Win9X platforms)

It is a stop gap measure to allow modern programs run on older
platforms, not the greatest invention since sliced bread ;-)

It would allow a full UNICODE D app to run unmodified on any
of those systems, get full use of UNICODE on newer systems,
and still just use one API.

Roald

Dec 04 2003

Hauke Duden <H.NS.Duden gmx.net> writes:

Roald Ribe wrote:
That means that Unicode characters that cannot be represented in the
current (ANSI) code page will just be replaced with '?', or whatever the
conversion routines use in such a case.

 
 
 Yes, that is true. But it also means that if the user/admin has set
 up the correct codepage/fonts for the language they work in, the
 application using the API will not need to know what codepage that
 is, it will just work with UNICODE. (openoffice.org uses this
 system on older Win9X platforms)
 
 It is a stop gap measure to allow modern programs run on older
 platforms, not the greatest invention since sliced bread ;-)
 
 It would allow a full UNICODE D app to run unmodified on any
 of those systems, get full use of UNICODE on newer systems,
 and still just use one API.

That was not the topic of this discussion. My point was that we 
shouldn't use Unicode characters for something as essential to the 
language as operators, because then the code will only be readable if 
your editor/OS uses a code page that happens to contain these symbols.

Creating Unicode applications in D is a completely different thing (and 
it was/is already discussed in a different thread).

Hauke

Dec 04 2003

"Walter" <walter digitalmars.com> writes:

"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:bqnr2q$1240$1 digitaldaemon.com...
 Creating Unicode applications in D is a completely different thing (and
 it was/is already discussed in a different thread).

I agree. D should fully support developing unicode apps. I should point out,
though, that right now D supports unicode source text (UTF-8, UTF-16, and
UTF-32), unicode characters in comments and strings, and unicode alpha
characters in identifiers.

I'm not sure, though, if the world is quite ready yet for unicode operators.
We'll see.

Dec 19 2003

Elias Martenson <no spam.spam> writes:

Den Thu, 04 Dec 2003 01:44:25 +0100 skrev Hauke Duden:

 Win95 is dying, if not dead, for development purposes.

 
 Win95 is close to dead: about 2% of our customers. But we still have 30% 
 customers using Win98 or WinME.
 
 And I'm sure there are lots of Unix systems that would also have their 
 problems with this - having been invented when ASCII ruled the world and 
 Unicode didn't even exist.

Unix has pretty much settled on using UTF-8 for external representation
and before long all text files in Unix will be UTF-8 instead of some local
encoding.

Here's a quote from the excellent UTF-8 for Unix FAQ
(http://www.cl.cam.ac.uk/~mgk25/unicode.html):

"With the UTF-8 encoding, Unicode can be used in a convenient and
backwards compatible way in environments that, like Unix, were designed
entirely around ASCII. UTF-8 is the way in which Unicode is used under
Unix, Linux, and similar systems. It is now time to make sure that you are
well familiar with it and that your software supports UTF-8 smoothly."

Regards

Elias

Dec 04 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

Right.  And the OS should provide at least one font that has every single
unicode character, for use as fallback for fonts that are missing such
characters.

Sean

"Elias Martenson" <no spam.spam> wrote in message
news:pan.2003.12.04.11.26.05.375275 spam.spam...
 Den Thu, 04 Dec 2003 01:44:25 +0100 skrev Hauke Duden:

 Win95 is dying, if not dead, for development purposes.

 Win95 is close to dead: about 2% of our customers. But we still have 30%
 customers using Win98 or WinME.

 And I'm sure there are lots of Unix systems that would also have their
 problems with this - having been invented when ASCII ruled the world and
 Unicode didn't even exist.

 Unix has pretty much settled on using UTF-8 for external representation
 and before long all text files in Unix will be UTF-8 instead of some local
 encoding.

 Here's a quote from the excellent UTF-8 for Unix FAQ
 (http://www.cl.cam.ac.uk/~mgk25/unicode.html):

 "With the UTF-8 encoding, Unicode can be used in a convenient and
 backwards compatible way in environments that, like Unix, were designed
 entirely around ASCII. UTF-8 is the way in which Unicode is used under
 Unix, Linux, and similar systems. It is now time to make sure that you are
 well familiar with it and that your software supports UTF-8 smoothly."

 Regards

 Elias

Dec 04 2003

Elias Martenson <no spam.spam> writes:

Den Thu, 04 Dec 2003 10:56:46 -0800 skrev Sean L. Palmer:

 Right.  And the OS should provide at least one font that has every single
 unicode character, for use as fallback for fonts that are missing such
 characters.

Yes it certainly should. Now, my Linux installationlacks fonts for a large
set of the unihan code points, but other than that I have most of them.

In fact, I think that almost all existing installed operating systems
today would be able to handle unicode operators. However, I think the
problem with them is more related to the fact that you more than likely
will need a special editor for the code (at least if you don't want to try
to remember all the \u-codes for the operators).

Unicode is very important, as I have pointed out several times in the
other unicode thread, but it deals with strings in the language. Not the
source code itself.

Do I think the designers of Java made a mistake when support unicode in
it's symbols? A few years ago I would have said yes. Now, I say that it
really didn't matter. People don't use unicode symbols anyway. Therefore,
I believe that this discussion is a non-issue. EVen if unicode operatos
would be supported, I doubdt people would use them in the name of
interoperability.

Regards

Elias

Dec 04 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

That's fine with me, so long as they are not expressly prohibited, I can use
them for my own personal projects.  Support for them would then grow
grassroots-style.  I have text editors that support Unicode, and I don't
mind cutting and pasting.  Ease of entry is a minor issue to me.

The problem is, if we can't define new operators in D, and it doesn't
provide enough overloadable builtin operators, I'm stuck.  I can do nothing
but invest in a Unicode-aware preprocessor.  I want the option of moving
forward.

What good is being able to compile D source encoded in UTF-8 if you aren't
allowed to use any symbols that aren't in ASCII?  (except embedded in string
literals)

Sean

"Elias Martenson" <no spam.spam> wrote in message
news:pan.2003.12.04.23.39.50.952964 spam.spam...
 Den Thu, 04 Dec 2003 10:56:46 -0800 skrev Sean L. Palmer:

 Right.  And the OS should provide at least one font that has every


single
 unicode character, for use as fallback for fonts that are missing such
 characters.

 Yes it certainly should. Now, my Linux installationlacks fonts for a large
 set of the unihan code points, but other than that I have most of them.

 In fact, I think that almost all existing installed operating systems
 today would be able to handle unicode operators. However, I think the
 problem with them is more related to the fact that you more than likely
 will need a special editor for the code (at least if you don't want to try
 to remember all the \u-codes for the operators).

 Unicode is very important, as I have pointed out several times in the
 other unicode thread, but it deals with strings in the language. Not the
 source code itself.

 Do I think the designers of Java made a mistake when support unicode in
 it's symbols? A few years ago I would have said yes. Now, I say that it
 really didn't matter. People don't use unicode symbols anyway. Therefore,
 I believe that this discussion is a non-issue. EVen if unicode operatos
 would be supported, I doubdt people would use them in the name of
 interoperability.

 Regards

 Elias

Dec 04 2003

J C Calvarese <jcc7 cox.net> writes:

Sean L. Palmer wrote:
 That's fine with me, so long as they are not expressly prohibited, I can use
 them for my own personal projects.  Support for them would then grow
 grassroots-style.  I have text editors that support Unicode, and I don't
 mind cutting and pasting.  Ease of entry is a minor issue to me.
 
 The problem is, if we can't define new operators in D, and it doesn't
 provide enough overloadable builtin operators, I'm stuck.  I can do nothing
 but invest in a Unicode-aware preprocessor.  I want the option of moving
 forward.
 
 What good is being able to compile D source encoded in UTF-8 if you aren't
 allowed to use any symbols that aren't in ASCII?  (except embedded in string
 literals)

Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode 
alpha") are allowed as identifier names.  (See the attached example.) 
Also, comments can contain any non-ASCII character.

I do think Unicode operators is an interesting idea.


Justin

 
 Sean

Dec 04 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

Yeah, just have to set this "free" browser to Encoding... Unicode UTF-8

That's pretty cool.  Pretty cool indeed.

I bet you if I cut and paste some D program made by someone is a far-away
land, into some web-based translator engine it would probably not do that
bad of a job of translating the identifiers back into english again ;)

Most likely, I'll rarely if ever see any source written in some other
language, and if I did, I'd just consider it obfuscation.  It's not a sin
punishable by death.

I think it's cool that finally people can more or less program in their own
language, once they learn the english keywords.  A preprocessor would allow
even those to be replaced.

In fact, whose idea was it to allow infix notation for regular identifiers?
We could use a preprocessor to translate our D + Unicode Symbols into D that
will actually compile.  ;)  Right now it would only work with prefix
(lisp-like) notation, however.

They have some really interesting brackets in Unicode, as well.  Surely
there's one just begging to be used for template syntax.

Sean

"J C Calvarese" <jcc7 cox.net> wrote in message
news:bqpbqo$8no$1 digitaldaemon.com...
 Sean L. Palmer wrote:
 That's fine with me, so long as they are not expressly prohibited, I can


use
 them for my own personal projects.  Support for them would then grow
 grassroots-style.  I have text editors that support Unicode, and I don't
 mind cutting and pasting.  Ease of entry is a minor issue to me.

 The problem is, if we can't define new operators in D, and it doesn't
 provide enough overloadable builtin operators, I'm stuck.  I can do


nothing
 but invest in a Unicode-aware preprocessor.  I want the option of moving
 forward.

 What good is being able to compile D source encoded in UTF-8 if you


aren't
 allowed to use any symbols that aren't in ASCII?  (except embedded in


string
 literals)

 Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode
 alpha") are allowed as identifier names.  (See the attached example.)
 Also, comments can contain any non-ASCII character.

 I do think Unicode operators is an interesting idea.


 Justin

 Sean



----------------------------------------------------------------------------
----


 const char[] S� = "yes";
 const char[] A�o = "year";

 /+

 These don't work (it might be because they are iconic symbols rather than

part of any actual language)
 const char[] ???? = "box drawing";
 const char[] ???? = "cards";

 +/


 int main()
 {

   int A�oN�mero = 2003;
   int Cyrillic???? = 1;
   int Hebrew?????;

   printf("%d", A�oN�mero);

   return 0;
 }

Dec 05 2003

J C Calvarese <jcc7 cox.net> writes:

Sean L. Palmer wrote:

 Yeah, just have to set this "free" browser to Encoding... Unicode UTF-8

OK, so I didn't send it right.  (That's what a WASP like me gets for 
belittling ASCII.)  Unicode isn't very friendly to novices.  I think 
putting it in a .zip will help out.  Maybe it will work if I turn it 
into an .html file.

I'm sure there's a setting in Thunderbird that will take care of this 
stuff automatically; I'm just not sure how much time I want to spent 
looking for it.

(By the way, I used WinXP's notepad to create the original document 
because I was lazy and didn't want to hunt down another Unicode-capable 
editor.)

Justin


 
 That's pretty cool.  Pretty cool indeed.
 
 I bet you if I cut and paste some D program made by someone is a far-away
 land, into some web-based translator engine it would probably not do that
 bad of a job of translating the identifiers back into english again ;)
 
 Most likely, I'll rarely if ever see any source written in some other
 language, and if I did, I'd just consider it obfuscation.  It's not a sin
 punishable by death.
 
 I think it's cool that finally people can more or less program in their own
 language, once they learn the english keywords.  A preprocessor would allow
 even those to be replaced.
 
 In fact, whose idea was it to allow infix notation for regular identifiers?
 We could use a preprocessor to translate our D + Unicode Symbols into D that
 will actually compile.  ;)  Right now it would only work with prefix
 (lisp-like) notation, however.
 
 They have some really interesting brackets in Unicode, as well.  Surely
 there's one just begging to be used for template syntax.
 
 Sean
 
 "J C Calvarese" <jcc7 cox.net> wrote in message
 news:bqpbqo$8no$1 digitaldaemon.com...
 
Sean L. Palmer wrote:

That's fine with me, so long as they are not expressly prohibited, I can


 
 use
 
them for my own personal projects.  Support for them would then grow
grassroots-style.  I have text editors that support Unicode, and I don't
mind cutting and pasting.  Ease of entry is a minor issue to me.

The problem is, if we can't define new operators in D, and it doesn't
provide enough overloadable builtin operators, I'm stuck.  I can do


 
 nothing
 
but invest in a Unicode-aware preprocessor.  I want the option of moving
forward.

What good is being able to compile D source encoded in UTF-8 if you


 
 aren't
 
allowed to use any symbols that aren't in ASCII?  (except embedded in


 
 string
 
literals)

Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode
alpha") are allowed as identifier names.  (See the attached example.)
Also, comments can contain any non-ASCII character.

I do think Unicode operators is an interesting idea.


Justin


Sean


 
 
 ----------------------------------------------------------------------------
 ----
 
 
 
const char[] S� = "yes";
const char[] A�o = "year";

/+

These don't work (it might be because they are iconic symbols rather than

 
 part of any actual language)
 
const char[] ???? = "box drawing";
const char[] ???? = "cards";

+/


int main()
{

  int A�oN�mero = 2003;
  int Cyrillic???? = 1;
  int Hebrew?????;

  printf("%d", A�oN�mero);

  return 0;
}

Dec 05 2003

Elias Martenson <no spam.spam> writes:

Den Fri, 05 Dec 2003 01:34:19 -0600 skrev J C Calvarese:

 Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode 
 alpha") are allowed as identifier names.  (See the attached example.) 
 Also, comments can contain any non-ASCII character.

Neat. Although your newsreader didn't include a proper encoding header.
Not your fault, but rather the broken software. :-)

Regards

Elias

Dec 05 2003

"Mark J. Brudnak" <mjbrudna oakland.edu> writes:

"J C Calvarese" <jcc7 cox.net> wrote


<snip>

 Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode
 alpha") are allowed as identifier names.  (See the attached example.)
 Also, comments can contain any non-ASCII character.

I think only "letter-like" unicode characters should be allowed in D
identifiers.  Having variables like

int   = 42 ;
float �ק =3.14159 ;

will really confuse things.  Punctuation, shapes, boxdrawing, dingbats, math
symbols, should be prohibited from being used in identifiers.

 I do think Unicode operators is an interesting idea.


 Justin

 Sean



----------------------------------------------------------------------------
----


 

 const char[] Sí = "yes";
 const char[] Año = "year";

 /+

 These don't work (it might be because they are iconic symbols rather than

part of any actual language)
 const char[] �. �.��.��.� = "box drawing";
 const char[] �T �T��T��T� = "cards";

 +/


 int main()
 {

   int AñoNúmero = 2003;
   int Cyrillic�-�?�"ұ = 1;
   int Hebrewא�"�Yףק;

   printf("%d", AñoNúmero);

   return 0;
 }

Dec 05 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

Agreed, though I would like to use symbols as operators.

Sean

"Mark J. Brudnak" <mjbrudna oakland.edu> wrote in message
news:bqq0pe$183n$1 digitaldaemon.com...
 I think only "letter-like" unicode characters should be allowed in D
 identifiers.  Having variables like

 int   = 42 ;
 float �ק =3.14159 ;

 will really confuse things.  Punctuation, shapes, boxdrawing, dingbats,

math
 symbols, should be prohibited from being used in identifiers.

Dec 05 2003

J C Calvarese <jcc7 cox.net> writes:

Mark J. Brudnak wrote:

 "J C Calvarese" <jcc7 cox.net> wrote
 
 
 <snip>
 
Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode
alpha") are allowed as identifier names.  (See the attached example.)
Also, comments can contain any non-ASCII character.

 
 
 I think only "letter-like" unicode characters should be allowed in D
 identifiers.  Having variables like
 
 int  ï»¿ = 42 ;
 float ±×§ =3.14159 ;
 
 will really confuse things.  Punctuation, shapes, boxdrawing, dingbats, math
 symbols, should be prohibited from being used in identifiers.
 
 

My mail program garbled the UTF-8 file that I was trying to use as an 
example.

D only allows unicode alphas (A - Z, alpha - omega, aleph - taw, 
accented letters, etc.)

For example, the cards symbols (♠♥♣♦) and box elements (╠╢╦╬)
can't be 
used as identifiers (I'm sure because I tried them and it wouldn't compile).

Justin

Dec 05 2003

"Walter" <walter digitalmars.com> writes:

"Mark J. Brudnak" <mjbrudna oakland.edu> wrote in message
news:bqq0pe$183n$1 digitaldaemon.com...
 I think only "letter-like" unicode characters should be allowed in D
 identifiers.

You're right, and that's the way it works now. I'm going by the C98
"Appendix D" list of allowed alpha characters.

Dec 19 2003

Andy Friesen <andy ikagames.com> writes:

Mark Brudnak wrote:

 When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16,
 UTF-32 source code formats. I propose that D extend its available set of
 operators (and maintain the current set) and draw from the unicode
 extensions for additional operators.  For example:
 [...]
 The largest difficulty with such a scheme is that our keyboards are not
 UNICODE friendly.  This I see as a problem for the editor and operating
 system and not so much for the D language itself.
 
 Any way ... your thoughts??
 
 Mark.

Bjarne suggested something similar to this for C++ once: 
http://www.research.att.com/~bs/whitespace98.pdf

(yes, this is a joke)

  -- andy

Dec 03 2003

Antti =?iso-8859-1?Q?Syk=E4ri?= <jsykari gamma.hut.fi> writes:

In article <bqjndj$138p$1 digitaldaemon.com>, Mark Brudnak wrote:
 The largest difficulty with such a scheme is that our keyboards are not
 UNICODE friendly.  This I see as a problem for the editor and operating
 system and not so much for the D language itself.

This is also a problem that the language designer cannot fix by fixing
the language. It continues to be a problem as long as QWERTY is the only
universally available keyboard or as long as some of the current major
operating systems do not offer a universally available easy way to input
those unicode characters that are commonly used in mathematics but
rarely seen on the computer screen.

-Antti

Dec 03 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

So someone can make a killing selling D Programmers' Keyboards!!  ;)

Sean

"Antti Syk�ri" <jsykari gamma.hut.fi> wrote in message
news:slrnbssvc9.i3r.jsykari pulu.hut.fi...
 In article <bqjndj$138p$1 digitaldaemon.com>, Mark Brudnak wrote:
 The largest difficulty with such a scheme is that our keyboards are not
 UNICODE friendly.  This I see as a problem for the editor and operating
 system and not so much for the D language itself.

 This is also a problem that the language designer cannot fix by fixing
 the language. It continues to be a problem as long as QWERTY is the only
 universally available keyboard or as long as some of the current major
 operating systems do not offer a universally available easy way to input
 those unicode characters that are commonly used in mathematics but
 rarely seen on the computer screen.

 -Antti

Dec 03 2003

Elias Martenson <no spam.spam> writes:

Den Wed, 03 Dec 2003 23:34:55 -0800 skrev Sean L. Palmer:

 So someone can make a killing selling D Programmers' Keyboards!!  ;)

Remember APL? Let's not go there again. :-)

I agree that unicode operators could be useful in maths applications but
other than that the advantages are pretty limited.

Java has support for Unicode symbols, and that can be a mess unless you
encode all non-ascii symbols using the \u notation, which makes the code
pretty hard to read.

I'm currently implementing a BASIC interpreter with full Unicode support.
It's not a serious project though. :-)

Regards

Elias

Dec 04 2003

D Programming

C/C++ Programming

Other

D - UNICODE operators