www.digitalmars.com         C & C++   DMDScript  

D - UNICODE operators

reply "Mark Brudnak" <malibrud provide.net> writes:
When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16,
UTF-32 source code formats. I propose that D extend its available set of
operators (and maintain the current set) and draw from the unicode
extensions for additional operators.  For example:

LOGICAL OPERATORS
==================
? (unicode 2264)     may be used instead of     <=
? (unicode 2265)     may be used instead of     >=
? (unicode 2260)     may be used instead of     !=
? (unicode 225F)     may be used instead of     ==
? (unicode 2227)     may be used instead of     &&
? (unicode 2228)     may be used instead of     ||

INFIX OPERATORS (may only be overloaded)
================
? (unicode 2218)    may be introduced as the Schur product
? (unicode 22C5)    may be introduced as the dot product
× (unicode 00D7)    may be introduced as the cross product
? (unicode 22C2)    may be introduced as the union of two sets



etc...



UNARY OPERATORS (may only be overloaded)

============

? (unicode 2218)    may be introduced as the square root



These were just chosen to provide some examples.  There are a slew of
symbols, most of which do not make sense in a programming environment.
However some of these symbols may be useful to those who wish to over load
them for a particular class they are developing.



i.e.



a = b × c ;



is cleaner than



a = cross(b, c) ;



or worse yet



a = b.cross(c) ;



The largest difficulty with such a scheme is that our keyboards are not
UNICODE friendly.  This I see as a problem for the editor and operating
system and not so much for the D language itself.



Any way ... your thoughts??



Mark.
Dec 02 2003
next sibling parent Georg Wrede <Georg_member pathlink.com> writes:
In article <bqjndj$138p$1 digitaldaemon.com>, Mark Brudnak says...
When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16,
UTF-32 source code formats. I propose that D extend its available set of
operators (and maintain the current set) and draw from the unicode
..
The largest difficulty with such a scheme is that our keyboards are not
UNICODE friendly.  This I see as a problem for the editor and operating
system and not so much for the D language itself.
I see it as a problem for code maintainers and debugging people. _They_ are not guaranteed to have the last and most international os version at hand, or if they do they still might no be able to see or even type such characters.
Dec 03 2003
prev sibling next sibling parent reply "Walter" <walter digitalmars.com> writes:
These ideas have merit. Something useful ought to be done with unicode! The
lack of a decent unicode keyboard is a problem, though, as it will be hard
for anyone to type in the unicode operators.
Dec 03 2003
parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
That really doesn't matter.  That's what Character Map or BabelMap are for!

Besides you'd likely be able to cut and paste them either from the header or
the documentation.

If someone makes some code that uses wierd unicode operators, you don't have
to use it (or you can wrap it in ugly function call syntax).

Sean


"Walter" <walter digitalmars.com> wrote in message
news:bql9s5$bkg$1 digitaldaemon.com...
 These ideas have merit. Something useful ought to be done with unicode!
The
 lack of a decent unicode keyboard is a problem, though, as it will be hard
 for anyone to type in the unicode operators.
Dec 03 2003
parent "Mark Brudnak" <malibrud provide.net> writes:
"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:bqlo7i$111q$1 digitaldaemon.com...
 That really doesn't matter.  That's what Character Map or BabelMap are
for!

Yes, this will work however it is not optimal.  Too much keyboard - mouse
switching is difficult.  The thing that will make this work is the editor.
I use VIM but am not familiar with its macro or shortcut features.  EMACS
must have similar features.  In either of these editors (or
others....LEDS....DIDE...) there will be a hassle/benefit tradeoff to the
macro approach.  The tipping point will be (I think) when the following
happens:

1) The symbols are rendered in the editor (I can see the typeface, unlike my
original post :^) ).
2) A symbol can be entered from a QWERTY keyboard using an escape/control
key plus 3-5 other key strokes.  this would be editor-specific.

Mark.


 Besides you'd likely be able to cut and paste them either from the header
or
 the documentation.
Too much hassle.
 If someone makes some code that uses wierd unicode operators, you don't
have
 to use it (or you can wrap it in ugly function call syntax).
It makes sense to reserve all UNICODE "ARROWS" and "MATH OPERATORS" as symbols that cannot be used in identifiers. We should then choose a handful to serve as valid operators to start out with.
 Sean


 "Walter" <walter digitalmars.com> wrote in message
 news:bql9s5$bkg$1 digitaldaemon.com...
 These ideas have merit. Something useful ought to be done with unicode!
The
 lack of a decent unicode keyboard is a problem, though, as it will be
hard
 for anyone to type in the unicode operators.
Dec 03 2003
prev sibling next sibling parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
I want more operators.  I am with you.  I want to take advantage of unicode.

I really see no reason why we should not be able to take any combination of
characters that Unicode classifies as symbols, and make an operator out of
it.  The designers of D cannot possibly predict all the operators people are
going to need or want.

Sean

"Mark Brudnak" <malibrud provide.net> wrote in message
news:bqjndj$138p$1 digitaldaemon.com...
 When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16,
 UTF-32 source code formats. I propose that D extend its available set of
 operators (and maintain the current set) and draw from the unicode
 extensions for additional operators.  For example:

 LOGICAL OPERATORS
 ==================
 ? (unicode 2264)     may be used instead of     <=
 ? (unicode 2265)     may be used instead of     >=
 ? (unicode 2260)     may be used instead of     !=
 ? (unicode 225F)     may be used instead of     ==
 ? (unicode 2227)     may be used instead of     &&
 ? (unicode 2228)     may be used instead of     ||

 INFIX OPERATORS (may only be overloaded)
 ================
 ? (unicode 2218)    may be introduced as the Schur product
 ? (unicode 22C5)    may be introduced as the dot product
 × (unicode 00D7)    may be introduced as the cross product
 ? (unicode 22C2)    may be introduced as the union of two sets



 etc...



 UNARY OPERATORS (may only be overloaded)

 ============

 ? (unicode 2218)    may be introduced as the square root



 These were just chosen to provide some examples.  There are a slew of
 symbols, most of which do not make sense in a programming environment.
 However some of these symbols may be useful to those who wish to over load
 them for a particular class they are developing.



 i.e.



 a = b × c ;



 is cleaner than



 a = cross(b, c) ;



 or worse yet



 a = b.cross(c) ;



 The largest difficulty with such a scheme is that our keyboards are not
 UNICODE friendly.  This I see as a problem for the editor and operating
 system and not so much for the D language itself.



 Any way ... your thoughts??



 Mark.
Dec 03 2003
next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:bqlasv$d7e$1 digitaldaemon.com...
 I really see no reason why we should not be able to take any combination
of
 characters that Unicode classifies as symbols, and make an operator out of
 it.  The designers of D cannot possibly predict all the operators people
are
 going to need or want.
Some problems: 1) the precedence level of those operators. 2) what this implies is user-definable tokens, which is a big problem with a language that has as a design goal the ability to tokenize it without needing to do parse or semantic analysis.
Dec 03 2003
parent "Sean L. Palmer" <palmer.sean verizon.net> writes:
"Walter" <walter digitalmars.com> wrote in message
news:bqlc2t$f0p$2 digitaldaemon.com...
 "Sean L. Palmer" <palmer.sean verizon.net> wrote in message
 news:bqlasv$d7e$1 digitaldaemon.com...
 I really see no reason why we should not be able to take any combination
of
 characters that Unicode classifies as symbols, and make an operator out
of
 it.  The designers of D cannot possibly predict all the operators people
are
 going to need or want.
Some problems: 1) the precedence level of those operators.
Yah that's a biggie. But I'd be ok with them defaulting to the lowest precedence and my being forced to use parenthesis.
 2) what this implies is user-definable tokens, which is a big problem with
a
 language that has as a design goal the ability to tokenize it without
 needing to do parse or semantic analysis.
So require whitespace between operator tokens. It's easy to distinguish the boundary between brackets and symbols, or alphanumeric and symbols. Maybe limit the user-defined operators to no more than two symbols. Sean
Dec 03 2003
prev sibling parent reply "Mark J. Brudnak" <mjbrudna oakland.edu> writes:
The UNICODE spec has a lot of mathematical symbols already defined (~100's).
In my view combining ASCII symbols to form more operators is *not* the way
to go.  It would make the syntax even more difficult to parse and probably
lead to abmbiguous syntax.  A UNICODE character is one text symbol which can
map to an operation (easy to parse).

In "ASCII-land" the best approach to arbitrary operators is to define them
with strings along with some yet-to-be-defined "bracket operator" to delimit
them.

For example, say I wanted to define some obtuse binary operator like the
vector-exterior-product then my operator would be defined as a string like
'extprod' and some language-defined bracket, say <[ and ]>.  To call this
operator the code would look like this.

myBivector = oneVector <[extprod]> anotherVector ; /* traditional infix
notation w/ bulky operator */

The operator would be defined as:

class ga {
    float [] vector ;
    int    size ;

    ga = <[extprod]>( ga vectorB) {
        /* compute the exterior product of 'this' and vectorB */
    }
}

It is bulky, but it would allow the definition of arbitrary operators in
ASCII!  As was said earlier, UNICODE is the way to go, it has a defined
symbol for the exterior product :^).

mark.

"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:bqlasv$d7e$1 digitaldaemon.com...
 I want more operators.  I am with you.  I want to take advantage of
unicode.
 I really see no reason why we should not be able to take any combination
of
 characters that Unicode classifies as symbols, and make an operator out of
 it.  The designers of D cannot possibly predict all the operators people
are
 going to need or want.

 Sean

 "Mark Brudnak" <malibrud provide.net> wrote in message
 news:bqjndj$138p$1 digitaldaemon.com...
 When reading the D spec I noticed that it supports UNICODE UTF-8,
UTF-16,
 UTF-32 source code formats. I propose that D extend its available set of
 operators (and maintain the current set) and draw from the unicode
 extensions for additional operators.  For example:

 LOGICAL OPERATORS
 ==================
 ? (unicode 2264)     may be used instead of     <=
 ? (unicode 2265)     may be used instead of     >=
 ? (unicode 2260)     may be used instead of     !=
 ? (unicode 225F)     may be used instead of     ==
 ? (unicode 2227)     may be used instead of     &&
 ? (unicode 2228)     may be used instead of     ||

 INFIX OPERATORS (may only be overloaded)
 ================
 ? (unicode 2218)    may be introduced as the Schur product
 ? (unicode 22C5)    may be introduced as the dot product
 × (unicode 00D7)    may be introduced as the cross product
 ? (unicode 22C2)    may be introduced as the union of two sets



 etc...



 UNARY OPERATORS (may only be overloaded)

 ============

 ? (unicode 2218)    may be introduced as the square root



 These were just chosen to provide some examples.  There are a slew of
 symbols, most of which do not make sense in a programming environment.
 However some of these symbols may be useful to those who wish to over
load
 them for a particular class they are developing.



 i.e.



 a = b × c ;



 is cleaner than



 a = cross(b, c) ;



 or worse yet



 a = b.cross(c) ;



 The largest difficulty with such a scheme is that our keyboards are not
 UNICODE friendly.  This I see as a problem for the editor and operating
 system and not so much for the D language itself.



 Any way ... your thoughts??



 Mark.
Dec 03 2003
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Mark J. Brudnak wrote:

 It is bulky, but it would allow the definition of arbitrary operators in
 ASCII!  As was said earlier, UNICODE is the way to go, it has a defined
 symbol for the exterior product :^).
You shall have a big, no, really HUGE parser handling these... Because the parsing manner is not generic and you need to set operator precedence by constructing a big... mess! or have all these operators have the same precedence? or even make it an error to rely on precedence of these operators like lint does? Another idea: 'blabla' should be enough for the ascii infix notation. -eye
Dec 03 2003
parent "Mark Brudnak" <malibrud provide.net> writes:
"Ilya Minkov" <minkov cs.tum.edu> wrote in message
news:bqljrj$q74$1 digitaldaemon.com...
 Mark J. Brudnak wrote:

 It is bulky, but it would allow the definition of arbitrary operators in
 ASCII!  As was said earlier, UNICODE is the way to go, it has a defined
 symbol for the exterior product :^).
You shall have a big, no, really HUGE parser handling these...
No, the parser would have to detect three tokens <[, identifier, ]>. It would have to take care of right/left matching. 'identifier' could be any valid D identifier like 'foo', 'bar'. For example: moo = foo <[ goo ]> zoo ; This statement parses to the following tokens ; moo = foo <[ goo ]> zoo ; The compiler then knows that this is equivalent to: moo = foo.goo(zoo) ;
 Because the parsing manner is not generic and you need to set operator
 precedence by constructing a big... mess!
 or have all these operators have the same precedence?
They would be higher than assignment. Otherwise be made explicit with parenthesies.
 or even make it an error to rely on precedence of these operators like
 lint does?

 Another idea: 'blabla' should be enough for the ascii infix notation.

 -eye
Dec 03 2003
prev sibling next sibling parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Mark Brudnak wrote:

 When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16,
 UTF-32 source code formats. I propose that D extend its available set of
 operators (and maintain the current set) and draw from the unicode
 extensions for additional operators.  For example:
 
 LOGICAL OPERATORS
 ==================
 ? (unicode 2264)     may be used instead of     <=
 ? (unicode 2265)     may be used instead of     >=
 ? (unicode 2260)     may be used instead of     !=
 ? (unicode 225F)     may be used instead of     ==
 ? (unicode 2227)     may be used instead of     &&
 ? (unicode 2228)     may be used instead of     ||
 
My email client shows '?' for all your suggestions. I expect most current code editors will do the same, since most programming languages use ASCII encoding for their source code. It would be quite some task to figure out what another programmer meant when he wrote: x = ((a ? b) ? c ? d ) ? e; Some operating systems (i.e. Win9x) don't even have support for printing unicode text on the screen, unless the used characters happen to also be available in the current code page. So it would be close to impossible to write a proper Unicode code editor on those OSs. And then, of course, there's the problem of entering such operators. My keyboard doesn't have any keys for (unicode 2264), (unicode 2265),... . It's a great idea, but currently I fear it is not practical. Hauke
Dec 03 2003
parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
Win95 is dying, if not dead, for development purposes.

You should look forward, it won't be long before all operating systems and
all applications support unicode fully.

Unless you think we're all gonna give up on this unicode nonsense in the
near future, and go back to ascii.  ;)

It is a feature that doesn't have to be 100% implemented right away, and it
is a feature that you are not forced to use.

Sean

"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:bqlept$j09$1 digitaldaemon.com...
 Mark Brudnak wrote:

 When reading the D spec I noticed that it supports UNICODE UTF-8,
UTF-16,
 UTF-32 source code formats. I propose that D extend its available set of
 operators (and maintain the current set) and draw from the unicode
 extensions for additional operators.  For example:

 LOGICAL OPERATORS
 ==================
 ? (unicode 2264)     may be used instead of     <=
 ? (unicode 2265)     may be used instead of     >=
 ? (unicode 2260)     may be used instead of     !=
 ? (unicode 225F)     may be used instead of     ==
 ? (unicode 2227)     may be used instead of     &&
 ? (unicode 2228)     may be used instead of     ||
My email client shows '?' for all your suggestions. I expect most current code editors will do the same, since most programming languages use ASCII encoding for their source code. It would be quite some task to figure out what another programmer meant when he wrote: x = ((a ? b) ? c ? d ) ? e; Some operating systems (i.e. Win9x) don't even have support for printing unicode text on the screen, unless the used characters happen to also be available in the current code page. So it would be close to impossible to write a proper Unicode code editor on those OSs. And then, of course, there's the problem of entering such operators. My keyboard doesn't have any keys for (unicode 2264), (unicode 2265),... . It's a great idea, but currently I fear it is not practical. Hauke
Dec 03 2003
parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
 Win95 is dying, if not dead, for development purposes.
Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME. And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist. Hauke
Dec 03 2003
next sibling parent reply "Roald Ribe" <rr.no spam.teikom.no> writes:
"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:bqlunf$1ag4$2 digitaldaemon.com...
 Win95 is dying, if not dead, for development purposes.
Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME. And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist.
UNICODE support files for Win95 -> Me Microsoft Layer for Unicode on Windows 95/95/ME Systems (MSLU) version 1.0 (http://tinyurl.com/qynq) The question at hand is: is D going to be a language of the future, for all languages, all over the globe, or will it be a conservative backward looking effort? Roald
Dec 03 2003
parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Roald Ribe wrote:
Win95 is close to dead: about 2% of our customers. But we still have 30%
customers using Win98 or WinME.

And I'm sure there are lots of Unix systems that would also have their
problems with this - having been invented when ASCII ruled the world and
Unicode didn't even exist.
UNICODE support files for Win95 -> Me Microsoft Layer for Unicode on Windows 95/95/ME Systems (MSLU) version 1.0 (http://tinyurl.com/qynq) The question at hand is: is D going to be a language of the future, for all languages, all over the globe, or will it be a conservative backward looking effort?
The MSLU is just a layer above the normal ANSI API. It converts all Unicode strings to ANSI before passing it to functions and converts the results back to Unicode afterwards. That means that Unicode characters that cannot be represented in the current (ANSI) code page will just be replaced with '?', or whatever the conversion routines use in such a case. Hauke
Dec 04 2003
parent reply "Roald Ribe" <rr.no spam.teikom.no> writes:
 UNICODE support files for Win95 -> Me

 Microsoft Layer for Unicode on Windows 95/95/ME Systems (MSLU)
     version 1.0  (http://tinyurl.com/qynq)

 The question at hand is: is D going to be a language of the future,
 for all languages, all over the globe, or will it be a conservative
 backward looking effort?
The MSLU is just a layer above the normal ANSI API. It converts all Unicode strings to ANSI before passing it to functions and converts the results back to Unicode afterwards. That means that Unicode characters that cannot be represented in the current (ANSI) code page will just be replaced with '?', or whatever the conversion routines use in such a case.
Yes, that is true. But it also means that if the user/admin has set up the correct codepage/fonts for the language they work in, the application using the API will not need to know what codepage that is, it will just work with UNICODE. (openoffice.org uses this system on older Win9X platforms) It is a stop gap measure to allow modern programs run on older platforms, not the greatest invention since sliced bread ;-) It would allow a full UNICODE D app to run unmodified on any of those systems, get full use of UNICODE on newer systems, and still just use one API. Roald
Dec 04 2003
parent reply Hauke Duden <H.NS.Duden gmx.net> writes:
Roald Ribe wrote:
That means that Unicode characters that cannot be represented in the
current (ANSI) code page will just be replaced with '?', or whatever the
conversion routines use in such a case.
Yes, that is true. But it also means that if the user/admin has set up the correct codepage/fonts for the language they work in, the application using the API will not need to know what codepage that is, it will just work with UNICODE. (openoffice.org uses this system on older Win9X platforms) It is a stop gap measure to allow modern programs run on older platforms, not the greatest invention since sliced bread ;-) It would allow a full UNICODE D app to run unmodified on any of those systems, get full use of UNICODE on newer systems, and still just use one API.
That was not the topic of this discussion. My point was that we shouldn't use Unicode characters for something as essential to the language as operators, because then the code will only be readable if your editor/OS uses a code page that happens to contain these symbols. Creating Unicode applications in D is a completely different thing (and it was/is already discussed in a different thread). Hauke
Dec 04 2003
parent "Walter" <walter digitalmars.com> writes:
"Hauke Duden" <H.NS.Duden gmx.net> wrote in message
news:bqnr2q$1240$1 digitaldaemon.com...
 Creating Unicode applications in D is a completely different thing (and
 it was/is already discussed in a different thread).
I agree. D should fully support developing unicode apps. I should point out, though, that right now D supports unicode source text (UTF-8, UTF-16, and UTF-32), unicode characters in comments and strings, and unicode alpha characters in identifiers. I'm not sure, though, if the world is quite ready yet for unicode operators. We'll see.
Dec 19 2003
prev sibling parent reply Elias Martenson <no spam.spam> writes:
Den Thu, 04 Dec 2003 01:44:25 +0100 skrev Hauke Duden:

 Win95 is dying, if not dead, for development purposes.
Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME. And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist.
Unix has pretty much settled on using UTF-8 for external representation and before long all text files in Unix will be UTF-8 instead of some local encoding. Here's a quote from the excellent UTF-8 for Unix FAQ (http://www.cl.cam.ac.uk/~mgk25/unicode.html): "With the UTF-8 encoding, Unicode can be used in a convenient and backwards compatible way in environments that, like Unix, were designed entirely around ASCII. UTF-8 is the way in which Unicode is used under Unix, Linux, and similar systems. It is now time to make sure that you are well familiar with it and that your software supports UTF-8 smoothly." Regards Elias
Dec 04 2003
parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
Right.  And the OS should provide at least one font that has every single
unicode character, for use as fallback for fonts that are missing such
characters.

Sean

"Elias Martenson" <no spam.spam> wrote in message
news:pan.2003.12.04.11.26.05.375275 spam.spam...
 Den Thu, 04 Dec 2003 01:44:25 +0100 skrev Hauke Duden:

 Win95 is dying, if not dead, for development purposes.
Win95 is close to dead: about 2% of our customers. But we still have 30% customers using Win98 or WinME. And I'm sure there are lots of Unix systems that would also have their problems with this - having been invented when ASCII ruled the world and Unicode didn't even exist.
Unix has pretty much settled on using UTF-8 for external representation and before long all text files in Unix will be UTF-8 instead of some local encoding. Here's a quote from the excellent UTF-8 for Unix FAQ (http://www.cl.cam.ac.uk/~mgk25/unicode.html): "With the UTF-8 encoding, Unicode can be used in a convenient and backwards compatible way in environments that, like Unix, were designed entirely around ASCII. UTF-8 is the way in which Unicode is used under Unix, Linux, and similar systems. It is now time to make sure that you are well familiar with it and that your software supports UTF-8 smoothly." Regards Elias
Dec 04 2003
parent reply Elias Martenson <no spam.spam> writes:
Den Thu, 04 Dec 2003 10:56:46 -0800 skrev Sean L. Palmer:

 Right.  And the OS should provide at least one font that has every single
 unicode character, for use as fallback for fonts that are missing such
 characters.
Yes it certainly should. Now, my Linux installationlacks fonts for a large set of the unihan code points, but other than that I have most of them. In fact, I think that almost all existing installed operating systems today would be able to handle unicode operators. However, I think the problem with them is more related to the fact that you more than likely will need a special editor for the code (at least if you don't want to try to remember all the \u-codes for the operators). Unicode is very important, as I have pointed out several times in the other unicode thread, but it deals with strings in the language. Not the source code itself. Do I think the designers of Java made a mistake when support unicode in it's symbols? A few years ago I would have said yes. Now, I say that it really didn't matter. People don't use unicode symbols anyway. Therefore, I believe that this discussion is a non-issue. EVen if unicode operatos would be supported, I doubdt people would use them in the name of interoperability. Regards Elias
Dec 04 2003
parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
That's fine with me, so long as they are not expressly prohibited, I can use
them for my own personal projects.  Support for them would then grow
grassroots-style.  I have text editors that support Unicode, and I don't
mind cutting and pasting.  Ease of entry is a minor issue to me.

The problem is, if we can't define new operators in D, and it doesn't
provide enough overloadable builtin operators, I'm stuck.  I can do nothing
but invest in a Unicode-aware preprocessor.  I want the option of moving
forward.

What good is being able to compile D source encoded in UTF-8 if you aren't
allowed to use any symbols that aren't in ASCII?  (except embedded in string
literals)

Sean

"Elias Martenson" <no spam.spam> wrote in message
news:pan.2003.12.04.23.39.50.952964 spam.spam...
 Den Thu, 04 Dec 2003 10:56:46 -0800 skrev Sean L. Palmer:

 Right.  And the OS should provide at least one font that has every
single
 unicode character, for use as fallback for fonts that are missing such
 characters.
Yes it certainly should. Now, my Linux installationlacks fonts for a large set of the unihan code points, but other than that I have most of them. In fact, I think that almost all existing installed operating systems today would be able to handle unicode operators. However, I think the problem with them is more related to the fact that you more than likely will need a special editor for the code (at least if you don't want to try to remember all the \u-codes for the operators). Unicode is very important, as I have pointed out several times in the other unicode thread, but it deals with strings in the language. Not the source code itself. Do I think the designers of Java made a mistake when support unicode in it's symbols? A few years ago I would have said yes. Now, I say that it really didn't matter. People don't use unicode symbols anyway. Therefore, I believe that this discussion is a non-issue. EVen if unicode operatos would be supported, I doubdt people would use them in the name of interoperability. Regards Elias
Dec 04 2003
parent reply J C Calvarese <jcc7 cox.net> writes:
Sean L. Palmer wrote:
 That's fine with me, so long as they are not expressly prohibited, I can use
 them for my own personal projects.  Support for them would then grow
 grassroots-style.  I have text editors that support Unicode, and I don't
 mind cutting and pasting.  Ease of entry is a minor issue to me.
 
 The problem is, if we can't define new operators in D, and it doesn't
 provide enough overloadable builtin operators, I'm stuck.  I can do nothing
 but invest in a Unicode-aware preprocessor.  I want the option of moving
 forward.
 
 What good is being able to compile D source encoded in UTF-8 if you aren't
 allowed to use any symbols that aren't in ASCII?  (except embedded in string
 literals)
Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names. (See the attached example.) Also, comments can contain any non-ASCII character. I do think Unicode operators is an interesting idea. Justin
 
 Sean
Dec 04 2003
next sibling parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
Yeah, just have to set this "free" browser to Encoding... Unicode UTF-8

That's pretty cool.  Pretty cool indeed.

I bet you if I cut and paste some D program made by someone is a far-away
land, into some web-based translator engine it would probably not do that
bad of a job of translating the identifiers back into english again ;)

Most likely, I'll rarely if ever see any source written in some other
language, and if I did, I'd just consider it obfuscation.  It's not a sin
punishable by death.

I think it's cool that finally people can more or less program in their own
language, once they learn the english keywords.  A preprocessor would allow
even those to be replaced.

In fact, whose idea was it to allow infix notation for regular identifiers?
We could use a preprocessor to translate our D + Unicode Symbols into D that
will actually compile.  ;)  Right now it would only work with prefix
(lisp-like) notation, however.

They have some really interesting brackets in Unicode, as well.  Surely
there's one just begging to be used for template syntax.

Sean

"J C Calvarese" <jcc7 cox.net> wrote in message
news:bqpbqo$8no$1 digitaldaemon.com...
 Sean L. Palmer wrote:
 That's fine with me, so long as they are not expressly prohibited, I can
use
 them for my own personal projects.  Support for them would then grow
 grassroots-style.  I have text editors that support Unicode, and I don't
 mind cutting and pasting.  Ease of entry is a minor issue to me.

 The problem is, if we can't define new operators in D, and it doesn't
 provide enough overloadable builtin operators, I'm stuck.  I can do
nothing
 but invest in a Unicode-aware preprocessor.  I want the option of moving
 forward.

 What good is being able to compile D source encoded in UTF-8 if you
aren't
 allowed to use any symbols that aren't in ASCII?  (except embedded in
string
 literals)
Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names. (See the attached example.) Also, comments can contain any non-ASCII character. I do think Unicode operators is an interesting idea. Justin
 Sean
---------------------------------------------------------------------------- ----
 const char[] Sí = "yes";
 const char[] Año = "year";

 /+

 These don't work (it might be because they are iconic symbols rather than
part of any actual language)
 const char[] ???? = "box drawing";
 const char[] ???? = "cards";

 +/


 int main()
 {

   int AñoNúmero = 2003;
   int Cyrillic???? = 1;
   int Hebrew?????;

   printf("%d", AñoNúmero);

   return 0;
 }
Dec 05 2003
parent J C Calvarese <jcc7 cox.net> writes:
Sean L. Palmer wrote:

 Yeah, just have to set this "free" browser to Encoding... Unicode UTF-8
OK, so I didn't send it right. (That's what a WASP like me gets for belittling ASCII.) Unicode isn't very friendly to novices. I think putting it in a .zip will help out. Maybe it will work if I turn it into an .html file. I'm sure there's a setting in Thunderbird that will take care of this stuff automatically; I'm just not sure how much time I want to spent looking for it. (By the way, I used WinXP's notepad to create the original document because I was lazy and didn't want to hunt down another Unicode-capable editor.) Justin
 
 That's pretty cool.  Pretty cool indeed.
 
 I bet you if I cut and paste some D program made by someone is a far-away
 land, into some web-based translator engine it would probably not do that
 bad of a job of translating the identifiers back into english again ;)
 
 Most likely, I'll rarely if ever see any source written in some other
 language, and if I did, I'd just consider it obfuscation.  It's not a sin
 punishable by death.
 
 I think it's cool that finally people can more or less program in their own
 language, once they learn the english keywords.  A preprocessor would allow
 even those to be replaced.
 
 In fact, whose idea was it to allow infix notation for regular identifiers?
 We could use a preprocessor to translate our D + Unicode Symbols into D that
 will actually compile.  ;)  Right now it would only work with prefix
 (lisp-like) notation, however.
 
 They have some really interesting brackets in Unicode, as well.  Surely
 there's one just begging to be used for template syntax.
 
 Sean
 
 "J C Calvarese" <jcc7 cox.net> wrote in message
 news:bqpbqo$8no$1 digitaldaemon.com...
 
Sean L. Palmer wrote:

That's fine with me, so long as they are not expressly prohibited, I can
use
them for my own personal projects.  Support for them would then grow
grassroots-style.  I have text editors that support Unicode, and I don't
mind cutting and pasting.  Ease of entry is a minor issue to me.

The problem is, if we can't define new operators in D, and it doesn't
provide enough overloadable builtin operators, I'm stuck.  I can do
nothing
but invest in a Unicode-aware preprocessor.  I want the option of moving
forward.

What good is being able to compile D source encoded in UTF-8 if you
aren't
allowed to use any symbols that aren't in ASCII?  (except embedded in
string
literals)
Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode alpha") are allowed as identifier names. (See the attached example.) Also, comments can contain any non-ASCII character. I do think Unicode operators is an interesting idea. Justin
Sean
---------------------------------------------------------------------------- ----
const char[] Sí = "yes";
const char[] Año = "year";

/+

These don't work (it might be because they are iconic symbols rather than
part of any actual language)
const char[] ???? = "box drawing";
const char[] ???? = "cards";

+/


int main()
{

  int AñoNúmero = 2003;
  int Cyrillic???? = 1;
  int Hebrew?????;

  printf("%d", AñoNúmero);

  return 0;
}
Dec 05 2003
prev sibling next sibling parent Elias Martenson <no spam.spam> writes:
Den Fri, 05 Dec 2003 01:34:19 -0600 skrev J C Calvarese:

 Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode 
 alpha") are allowed as identifier names.  (See the attached example.) 
 Also, comments can contain any non-ASCII character.
Neat. Although your newsreader didn't include a proper encoding header. Not your fault, but rather the broken software. :-) Regards Elias
Dec 05 2003
prev sibling parent reply "Mark J. Brudnak" <mjbrudna oakland.edu> writes:
"J C Calvarese" <jcc7 cox.net> wrote


<snip>

 Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode
 alpha") are allowed as identifier names.  (See the attached example.)
 Also, comments can contain any non-ASCII character.
I think only "letter-like" unicode characters should be allowed in D identifiers. Having variables like int  = 42 ; float ±×§ =3.14159 ; will really confuse things. Punctuation, shapes, boxdrawing, dingbats, math symbols, should be prohibited from being used in identifiers.
 I do think Unicode operators is an interesting idea.


 Justin

 Sean
---------------------------------------------------------------------------- ----
 

 const char[] Sí = "yes";
 const char[] Año = "year";

 /+

 These don't work (it might be because they are iconic symbols rather than
part of any actual language)
 const char[] â. â.¢â.¦â.¬ = "box drawing";
 const char[] âT âT¥âT£âT¦ = "cards";

 +/


 int main()
 {

   int AñoNúmero = 2003;
   int CyrillicÒ-Ñ?Ò"Ò± = 1;
   int Hebrew××"×Yףק;

   printf("%d", AñoNúmero);

   return 0;
 }
Dec 05 2003
next sibling parent "Sean L. Palmer" <palmer.sean verizon.net> writes:
Agreed, though I would like to use symbols as operators.

Sean

"Mark J. Brudnak" <mjbrudna oakland.edu> wrote in message
news:bqq0pe$183n$1 digitaldaemon.com...
 I think only "letter-like" unicode characters should be allowed in D
 identifiers.  Having variables like

 int   = 42 ;
 float ±×§ =3.14159 ;

 will really confuse things.  Punctuation, shapes, boxdrawing, dingbats,
math
 symbols, should be prohibited from being used in identifiers.
Dec 05 2003
prev sibling next sibling parent J C Calvarese <jcc7 cox.net> writes:
Mark J. Brudnak wrote:

 "J C Calvarese" <jcc7 cox.net> wrote
 
 
 <snip>
 
Actually, since DMD 0.74 non-ASCII characters (as long they are "unicode
alpha") are allowed as identifier names.  (See the attached example.)
Also, comments can contain any non-ASCII character.
I think only "letter-like" unicode characters should be allowed in D identifiers. Having variables like int  = 42 ; float ±×§ =3.14159 ; will really confuse things. Punctuation, shapes, boxdrawing, dingbats, math symbols, should be prohibited from being used in identifiers.
My mail program garbled the UTF-8 file that I was trying to use as an example. D only allows unicode alphas (A - Z, alpha - omega, aleph - taw, accented letters, etc.) For example, the cards symbols (♠♥♣♦) and box elements (╠╢╦╬) can't be used as identifiers (I'm sure because I tried them and it wouldn't compile). Justin
Dec 05 2003
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Mark J. Brudnak" <mjbrudna oakland.edu> wrote in message
news:bqq0pe$183n$1 digitaldaemon.com...
 I think only "letter-like" unicode characters should be allowed in D
 identifiers.
You're right, and that's the way it works now. I'm going by the C98 "Appendix D" list of allowed alpha characters.
Dec 19 2003
prev sibling next sibling parent Andy Friesen <andy ikagames.com> writes:
Mark Brudnak wrote:

 When reading the D spec I noticed that it supports UNICODE UTF-8, UTF-16,
 UTF-32 source code formats. I propose that D extend its available set of
 operators (and maintain the current set) and draw from the unicode
 extensions for additional operators.  For example:
 [...]
 The largest difficulty with such a scheme is that our keyboards are not
 UNICODE friendly.  This I see as a problem for the editor and operating
 system and not so much for the D language itself.
 
 Any way ... your thoughts??
 
 Mark.
Bjarne suggested something similar to this for C++ once: http://www.research.att.com/~bs/whitespace98.pdf (yes, this is a joke) -- andy
Dec 03 2003
prev sibling parent reply Antti =?iso-8859-1?Q?Syk=E4ri?= <jsykari gamma.hut.fi> writes:
In article <bqjndj$138p$1 digitaldaemon.com>, Mark Brudnak wrote:
 The largest difficulty with such a scheme is that our keyboards are not
 UNICODE friendly.  This I see as a problem for the editor and operating
 system and not so much for the D language itself.
This is also a problem that the language designer cannot fix by fixing the language. It continues to be a problem as long as QWERTY is the only universally available keyboard or as long as some of the current major operating systems do not offer a universally available easy way to input those unicode characters that are commonly used in mathematics but rarely seen on the computer screen. -Antti
Dec 03 2003
parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
So someone can make a killing selling D Programmers' Keyboards!!  ;)

Sean

"Antti Sykäri" <jsykari gamma.hut.fi> wrote in message
news:slrnbssvc9.i3r.jsykari pulu.hut.fi...
 In article <bqjndj$138p$1 digitaldaemon.com>, Mark Brudnak wrote:
 The largest difficulty with such a scheme is that our keyboards are not
 UNICODE friendly.  This I see as a problem for the editor and operating
 system and not so much for the D language itself.
This is also a problem that the language designer cannot fix by fixing the language. It continues to be a problem as long as QWERTY is the only universally available keyboard or as long as some of the current major operating systems do not offer a universally available easy way to input those unicode characters that are commonly used in mathematics but rarely seen on the computer screen. -Antti
Dec 03 2003
parent Elias Martenson <no spam.spam> writes:
Den Wed, 03 Dec 2003 23:34:55 -0800 skrev Sean L. Palmer:

 So someone can make a killing selling D Programmers' Keyboards!!  ;)
Remember APL? Let's not go there again. :-) I agree that unicode operators could be useful in maths applications but other than that the advantages are pretty limited. Java has support for Unicode symbols, and that can be a mess unless you encode all non-ascii symbols using the \u notation, which makes the code pretty hard to read. I'm currently implementing a BASIC interpreter with full Unicode support. It's not a serious project though. :-) Regards Elias
Dec 04 2003