## digitalmars.D.announce - Adding Unicode operators to D

• Andrei Alexandrescu (3/3) Oct 22 2008 Please vote up before the haters take it down, and discuss:
• Andrei Alexandrescu (4/11) Oct 22 2008 Correx:
• Steven Schveighoffer (24/27) Oct 22 2008 No thanks. Please let's only use operators that are on the keys of my
• Jarrett Billingsley (3/4) Oct 22 2008 Beeeecause not everyone uses emacs?
• Bill Baxter (18/23) Oct 22 2008 Actually, the solutions aren't that far apart. Andrei's solution
• Bill Baxter (10/14) Oct 22 2008 In fact, I think there are only like three of us using emacs. :-) So
• Steven Schveighoffer (14/81) Oct 23 2008 All that is being proposed right now is syntax sugar. Cross product, do...
• Sergey Gromov (6/8) Oct 23 2008 I think an editor is not the only thing that displays your program's
• Paul D. Anderson (5/19) Oct 22 2008 Java allows unicode variable names. The Greek letter 'pi' is a valid var...
• Spacen Jasset (11/26) Oct 23 2008 I haven't really ever felt the need for such things. It would require
• Bill Baxter (19/29) Oct 23 2008 I think that's the conclusion I'm coming too as well. While the use
• Walter Bright (3/21) Oct 23 2008 Unfortunately, you might be right in that D is not currently in a
• Don (32/47) Oct 28 2008 Entering this debate late:
• Sergey Gromov (5/7) Oct 28 2008 I'd use dot "⋅" and cross "×" products for 3D, union "∪" and
• bearophile (17/20) Oct 28 2008 I just want to note that the whole thread is almost unreadable on the di...
• KennyTM~ (3/31) Oct 28 2008 If the two sets are incomparable, just return NaN... We need an opCmp
• KennyTM~ (4/40) Oct 28 2008 Actually I've made a working solution. Even the exotic operators like
• Andrei Alexandrescu (9/18) Oct 28 2008 In my opinion, a workable feature is this:
• Bill Baxter (20/20) Oct 28 2008 T24gV2VkLCBPY3QgMjksIDIwMDggYXQgNDoxMiBBTSwgQW5kcmVpIEFsZXhhbmRyZXNjdQo8...
• Don (18/38) Oct 29 2008 Do we really need to do that? How many Unicode binary operators are ther...
• Walter Bright (3/5) Oct 29 2008 That throws out the ability to parse without semantic analysis. It's not...
• Andrei Alexandrescu (3/9) Oct 29 2008 It doesn't per a previous post of mine, but I agree it's still not worth...
• Benji Smith (5/14) Oct 28 2008 I have pretty much the same list.
• Moritz Warning (5/11) Oct 22 2008 It would be very nice to have unicode operators.
• Moritz Warning (2/16) Oct 22 2008 sorry posted in d.announce by .. accident. :/
• Nick Sabalausky (3/14) Oct 22 2008 I'd certainly like opIntersection and maybe opUnion.
• Bill Baxter (24/26) Oct 22 2008 (My comment cross posted here from reddit)
• Jesse Phillips (11/45) Oct 22 2008 I don't find this terribly appealing. Walter mentions having thrown out
• Don (16/37) Oct 23 2008 I agree.
• Sergey Gromov (3/4) Oct 23 2008 Lots of question marks here. This sucks.
• Spacen Jasset (36/71) Oct 25 2008 I am not entirely sure that 30 or (x amount) of new operators would be a...
• Andrei Alexandrescu (11/64) Oct 25 2008 I have noticed that in pretty much all scientific code, the f(a, b) and
• Spacen Jasset (13/83) Oct 25 2008 Yes, that is indeed a fair point and I agree. D is a "systems
• Bill Baxter (28/45) Oct 25 2008 Yes, heavy math code is hard to read in the current situation.
• bearophile (5/8) Oct 25 2008 I have seen many scientific programs that use numpy, so sometimes it's f...
• Bill Baxter (28/35) Oct 25 2008 Yep C/D/C++ is easier. The SciPy.org site has a growing section of
• bearophile (11/14) Oct 25 2008 Mixing languages isn't nice, I agree. That's why I too use D for several...
• Andrei Alexandrescu (9/67) Oct 25 2008 Surprisingly there's not a lot of choice, witnessed by the prevalence of
• Bruno Medeiros (9/59) Oct 26 2008 But what operators would be added? Some mathematician programmers might
• KennyTM~ (9/70) Oct 26 2008 Composition may be useful for functional programming (I've never used
• Andrei Alexandrescu (21/80) Oct 26 2008 I was thinking of allowing a general way of defining one Unicode
• KennyTM~ (15/104) Oct 26 2008 LaTeX in D? :p
• Charles Hixson (39/100) Oct 26 2008 Perhaps what needs to be added is a syntax for defining character to
• Simen Kjaeraas (23/41) Oct 26 2008 This made me think. What if we /could/ define arbitrary infix operators ...
• Bill Baxter (10/53) Oct 26 2008 uk>
• Simen Kjaeraas (5/67) Oct 26 2008 Yup, I realized this myself as well. Seemed like such a great idea when ...
• Bill Baxter (13/79) Oct 26 2008 :
• Simen Kjaeraas (12/19) Oct 26 2008 An interesting read, though I have looked at downs' code before. It
• Andrei Alexandrescu (4/73) Oct 26 2008 An operator could always be defined to have the same precedent as an
• Bill Baxter (6/14) Oct 26 2008 Walter said in a previous post a few days ago when I suggested it that
• Andrei Alexandrescu (7/21) Oct 26 2008 It can be done, but it's kinda involved. You define a grammar in which
• Bill Baxter (8/32) Oct 26 2008 I see. So the price you pay is that you defer more decisions till
• Andrei Alexandrescu (3/32) Oct 26 2008 Yah. Something tells me Walter won't embark on that soon.
• Walter Bright (3/4) Oct 26 2008 Not a chance . Producing an amorphous list of tokens isn't what I'd
• Max Samukha (5/8) Oct 22 2008 I'm already having problems with unicode: the news reader I'm using
• bearophile (12/12) Oct 23 2008 Andrei Alexandrescu:
• Robert Fraser (2/3) Oct 23 2008 So does D.
• Max Samukha (6/9) Oct 23 2008 I'd like to note that identifiers in a non-English language are
• Yigal Chripun (9/21) Oct 23 2008 isn't that something that should be decided upon on a per-project basis?
• bearophile (4/4) Oct 23 2008 I always use English for variable names, instead of my language, because...
• Max Samukha (3/7) Oct 23 2008 Keep children away from Python. Let them have happy lives :)
• Walter Bright (3/8) Oct 23 2008 D currently allows Unicode in identifiers, comments, and strings. In
• Andrei Alexandrescu (5/6) Oct 23 2008 [snip]
• Yigal Chripun (22/29) Oct 23 2008 A few thoughts on the subject:
• KennyTM~ (78/85) Oct 23 2008 I suggest not. There are problems if you adopt Unicode as operators:
• Bruno Medeiros (5/9) Oct 24 2008 Then I suggest a change in career... ^^'
• Simen Kjaeraas (30/33) Oct 23 2008 I really like the idea of having more unicode in the language, but I fee...
• Bill Baxter (9/9) Oct 23 2008 T24gRnJpLCBPY3QgMjQsIDIwMDggYXQgNTo0OCBBTSwgU2ltZW4gS2phZXJhYXMgPHNpbWVu...
• Bruno Medeiros (8/15) Oct 24 2008 Hum, interesting example, it actually made me realize that 'null' would
• Simen Kjaeraas (6/16) Oct 24 2008 Well, we norwegians got the Ø (html entity Ø, Latin-1 character ...
• KennyTM~ (3/19) Oct 24 2008 auto Ø = null; // \Ø
• Bruno Medeiros (8/29) Oct 26 2008 It's an interesting and effective way to save some typing, and it might
• Simen Kjaeraas (6/33) Oct 24 2008 I'd guess this oughtta do it:
• Bruno Medeiros (6/46) Oct 26 2008 Yes, exactly that! I had the impression there was such a program for
• Robert Fraser (5/12) Oct 26 2008 I remember this same question being asked on a Microsoft DL when I was
• bearophile (35/36) Oct 24 2008 Fortress uses pairs of symbols to denote various sequence literarls. Som...
• ore-sama (2/4) Oct 24 2008 Console is a legacy technology (you even still call it "DOS"), why expec...
• Bill Baxter (5/9) Oct 24 2008 So tell me what the alternative is? I had trouble with running D
• Sergey Gromov (6/18) Oct 24 2008 A regular Windows console supports UTF-8 to some extent:
• Bill Baxter (3/21) Oct 24 2008 I did that but "type " still prints garbage.
• Yigal Chripun (10/33) Oct 24 2008 so don't use type. use notepad instead...
• Benji Smith (9/43) Oct 24 2008 Oh, and one of my favorite tricks in Windows is to install cygwin
• Bill Baxter (6/60) Oct 24 2008 But that has the same problem. Cygtools don't understand windows
• Benji Smith (20/75) Oct 24 2008 Wha???
• Bill Baxter (12/104) Oct 24 2008 Oh, I didn't realize that. There is one thing that doesn't work,
• Benji Smith (3/19) Oct 24 2008 Glad I could be of service!
• Steven Schveighoffer (14/26) Oct 24 2008 It's not the paths with wildcards that is the problem. In this case, it...
• Bill Baxter (8/24) Oct 24 2008 Read again. Particularly this part:
• Bill Baxter (8/19) Oct 24 2008 No, that's how it works with the Bash shell and most Unix shells, but
• Steven Schveighoffer (17/87) Oct 24 2008 No, grep accepts either input. The shell does not change paths to windo...
• Bill Baxter (6/95) Oct 24 2008 Yeh, I love the bash shell. Really the only thing keeping me from
• Benji Smith (4/6) Oct 25 2008 Definitely!
• Bill Baxter (10/43) Oct 24 2008 Ok what about grep and sort and uniq then? Can notepad do that?
• Benji Smith (5/14) Oct 24 2008 That's weird. My machine (WinXp Sp3) has no problem printing UTF-8 to
• Bill Baxter (4/19) Oct 24 2008 Ok. Thanks for the info. Knowing that it has actually worked for at
• Benji Smith (10/28) Oct 24 2008 Write a tiny little D program and see what you get on the console:
• Bill Baxter (10/46) Oct 24 2008 Ah, I see. I guess more what I want to know is if I had utf-8 source
• Yigal Chripun (9/54) Oct 24 2008 Msys does autocomplete. it's not perfect but it works. the path will
• Bill Baxter (10/64) Oct 24 2008 Right that's what Cygwin does too, and it's useless if I want to call
• Robert Fraser (9/79) Oct 25 2008 PowerShell is MS's concession that there are things better done in a
• Sergey Gromov (8/51) Oct 27 2008 They all work for me: type, cat, less. The file is UTF-8 with BOM.
• Steven Schveighoffer (17/28) Oct 24 2008 Any text-based program uses the same Windows console (unless it's a GUI
• Yigal Chripun (9/44) Oct 25 2008 windows console AKA DOS Box *is* in fact legacy technology. It is
• Bill Baxter (8/45) Oct 25 2008 After downloading it and giving it a try, I find this claim somewhat
• Steven Schveighoffer (11/65) Oct 25 2008 I've never used powershell, but most likely you are correct. I think th...
• ore-sama (2/5) Oct 26 2008 One important feature of legacy technology is it must not change for com...
• Robert Fraser (7/18) Oct 25 2008 It uses the same console application to do the displaying/execution.
• Bill Baxter (6/26) Oct 25 2008 I'm using "Console2" as my facade on the console window.
• KennyTM~ (2/22) Oct 25 2008 Hey, they do have fixed MSPaint and WordPad! :)
• torhu (3/5) Oct 26 2008 That works fine for me if I enable Quick edit mode in the options. Then...
• Bill Baxter (4/10) Oct 26 2008 Except it only does block-oriented rectangular selection, which is odd
• torhu (2/13) Oct 26 2008 Yeah, that's true. Pretty stupid.
• Robert Fraser (5/21) Oct 26 2008 My main problem is that you can't do it just with the keyboard, which is...
• Bill Baxter (5/28) Oct 26 2008 By the way I tried running powershell as a tab inside the Console2
• Yigal Chripun (5/51) Oct 27 2008 I've just checked (it's been a long time since I used it) and you're
• Andrei Alexandrescu (7/50) Oct 25 2008 Windows has gotten a lot better in the recent times - ever since it
• ore-sama (2/12) Oct 25 2008 gui of course. MSYS's console is gui in fact.
• ore-sama (2/10) Oct 25 2008 It's not windows, it's program's standard startup module gets command li...
• ore-sama (2/5) Oct 25 2008 if application prints garbage, this indicates that it's implemented inco...
• Kevin Bealer (10/16) Oct 25 2008 I think this is a bad idea -- there are a lot of places that don't use U...
• Alix Pexton (22/29) Oct 26 2008 I've been following this thread without really having an opinion to
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Please vote up before the haters take it down, and discuss:

Andrei

Oct 22 2008
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Correx:

Andrei

Andrei Alexandrescu wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

Andrei


Oct 22 2008
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu"  wrote
Correx:

Andrei

No thanks.  Please let's only use operators that are on the keys of my
keyboard. I don't fancy having to type key digraphs or trigraphs to try and
write code.

I understand that others already have this problem, but I don't.  This would
be a huge detractor from D for me.  I'd definitely support a language fork
at that point, or at least refuse to deal with any code that has unicode
operators.  I think you'd find others feel the same way.

Why can't the emacs module solution work that was used for the cheverons?
That is, when emacs sees:

x opCross(y);

display it as

x x y

(of course, assume the middle x is the cross symbol, I have no idea how to
type it).

And upon save, regenerate the correct code.

I see no issue with something like that.  This is all the compiler is doing
anyways...

Note that any operators for unicode would be user-defined anyways, the
standard operator symbols already cover what actually gets generated to
machine code.  That is, unicode operator X is invariably going to map to
opX, so there is no benefit to the compiler performing this step instead of
an editor.

-Steve

Oct 22 2008
"Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:
On Wed, Oct 22, 2008 at 9:36 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
Why can't the emacs module solution work that was used for the cheverons?

Beeeecause not everyone uses emacs?

Oct 22 2008
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Jarrett Billingsley" wrote
On Wed, Oct 22, 2008 at 9:36 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
Why can't the emacs module solution work that was used for the cheverons?

Beeeecause not everyone uses emacs?

Including myself ;)

But I really meant the same *type* of solution.  If you use another editor,
especially if it is used for coding, it probably has a macro feature that
you can use for doing this.

-Steve

Oct 22 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Oct 23, 2008 at 10:36 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
No thanks.  Please let's only use operators that are on the keys of my
keyboard. I don't fancy having to type key digraphs or trigraphs to try and
write code.
[...]
Why can't the emacs module solution work that was used for the cheverons?

Actually, the solutions aren't that far apart.  Andrei's solution
displays XXX as YYY, the actual Unicode version you'd still type XXX
just it would actually be replaced by YYY instead of just being
displayed as YYY.

The nice thing about getting such AutoCorrect replacements working
well across a wide range of editors is that it has benefits beyond
just typing unicode characters.  You can have it insert code snippets
when you type [[main]] for example, or some people have said that some
of the existing characters are hard to type on their non-US keyboards.
You could define replacements for those.

I'm certainly not saying going Unicode is the right thing to do right
now.  More like trying to explore what has to change (if anything)
before it really becomes viable to introduce Unicode.  The topic seems
to keep coming up in a lot of places, so I think eventually it is
inevitable that we will see more and more languages start using it.

---bb

Oct 22 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Oct 23, 2008 at 10:45 AM, Jarrett Billingsley
<jarrett.billingsley gmail.com> wrote:
On Wed, Oct 22, 2008 at 9:36 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
Why can't the emacs module solution work that was used for the cheverons?

Beeeecause not everyone uses emacs?

In fact, I think there are only like three of us using emacs.  :-)  So
it's not a very general solution.

But I think the point is that you should be able to implement
something similar in many editors.
Although I think the trick of showing one thing but saving another is
more tricky for most editors than just replacing the strings outright
a la AutoCorrect.

--bb

Oct 22 2008
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
"davidl" wrote
? Thu, 23 Oct 2008 09:36:29 +0800,Steven Schveighoffer
<schveiguy yahoo.com> ??:

"Andrei Alexandrescu"  wrote
Correx:

Andrei

No thanks.  Please let's only use operators that are on the keys of my
keyboard. I don't fancy having to type key digraphs or trigraphs to try
and
write code.

I understand that others already have this problem, but I don't.  This
would
be a huge detractor from D for me.  I'd definitely support a language
fork
at that point, or at least refuse to deal with any code that has unicode
operators.  I think you'd find others feel the same way.

Why can't the emacs module solution work that was used for the cheverons?
That is, when emacs sees:

x opCross(y);

display it as

x x y

(of course, assume the middle x is the cross symbol, I have no idea how
to
type it).

And upon save, regenerate the correct code.

I see no issue with something like that.  This is all the compiler is
doing
anyways...

Everything you worry about is just poor editor. Why do you think an editor
can affect the language?

All that is being proposed right now is syntax sugar.  Cross product, dot
product, union, etc.  All of these will map to a function, so there is no
reason to require compiler support  (that is, they don't translate directly
to assembly/machine code).  I'm proposing the editor be used to do the sugar

Right now Unicode is not universally accepted by all editors, ASCII is.
Right now, I don't have cross product symbol on my keyboard, all currently
supported symbols I do have.  Why should my experience with D be severely
affected by your desire for syntax sugar?

And It complexes the language, if it's not priorly converted by the
programmer. Also it possibly sets up
future restrictions of extending the language in the correct direction!

Today, I can call opX functions instead of using the appropriate operator.
This is no different.

In your case: x opCross(y) , why identifier opCross(identifier) is
considered as identifier x identifier?
So would the typical operator overload function declaration should be
considered that way?

x opCross(y)
{
}

x x y
{
}

or even

x opCross(y, m){}

--->

x x y, m  {}

also consider a template declaration

Matrix opCross(T)(T a)
{
}

should it be considered as Matrix x T (T a)?

If not , how do you distinguish in all those circumstances(and not all
possible "shouldn't be" situations are listed here)

The editor module would have to be (and can be) smarter than that.

-Steve

Oct 23 2008
Sergey Gromov <snake.scaly gmail.com> writes:
Thu, 23 Oct 2008 18:21:18 +0800,
davidl wrote:
Everything you worry about is just poor editor. Why do you think an
editor can affect the language?

I think an editor is not the only thing that displays your program's
source.  I think that compiler's error message should be readable over a
TTY terminal.  Otherwise you're limited to working with fancy graphical
shells.

Oct 23 2008
KennyTM~ <kennytm gmail.com> writes:
Sergey Gromov wrote:
Thu, 23 Oct 2008 18:21:18 +0800,
davidl wrote:
Everything you worry about is just poor editor. Why do you think an
editor can affect the language?

I think an editor is not the only thing that displays your program's
source.  I think that compiler's error message should be readable over a
TTY terminal.  Otherwise you're limited to working with fancy graphical
shells.

I agree.

My real world experience: Sometimes I need to code over ssh. The server
admin only installed vim (which I don't use) and nano, no emacs.

Probably there could be a vim module also (is it possible?), but that's
just palliatives.

Oct 23 2008
Paul D. Anderson <paul.d.removethis.anderson comcast.andthis.net> writes:
Andrei Alexandrescu Wrote:

Correx:

Andrei

Andrei Alexandrescu wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

Andrei

Java allows unicode variable names. The Greek letter 'pi' is a valid variable
name in Java (see www.jscience.org for an example). Having said that, I've had
Java IDEs choke on these.

An opportunity may exist here for someone to create/modify a D language IDE
that supports same. [Although Descent (being Eclipse-based and therefore
Java-based) should have a leg up already.]

I know projects exist that intend to be 'the' D IDE (written in D, for D,
etc.). Maybe this could be a discriminator that makes one stand out.

Paul

Oct 22 2008
Spacen Jasset <spacenjasset yahoo.co.uk> writes:
Andrei Alexandrescu wrote:
Correx:

_in_d_similarly_to/

Andrei

Andrei Alexandrescu wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

Andrei

I haven't really ever felt the need for such things. It would require
editor support and I think that it could hinder readability as one would
have to know that symbol 'x' is say, crossproduct. -- It isn't always,
it depends on the mathematical domain.

There are, I belive, far more pressing matters, and this feature would
make editor support a bit more difficult, and we are currently in the
days where there isn't enough editor and/or ide support for D. I would
personally prefer it not be added to the language in the near future,
this is of course only my perferance, which in honesty may be biased but
isn't entirely for self reasons.

Oct 23 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Fri, Oct 24, 2008 at 3:42 AM, Spacen Jasset <spacenjasset yahoo.co.uk> wrote:
I haven't really ever felt the need for such things. It would require editor
support and I think that it could hinder readability as one would have to
know that symbol 'x' is say, crossproduct. -- It isn't always, it depends on
the mathematical domain.

There are, I belive, far more pressing matters, and this feature would make
editor support a bit more difficult, and we are currently in the days where
there isn't enough editor and/or ide support for D. I would personally
prefer it not be added to the language in the near future, this is of course
only my perferance, which in honesty may be biased but isn't entirely for
self reasons.

I think that's the conclusion I'm coming too as well.  While the use
of Unicode would have some advantages, there are various technical
issues with it (like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8).  I think those issues can
all be solved, but it would be a large distraction for the D
community.  Better to let some big, well-funded, massively popular
language pioneer in this area.  If some language with a billion
programmers decided to use Unicode, then you can bet that most of
these infrastructure problems would start to disappear quickly as
annoyed programmers start scratching their own itches and as they
start complaining to the people who write the tools they use.

Realistically, if I complain to any software vendor now that their
editor doesn't work well with D because they don't have funky Unicode
functionality, the response is likely to be "Sounds like a problem
with D, whatever that is".  If the language were Java or C++, though,
they would have little choice but to take the complaint seriously,
regardless of the effort required.

--bb

Oct 23 2008
Walter Bright <newshound1 digitalmars.com> writes:
Bill Baxter wrote:
I think that's the conclusion I'm coming too as well.  While the use
of Unicode would have some advantages, there are various technical
issues with it (like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8).  I think those issues can
all be solved, but it would be a large distraction for the D
community.  Better to let some big, well-funded, massively popular
language pioneer in this area.  If some language with a billion
programmers decided to use Unicode, then you can bet that most of
these infrastructure problems would start to disappear quickly as
annoyed programmers start scratching their own itches and as they
start complaining to the people who write the tools they use.

Realistically, if I complain to any software vendor now that their
editor doesn't work well with D because they don't have funky Unicode
functionality, the response is likely to be "Sounds like a problem
with D, whatever that is".  If the language were Java or C++, though,
they would have little choice but to take the complaint seriously,
regardless of the effort required.

Unfortunately, you might be right in that D is not currently in a
position to force the issue.

Oct 23 2008
"Nick Sabalausky" <a a.a> writes:
"Walter Bright" <newshound1 digitalmars.com> wrote in message
news:gdr4pe$2uje$1 digitalmars.com...
Bill Baxter wrote:
I think that's the conclusion I'm coming too as well.  While the use
of Unicode would have some advantages, there are various technical
issues with it (like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8).  I think those issues can
all be solved, but it would be a large distraction for the D
community.  Better to let some big, well-funded, massively popular
language pioneer in this area.  If some language with a billion
programmers decided to use Unicode, then you can bet that most of
these infrastructure problems would start to disappear quickly as
annoyed programmers start scratching their own itches and as they
start complaining to the people who write the tools they use.

Realistically, if I complain to any software vendor now that their
editor doesn't work well with D because they don't have funky Unicode
functionality, the response is likely to be "Sounds like a problem
with D, whatever that is".  If the language were Java or C++, though,
they would have little choice but to take the complaint seriously,
regardless of the effort required.

Unfortunately, you might be right in that D is not currently in a position
to force the issue.

My various thoughts:

Whatever language does end up forcing the issue is going to come up against
(inertial) resistance, either successfully or unsuccessfully. If D, right
now, were to be the language to attempt to force the issue, then like you
two have said, it would probably be unsuccesful. So, in order for the
unicode transition to ever be successful, it would have to be some other
language (or a version of D later down the road) that forces the issue.

However, if D and/or other similarly less-than-mainstream (I hate referring
to D that way, BTW) languages already had useful unicode support in a way
that *wasn't* trying to force the issue (ie, purely optional, with perfectly
acceptable ASCII fallbacks) when that "force the issue" language does come
along, then that can help cut down on the resistance that the "force the
issue" language encounters. We might not be able to crack the
chicken-and-the-egg, but we could help weaken it by providing a little extra
incentive of out own (again, as long as it was in a way that wasn't
forceful).

I do agree, though, with the people who have said that D has more important
things to focus on right now than unicode. And I would add that I see most
of D's biggest strengths as things where it cleans up and fixes the mistakes
made by the more pioneering languages like C++ or Java. So I think it would
be in true D style (in a good way) to wait for something else, like maybe
Fortress, to go muck around in unicode, and then we can design our unicode
to clean up the mistakes those languages will inevitably end up making
(instead leading our own language into a corner by making those "pioneer"
mistakes ourselves). Plus, hopefully by that time we'll have finally taken
care of the more pressing issues that we're currently facing. (Like

I hope that all made sense. I guess my summary is: Hold off on official
unicode stuff for now and learn from other's unicode mistakes. But, if we do
put official unicode stuff in right now, keep it in a way that doesn't force
the issue. And as for unofficial unicode stuff, I say go ahead, play around
with it, post it, do whatever.

Oct 23 2008
Don <nospam nospam.com.au> writes:
Andrei Alexandrescu wrote:
Correx:

_in_d_similarly_to/

Andrei

Andrei Alexandrescu wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

Andrei

Entering this debate late:

primarily exists for numerical programmers. So it's not so unreasonable
non-mathematicians.
"Funny" operators should never be seen by anyone without a mathematical
background. However, I'm not so sure how common they'd actually be.

The strongest use case seems to me to be the situation where multiple
related operations exist, but only one operator is available.
The classic example is vector products, where we have:
- vector dot vector
- vector cross vector
- Elementwise product of two vectors.
But we only have one opMul. So it would be useful to have alternate
multiplication signs available.
Adding  (opCross) as a multiplication which is non-associative would, I
think, be quite generally useful.

But, I think there aren't actually very many other operators which are
easy to justify on mathematical grounds. Largely because most unary
operations look quite OK when implemented as functions, and
mathematicians don't have a huge number of binary operators.
Other than dot product, cross product, and convolution, there's the
exclusive or symbol (+ with a circle around it), and everything else is
pretty obscure.

Apart from the dot and cross product, the inability to have superscripts
and subscripts in variable names (and comments!) is a much bigger issue,
in my experience.
Oh. And the lack of an exponentiation operator. I miss the old Commodore
64 up-arrow for power <g>

If you could completely ignore keyboard and display issues, and use any
unicode character as an operator, which ones would you actually use?

Oct 28 2008
Sergey Gromov <snake.scaly gmail.com> writes:
Don wrote:
If you could completely ignore keyboard and display issues, and use any
unicode character as an operator, which ones would you actually use?

I'd use dot "⋅" and cross "×" products for 3D, union "∪" and
intersection "∩", subset "⊂" and superset "⊃" and their negative forms.
I don't think I'd use anything else.

Well, comparisons look better when converted into appropriate unicode.

Oct 28 2008
bearophile <bearophileHUGS lycos.com> writes:
Sergey Gromov:
I'd use dot "⋅" and cross "×" products for 3D, union "∪" and
intersection "∩", subset "⊂" and superset "⊃" and their negative forms.
I don't think I'd use anything else.

I just want to note that the whole thread is almost unreadable on the
digitalmars.com/webnews/, because it doesn't digest unicode chars at all. So
adding unicode to D will give problems to show code.

Unrelated to the unicode, but related on those opSubset, opSuperset, etc:
while implementing a set() class with the same API of the Python sets, I have
seen there are the following operators/methods too:

issubset(other)
set <= other
Test whether every element in the set is in other.

set < other
Test whether the set is a true subset of other, that is, set <= other and set
!= other.

issuperset(other)
set >= other
Test whether every element in other is in the set.

set > other
Test whether the set is a true superset of other, that is, set >= other and set
!= other.

A full opCmp can't be defined on sets, so I think in D1 we can't overload <= >=
among sets... I think this is a problem has to be solved in D2, because sets
are important enough.

Bye,
bearophile

Oct 28 2008
KennyTM~ <kennytm gmail.com> writes:
bearophile wrote:
Sergey Gromov:
I'd use dot "â‹…" and cross "Ã—" products for 3D, union "âˆª" and
intersection "âˆ©", subset "âŠ‚" and superset "âŠƒ" and their
negative forms.
I don't think I'd use anything else.

I just want to note that the whole thread is almost unreadable on the
digitalmars.com/webnews/, because it doesn't digest unicode chars at all. So
adding unicode to D will give problems to show code.

Unrelated to the unicode, but related on those opSubset, opSuperset, etc:
while implementing a set() class with the same API of the Python sets, I have
seen there are the following operators/methods too:

issubset(other)
set <= other
Test whether every element in the set is in other.

set < other
Test whether the set is a true subset of other, that is, set <= other and set
!= other.

issuperset(other)
set >= other
Test whether every element in other is in the set.

set > other
Test whether the set is a true superset of other, that is, set >= other and
set != other.

A full opCmp can't be defined on sets, so I think in D1 we can't overload <=
>= among sets... I think this is a problem has to be solved in D2, because sets
are important enough.

Bye,
bearophile

If the two sets are incomparable, just return NaN... We need an opCmp
that returns a float :)

Oct 28 2008
KennyTM~ <kennytm gmail.com> writes:
KennyTM~ wrote:
bearophile wrote:
Sergey Gromov:
I'd use dot "â‹…" and cross "Ã—" products for 3D, union "âˆª" and
intersection "âˆ©", subset "âŠ‚" and superset "âŠƒ" and their
negative forms.
I don't think I'd use anything else.

I just want to note that the whole thread is almost unreadable on the
digitalmars.com/webnews/, because it doesn't digest unicode chars at
all. So adding unicode to D will give problems to show code.

Unrelated to the unicode, but related on those opSubset, opSuperset, etc:
while implementing a set() class with the same API of the Python sets,
I have seen there are the following operators/methods too:

issubset(other) set <= other Test whether every element in the set is
in other.

set < other Test whether the set is a true subset of other, that is,
set <= other and set != other.

issuperset(other) set >= other Test whether every element in other is
in the set.

set > other Test whether the set is a true superset of other, that is,
set >= other and set != other.

A full opCmp can't be defined on sets, so I think in D1 we can't
overload <= >= among sets... I think this is a problem has to be
solved in D2, because sets are important enough.

Bye,
bearophile

If the two sets are incomparable, just return NaN... We need an opCmp
that returns a float :)

Actually I've made a working solution. Even the exotic operators like
!<= (not a subset of, ⊈) works too. It's designed for demonstration, not
performance, though.

Oct 28 2008
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
Don wrote:
If you could completely ignore keyboard and display issues, and use any
unicode character as an operator, which ones would you actually use?

I'd use dot "⋅" and cross "×" products for 3D, union "∪" and
intersection "∩", subset "⊂" and superset "⊃" and their negative forms.
I don't think I'd use anything else.

Well, comparisons look better when converted into appropriate unicode.

In my opinion, a workable feature is this:

* Functions can be defined with a leading backspace. They will be usable
with the infix notation.

* There is a way of specifying that precedence of a function defined as
above is the same as precedence of a built-in operator.

* Functions of which name is the same as an HTML entity name for a
symbol can be replaced with the actual symbol.

Andrei

Oct 28 2008
"Bill Baxter" <wbaxter gmail.com> writes:
T24gV2VkLCBPY3QgMjksIDIwMDggYXQgNDoxMiBBTSwgQW5kcmVpIEFsZXhhbmRyZXNjdQo8U2Vl
V2Vic2l0ZUZvckVtYWlsQGVyZGFuaS5vcmc+IHdyb3RlOgo+IFNlcmdleSBHcm9tb3Ygd3JvdGU6
Cj4+Cj4+IERvbiB3cm90ZToKPj4+Cj4+PiBJZiB5b3UgY291bGQgY29tcGxldGVseSBpZ25vcmUg
a2V5Ym9hcmQgYW5kIGRpc3BsYXkgaXNzdWVzLCBhbmQgdXNlIGFueQo+Pj4gdW5pY29kZSBjaGFy
YWN0ZXIgYXMgYW4gb3BlcmF0b3IsIHdoaWNoIG9uZXMgd291bGQgeW91IGFjdHVhbGx5IHVzZT8K
Pj4KPj4gSSdkIHVzZSBkb3QgIuKLhSIgYW5kIGNyb3NzICLDlyIgcHJvZHVjdHMgZm9yIDNELCB1
bmlvbiAi4oiqIiBhbmQKPj4gaW50ZXJzZWN0aW9uICLiiKkiLCBzdWJzZXQgIuKKgiIgYW5kIHN1
cGVyc2V0ICLiioMiIGFuZCB0aGVpciBuZWdhdGl2ZSBmb3Jtcy4KPj4gIEkgZG9uJ3QgdGhpbmsg
SSdkIHVzZSBhbnl0aGluZyBlbHNlLgo+Pgo+PiBXZWxsLCBjb21wYXJpc29ucyBsb29rIGJldHRl
ciB3aGVuIGNvbnZlcnRlZCBpbnRvIGFwcHJvcHJpYXRlIHVuaWNvZGUuCj4KPiBJbiBteSBvcGlu
aW9uLCBhIHdvcmthYmxlIGZlYXR1cmUgaXMgdGhpczoKPgo+ICogRnVuY3Rpb25zIGNhbiBiZSBk
ZWZpbmVkIHdpdGggYSBsZWFkaW5nIGJhY2tzcGFjZS4gVGhleSB3aWxsIGJlIHVzYWJsZQo+IHdp
b3UncmUgbm90IHN1Z2dlc3Rpbmcgd2Ugd3JpdGUKXkhpbmZpeE9wZXJhdG9yLiA6LSkKCj4gKiBU
aGVyZSBpcyBhIHdheSBvZiBzcGVjaWZ5aW5nIHRoYXQgcHJlY2VkZW5jZSBvZiBhIGZ1bmN0aW9u
IGRlZmluZWQgYXMKPiBhYm92ZSBpcyB0aGUgc2FtZSBhcyBwcmVjZWRlbmNlIG9mIGEgYnVpbHQt
aW4gb3BlcmF0b3IuCgpXb3JrYWJsZSwgYnV0IGl0IGFpbid0IHdoYXQgV2FsdGVyIGNhbGxzIHBh
cnNpbmcuCgo+ICogRnVuY3Rpb25zIG9mIHdoaWNoIG5hbWUgaXMgdGhlIHNhbWUgYXMgYW4gSFRN
TCBlbnRpdHkgbmFtZSBmb3IgYSBzeW1ib2wKPiBjYW4gYmUgcmVwbGFjZWQgd2l0aCB0aGUgYWN0
dWFsIHN5bWJvbC4KCi0tYmIK

Oct 28 2008
Don <nospam nospam.com.au> writes:
Andrei Alexandrescu wrote:
Sergey Gromov wrote:
Don wrote:
If you could completely ignore keyboard and display issues, and use any
unicode character as an operator, which ones would you actually use?

I'd use dot "⋅" and cross "×" products for 3D, union "∪" and
intersection "∩", subset "⊂" and superset "⊃" and their negative forms.
I don't think I'd use anything else.

Well, comparisons look better when converted into appropriate unicode.

In my opinion, a workable feature is this:

* Functions can be defined with a leading backspace. They will be usable
with the infix notation.

* There is a way of specifying that precedence of a function defined as
above is the same as precedence of a built-in operator.

Do we really need to do that? How many Unicode binary operators are there?

This list of symbols which work in web browsers is very short.
http://en.wikipedia.org/wiki/Wikipedia:Mathematical_symbols

and how many of the items in it are comparison operators.
Any of the unicode comparison operators could be given the same
precedence as <,> and 'in'.
Cross should be given the same precedence as opMul and opDiv.
That just leaves oplus, otimes, which probably the same precedence as
plus and mul.

You can do the same thing with this list:
http://en.wikipedia.org/wiki/Unicode_Mathematical_Operators
And you find that the precedence of almost everything is easy to
determine. Seems like 90% of them are relational operators.

Specifying the precedence of each unicode operator (eg by a lookup
table) would be adequate for any use case I can imagine, and it wouldn't
make syntactic analysis any more ambiguous.

* Functions of which name is the same as an HTML entity name for a
symbol can be replaced with the actual symbol.


Oct 29 2008
Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
* There is a way of specifying that precedence of a function defined as
above is the same as precedence of a built-in operator.

That throws out the ability to parse without semantic analysis. It's not
worth it.

Oct 29 2008
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
Andrei Alexandrescu wrote:
* There is a way of specifying that precedence of a function defined
as above is the same as precedence of a built-in operator.

That throws out the ability to parse without semantic analysis. It's not
worth it.

It doesn't per a previous post of mine, but I agree it's still not worth it.

Andrei

Oct 29 2008
Benji Smith <dlanguage benjismith.net> writes:
Sergey Gromov wrote:
Don wrote:
If you could completely ignore keyboard and display issues, and use any
unicode character as an operator, which ones would you actually use?

I'd use dot "⋅" and cross "×" products for 3D, union "∪" and
intersection "∩", subset "⊂" and superset "⊃" and their negative forms.
I don't think I'd use anything else.

Well, comparisons look better when converted into appropriate unicode.

I have pretty much the same list.

For me the really compelling case for unicode characters isn't in
finding more operators. It's the brackets!!

--benji

Oct 28 2008
Moritz Warning <moritzwarning web.de> writes:
On Wed, 22 Oct 2008 17:27:58 -0500, Andrei Alexandrescu wrote:

Please vote up before the haters take it down, and discuss:

allowing_unicode_operators_in_d_similarly_to/

Andrei

It would be very nice to have unicode operators.
But what opFooBar functions do users need (most)?

opDotProduct and opCrossProduct would be definitely cool.

Oct 22 2008
Moritz Warning <moritzwarning web.de> writes:
On Wed, 22 Oct 2008 23:37:43 +0000, Moritz Warning wrote:

On Wed, 22 Oct 2008 17:27:58 -0500, Andrei Alexandrescu wrote:

Please vote up before the haters take it down, and discuss:

allowing_unicode_operators_in_d_similarly_to/

Andrei

It would be very nice to have unicode operators. But what opFooBar
functions do users need (most)?

opDotProduct and opCrossProduct would be definitely cool.

sorry posted in d.announce by .. accident. :/

Oct 22 2008
"Nick Sabalausky" <a a.a> writes:
"Moritz Warning" <moritzwarning web.de> wrote in message
news:gdodg7$1f5o$1 digitalmars.com...
On Wed, 22 Oct 2008 17:27:58 -0500, Andrei Alexandrescu wrote:

Please vote up before the haters take it down, and discuss:

allowing_unicode_operators_in_d_similarly_to/
Andrei

It would be very nice to have unicode operators.
But what opFooBar functions do users need (most)?

opDotProduct and opCrossProduct would be definitely cool.

I'd certainly like opIntersection and maybe opUnion.

Oct 22 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

As for how to input unicode -- Microsoft Word solved that problem ages
ago, assuming we're talking about small numbers of special characters.
It's called AutoCorrect. You just register your unicode symbol as a
misspelling for "(X)" or something unique like that and then every
time you type "(X)" a funky unicode character instantly replaces those
chars.

Yeh, not many editors support such a feature. But it's very easy to
to support input of Unicode chars in any language just by adding the
right definitions.

--bb

Oct 22 2008
Jesse Phillips <jessekphillips gmail.com> writes:
On Thu, 23 Oct 2008 09:52:34 +0900, Bill Baxter wrote:

On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

allowing_unicode_operators_in_d_similarly_to/

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

As for how to input unicode -- Microsoft Word solved that problem ages
ago, assuming we're talking about small numbers of special characters.
It's called AutoCorrect. You just register your unicode symbol as a
misspelling for "(X)" or something unique like that and then every time
you type "(X)" a funky unicode character instantly replaces those chars.

Yeh, not many editors support such a feature. But it's very easy to
implement. And with that one generic mechanism, your editor is ready to
support input of Unicode chars in any language just by adding the right
definitions.

--bb

I don't find this terribly appealing. Walter mentions having thrown out
support for 16bit processors and such. Why not through out 32bit too?
Those are going out of style.

The point is, it's not the languages job to force change of hardware. And
support via a text editor is also not acceptable. Going the software
support route relies on the OS to support a universal easy method to
enter unicode.

As for D's case, I say support unicode for these new operators, but
provide the same function with keyboard provided symbols.

Oct 22 2008
Don <nospam nospam.com.au> writes:
Bill Baxter wrote:
On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

I agree.
There is in fact a fairly defensible subset of Unicode: those characters
which are easy to type on some keyboard.  This would includes chevrons,
currency symbols (especially pound, euro, yen); european accented
characters (not terribly useful) and a couple of other punctuation
marks. After all, if it's painful to type a Euro symbol on your

The list is pretty much equivalent to the US-International keyboard
layout in  Windows. There aren't many useful characters in there, but it
might be enough.

The chevrons and the inverted ? and ! are perhaps the most interesting,
since they are paired. The multiply sign isn't bad, though.
With the German keyboards I have to use, some of these are less painful
to type than {}.

Oct 23 2008
Sergey Gromov <snake.scaly gmail.com> writes:
Thu, 23 Oct 2008 09:36:39 +0200,
Don wrote:
=AB =BB ? ? =B6 =A7 =AC ? ? ? ? ? =A4 ? =A9 =AE

Lots of question marks here.  This sucks.

Oct 23 2008
Spacen Jasset <spacenjasset yahoo.co.uk> writes:
Bill Baxter wrote:
On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

As for how to input unicode -- Microsoft Word solved that problem ages
ago, assuming we're talking about small numbers of special characters.
It's called AutoCorrect. You just register your unicode symbol as a
misspelling for "(X)" or something unique like that and then every
time you type "(X)" a funky unicode character instantly replaces those
chars.

Yeh, not many editors support such a feature. But it's very easy to
to support input of Unicode chars in any language just by adding the
right definitions.

--bb

I am not entirely sure that 30 or (x amount) of new operators would be a
good thing anyway. How hard is it to say m3 = m1.crossProduct(m2) ? vs
m3 = m1 X m2 ? and how often will that happen? It's also going to make
the language more difficult to learn and understand.

If set memebrship test operator and a few others are introduced, then
really to be "complete" all the set operators must be added, and
implemented.

Futhermore, the introduction of set operators should really mean that
you can use them on something by default, that means implementing sets
that presumably are usable, quick, and are worth using, otherwise peope
will roll thier own (all the time) in many different ways.

Unicode symbol 'x' may look better, but is it really more readable? I
think it is -- a bit, and it may be cool, but I don't think it's one of
the things that is going to make developing software siginficantly easier.

Why unicode anyway? In the same way that editor support is required to
actually type them in, why not let the editor render them. So instead of
symbol 'x' in the source code, say:

m3 = m1 cross_product m2

as an infix notatation in a similar way to the (uniary) sizeof operator.

While cross_product is a bit long and unwieldy any editor capable can
replace the rendition of that keyword with a symbol for it. But in
editors that don't it means that it still can be typed in and/or
displayed easily.

Another option includes providing cross_product as an 'alias' and 'X'
aswell.

Which then leads on to the introduction of a facility to add arbitary
operators, which could be interesting becuase you can supply any
operator you see fit for the domains that you use that require it. --
This provide exactly the right solution though as all the additions
would be 'non standard' and I can see books in the future recommending
people not use unicode operators, becuase editors don't have support for
them.

If D is to be used on a wide variety of platforms, which would be
desirable if it is to gain traction, then editor support barriers like
this could impeede it's progress.

Oct 25 2008
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Spacen Jasset wrote:
Bill Baxter wrote:
On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

As for how to input unicode -- Microsoft Word solved that problem ages
ago, assuming we're talking about small numbers of special characters.
It's called AutoCorrect. You just register your unicode symbol as a
misspelling for "(X)" or something unique like that and then every
time you type "(X)" a funky unicode character instantly replaces those
chars.

Yeh, not many editors support such a feature. But it's very easy to
to support input of Unicode chars in any language just by adding the
right definitions.

--bb

I am not entirely sure that 30 or (x amount) of new operators would be a
good thing anyway. How hard is it to say m3 = m1.crossProduct(m2) ? vs
m3 = m1 X m2 ? and how often will that happen? It's also going to make
the language more difficult to learn and understand.

I have noticed that in pretty much all scientific code, the f(a, b) and
a.f(b) notations fall off a readability cliff when the number of
operators grows only to a handful. Lured by simple examples like yours,
people don't see that as a problem until they actually have to read or
write such code. Adding temporaries and such is not that great because
it further takes the algorithm away from its mathematical form just for
serving a notation that was the problem in the first place.

If set memebrship test operator and a few others are introduced, then
really to be "complete" all the set operators must be added, and
implemented.

Futhermore, the introduction of set operators should really mean that
you can use them on something by default, that means implementing sets
that presumably are usable, quick, and are worth using, otherwise peope
will roll thier own (all the time) in many different ways.

Unicode symbol 'x' may look better, but is it really more readable? I
think it is -- a bit, and it may be cool, but I don't think it's one of
the things that is going to make developing software siginficantly easier.

I think "cool" has not a lot to do with it. For scientific code, it's
closer to a necessity.

Andrei

Oct 25 2008
Spacen Jasset <spacenjasset yahoo.co.uk> writes:
Andrei Alexandrescu wrote:
Spacen Jasset wrote:
Bill Baxter wrote:
On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

As for how to input unicode -- Microsoft Word solved that problem ages
ago, assuming we're talking about small numbers of special characters.
It's called AutoCorrect. You just register your unicode symbol as a
misspelling for "(X)" or something unique like that and then every
time you type "(X)" a funky unicode character instantly replaces those
chars.

Yeh, not many editors support such a feature. But it's very easy to
to support input of Unicode chars in any language just by adding the
right definitions.

--bb

I am not entirely sure that 30 or (x amount) of new operators would be
a good thing anyway. How hard is it to say m3 = m1.crossProduct(m2) ?
vs m3 = m1 X m2 ? and how often will that happen? It's also going to
make the language more difficult to learn and understand.

I have noticed that in pretty much all scientific code, the f(a, b) and
a.f(b) notations fall off a readability cliff when the number of
operators grows only to a handful. Lured by simple examples like yours,
people don't see that as a problem until they actually have to read or
write such code. Adding temporaries and such is not that great because
it further takes the algorithm away from its mathematical form just for
serving a notation that was the problem in the first place.

Yes, that is indeed a fair point and I agree. D is a "systems
programming language." [sic] though; and so what will people use it for
in the main? I suggest that communities that require scientific code
have options now, and that they can and do choose languages for the
purpose which have better support for thier needs than D might achieve.

If set memebrship test operator and a few others are introduced, then
really to be "complete" all the set operators must be added, and
implemented.

Futhermore, the introduction of set operators should really mean that
you can use them on something by default, that means implementing sets
that presumably are usable, quick, and are worth using, otherwise
peope will roll thier own (all the time) in many different ways.

Unicode symbol 'x' may look better, but is it really more readable? I
think it is -- a bit, and it may be cool, but I don't think it's one
of the things that is going to make developing software siginficantly
easier.

I think "cool" has not a lot to do with it. For scientific code, it's
closer to a necessity.

On my use of "cool" I only brought it up as this thread has a few
mentions of the word and it's a bit nebulous. I, personally, am more
concerened with practicality than "cool".

Andrei

What I think of unicode symbols therefore depends on whether D should be
more scientific oriented or not. If it should be, then unicode symbols
would undoubtedly be a benefit. My responses were guided by the
assumption that D was more generic in nature, though.

Oct 25 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sun, Oct 26, 2008 at 3:46 AM, Spacen Jasset <spacenjasset yahoo.co.uk> wrote:
I am not entirely sure that 30 or (x amount) of new operators would be a
good thing anyway. How hard is it to say m3 = m1.crossProduct(m2) ? vs m3 =
m1 X m2 ? and how often will that happen? It's also going to make the
language more difficult to learn and understand.

I have noticed that in pretty much all scientific code, the f(a, b) and
a.f(b) notations fall off a readability cliff when the number of operators
grows only to a handful. Lured by simple examples like yours, people don't
see that as a problem until they actually have to read or write such code.
Adding temporaries and such is not that great because it further takes the
algorithm away from its mathematical form just for serving a notation that
was the problem in the first place.

Yes, heavy math code is hard to read in the current situation.
I almost always prefix any significant math with a comment giving the
equations being implemented in a more compact notation.
Having to write the same thing in two different ways like that is a
waste of effort.
It would be very cool if I could just write it once and have it look
like it does in my notebook.

Yes, that is indeed a fair point and I agree. D is a "systems programming
language." [sic] though; and so what will people use it for in the main?

D is a compile-to-the-metal language that is of interest to anyone who
ranks performance high on their list of priorities.  Mathemeticians
and scientists are among the few remaining groups where maximum speed
is still needed.  Games are another area, and games are becoming more
and more sophisticated mathematically under the hood.

I suggest that communities that require scientific code have options now, and
that they can and do choose languages for the purpose which have better
support for thier needs than D might achieve.

The traditional math languages suck at doing anything besides math.
Want to do a bit of math then display the results interactively in an
OpenGL window?  With Fortran?!  Ha!

On the other end there are the Matlab and NumPy-type solutions.  They
are convenient for tinkering around and displaying some results, but
these are not good for performance.

D has both.  So I think D has potential to gain traction in the world
of math-heavy computing.

But anyway, I'm got convinced several posts back that the time is not
yet ripe for Unicode in D.  So I'm not gonna argue that D go Unicode
now.   I'm just saying that math code is hard to read, and that heavy
math users are a good target audience for D because they need
performance, but don't necessarily want to give up
general-purposeness.

--bb

Oct 25 2008
bearophile <bearophileHUGS lycos.com> writes:
Bill Baxter:
On the other end there are the Matlab and NumPy-type solutions.  They
are convenient for tinkering around and displaying some results, but
these are not good for performance.

I have seen many scientific programs that use numpy, so sometimes it's fast
enough. But it forces you to write everything in a vector programming style,
that a procedural programmer needs time to learn. Normal C/D/C++ code is more
flexible, you can work on single items too in a fast way, while in numpy you
can go fast only when you work in bulk, on vectors.

On the other hand numpy offers you some higher level operations on arrays that
are currently missing in D, like certain complex slicing operations, that may
looks more like formulas); I can show you some examples if you want. Note that
in D there's no built-in rectangular dynamic arrays, that are basic stuff in
numpy/matlab.

Bye,
bearophile

Oct 25 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sun, Oct 26, 2008 at 5:10 AM, bearophile <bearophileHUGS lycos.com> wrote:
Bill Baxter:
On the other end there are the Matlab and NumPy-type solutions.  They
are convenient for tinkering around and displaying some results, but
these are not good for performance.

I have seen many scientific programs that use numpy, so sometimes it's fast
enough. But it forces you to write everything in a vector programming style,
that a procedural programmer needs time to learn. Normal C/D/C++ code is more
flexible, you can work on single items too in a fast way, while in numpy you
can go fast only when you work in bulk, on vectors.

Yep  C/D/C++ is easier.  The SciPy.org site has a growing section of
their wiki devoted to how to make your code fast using various levels
of python/native hybrids.  I was using python heavily for numerical
stuff for a while and it got to the point where I realized that the
time I spent trying to figure out how to vectorize things and use
other tricks to make things fast, and to make python modules out of
external code I wanted to call,  etc.  was actually more work than it
would be to just use D for everything.   Sure Python does have some
nice features as a language that D lacks, but from 10,000 ft  D is a
lot closer to Python than C++ in terms of ease of use.  Also, while
Python is nice for arrays and number crunching, I found the lack of
typing to be a liability when it comes to complicated graph
structures.  Instead of nicely typed pointers that the compiler can
tell apart, you end up with 23 different integer index variables that
you have to keep straight.  And finally, also type related, there's
the annoyance that you have to actually run your app to detect typos.

I'm sure there's way's to work around all those issues, but to me D's
a lot easier.  I simply don't need the workarounds.

I still fire up NumPy and Matplotlib for analyzing the from results
from my D programs.  And SymPy is great too.  I just don't use it as
my main development langauge any more.

On the other hand numpy offers you some higher level operations on arrays that
are currently missing in D, like certain complex slicing operations, that may
looks more like formulas); I can show you some examples if you want.

No thanks!  Been there, done that!

Note that in D there's no built-in rectangular dynamic arrays, that are basic
stuff in numpy/matlab.

I've got my dflat and gobo
(http://www.dsource.org/projects/multiarray) that are working for me
pretty well.  They could use some full-time loving to make more
operations work intuitively, but the basics work ok.

--bb

Oct 25 2008
bearophile <bearophileHUGS lycos.com> writes:
Bill Baxter:

was actually more work than it would be to just use D for everything.<

Mixing languages isn't nice, I agree. That's why I too use D for several
purposes.

But if you have to change your code very often (and if your problems are of a
certain kind that allow a natural vectorization), then having vectorial (short)
code may have some advantages), think about how much C++ code you need to write
to implement the programs of this book:
http://wiki.deductivethinking.com/wiki/Python_Programs_for_Modelling_Infectious_Diseases_book
So it allows a more explorative way of coding.

Sure Python does have some nice features as a language that D lacks, but from
10,000 ft  D is a lot closer to Python than C++ in terms of ease of use.<

My experience with the ShedSkin compiler shows me that most of those features
that D lacks (complex slices, list comps, generators, short syntax, some
near-zero-cost safeties, etc) are absent because of cultural or inertial
reasons present in the brain of people used to C/C++, and not because they
can't be present/added in a language like D.
ShedSkin translates Python code to clean C++ code, showing that it can be done,
it gives advantages, and it's not too much difficult to do. It shows once and
forever, that you can have a C++-class language with a short and nice syntax,
etc.
Hopefully the Delight language has less of the cultural inertia coming from
C/C++, so it may become a better compromise than D itself.

I've got my dflat and gobo (http://www.dsource.org/projects/multiarray) that
are working for me pretty well.  They could use some full-time loving to make
more operations work intuitively, but the basics work ok.<

Nice stuff, lot of stuff. More comments require more study of that code. D
(Tango) may gain from having more batteries.

Bye,
bearophile

Oct 25 2008
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Spacen Jasset wrote:
Andrei Alexandrescu wrote:
Spacen Jasset wrote:
Bill Baxter wrote:
On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

As for how to input unicode -- Microsoft Word solved that problem ages
ago, assuming we're talking about small numbers of special characters.
It's called AutoCorrect. You just register your unicode symbol as a
misspelling for "(X)" or something unique like that and then every
time you type "(X)" a funky unicode character instantly replaces those
chars.

Yeh, not many editors support such a feature. But it's very easy to
to support input of Unicode chars in any language just by adding the
right definitions.

--bb

I am not entirely sure that 30 or (x amount) of new operators would
be a good thing anyway. How hard is it to say m3 =
m1.crossProduct(m2) ? vs m3 = m1 X m2 ? and how often will that
happen? It's also going to make the language more difficult to learn
and understand.

I have noticed that in pretty much all scientific code, the f(a, b)
and a.f(b) notations fall off a readability cliff when the number of
operators grows only to a handful. Lured by simple examples like
yours, people don't see that as a problem until they actually have to
read or write such code. Adding temporaries and such is not that great
because it further takes the algorithm away from its mathematical form
just for serving a notation that was the problem in the first place.

Yes, that is indeed a fair point and I agree. D is a "systems
programming language." [sic] though; and so what will people use it for
in the main? I suggest that communities that require scientific code
have options now, and that they can and do choose languages for the
purpose which have better support for thier needs than D might achieve.

Surprisingly there's not a lot of choice, witnessed by the prevalence of
Fortran for scientific code. One interesting thing is that quite a few
scientific coders mess with D and hang out around here, such as Don
Clugston, Bill Baxter, bearophile, Benji Smith (he's doing machine
learning if I remember correctly) and, if I may aspire to the status,
yours truly.

(I remain with an unformed opinion regarding Unicode operators.)

Andrei

Oct 25 2008
Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Andrei Alexandrescu wrote:
Spacen Jasset wrote:
Bill Baxter wrote:
On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

As for how to input unicode -- Microsoft Word solved that problem ages
ago, assuming we're talking about small numbers of special characters.
It's called AutoCorrect. You just register your unicode symbol as a
misspelling for "(X)" or something unique like that and then every
time you type "(X)" a funky unicode character instantly replaces those
chars.

Yeh, not many editors support such a feature. But it's very easy to
to support input of Unicode chars in any language just by adding the
right definitions.

--bb

I am not entirely sure that 30 or (x amount) of new operators would be
a good thing anyway. How hard is it to say m3 = m1.crossProduct(m2) ?
vs m3 = m1 X m2 ? and how often will that happen? It's also going to
make the language more difficult to learn and understand.

I have noticed that in pretty much all scientific code, the f(a, b) and
a.f(b) notations fall off a readability cliff when the number of
operators grows only to a handful. Lured by simple examples like yours,
people don't see that as a problem until they actually have to read or
write such code. Adding temporaries and such is not that great because
it further takes the algorithm away from its mathematical form just for
serving a notation that was the problem in the first place.

But what operators would be added? Some mathematician programmers might
want vector and matrix operators, others set operators, others still
derivation/integration operators, and so on. Where would we stop?
I don't deny it might be useful for them, but it does seem like too
specific a need to integrate in the language.

--
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Oct 26 2008
KennyTM~ <kennytm gmail.com> writes:
Bruno Medeiros wrote:
Andrei Alexandrescu wrote:
Spacen Jasset wrote:
Bill Baxter wrote:
On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

As for how to input unicode -- Microsoft Word solved that problem ages
ago, assuming we're talking about small numbers of special characters.
It's called AutoCorrect. You just register your unicode symbol as a
misspelling for "(X)" or something unique like that and then every
time you type "(X)" a funky unicode character instantly replaces those
chars.

Yeh, not many editors support such a feature. But it's very easy to
to support input of Unicode chars in any language just by adding the
right definitions.

--bb

I am not entirely sure that 30 or (x amount) of new operators would
be a good thing anyway. How hard is it to say m3 =
m1.crossProduct(m2) ? vs m3 = m1 X m2 ? and how often will that
happen? It's also going to make the language more difficult to learn
and understand.

I have noticed that in pretty much all scientific code, the f(a, b)
and a.f(b) notations fall off a readability cliff when the number of
operators grows only to a handful. Lured by simple examples like
yours, people don't see that as a problem until they actually have to
read or write such code. Adding temporaries and such is not that great
because it further takes the algorithm away from its mathematical form
just for serving a notation that was the problem in the first place.

But what operators would be added? Some mathematician programmers might
want vector and matrix operators, others set operators, others still
derivation/integration operators, and so on. Where would we stop?
I don't deny it might be useful for them, but it does seem like too
specific a need to integrate in the language.

Composition may be useful for functional programming (I've never used
any functional programming paradigm except "reduce".)

Matrix operations: + - * .tr() .inv() .det() etc are already sufficient
for most jobs.

Vector operations: Maybe an operator for cross product.

Set operators: Just use + - * (| ~ &) instead like Pascal.

So only 2 Unicode operators I see are really useful and the replacements
are ugly: Composition (o) and cross product (×).

Oct 26 2008
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bruno Medeiros wrote:
Andrei Alexandrescu wrote:
Spacen Jasset wrote:
Bill Baxter wrote:
On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

As for how to input unicode -- Microsoft Word solved that problem ages
ago, assuming we're talking about small numbers of special characters.
It's called AutoCorrect. You just register your unicode symbol as a
misspelling for "(X)" or something unique like that and then every
time you type "(X)" a funky unicode character instantly replaces those
chars.

Yeh, not many editors support such a feature. But it's very easy to
to support input of Unicode chars in any language just by adding the
right definitions.

--bb

I am not entirely sure that 30 or (x amount) of new operators would
be a good thing anyway. How hard is it to say m3 =
m1.crossProduct(m2) ? vs m3 = m1 X m2 ? and how often will that
happen? It's also going to make the language more difficult to learn
and understand.

I have noticed that in pretty much all scientific code, the f(a, b)
and a.f(b) notations fall off a readability cliff when the number of
operators grows only to a handful. Lured by simple examples like
yours, people don't see that as a problem until they actually have to
read or write such code. Adding temporaries and such is not that great
because it further takes the algorithm away from its mathematical form
just for serving a notation that was the problem in the first place.

But what operators would be added? Some mathematician programmers might
want vector and matrix operators, others set operators, others still
derivation/integration operators, and so on. Where would we stop?
I don't deny it might be useful for them, but it does seem like too
specific a need to integrate in the language.

I was thinking of allowing a general way of defining one Unicode
character to stand in as one operator, and then have libraries implement
the actual operators.

There's the remaining problem of different libraries defining the same
character to mean different operators. This may not be huge as math
subdomains tend to be rather consistent in their use of operators.

Also, ascii representation should be allowed for operators, and one nice
thing about Unicode characters is that many have HTML ascii and
http://www.fileformat.info/format/w3c/htmlentity.htm. So
\unicodecharname may be a good alternate way to enter these operators.
For example, the empty set could be \empty, and the cross-product could
be written as \times. So

c = a \times b;

doesn't quite look bad to me.

such, we just use stuff that others (creators and users alike) have
already pored over. Saves on documentation writing too :o).

Andrei

Oct 26 2008
KennyTM~ <kennytm gmail.com> writes:
Andrei Alexandrescu wrote:
Bruno Medeiros wrote:
Andrei Alexandrescu wrote:
Spacen Jasset wrote:
Bill Baxter wrote:
On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

As for how to input unicode -- Microsoft Word solved that problem ages
ago, assuming we're talking about small numbers of special characters.
It's called AutoCorrect. You just register your unicode symbol as a
misspelling for "(X)" or something unique like that and then every
time you type "(X)" a funky unicode character instantly replaces those
chars.

Yeh, not many editors support such a feature. But it's very easy to
to support input of Unicode chars in any language just by adding the
right definitions.

--bb

I am not entirely sure that 30 or (x amount) of new operators would
be a good thing anyway. How hard is it to say m3 =
m1.crossProduct(m2) ? vs m3 = m1 X m2 ? and how often will that
happen? It's also going to make the language more difficult to learn
and understand.

I have noticed that in pretty much all scientific code, the f(a, b)
and a.f(b) notations fall off a readability cliff when the number of
operators grows only to a handful. Lured by simple examples like
yours, people don't see that as a problem until they actually have to
read or write such code. Adding temporaries and such is not that
great because it further takes the algorithm away from its
mathematical form just for serving a notation that was the problem in
the first place.

But what operators would be added? Some mathematician programmers
might want vector and matrix operators, others set operators, others
still derivation/integration operators, and so on. Where would we stop?
I don't deny it might be useful for them, but it does seem like too
specific a need to integrate in the language.

I was thinking of allowing a general way of defining one Unicode
character to stand in as one operator, and then have libraries implement
the actual operators.

There's the remaining problem of different libraries defining the same
character to mean different operators. This may not be huge as math
subdomains tend to be rather consistent in their use of operators.

Also, ascii representation should be allowed for operators, and one nice
thing about Unicode characters is that many have HTML ascii and
http://www.fileformat.info/format/w3c/htmlentity.htm. So
\unicodecharname may be a good alternate way to enter these operators.
For example, the empty set could be \empty, and the cross-product could
be written as \times. So

c = a \times b;

doesn't quite look bad to me.

such, we just use stuff that others (creators and users alike) have
already pored over. Saves on documentation writing too :o).

Andrei

LaTeX in D? :p

Anyway we already have \&times; and \&empty; so we could reuse them in
source code level as I've described somewhere in this thread.

auto torque = position \&times; force;

This is uglier than

auto torque = position \times force;

but it gives a uniform syntax between escape sequences inside and
outside strings.

The problem is you may have to invent some names, i.e. the composition
operator ∘ (U+2218 ring operator) has no name in SGML entities. In LaTeX
it is represented as \circ but \&circ; is already taken by ˆ (U+02C6
modifier letter circumflex accent).

And you'll need to predefine the associativity and operation precedence
too. ;) See my other entry in this thread.

Oct 26 2008
Bruno Medeiros wrote:
Andrei Alexandrescu wrote:
Spacen Jasset wrote:
Bill Baxter wrote:
On Thu, Oct 23, 2008 at 7:27 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

(My comment cross posted here from reddit)

I think the right way to do it is not to make everything Unicode. All
the pressure on the existing symbols would be dramatically relieved by
the addition of just a handful of new symbols.

The truth is keyboards aren't very good for inputting Unicode. That
isn't likely to change. Yes they've dealt with the problem in Asian
languages by using IMEs but in my opinion IMEs are horrible to use.

Some people seem to argue it's a waste to go to Unicode only for a few
symbols. If you're going to go Unicode, you should go whole hog. I'd
argue the exact opposite. If you're going to go Unicode, it should be
done in moderation. Use as little Unicode as necessary and no more.

As for how to input unicode -- Microsoft Word solved that problem ages
ago, assuming we're talking about small numbers of special characters.
It's called AutoCorrect. You just register your unicode symbol as a
misspelling for "(X)" or something unique like that and then every
time you type "(X)" a funky unicode character instantly replaces those
chars.

Yeh, not many editors support such a feature. But it's very easy to
to support input of Unicode chars in any language just by adding the
right definitions.

--bb

I am not entirely sure that 30 or (x amount) of new operators would
be a good thing anyway. How hard is it to say m3 =
m1.crossProduct(m2) ? vs m3 = m1 X m2 ? and how often will that
happen? It's also going to make the language more difficult to learn
and understand.

I have noticed that in pretty much all scientific code, the f(a, b)
and a.f(b) notations fall off a readability cliff when the number of
operators grows only to a handful. Lured by simple examples like
yours, people don't see that as a problem until they actually have to
read or write such code. Adding temporaries and such is not that great
because it further takes the algorithm away from its mathematical form
just for serving a notation that was the problem in the first place.

But what operators would be added? Some mathematician programmers might
want vector and matrix operators, others set operators, others still
derivation/integration operators, and so on. Where would we stop?
I don't deny it might be useful for them, but it does seem like too
specific a need to integrate in the language.

Perhaps what needs to be added is a syntax for defining character to
function correspondence?  That way people could define the binary
functions that they need, and then define a corresponding character
string that represented it.  I once recommended that Eiffel include a
means of defining user operators (i.e., binary functions that sit
between the terms on which the operate) using the name syntax thusly:

Starts and ends with '|' and doesn't contain any whitespace.  Must be
surrounded by whitespace when used.  I.e. 1 |X|-3 would be forbidden, as
there is no whitespace following the |X| operator.

That still seems like a good rule to me.  If you want to include
unicode, that's no problem.  And the function could also be used as:
X(1, -3)
with identical meaning.  I.e., marking a function as an operator by
surrounding it with pipes would be purely syntax sugar.  Note that such
operators would have a precedence higher than assignment, but lower than
everything else, so in practice the choice would be between writing:
X (1, -3)
and writing:
(1 |X| -3)
unless all one were doing is making an assignment.  This is analogous to
the class member variable in object methods, or the class name in class
methods, except that that is often understood.

OTOH, I'm not certain how much such syntax buys you.

P.S.:  another possibility, which is more in line with current D syntax
requires an assignment of the operator character to a function that
starts with op.  As in '+' is associated with opAdd.  However even
though this is more in line with current D syntax, it seems to buy you a
lot less.  And it seems to require that the operator be a single
character.  This appears to me to be more work than it's worth for the
return.  Even the approach that I suggested is probably marginal.

P.P.S:  Any system that requires that a specific IDE or editor be used
is no going to work.  Not unless the IDE were provided with the
language, and even then the most successful examples I can thing of are
EMACS and Smalltalk.  (I'm excluding programs that don't run on Linux,
as I have no familiarity with either how they function or how popular
they are.  Probably, though, one could include Visual Basic and maybe
some others.  But one certainly couldn't include Basic, merely one
dialect of it.)

Oct 26 2008
"Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Sat, 25 Oct 2008 12:14:47 +0200, Spacen Jasset
<spacenjasset yahoo.co.uk> wrote:

Why unicode anyway? In the same way that editor support is required to
actually type them in, why not let the editor render them. So instead of
symbol 'x' in the source code, say:

m3 = m1 cross_product m2

as an infix notatation in a similar way to the (uniary) sizeof operator.

While cross_product is a bit long and unwieldy any editor capable can
replace the rendition of that keyword with a symbol for it. But in
editors that don't it means that it still can be typed in and/or
displayed easily.

Another option includes providing cross_product as an 'alias' and 'X'
aswell.

Which then leads on to the introduction of a facility to add arbitary
operators, which could be interesting becuase you can supply any
operator you see fit for the domains that you use that require it. --
This provide exactly the right solution though as all the additions
would be 'non standard' and I can see books in the future recommending
people not use unicode operators, becuase editors don't have support for
them.

This made me think. What if we /could/ define arbitrary infix operators in
D? I'm thinking something along the lines of:

operator cross_product(T, U)
{
static if (T.opCross)
{
T.opCross(T)
}
else static if (U.opCross)
{
U.opCross_r(T);
}
else
{
static assert(false, "Operator not applicable to operands.");
}
}

alias cross_product ×;

I'm not sure if this is possible, but it sure would please downs. :P

--
Simen

Oct 26 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sun, Oct 26, 2008 at 11:02 PM, Simen Kjaeraas <simen.kjaras gmail.com> w=
rote:
On Sat, 25 Oct 2008 12:14:47 +0200, Spacen Jasset <spacenjasset yahoo.co.=

uk>
wrote:

Why unicode anyway? In the same way that editor support is required to
actually type them in, why not let the editor render them. So instead of
symbol 'x' in the source code, say:

m3 =3D m1 cross_product m2

as an infix notatation in a similar way to the (uniary) sizeof operator.

While cross_product is a bit long and unwieldy any editor capable can
replace the rendition of that keyword with a symbol for it. But in edito=

rs
that don't it means that it still can be typed in and/or displayed easil=

y.
Another option includes providing cross_product as an 'alias' and 'X'
aswell.

Which then leads on to the introduction of a facility to add arbitary
operators, which could be interesting becuase you can supply any operato=

r
you see fit for the domains that you use that require it. -- This provid=

e
exactly the right solution though as all the additions would be 'non
standard' and I can see books in the future recommending people not use
unicode operators, becuase editors don't have support for them.

This made me think. What if we /could/ define arbitrary infix operators i=

n
D? I'm thinking something along the lines of:

operator cross_product(T, U)
{
static if (T.opCross)
{
T.opCross(T)
}
else static if (U.opCross)
{
U.opCross_r(T);
}
else
{
static assert(false, "Operator not applicable to operands.");
}
}

alias cross_product =D7;

I'm not sure if this is possible, but it sure would please downs. :P

What's the precedence of your user-defined in-fix operator?

--bb

Oct 26 2008
"Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Sun, 26 Oct 2008 22:28:16 +0100, Bill Baxter <wbaxter gmail.com> wrote:

On Sun, Oct 26, 2008 at 11:02 PM, Simen Kjaeraas
<simen.kjaras gmail.com> wrote:
On Sat, 25 Oct 2008 12:14:47 +0200, Spacen Jasset
<spacenjasset yahoo.co.uk>
wrote:

Why unicode anyway? In the same way that editor support is required to
actually type them in, why not let the editor render them. So instead
of
symbol 'x' in the source code, say:

m3 = m1 cross_product m2

as an infix notatation in a similar way to the (uniary) sizeof
operator.

While cross_product is a bit long and unwieldy any editor capable can
replace the rendition of that keyword with a symbol for it. But in
editors
that don't it means that it still can be typed in and/or displayed
easily.

Another option includes providing cross_product as an 'alias' and 'X'
aswell.

Which then leads on to the introduction of a facility to add arbitary
operators, which could be interesting becuase you can supply any
operator
you see fit for the domains that you use that require it. -- This
provide
exactly the right solution though as all the additions would be 'non
standard' and I can see books in the future recommending people not use
unicode operators, becuase editors don't have support for them.

This made me think. What if we /could/ define arbitrary infix operators
in
D? I'm thinking something along the lines of:

operator cross_product(T, U)
{
static if (T.opCross)
{
T.opCross(T)
}
else static if (U.opCross)
{
U.opCross_r(T);
}
else
{
static assert(false, "Operator not applicable to operands.");
}
}

alias cross_product ×;

I'm not sure if this is possible, but it sure would please downs. :P

What's the precedence of your user-defined in-fix operator?

--bb

Yup, I realized this myself as well. Seemed like such a great idea when I
only thought of it for three seconds. :p

--
Simen

Oct 26 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Mon, Oct 27, 2008 at 8:23 AM, Simen Kjaeraas <simen.kjaras gmail.com> wr=
ote:
On Sun, 26 Oct 2008 22:28:16 +0100, Bill Baxter <wbaxter gmail.com> wrote=

:
On Sun, Oct 26, 2008 at 11:02 PM, Simen Kjaeraas <simen.kjaras gmail.com=

wrote:
On Sat, 25 Oct 2008 12:14:47 +0200, Spacen Jasset
<spacenjasset yahoo.co.uk>
wrote:

Why unicode anyway? In the same way that editor support is required to
actually type them in, why not let the editor render them. So instead =

of
symbol 'x' in the source code, say:

m3 =3D m1 cross_product m2

as an infix notatation in a similar way to the (uniary) sizeof operato=

r.
While cross_product is a bit long and unwieldy any editor capable can
replace the rendition of that keyword with a symbol for it. But in
editors
that don't it means that it still can be typed in and/or displayed
easily.

Another option includes providing cross_product as an 'alias' and 'X'
aswell.

Which then leads on to the introduction of a facility to add arbitary
operators, which could be interesting becuase you can supply any
operator
you see fit for the domains that you use that require it. -- This
provide
exactly the right solution though as all the additions would be 'non
standard' and I can see books in the future recommending people not us=

e
unicode operators, becuase editors don't have support for them.

This made me think. What if we /could/ define arbitrary infix operators
in
D? I'm thinking something along the lines of:

operator cross_product(T, U)
{
static if (T.opCross)
{
T.opCross(T)
}
else static if (U.opCross)
{
U.opCross_r(T);
}
else
{
static assert(false, "Operator not applicable to operands.");
}
}

alias cross_product =D7;

I'm not sure if this is possible, but it sure would please downs. :P

What's the precedence of your user-defined in-fix operator?

--bb

Yup, I realized this myself as well. Seemed like such a great idea when I
only thought of it for three seconds. :p

Same thing goes for downs' in-fix operators.  I think his syntax is
/infix/ which means that his ops always have the same precedence as
division.
I'm guessing this Python Cookbook recipe is very similar to Downs'
technique.  It discusses pros and cons and such.
http://code.activestate.com/recipes/384122/

--bb

Oct 26 2008
"Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Mon, 27 Oct 2008 00:41:26 +0100, Bill Baxter <wbaxter gmail.com> wrote:
Same thing goes for downs' in-fix operators.  I think his syntax is
/infix/ which means that his ops always have the same precedence as
division.
I'm guessing this Python Cookbook recipe is very similar to Downs'
technique.  It discusses pros and cons and such.
http://code.activestate.com/recipes/384122/

--bb

An interesting read, though I have looked at downs' code before. It
occured to
me now that this could sorta have been fixed with a preprocessor, just
define
an operator to have the same precedence as an already existing operator,
define
an alias that gets replaced with /foo/, +foo+, or whatever operator you
chose.
I guess we're stuck waiting for macros in the meantime.

--
Simen

Oct 26 2008
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Simen Kjaeraas wrote:
On Sun, 26 Oct 2008 22:28:16 +0100, Bill Baxter <wbaxter gmail.com> wrote:

On Sun, Oct 26, 2008 at 11:02 PM, Simen Kjaeraas
<simen.kjaras gmail.com> wrote:
On Sat, 25 Oct 2008 12:14:47 +0200, Spacen Jasset
<spacenjasset yahoo.co.uk>
wrote:

Why unicode anyway? In the same way that editor support is required to
actually type them in, why not let the editor render them. So
symbol 'x' in the source code, say:

m3 = m1 cross_product m2

as an infix notatation in a similar way to the (uniary) sizeof
operator.

While cross_product is a bit long and unwieldy any editor capable can
replace the rendition of that keyword with a symbol for it. But in
editors
that don't it means that it still can be typed in and/or displayed
easily.

Another option includes providing cross_product as an 'alias' and 'X'
aswell.

Which then leads on to the introduction of a facility to add arbitary
operators, which could be interesting becuase you can supply any
operator
you see fit for the domains that you use that require it. -- This
provide
exactly the right solution though as all the additions would be 'non
standard' and I can see books in the future recommending people not use
unicode operators, becuase editors don't have support for them.

This made me think. What if we /could/ define arbitrary infix
operators in
D? I'm thinking something along the lines of:

operator cross_product(T, U)
{
static if (T.opCross)
{
T.opCross(T)
}
else static if (U.opCross)
{
U.opCross_r(T);
}
else
{
static assert(false, "Operator not applicable to operands.");
}
}

alias cross_product ×;

I'm not sure if this is possible, but it sure would please downs. :P

What's the precedence of your user-defined in-fix operator?

--bb

Yup, I realized this myself as well. Seemed like such a great idea when
I only thought of it for three seconds. :p

An operator could always be defined to have the same precedent as an
existing operator, which it has to specify.

Andrei

Oct 26 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Mon, Oct 27, 2008 at 9:04 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

What's the precedence of your user-defined in-fix operator?

--bb

Yup, I realized this myself as well. Seemed like such a great idea when I
only thought of it for three seconds. :p

An operator could always be defined to have the same precedent as an
existing operator, which it has to specify.

Walter said in a previous post a few days ago when I suggested it that
that would kill D's easy parsability.
You say no?  I'm no parser expert, so hard for me to say.

--bb

Oct 26 2008
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
On Mon, Oct 27, 2008 at 9:04 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

What's the precedence of your user-defined in-fix operator?

--bb

Yup, I realized this myself as well. Seemed like such a great idea when I
only thought of it for three seconds. :p

An operator could always be defined to have the same precedent as an
existing operator, which it has to specify.

Walter said in a previous post a few days ago when I suggested it that
that would kill D's easy parsability.
You say no?  I'm no parser expert, so hard for me to say.

It can be done, but it's kinda involved. You define a grammar in which
all operators have the same precedence. Consequently you compile any
expression into a list of operands and operators. That makes the
language parsable without semanting info. Then the semantic stage
transforms the list into a tree. Cecil does that.

Andrei

Oct 26 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Mon, Oct 27, 2008 at 11:43 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Bill Baxter wrote:
On Mon, Oct 27, 2008 at 9:04 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

What's the precedence of your user-defined in-fix operator?

--bb

Yup, I realized this myself as well. Seemed like such a great idea when
I
only thought of it for three seconds. :p

An operator could always be defined to have the same precedent as an
existing operator, which it has to specify.

Walter said in a previous post a few days ago when I suggested it that
that would kill D's easy parsability.
You say no?  I'm no parser expert, so hard for me to say.

It can be done, but it's kinda involved. You define a grammar in which all
operators have the same precedence. Consequently you compile any expression
into a list of operands and operators. That makes the language parsable
without semanting info. Then the semantic stage transforms the list into a
tree. Cecil does that.

I see.  So the price you pay is that you defer more decisions till
semantic stage.

I.e. "a b c d e" is allowed to parse into an amorphous list, then in
the semantic pass you decide if 'b' and 'd' are actually legal
operators or not.

--bb

Oct 26 2008
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
On Mon, Oct 27, 2008 at 11:43 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
Bill Baxter wrote:
On Mon, Oct 27, 2008 at 9:04 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

What's the precedence of your user-defined in-fix operator?

--bb

Yup, I realized this myself as well. Seemed like such a great idea when
I
only thought of it for three seconds. :p

An operator could always be defined to have the same precedent as an
existing operator, which it has to specify.

Walter said in a previous post a few days ago when I suggested it that
that would kill D's easy parsability.
You say no?  I'm no parser expert, so hard for me to say.

It can be done, but it's kinda involved. You define a grammar in which all
operators have the same precedence. Consequently you compile any expression
into a list of operands and operators. That makes the language parsable
without semanting info. Then the semantic stage transforms the list into a
tree. Cecil does that.

I see.  So the price you pay is that you defer more decisions till
semantic stage.

I.e. "a b c d e" is allowed to parse into an amorphous list, then in
the semantic pass you decide if 'b' and 'd' are actually legal
operators or not.

Yah. Something tells me Walter won't embark on that soon.

Andrei

Oct 26 2008
Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
Yah. Something tells me Walter won't embark on that soon.

Not a chance <g>. Producing an amorphous list of tokens isn't what I'd
call "parsing".

Oct 26 2008
Max Samukha <samukha voliacable.com.removethis> writes:
On Wed, 22 Oct 2008 17:27:58 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Please vote up before the haters take it down, and discuss:

Andrei

doesn't display the characters correctly (maybe it's time to update).
If unicode can be avoided, please avoid it.

Oct 22 2008
bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

Few random thoughts on the subject:
- Someday probably programming languages will use some Unicode symbols. I don't
know if Fortress will succeed, but I think someday some language will do.
Probably Unicode symbols will be used as in Fotress, for improve the
readability of the code, and not as in APL to transform the code into
hieroglyphics.
- Another good thing that Fortress does is that there are always *nice* looking
ways to write the same code in pure ASCII. So there are usually intuitive 2 or
3 char long translations of all the accepted Unicode symbols. This is very
positive, so you can write/read Fortress with a normal ASCII editor too.
- My editor, programming font, newsreader, IDEs, and probably more things,
currently have problems with Unicode texts.
- Novels in English and other languages show that you can express very complex
and refined thoughts with just very few characters. But you need some space to
write a novel/short story. Mathematics shows that a judicious usage of standard
and widely used symbols helps a lot in decreasing the space used to represent
formulas, etc.
- Fortress and the Mathematica language are designed for physics and
mathematics. D language can be used for that, but it's mostly a system
language. So symbols are more used and more important in Fortress than D. So
their purposes and targets are different.
- I like the idea of using *few* Unicode symbols in my programs, they can
reduce code size and they may even improve readability.
- Python3 allows Unicode identifiers, mostly to allow people in all part of the
world to write variable names in their languages.
- But seeing the disadvantages in the end I think that in practice adopting
Unicode for D programs is currently bad.

Bye,
bearophile

Oct 23 2008
Robert Fraser <fraserofthenight gmail.com> writes:
bearophile wrote:
- Python3 allows Unicode identifiers, mostly to allow people in all part of
the world to write variable names in their languages.

So does D.

Oct 23 2008
Max Samukha <samukha voliacable.com.removethis> writes:
On Thu, 23 Oct 2008 04:23:29 -0700, Robert Fraser
<fraserofthenight gmail.com> wrote:

bearophile wrote:
- Python3 allows Unicode identifiers, mostly to allow people in all part of
the world to write variable names in their languages.

So does D.

I'd like to note that identifiers in a non-English language are
considered bad style by many programmers. Besides, big part of
software projects nowadays are international. Imagine participants of
linux project writing identifiers in his language.

Oct 23 2008
Yigal Chripun <yigal100 gmail.com> writes:
Max Samukha wrote:
On Thu, 23 Oct 2008 04:23:29 -0700, Robert Fraser
<fraserofthenight gmail.com> wrote:

bearophile wrote:
- Python3 allows Unicode identifiers, mostly to allow people in
all part of the world to write variable names in their languages.

So does D.

I'd like to note that identifiers in a non-English language are
considered bad style by many programmers. Besides, big part of
software projects nowadays are international. Imagine participants of
linux project writing identifiers in his language.

isn't that something that should be decided upon on a per-project basis?
I agree that it'll be bad for Linux, but each project has its own
objectives. for example, what if you're teaching a programming course
for kids? it'll be easier for them writing in their own native language.
I could easily imagine a small start-up writing in their own native
language (let's say Hebrew) as one way for obfuscating the source code,
so as to protect their IP.
there are, I'm sure, more use-cases.

Oct 23 2008
bearophile <bearophileHUGS lycos.com> writes:
I always use English for variable names, instead of my language, because I've
had my share of debugging code with variables in other languages and it's not a
nice thing to do.

Regarding Python code, its std libs keeps identifiers in English only, but when
they have invented the OneLaptopForChild that uses Python a lot, they have
decided that 'kids' may enjoy using variable names in their language...

Bye,
bearophile

Oct 23 2008
Max Samukha <samukha voliacable.com.removethis> writes:
On Thu, 23 Oct 2008 08:33:16 -0400, bearophile
<bearophileHUGS lycos.com> wrote:

I always use English for variable names, instead of my language, because I've
had my share of debugging code with variables in other languages and it's not a
nice thing to do.

Regarding Python code, its std libs keeps identifiers in English only, but when
they have invented the OneLaptopForChild that uses Python a lot, they have
decided that 'kids' may enjoy using variable names in their language...

Bye,
bearophile

Keep children away from Python. Let them have happy lives :)

Oct 23 2008
Walter Bright <newshound1 digitalmars.com> writes:
Robert Fraser wrote:
bearophile wrote:
- Python3 allows Unicode identifiers, mostly to allow people in all
part of the world to write variable names in their languages.

So does D.

D currently allows Unicode in identifiers, comments, and strings. In
fact, D source text is defined to be Unicode.

Oct 23 2008
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
Andrei Alexandrescu:

[snip]

(No need to single me out. It's Walter's post, and besides I don't have
a formed opinion on Unicode symbols.)

Andrei

Oct 23 2008
Yigal Chripun <yigal100 gmail.com> writes:
Andrei Alexandrescu wrote:
Please vote up before the haters take it down, and discuss:

Andrei

A few thoughts on the subject:

- others already mentioned, i think, smalltalk as an example. smalltalk
bundles as part of the language also the complete environment and IDE so
in D this is an issue as D doesn't provide an "official" D editor. The
support largely exists for Unicode - even plain notepad supports Unicode
fully but that doesn't mean people are using any of the many editors
that has this feature.

- smalltalk uses left-arrow as assignment op. the way you enter it is by
typing "<_" so this is similar to Bill's suggestion, i.e. define a short
sequence of chars to be replaced by a Unicode char in the file source.

- why not generalize the concept? a few ideas: syntax is not important
here, just the idea itself..
1) bool compare as == (A a, A b) {}
you can add an op alias to your function, maybe define anonymous
function with alias to be used only as op.
2) provide a way to specify which functions can be used as infix
functions (Scala does that IIRC) and maybe even specify precedence
somehow, so that downs' map function could be written as :
infix void map(...) {}
and used as: dg map array;

Oct 23 2008
KennyTM~ <kennytm gmail.com> writes:
Andrei Alexandrescu wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

Andrei

I suggest not. There are problems if you adopt Unicode as operators:

======

1) My editor supports Unicode, but my keyboard don't. So how do I type ∩
and ∪ for a set«T»?

1.1) What if the library writer forget to provide an alternative,
ASCII-only name? [This is also a problem of using Unicode as identifier
as general.]

1.2) Some suggested auto-correction in the IDE. Again what if I used

I had suggested once before, but let me put it formally here. If you
really want to support Unicode operators in source code,

- Firstly, ditch the ability to replace \xxx with '\xxx' when it
appears without the quotes (so “char x = \n;” won't compile).
- Then, replace \xxx with the character represented in source level, so

Vector3D«real» τ = r × F;

can be written as

Vector3D!(real) \&tau; = r \&times; F;

- You don't need to introduce a separate trigraph.
- But suggestion do trigger some people's trigraph-phobia. [Yell no!
Now! :) ]
- It may make the source code difficult to parse grammatically.
- It will make the source code difficult to read, just look at the
number of semicolons in the ASCII encoded version.
- But at least you can compile your code.

======

2) This is regarding the rejection of « & » to be supported even if the
emacs module goes official. Of course it turns out it is not, but let's
think of these scenarios:

2.1) OK it turns out ∩ and ∪ and «T» where just .opUnion(x) and
.opIntersect(x) and !(T) pretty-printed in emacs; the compiler won't
accept these characters anyway. But sometimes I forgot and just copied a
portion of these code to nano/geany/whatever and then it stops compiling!

2.2) Well this copy&paste problem has been solved in the IDE level by
inverting the pretty printing while copying. But now I publish my
fantastic, pretty-printed D program in a web page/PDF/whatever, and
people just complain the compiler won't accept it!

I still believe if you're going to transform D code to Unicode visually,
the compiler must accept these visual replacement as well.

May I also take Mathematica as an example. The programming language
itself uses a heavy load of non-ASCII characters, and the IDE also
pretty-printed them as nice mathematical formulas, but in the “source
code” level they are just escape sequences. So on screen you see

E^(I π) + 1

but in the source code you'll see

E^(I \[Pi]) + 1

However, if you type in “E^(I π) + 1” in a plain .nb file and open with
the Get[] function (think of it as “import xx.d”) it can still correctly
display the result “0”.

======

3) There are over 800 unary or binary operators in Unicode[1]. How are
you going to opXXX all them? Assume your blog entry doesn't mean the
simple “!=” ↦ “≠” transformation.

Use to the C++/C# approach? But I heard that's no good.

======

4) These are regarding if you are going to support overloading for all
these 800 operators, how to define:

4.1) [Big problem] Operator precedence? (One person may want ∧ to mean
the wedge product (so they have higher precedence than + and -) but
another want it to mean logical AND (so lower than + and -).)

4.2) Associativity? How to determine if an operator is left-associative,
right-associative or both? (∧ as wedge product is both, while ∧ as a
power function pow(a,b) is right-assoc.)

4.3) [Minor problem] Commutativity? Or we'll need to write opXXX and
opXXX_r all the time?

I don't have solutions for D on these. For 4.2 & 4.3 in C# we can
introduce some attributes like

[Associative, Commutative]
FuzzyBool operator∧ (FuzzyBool x, FuzzyBool y) { return min(x,y); }

(Not actual C# code.)

but it's not D. :)

Or predefine the meaning, precedence and associativity for the each
operator, so e.g. ∧ always means the wedge product and not logical AND,
just like now ^ always means XOR and not power function.

Or just require the programmer to always put the parenthesis.

Ref: [1] A rough word count in
http://www.unicode.org/Public/math/revision-11/MathClass-11.txt. The
actual number is higher than this.

Oct 23 2008
Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
KennyTM~ wrote:

1.2) Some suggested auto-correction in the IDE. Again what if I used

Then I suggest a change in career... ^^'

--
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Oct 24 2008
"Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Thu, 23 Oct 2008 00:27:58 +0200, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Please vote up before the haters take it down, and discuss:

Andrei

I really like the idea of having more unicode in the language, but I feel
these should be fairly limited.

There are times I feel that more operators (especially, as has been
mentioned, opCross and opDotProduct) would be nice to have, but it's just
sugar, really.

As an example, while I'd enjoy seeing code like this, I'm not sure I'd
enjoy writing it (Note that I am prone to exaggerations):

int a = ∅; //empty set, same as "= void"
int[] b = [1,2,3,4,5,6];

if (a ∈ b) // Element of - "in"
{
float c = 2.00001;
writefln(c ≈ ⌈d⌉ ); // Approximately equal, ceil

myClass c = getInstance();
if (∃c) // c exists, i.e. "!is null"
{
writefln(√(c.foo)); // I thought this should work in D today, using
"alias sqrt √;", but it seems the compiler chokes on it. :(
}

∀element∈b // New foreach syntax!
{
element *= ¼;
}
}

--
Simen

Oct 23 2008
"Bill Baxter" <wbaxter gmail.com> writes:
T24gRnJpLCBPY3QgMjQsIDIwMDggYXQgNTo0OCBBTSwgU2ltZW4gS2phZXJhYXMgPHNpbWVuLmtq
YXJhc0BnbWFpbC5jb20+IHdyb3RlOgoKPiAgICB3cml0ZWZsbiiWKGMuZm9vKSk7IC8vIEkgdGhv
dWdodCB0aGlzIHNob3VsZCB3b3JrIGluIEQgdG9kYXksIHVzaW5nCj4gImFsaWFzIHNxcnQgljsi
LCBidXQgaXQgc2VlbXMgdGhlIGNvbXBpbGVyIGNob2tlcyBvbiBpdC4gOigKCkFjY29yZGluZyB0
byB0aGUgc3BlYywgeW91IGNhbiBjYW4gb25seSB1c2UgIlVuaXZlcnNhbEFscGhhIiBVbmljb2Rl
CmNoYXJhY3RlcnMgaW4geW91ciBpZGVudGlmaWVycy4gIFN1cHBvc2VkbHkgdGhvc2UgYXJlIGRl
ZmluZWQgaW4KSVNPL0lFQyA5ODk5OjE5OTkoRSkgQXBwZW5kaXggRC4gIEJ1dCBJJ20gZ3Vlc3Np
bmcgdGhlIElTTyBkaWQgbm90CmRlZmluZSBzcXVhcmUtcm9vdC1zeW1ib2wgYXMgYW4gYWxwaGEg
Y2hhcmFjdGVyLgoKLS1iYgo=

Oct 23 2008
"Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Thu, 23 Oct 2008 23:47:59 +0200, Bill Baxter <wbaxter gmail.com> wrote:

On Fri, Oct 24, 2008 at 5:48 AM, Simen Kjaeraas <simen.kjaras gmail.com>
wrote:

writefln(√(c.foo)); // I thought this should work in D today, using
"alias sqrt √;", but it seems the compiler chokes on it. :(

According to the spec, you can can only use "UniversalAlpha" Unicode
characters in your identifiers.  Supposedly those are defined in
ISO/IEC 9899:1999(E) Appendix D.  But I'm guessing the ISO did not
define square-root-symbol as an alpha character.

--bb

That seems to make sense indeed.

--
Simen

Oct 23 2008
Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Simen Kjaeraas wrote:

As an example, while I'd enjoy seeing code like this, I'm not sure I'd
enjoy writing it (Note that I am prone to exaggerations):

int a = ∅; //empty set, same as "= void"
int[] b = [1,2,3,4,5,6];

Hum, interesting example, it actually made me realize that 'null' would
be an ideal candidate for having a Unicode symbol of it's own. Does
anyone have suggestions for a possible one? Preferably somewhat
circle-shaped.

--
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Oct 24 2008
"Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Fri, 24 Oct 2008 18:52:03 +0200, Bruno Medeiros
<brunodomedeiros+spam com.gmail> wrote:

Simen Kjaeraas wrote:
As an example, while I'd enjoy seeing code like this, I'm not sure I'd
enjoy writing it (Note that I am prone to exaggerations):
int a = ∅; //empty set, same as "= void"
int[] b = [1,2,3,4,5,6];

Hum, interesting example, it actually made me realize that 'null' would
be an ideal candidate for having a Unicode symbol of it's own. Does
anyone have suggestions for a possible one? Preferably somewhat
circle-shaped.

Well, we norwegians got the Ø (html entity &Oslash;, Latin-1 character
216) - looks a lot like the empty set symbol.

--
Simen

Oct 24 2008
KennyTM~ <kennytm gmail.com> writes:
Bruno Medeiros wrote:
Simen Kjaeraas wrote:
As an example, while I'd enjoy seeing code like this, I'm not sure I'd
enjoy writing it (Note that I am prone to exaggerations):

int a = ∅; //empty set, same as "= void"
int[] b = [1,2,3,4,5,6];

Hum, interesting example, it actually made me realize that 'null' would
be an ideal candidate for having a Unicode symbol of it's own. Does
anyone have suggestions for a possible one? Preferably somewhat
circle-shaped.

auto Ø = null; // \&Oslash;

I assume you're not serious...

Oct 24 2008
Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
KennyTM~ wrote:
Bruno Medeiros wrote:
Simen Kjaeraas wrote:
As an example, while I'd enjoy seeing code like this, I'm not sure
I'd enjoy writing it (Note that I am prone to exaggerations):

int a = ∅; //empty set, same as "= void"
int[] b = [1,2,3,4,5,6];

Hum, interesting example, it actually made me realize that 'null'
would be an ideal candidate for having a Unicode symbol of it's own.
Does anyone have suggestions for a possible one? Preferably somewhat
circle-shaped.

auto Ø = null; // \&Oslash;

I assume you're not serious...

It's an interesting and effective way to save some typing, and it might
be even more readable (but with a symbol other than Ø).
But I probably would not use it anyway, since I like to write very
standardized code, that other people can easily recognize and read.

--
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Oct 26 2008
Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Andrei Alexandrescu wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

Andrei

I don't know if it would be worthwhile, but I would say there are two
aspects that likely would need to be observed for this to work out
favorably:

* Having non-unicode versions of the symbols/keywords available in
Unicode, such that non-Uunicode editing and viewing is always possible
as a fallback. This has some important consequences though, such as
making Unicode-symbol-usage unable to solve the shortage of brackets
for, for example, the template instantiation syntax (because an
alternative ASCII notation would still be necessary).

* Having a way to directly input the Unicode symbols in the keyboard.
One reason is because of typing succinctness, and another, is because I
find the alternative (have the editor/IDE automatically change an ASCII
character sequence into a Unicode symbol) to have several disadvantages:
First is that it doesn't work outside the editors/IDEs configured to do
so, (which is a bummer, there is actually plenty of code written outside
that: newsgroups, articles, forums, bug reports, IRC, etc.). Second, I
personally like that the editor always require exactly N backspaces to
erase N typed characters[*].

So, anyone knows if it is possible on Windows (I believe in Unix it is)
to configure your keyboard mapping with custom settings? For example, if
I press AltGr-O, it inputs some Unicode character of my choosing?

[*] As a sidenote, this is also why I don't like having my editor
configured to insert 4 spaces on TAB-press. Unless, the editor is also
smart enough to delete the 4 spaces on one backspace/delete and move 4
spaces on one move cursor operation (arrow key press).

--
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Oct 24 2008
"Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On Fri, 24 Oct 2008 18:28:51 +0200, Bruno Medeiros
<brunodomedeiros+spam com.gmail> wrote:

Andrei Alexandrescu wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/
Andrei

I don't know if it would be worthwhile, but I would say there are two
aspects that likely would need to be observed for this to work out
favorably:

* Having non-unicode versions of the symbols/keywords available in
Unicode, such that non-Uunicode editing and viewing is always possible
as a fallback. This has some important consequences though, such as
making Unicode-symbol-usage unable to solve the shortage of brackets
for, for example, the template instantiation syntax (because an
alternative ASCII notation would still be necessary).

* Having a way to directly input the Unicode symbols in the keyboard.
One reason is because of typing succinctness, and another, is because I
find the alternative (have the editor/IDE automatically change an ASCII
character sequence into a Unicode symbol) to have several disadvantages:
First is that it doesn't work outside the editors/IDEs configured to do
so, (which is a bummer, there is actually plenty of code written outside
that: newsgroups, articles, forums, bug reports, IRC, etc.). Second, I
personally like that the editor always require exactly N backspaces to
erase N typed characters[*].

So, anyone knows if it is possible on Windows (I believe in Unix it is)
to configure your keyboard mapping with custom settings? For example, if
I press AltGr-O, it inputs some Unicode character of my choosing?

I'd guess this oughtta do it:
http://www.microsoft.com/globaldev/tools/msklc.mspx

--
Simen

Oct 24 2008
Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
Simen Kjaeraas wrote:
On Fri, 24 Oct 2008 18:28:51 +0200, Bruno Medeiros
<brunodomedeiros+spam com.gmail> wrote:

Andrei Alexandrescu wrote:
Please vote up before the haters take it down, and discuss:
_in_d_similarly_to/
Andrei

I don't know if it would be worthwhile, but I would say there are two
aspects that likely would need to be observed for this to work out
favorably:

* Having non-unicode versions of the symbols/keywords available in
Unicode, such that non-Uunicode editing and viewing is always possible
as a fallback. This has some important consequences though, such as
making Unicode-symbol-usage unable to solve the shortage of brackets
for, for example, the template instantiation syntax (because an
alternative ASCII notation would still be necessary).

* Having a way to directly input the Unicode symbols in the keyboard.
One reason is because of typing succinctness, and another, is because
I find the alternative (have the editor/IDE automatically change an
ASCII character sequence into a Unicode symbol) to have several
disadvantages: First is that it doesn't work outside the editors/IDEs
configured to do so, (which is a bummer, there is actually plenty of
code written outside that: newsgroups, articles, forums, bug reports,
IRC, etc.). Second, I personally like that the editor always require
exactly N backspaces to erase N typed characters[*].

So, anyone knows if it is possible on Windows (I believe in Unix it
is) to configure your keyboard mapping with custom settings? For
example, if I press AltGr-O, it inputs some Unicode character of my
choosing?

I'd guess this oughtta do it:
http://www.microsoft.com/globaldev/tools/msklc.mspx

Yes, exactly that! I had the impression there was such a program for
Windows, but couldn't remember the name.

--
Bruno Medeiros - Software Developer, MSc. in CS/E graduate
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Oct 26 2008
Robert Fraser <fraserofthenight gmail.com> writes:
Simen Kjaeraas wrote:
So, anyone knows if it is possible on Windows (I believe in Unix it
is) to configure your keyboard mapping with custom settings? For
example, if I press AltGr-O, it inputs some Unicode character of my
choosing?

I'd guess this oughtta do it:
http://www.microsoft.com/globaldev/tools/msklc.mspx

I remember this same question being asked on a Microsoft DL when I was
working there, and all the answers given were for third-party tools like
KeyTweak ( http://webpages.charter.net/krumsick/ ) ;-P . Good to know
there's an MS one.

Oct 26 2008
bearophile <bearophileHUGS lycos.com> writes:
Bruno Medeiros:
* Having non-unicode versions of the symbols/keywords available in Unicode,
such that non-Uunicode editing and viewing is always possible as a fallback.
This has some important consequences though, such as making
Unicode-symbol-usage unable to solve the shortage of brackets for, for example,
the template instantiation syntax (because an alternative ASCII notation would
still be necessary).<

Fortress uses pairs of symbols to denote various sequence literarls. Some of
them can be seen in F# too, you can see some here:
http://a6systems.com/fsharpsheet.pdf

Creates the list:
let lsgen2 = [0 .. 2 .. 8]
Gives:
[0;2;4;6;8]
Note:  0 .. 2 .. 8  equals to the Python slice with stride syntax 0:8:2

Create the array:
let argen2 = [|0 .. 2 .. 8|]
Gives:
[|0;2;4;6;8|]

Creating a seq (that is lazy):
let s = seq { for i in 0 .. 10 do yield i }

F# has also algebraic types that will become very useful in D2, as it becomes
more functional (as them are useful in Scala too, that is partially functional.
F# and Scala are languages to copy from because they are
functional-procedural-OOP hybrids almost like D2 will want to become, D2 is so
far just a bit functional, Scala is more functional, F# even more, and
languages like Haskell are functional all the way), this is an Augmented
Discriminated Union:

type BinTree<'a> =
| Node of
BinTree<'a> * 'a *
BinTree<'a>
| Leaf
with member self.Depth() =
match self with
| Leaf -> 0
| Node(l, _, r) -> 1 +
l.Depth() + r.Depth()

So D2 can use collection literals similar to those ones in F# to implement
lazy/nonlazy collection generators too, this is the third iteration of my ideas
on this topic (if you think succintness in (partially) functional languages is
useless, think again. It allows to use certain things instead of falling back
to more procedural idioms):

auto flat = (abs(el) for(row: mat) for(el: row) if (el % 2)); // lazy
auto multi = [c:mulIter(c, i) for(i,c: "abcdef")]; // AA
auto squares = void[x*x for(x: 0..100)]; // set
void[int] squares = [x*x for(x: 0..100)];// set, alternative syntax
auto squares = {x*x for x in xrange(100)}; // set, alternative syntax
auto squares = {| x*x for(x: 0..100) |}; // list?
auto squares = [| x*x for(x: 0..100) |]; // multiset? something else?

Bye,
bearophile

Oct 24 2008
ore-sama <spam here.lot> writes:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

Oct 24 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8 properly?

--bb

Oct 24 2008
Sergey Gromov <snake.scaly gmail.com> writes:
Sat, 25 Oct 2008 06:43:19 +0900,
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8 properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

Oct 24 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 9:15 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
Sat, 25 Oct 2008 06:43:19 +0900,
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8 properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

--bb

Oct 24 2008
Yigal Chripun <yigal100 gmail.com> writes:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 9:15 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
Sat, 25 Oct 2008 06:43:19 +0900,
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8 properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

--bb

also, MSYS gives you all the linux tools if you really need to be shell
only.
last resort: nothing stops you from implementing your own "cat"
application in D with full Unicode support.

most if not all linux shell tools are separate executables anyway and if
any still do not support unicode it'll be trivial to roll your own

Oct 24 2008
Benji Smith <dlanguage benjismith.net> writes:
Yigal Chripun wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 9:15 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
Sat, 25 Oct 2008 06:43:19 +0900,
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8 properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

--bb

also, MSYS gives you all the linux tools if you really need to be shell
only.
last resort: nothing stops you from implementing your own "cat"
application in D with full Unicode support.

most if not all linux shell tools are separate executables anyway and if
any still do not support unicode it'll be trivial to roll your own

Oh, and one of my favorite tricks in Windows is to install cygwin
(usually at "C:\cygwin" or whatever their boneheaded installer insists
on using) and then add the bin path ("C:\cygwin\bin") to the windows PATH.

That way, I can continue using the ordinary windows shell (which I
prefer, since it doesn't force me to use the nutty directory names that
the cygwin shell uses), but I can still access all the linux commands.

Calling grep from a windows shell is the bestest!

--benji

Oct 24 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 10:31 AM, Benji Smith <dlanguage benjismith.net> wrote:
Yigal Chripun wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 9:15 AM, Sergey Gromov <snake.scaly gmail.com>
wrote:
Sat, 25 Oct 2008 06:43:19 +0900,
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why
expect features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8
properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

--bb

also, MSYS gives you all the linux tools if you really need to be shell
only.
last resort: nothing stops you from implementing your own "cat"
application in D with full Unicode support.

most if not all linux shell tools are separate executables anyway and if
any still do not support unicode it'll be trivial to roll your own

Oh, and one of my favorite tricks in Windows is to install cygwin (usually
at "C:\cygwin" or whatever their boneheaded installer insists on using) and
then add the bin path ("C:\cygwin\bin") to the windows PATH.

That way, I can continue using the ordinary windows shell (which I prefer,
since it doesn't force me to use the nutty directory names that the cygwin
shell uses), but I can still access all the linux commands.

Calling grep from a windows shell is the bestest!

But that has the same problem.  Cygtools don't understand windows
paths so barf when you say "grep c:\foo.txt"  But the Windows shell
only will only autocomplete Windows-style paths.

I've found the gnuwin32 tools to work a little better on that front.

--bb

Oct 24 2008
Benji Smith <dlanguage benjismith.net> writes:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:31 AM, Benji Smith <dlanguage benjismith.net> wrote:
Yigal Chripun wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 9:15 AM, Sergey Gromov <snake.scaly gmail.com>
wrote:
Sat, 25 Oct 2008 06:43:19 +0900,
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why
expect features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8
properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

--bb

also, MSYS gives you all the linux tools if you really need to be shell
only.
last resort: nothing stops you from implementing your own "cat"
application in D with full Unicode support.

most if not all linux shell tools are separate executables anyway and if
any still do not support unicode it'll be trivial to roll your own

Oh, and one of my favorite tricks in Windows is to install cygwin (usually
at "C:\cygwin" or whatever their boneheaded installer insists on using) and
then add the bin path ("C:\cygwin\bin") to the windows PATH.

That way, I can continue using the ordinary windows shell (which I prefer,
since it doesn't force me to use the nutty directory names that the cygwin
shell uses), but I can still access all the linux commands.

Calling grep from a windows shell is the bestest!

But that has the same problem.  Cygtools don't understand windows
paths so barf when you say "grep c:\foo.txt"  But the Windows shell
only will only autocomplete Windows-style paths.

I've found the gnuwin32 tools to work a little better on that front.

--bb

Wha???

The "grep" tool doesn't read the path. The *shell* interprets the path
and passes the text to the program. That's how all the gnu tools are
able to pipe their results from one tool to the other.

Or at least, that's how I assume it works.

Cuz I use grep like every single day. On the "cmd.exe" shell. With
windows paths.

In fact, just for you, I tested this:

grep -i "SHAZZAM" "C:\Documents and Settings\benji\Desktop\my
filename with spaces.txt"

Worked like a charm.

If the path doesn't have spaces, I have no problem with this:

grep -i "SHAZZAM" C:\file.txt

I tried it in both "command.com" and in "cmd.exe" and didn't experience
any problem in either environment.

The key is to never never never use the cygwin shell. It's a piece of
garbage. But using the executables from the "cygwin\bin" directory
within the windows shell... Priceless!

--benji

Oct 24 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 11:39 AM, Benji Smith <dlanguage benjismith.net> wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:31 AM, Benji Smith <dlanguage benjismith.net>
wrote:
Yigal Chripun wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 9:15 AM, Sergey Gromov <snake.scaly gmail.com>
wrote:
Sat, 25 Oct 2008 06:43:19 +0900,
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why
expect features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8
properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

--bb

also, MSYS gives you all the linux tools if you really need to be shell
only.
last resort: nothing stops you from implementing your own "cat"
application in D with full Unicode support.

most if not all linux shell tools are separate executables anyway and if
any still do not support unicode it'll be trivial to roll your own

Oh, and one of my favorite tricks in Windows is to install cygwin
(usually
at "C:\cygwin" or whatever their boneheaded installer insists on using)
and
then add the bin path ("C:\cygwin\bin") to the windows PATH.

That way, I can continue using the ordinary windows shell (which I
prefer,
since it doesn't force me to use the nutty directory names that the
cygwin
shell uses), but I can still access all the linux commands.

Calling grep from a windows shell is the bestest!

But that has the same problem.  Cygtools don't understand windows
paths so barf when you say "grep c:\foo.txt"  But the Windows shell
only will only autocomplete Windows-style paths.

I've found the gnuwin32 tools to work a little better on that front.

--bb

Wha???

The "grep" tool doesn't read the path. The *shell* interprets the path and
passes the text to the program. That's how all the gnu tools are able to
pipe their results from one tool to the other.

Or at least, that's how I assume it works.

Cuz I use grep like every single day. On the "cmd.exe" shell. With windows
paths.

In fact, just for you, I tested this:

grep -i "SHAZZAM" "C:\Documents and Settings\benji\Desktop\my filename
with spaces.txt"

Worked like a charm.

If the path doesn't have spaces, I have no problem with this:

grep -i "SHAZZAM" C:\file.txt

I tried it in both "command.com" and in "cmd.exe" and didn't experience any
problem in either environment.

The key is to never never never use the cygwin shell. It's a piece of
garbage. But using the executables from the "cygwin\bin" directory within
the windows shell... Priceless!

Oh, I didn't realize that.  There is one thing that doesn't work,
which is probably what gave me the impression it was broken -- Windows
paths with wildcards don't work.   Like "grep c:\Windows\*.txt".   But
you're right that it does seem to work for both windows paths, and
local wildcards, just not Windows paths with wildcards.

But that's great.  Thanks for the info.  Actually I used to put
cygwin\bin on my path years ago, but stopped doing it at some point
and switched to gnuwin32.  I was under the impression that it worked
better then, but actually I've had some trouble with gnuwin32
recently.

--bb

Oct 24 2008
Benji Smith <dlanguage benjismith.net> writes:
Bill Baxter wrote:
Benji Smith wrote:
The key is to never never never use the cygwin shell. It's a piece of
garbage. But using the executables from the "cygwin\bin" directory within
the windows shell... Priceless!

Oh, I didn't realize that.  There is one thing that doesn't work,
which is probably what gave me the impression it was broken -- Windows
paths with wildcards don't work.   Like "grep c:\Windows\*.txt".   But
you're right that it does seem to work for both windows paths, and
local wildcards, just not Windows paths with wildcards.

But that's great.  Thanks for the info.  Actually I used to put
cygwin\bin on my path years ago, but stopped doing it at some point
and switched to gnuwin32.  I was under the impression that it worked
better then, but actually I've had some trouble with gnuwin32
recently.

Glad I could be of service!

--benji

Oct 24 2008
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Benji Smith" wrote
Bill Baxter wrote:
Benji Smith wrote:
The key is to never never never use the cygwin shell. It's a piece of
garbage. But using the executables from the "cygwin\bin" directory
within
the windows shell... Priceless!

Oh, I didn't realize that.  There is one thing that doesn't work,
which is probably what gave me the impression it was broken -- Windows
paths with wildcards don't work.   Like "grep c:\Windows\*.txt".   But
you're right that it does seem to work for both windows paths, and
local wildcards, just not Windows paths with wildcards.

It's not the paths with wildcards that is the problem.  In this case, it is
the shell.  Grep is expecting the shell to expand the wildcards, as it does
on unix.

For example, you can use this old trick if ls suddenly becomes unavailable
to list all files in the current directory:

echo *

Which is all shell builtin no executables are run.

If you ran this from a windows shell you get the same error:

grep text /cygdrive/c/Windows/*.txt

The windows shell expects the application to handle wildcard expansion,
which is why windows command line programs don't always work the same way.
Every program has to build in wildcard expansion to support it.

-Steve

Oct 24 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 1:40 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
"Benji Smith" wrote
Bill Baxter wrote:
Benji Smith wrote:
The key is to never never never use the cygwin shell. It's a piece of
garbage. But using the executables from the "cygwin\bin" directory
within
the windows shell... Priceless!

Oh, I didn't realize that.  There is one thing that doesn't work,
which is probably what gave me the impression it was broken -- Windows
paths with wildcards don't work.   Like "grep c:\Windows\*.txt".   But
you're right that it does seem to work for both windows paths, and
local wildcards, just not Windows paths with wildcards.

It's not the paths with wildcards that is the problem.  In this case, it is
the shell.  Grep is expecting the shell to expand the wildcards, as it does
on unix.

"it does seem to work for both windows paths, **and local wildcards**,
just not Windows paths with wildcards".

"grep Foo *.txt"  works just fine.  "grep Foo c:\*.txt"  does not.

--bb

Oct 24 2008
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Bill Baxter" wrote
On Sat, Oct 25, 2008 at 1:40 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
"Benji Smith" wrote
Bill Baxter wrote:
Benji Smith wrote:
The key is to never never never use the cygwin shell. It's a piece of
garbage. But using the executables from the "cygwin\bin" directory
within
the windows shell... Priceless!

Oh, I didn't realize that.  There is one thing that doesn't work,
which is probably what gave me the impression it was broken -- Windows
paths with wildcards don't work.   Like "grep c:\Windows\*.txt".   But
you're right that it does seem to work for both windows paths, and
local wildcards, just not Windows paths with wildcards.

It's not the paths with wildcards that is the problem.  In this case, it
is
the shell.  Grep is expecting the shell to expand the wildcards, as it
does
on unix.

"it does seem to work for both windows paths, **and local wildcards**,
just not Windows paths with wildcards".

"grep Foo *.txt"  works just fine.  "grep Foo c:\*.txt"  does not.

Then that must be something grep is doing extra.  Or perhaps the Windows
console selectively expands wildcards?  I have no idea.  It seems weird that
grep would expand only current-directory wildcards (try grep Foo *, and see
if it works.  Windows normally only expands *.* to mean 'all files').  But
in the case of using a cygwin shell, the shell expands all wildcards before
passing arguments to grep.  That much I do know.  I haven't really had a
need to use the windows shell in a long time ;)

-Steve

Oct 24 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 2:09 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
"Bill Baxter" wrote
On Sat, Oct 25, 2008 at 1:40 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
"Benji Smith" wrote
Bill Baxter wrote:
Benji Smith wrote:
The key is to never never never use the cygwin shell. It's a piece of
garbage. But using the executables from the "cygwin\bin" directory
within
the windows shell... Priceless!

Oh, I didn't realize that.  There is one thing that doesn't work,
which is probably what gave me the impression it was broken -- Windows
paths with wildcards don't work.   Like "grep c:\Windows\*.txt".   But
you're right that it does seem to work for both windows paths, and
local wildcards, just not Windows paths with wildcards.

It's not the paths with wildcards that is the problem.  In this case, it
is
the shell.  Grep is expecting the shell to expand the wildcards, as it
does
on unix.

"it does seem to work for both windows paths, **and local wildcards**,
just not Windows paths with wildcards".

"grep Foo *.txt"  works just fine.  "grep Foo c:\*.txt"  does not.

Then that must be something grep is doing extra.

Yep, that was what I said.

Or perhaps the Windows
console selectively expands wildcards?  I have no idea.

Don't think so.   "echo *" still dutifully prints a "*" to the
console.  Cygwin grep is doing it, probably in an attempt to be more
useful when used from the DOS prompt.

It seems weird that
grep would expand only current-directory wildcards (try grep Foo *, and see
if it works.

Yep that works.

Windows normally only expands *.* to mean 'all files').

If by that you mean Windows command line programs usually expand *.*, then yeh.

But in the case of using a cygwin shell, the shell expands all wildcards before
passing arguments to grep.  That much I do know.  I haven't really had a
need to use the windows shell in a long time ;)

Yep that's true for Bash.

An easy way to tell the Windows shell does nothing is by compiling and running:

import std.stdio;
void main(string[] args) {  writefln("Args: %s", args); }

And passing it some wildcards.  It never expands anything.  Only thing
it does do is mess with quotes some.  Here's an example:

C:\> args.exe * "C:\Program Files" *.* c:\*
Args: [args,*,C:\Program Files,*.*,c:\*]

--bb

Oct 24 2008
Benji Smith <dlanguage benjismith.net> writes:
Bill Baxter wrote:
"it does seem to work for both windows paths, **and local wildcards**,
just not Windows paths with wildcards".

"grep Foo *.txt"  works just fine.  "grep Foo c:\*.txt"  does not.

Then that must be something grep is doing extra.

Yep, that was what I said.

Or perhaps the Windows
console selectively expands wildcards?  I have no idea.

Don't think so.   "echo *" still dutifully prints a "*" to the
console.  Cygwin grep is doing it, probably in an attempt to be more
useful when used from the DOS prompt.

It seems weird that
grep would expand only current-directory wildcards (try grep Foo *, and see
if it works.

Interesting.

About 90% of the time, I run grep with the "recursion" flag, so I
haven't thought about wildcard expansion in ages.

grep -R "some text" .

I do know that "wc" does wildcard expansion, even with paths, but you
have to use forward slashes. So, to count lines in D programs from the
windows shell:

wc -l /dev/*.d

Unfortunately, there's no "recursion" flag for wc, so I end up doing
something dumb like this:

wc -l /dev/*.d
wc -l /dev/*/*.d
wc -l /dev/*/*/*.d

Etc.

Hmmmmmm. I really should just compile my own wc. After all, Walter's

--benji

Oct 25 2008
"Bill Baxter" <wbaxter gmail.com> writes:
 But that has the same problem.  Cygtools don't understand windows
paths so barf when you say "grep c:\foo.txt"  But the Windows shell
only will only autocomplete Windows-style paths.

I've found the gnuwin32 tools to work a little better on that front.

Wha???

The "grep" tool doesn't read the path. The *shell* interprets the path and
passes the text to the program. That's how all the gnu tools are able to
pipe their results from one tool to the other.

Or at least, that's how I assume it works.

No, that's how it works with the Bash shell and most Unix shells, but
the Windows console doesn't do that stuff.  It's up to each app to
interpret and expand wildcards like *.txt.  So the cygwin progs must
be explicitly checking to see if they got a * from a stupid DOS
console and doing the glob themselves.  But the implementation is
apparently imperfect since it doesn't work on full DOS paths with
wildcards.

--bb

Oct 24 2008
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Benji Smith" wrote
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:31 AM, Benji Smith <dlanguage benjismith.net>
wrote:
Yigal Chripun wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 9:15 AM, Sergey Gromov <snake.scaly gmail.com>
wrote:
Sat, 25 Oct 2008 06:43:19 +0900,
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why
expect features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8
properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

--bb

also, MSYS gives you all the linux tools if you really need to be shell
only.
last resort: nothing stops you from implementing your own "cat"
application in D with full Unicode support.

most if not all linux shell tools are separate executables anyway and
if
any still do not support unicode it'll be trivial to roll your own

Oh, and one of my favorite tricks in Windows is to install cygwin
(usually
at "C:\cygwin" or whatever their boneheaded installer insists on using)
and
then add the bin path ("C:\cygwin\bin") to the windows PATH.

That way, I can continue using the ordinary windows shell (which I
prefer,
since it doesn't force me to use the nutty directory names that the
cygwin
shell uses), but I can still access all the linux commands.

Calling grep from a windows shell is the bestest!

But that has the same problem.  Cygtools don't understand windows
paths so barf when you say "grep c:\foo.txt"  But the Windows shell
only will only autocomplete Windows-style paths.

I've found the gnuwin32 tools to work a little better on that front.

--bb

Wha???

The "grep" tool doesn't read the path. The *shell* interprets the path and
passes the text to the program. That's how all the gnu tools are able to
pipe their results from one tool to the other.

Or at least, that's how I assume it works.

No, grep accepts either input.  The shell does not change paths to windows
style, that is what cygpath is for.  But it does interpret backslashes, so
you have to double all those.

So for instance, in a cygwin shell, this works also:

grep -i "SHAZZAM" C:\\Documents\ and\ Settings\\benji\\Desktop\\my\
filename\ with\ spaces.txt

The arguments are passed as they are, grep just is smart enough to use
either one.  Probably many tools are that way, I wouldn't know because I
usually do the /cygdrive/c/... form.

The key is to never never never use the cygwin shell. It's a piece of
garbage. But using the executables from the "cygwin\bin" directory within
the windows shell... Priceless!

Without the cygwin shell, you lose all bash features, like for, or backticks
to execute a command and use it's output.  The paths are a minor annoyance
IMO.  Using the cmd.exe shell is ok for simple tasks, but it pales severely
in comparison to the power of bash.

So piece of garbage it is not.  Something you don't understand how to use
properly? definitely ;)

-Steve

Oct 24 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 1:33 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
"Benji Smith" wrote
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:31 AM, Benji Smith <dlanguage benjismith.net>
wrote:
Yigal Chripun wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 9:15 AM, Sergey Gromov <snake.scaly gmail.com>
wrote:
Sat, 25 Oct 2008 06:43:19 +0900,
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why
expect features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8
properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

--bb

also, MSYS gives you all the linux tools if you really need to be shell
only.
last resort: nothing stops you from implementing your own "cat"
application in D with full Unicode support.

most if not all linux shell tools are separate executables anyway and
if
any still do not support unicode it'll be trivial to roll your own

Oh, and one of my favorite tricks in Windows is to install cygwin
(usually
at "C:\cygwin" or whatever their boneheaded installer insists on using)
and
then add the bin path ("C:\cygwin\bin") to the windows PATH.

That way, I can continue using the ordinary windows shell (which I
prefer,
since it doesn't force me to use the nutty directory names that the
cygwin
shell uses), but I can still access all the linux commands.

Calling grep from a windows shell is the bestest!

But that has the same problem.  Cygtools don't understand windows
paths so barf when you say "grep c:\foo.txt"  But the Windows shell
only will only autocomplete Windows-style paths.

I've found the gnuwin32 tools to work a little better on that front.

--bb

Wha???

The "grep" tool doesn't read the path. The *shell* interprets the path and
passes the text to the program. That's how all the gnu tools are able to
pipe their results from one tool to the other.

Or at least, that's how I assume it works.

No, grep accepts either input.  The shell does not change paths to windows
style, that is what cygpath is for.  But it does interpret backslashes, so
you have to double all those.

So for instance, in a cygwin shell, this works also:

grep -i "SHAZZAM" C:\\Documents\ and\ Settings\\benji\\Desktop\\my\
filename\ with\ spaces.txt

The arguments are passed as they are, grep just is smart enough to use
either one.  Probably many tools are that way, I wouldn't know because I
usually do the /cygdrive/c/... form.

The key is to never never never use the cygwin shell. It's a piece of
garbage. But using the executables from the "cygwin\bin" directory within
the windows shell... Priceless!

Without the cygwin shell, you lose all bash features, like for, or backticks
to execute a command and use it's output.  The paths are a minor annoyance
IMO.  Using the cmd.exe shell is ok for simple tasks, but it pales severely
in comparison to the power of bash.

So piece of garbage it is not.  Something you don't understand how to use
properly? definitely ;)

Yeh, I love the bash shell.  Really the only thing keeping me from
using it for D work is the fact that it won't auto-complete Windows
filenames.

--bb

Oct 24 2008
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Bill Baxter" wrote
On Sat, Oct 25, 2008 at 1:33 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
"Benji Smith" wrote
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:31 AM, Benji Smith
<dlanguage benjismith.net>
wrote:
Yigal Chripun wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 9:15 AM, Sergey Gromov
<snake.scaly gmail.com>
wrote:
Sat, 25 Oct 2008 06:43:19 +0900,
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"),
why
expect features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8
properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

--bb

also, MSYS gives you all the linux tools if you really need to be
shell
only.
last resort: nothing stops you from implementing your own "cat"
application in D with full Unicode support.

most if not all linux shell tools are separate executables anyway and
if
any still do not support unicode it'll be trivial to roll your own

Oh, and one of my favorite tricks in Windows is to install cygwin
(usually
at "C:\cygwin" or whatever their boneheaded installer insists on
using)
and
then add the bin path ("C:\cygwin\bin") to the windows PATH.

That way, I can continue using the ordinary windows shell (which I
prefer,
since it doesn't force me to use the nutty directory names that the
cygwin
shell uses), but I can still access all the linux commands.

Calling grep from a windows shell is the bestest!

But that has the same problem.  Cygtools don't understand windows
paths so barf when you say "grep c:\foo.txt"  But the Windows shell
only will only autocomplete Windows-style paths.

I've found the gnuwin32 tools to work a little better on that front.

--bb

Wha???

The "grep" tool doesn't read the path. The *shell* interprets the path
and
passes the text to the program. That's how all the gnu tools are able to
pipe their results from one tool to the other.

Or at least, that's how I assume it works.

No, grep accepts either input.  The shell does not change paths to
windows
style, that is what cygpath is for.  But it does interpret backslashes,
so
you have to double all those.

So for instance, in a cygwin shell, this works also:

grep -i "SHAZZAM" C:\\Documents\ and\ Settings\\benji\\Desktop\\my\
filename\ with\ spaces.txt

The arguments are passed as they are, grep just is smart enough to use
either one.  Probably many tools are that way, I wouldn't know because I
usually do the /cygdrive/c/... form.

The key is to never never never use the cygwin shell. It's a piece of
garbage. But using the executables from the "cygwin\bin" directory
within
the windows shell... Priceless!

Without the cygwin shell, you lose all bash features, like for, or
backticks
to execute a command and use it's output.  The paths are a minor
annoyance
IMO.  Using the cmd.exe shell is ok for simple tasks, but it pales
severely
in comparison to the power of bash.

So piece of garbage it is not.  Something you don't understand how to use
properly? definitely ;)

Yeh, I love the bash shell.  Really the only thing keeping me from
using it for D work is the fact that it won't auto-complete Windows
filenames.

It's ugly, but can be aliased or scripted, look into cygpath:

cygpath -w /cygdrive/c/filename.txt
outputs:

C:\filename.txt

so you can use dmd combined with cygpath:

dmd cygpath -w /cygdrive/c/path/to/d/files/*.d

It wouldn't take much to write a bash script to do this for you...

-Steve

Oct 24 2008
Benji Smith <dlanguage benjismith.net> writes:
Steven Schveighoffer wrote:
So piece of garbage it is not.  Something you don't understand how to use
properly? definitely ;)

Definitely!

I hope you'll agree that hyperbole is the best thing in the world :)

--benji

Oct 25 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 10:23 AM, Yigal Chripun <yigal100 gmail.com> wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 9:15 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
Sat, 25 Oct 2008 06:43:19 +0900,
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Anyone using a shell for Windows that works and supports UTF-8 properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

--bb

Ok what about grep and sort and uniq then?  Can notepad do that?
I have all these tools that work fine in my DOS shell.  I never use
"type".  It was simply meant as the most basic possible tool -- as in
if "type" doesn't work nothing will.

also, MSYS gives you all the linux tools if you really need to be shell
only.

I think part of the problem I had with Cygwin shell was that it can't
auto-complete dos filenames, but D programs on Windows can't accept
Cygwin paths.  So it was a pain to work with command-line tools (like
DMD itself) that take filenames.   So I don't think MSYS helps there
either.

last resort: nothing stops you from implementing your own "cat"
application in D with full Unicode support.

most if not all linux shell tools are separate executables anyway and if
any still do not support unicode it'll be trivial to roll your own


Oct 24 2008
Benji Smith <dlanguage benjismith.net> writes:
Bill Baxter wrote:
Anyone using a shell for Windows that works and supports UTF-8 properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

That's weird. My machine (WinXp Sp3) has no problem printing UTF-8 to
the console. The only special thing I did was changed the font to Lucide
Console.

--benji

Oct 24 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 10:23 AM, Benji Smith <dlanguage benjismith.net> wrote:
Bill Baxter wrote:
Anyone using a shell for Windows that works and supports UTF-8 properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

That's weird. My machine (WinXp Sp3) has no problem printing UTF-8 to the
console. The only special thing I did was changed the font to Lucide
Console.

Ok.  Thanks for the info.  Knowing that it has actually worked for at
least one person gives me motivation to try again.

--bb

Oct 24 2008
Benji Smith <dlanguage benjismith.net> writes:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:23 AM, Benji Smith <dlanguage benjismith.net> wrote:
Bill Baxter wrote:
Anyone using a shell for Windows that works and supports UTF-8 properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

That's weird. My machine (WinXp Sp3) has no problem printing UTF-8 to the
console. The only special thing I did was changed the font to Lucide
Console.

Ok.  Thanks for the info.  Knowing that it has actually worked for at
least one person gives me motivation to try again.

--bb

Write a tiny little D program and see what you get on the console:

import tango.io.Stdout;
void main() {
}

I don't know anything about the "type" command, and whether it supports
UTF-8. But the console itself ought to be able to handle it. Try
compiling the above code and see what happens.

--benji

Oct 24 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 10:37 AM, Benji Smith <dlanguage benjismith.net> wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:23 AM, Benji Smith <dlanguage benjismith.net>
wrote:
Bill Baxter wrote:
Anyone using a shell for Windows that works and supports UTF-8
properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

That's weird. My machine (WinXp Sp3) has no problem printing UTF-8 to the
console. The only special thing I did was changed the font to Lucide
Console.

Ok.  Thanks for the info.  Knowing that it has actually worked for at
least one person gives me motivation to try again.

--bb

Write a tiny little D program and see what you get on the console:

import tango.io.Stdout;
void main() {
}

I don't know anything about the "type" command, and whether it supports
UTF-8. But the console itself ought to be able to handle it. Try compiling
the above code and see what happens.

--benji

Ah, I see.  I guess more what I want to know is if I had utf-8 source
code and the D compiler spit out a message about one of the lines,
would that error message come out as garbage?  Same for ddbg -- if I'm
debugging and say "ps" for "print source" will the result be garbage.
I was thinking that "type" would be a simple test if that sort of
thing would work.

But maybe type is just borked.  I did try "cat" and "more" too I
think, with same result, though.

--bb

Oct 24 2008
Yigal Chripun <yigal100 gmail.com> writes:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:37 AM, Benji Smith <dlanguage benjismith.net> wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:23 AM, Benji Smith <dlanguage benjismith.net>
wrote:
Bill Baxter wrote:
Anyone using a shell for Windows that works and supports UTF-8
properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

That's weird. My machine (WinXp Sp3) has no problem printing UTF-8 to the
console. The only special thing I did was changed the font to Lucide
Console.

Ok.  Thanks for the info.  Knowing that it has actually worked for at
least one person gives me motivation to try again.

--bb

Write a tiny little D program and see what you get on the console:

import tango.io.Stdout;
void main() {
}

I don't know anything about the "type" command, and whether it supports
UTF-8. But the console itself ought to be able to handle it. Try compiling
the above code and see what happens.

--benji

Ah, I see.  I guess more what I want to know is if I had utf-8 source
code and the D compiler spit out a message about one of the lines,
would that error message come out as garbage?  Same for ddbg -- if I'm
debugging and say "ps" for "print source" will the result be garbage.
I was thinking that "type" would be a simple test if that sort of
thing would work.

But maybe type is just borked.  I did try "cat" and "more" too I
think, with same result, though.

--bb

Msys does autocomplete. it's not perfect but it works. the path will
look unix like though.. i.e.
/c/program files/...

from what I know (winXP sp 2) - console works for unicode Except for RTL
languages like Hebrew. as someone else already noted, this is legacy
tech which you shouldn't be using anyway. I don't know if it's fixed in
SP3 or not but the new way from MS is their powershell tool based on C#.
there are also other 3rd party stuff as well..

Oct 24 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 11:53 AM, Yigal Chripun <yigal100 gmail.com> wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:37 AM, Benji Smith <dlanguage benjismith.net> wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:23 AM, Benji Smith <dlanguage benjismith.net>
wrote:
Bill Baxter wrote:
Anyone using a shell for Windows that works and supports UTF-8
properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

That's weird. My machine (WinXp Sp3) has no problem printing UTF-8 to the
console. The only special thing I did was changed the font to Lucide
Console.

Ok.  Thanks for the info.  Knowing that it has actually worked for at
least one person gives me motivation to try again.

--bb

Write a tiny little D program and see what you get on the console:

import tango.io.Stdout;
void main() {
}

I don't know anything about the "type" command, and whether it supports
UTF-8. But the console itself ought to be able to handle it. Try compiling
the above code and see what happens.

--benji

Ah, I see.  I guess more what I want to know is if I had utf-8 source
code and the D compiler spit out a message about one of the lines,
would that error message come out as garbage?  Same for ddbg -- if I'm
debugging and say "ps" for "print source" will the result be garbage.
I was thinking that "type" would be a simple test if that sort of
thing would work.

But maybe type is just borked.  I did try "cat" and "more" too I
think, with same result, though.

--bb

Msys does autocomplete. it's not perfect but it works. the path will
look unix like though.. i.e.
/c/program files/...

Right that's what Cygwin does too, and it's useless if I want to call
the DMD compiler.

dmd foo.d /c/libs/mydlib.lib

"Error:  what do you think this is, Linux?"

from what I know (winXP sp 2) - console works for unicode Except for RTL
languages like Hebrew. as someone else already noted, this is legacy
tech which you shouldn't be using anyway. I don't know if it's fixed in
SP3 or not but the new way from MS is their powershell tool based on C#.
there are also other 3rd party stuff as well..

Yeh, i've heard of that.  Do you (or anyone) have any actual
experience with PowerShell?  It doesn't seem to be standard equipment
on my new Vista box even.  Does it require a separate download?
Strange if it really is supposed to be "the new way".

--bb

Oct 24 2008
Robert Fraser <fraserofthenight gmail.com> writes:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 11:53 AM, Yigal Chripun <yigal100 gmail.com> wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:37 AM, Benji Smith <dlanguage benjismith.net> wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:23 AM, Benji Smith <dlanguage benjismith.net>
wrote:
Bill Baxter wrote:
Anyone using a shell for Windows that works and supports UTF-8
properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

That's weird. My machine (WinXp Sp3) has no problem printing UTF-8 to the
console. The only special thing I did was changed the font to Lucide
Console.

Ok.  Thanks for the info.  Knowing that it has actually worked for at
least one person gives me motivation to try again.

--bb

Write a tiny little D program and see what you get on the console:

import tango.io.Stdout;
void main() {
}

I don't know anything about the "type" command, and whether it supports
UTF-8. But the console itself ought to be able to handle it. Try compiling
the above code and see what happens.

--benji

Ah, I see.  I guess more what I want to know is if I had utf-8 source
code and the D compiler spit out a message about one of the lines,
would that error message come out as garbage?  Same for ddbg -- if I'm
debugging and say "ps" for "print source" will the result be garbage.
I was thinking that "type" would be a simple test if that sort of
thing would work.

But maybe type is just borked.  I did try "cat" and "more" too I
think, with same result, though.

--bb

Msys does autocomplete. it's not perfect but it works. the path will
look unix like though.. i.e.
/c/program files/...

Right that's what Cygwin does too, and it's useless if I want to call
the DMD compiler.

dmd foo.d /c/libs/mydlib.lib

"Error:  what do you think this is, Linux?"

from what I know (winXP sp 2) - console works for unicode Except for RTL
languages like Hebrew. as someone else already noted, this is legacy
tech which you shouldn't be using anyway. I don't know if it's fixed in
SP3 or not but the new way from MS is their powershell tool based on C#.
there are also other 3rd party stuff as well..

Yeh, i've heard of that.  Do you (or anyone) have any actual
experience with PowerShell?  It doesn't seem to be standard equipment
on my new Vista box even.  Does it require a separate download?
Strange if it really is supposed to be "the new way".

--bb

PowerShell is MS's concession that there are things better done in a
console environment, especially for developers & powerusers. And, yes,
it works very well (I'm a fan...). It also contains aliases for all the
GNU tools (i.e. ls => dir, etc.).

It doesn't come as the default on most OSes simply because Microsoft
doesn't expect the average home user to need it. It does come default on
Windows Server 2008, because Microsoft expects it to be a useful utility

Oct 25 2008
Sergey Gromov <snake.scaly gmail.com> writes:
Bill Baxter пишет:
On Sat, Oct 25, 2008 at 10:37 AM, Benji Smith <dlanguage benjismith.net> wrote:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 10:23 AM, Benji Smith <dlanguage benjismith.net>
wrote:
Bill Baxter wrote:
Anyone using a shell for Windows that works and supports UTF-8
properly?

A regular Windows console supports UTF-8 to some extent:

* Change console font to Lucida Console
* issue "chcp 65001"

You can even get more fonts into there with a bit of hackery.

I did that but "type <filewith-utf8.txt>"  still prints garbage.

That's weird. My machine (WinXp Sp3) has no problem printing UTF-8 to the
console. The only special thing I did was changed the font to Lucide
Console.

Ok.  Thanks for the info.  Knowing that it has actually worked for at
least one person gives me motivation to try again.

--bb

Write a tiny little D program and see what you get on the console:

import tango.io.Stdout;
void main() {
}

I don't know anything about the "type" command, and whether it supports
UTF-8. But the console itself ought to be able to handle it. Try compiling
the above code and see what happens.

--benji

Ah, I see.  I guess more what I want to know is if I had utf-8 source
code and the D compiler spit out a message about one of the lines,
would that error message come out as garbage?  Same for ddbg -- if I'm
debugging and say "ps" for "print source" will the result be garbage.
I was thinking that "type" would be a simple test if that sort of
thing would work.

But maybe type is just borked.  I did try "cat" and "more" too I
think, with same result, though.

They all work for me: type, cat, less.  The file is UTF-8 with BOM.
Error messages are printed correctly displaying all the characters in a
buggy symbol.

But now I remember.  It fails to execute any  batch files when it's in
65001 codepage.  More precisely, it executes exactly one line from a
batch file like if there were no more lines.  So this pseudo-uniclde
mode is useless.

Oct 27 2008
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Bill Baxter" wrote
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Any text-based program uses the same Windows console (unless it's a GUI
application, and it uses controls to create a text box, etc).  Including
cygwin shell.

To say it's a legacy technology is like saying Linux is a legacy technology
because it's command line based.  It's a false experience promoted by
Microsoft to try and spread FUD about OSes that mainly support command line
tools, like Linux.  But command line tools are extremely useful and
powerful, much easier to develop, and IMO easier to use.  For instance, if
you want to find all files that contain a certain text, grep -R text / and
you're done.  On windows it's 'click the start menu, select search, wait for
the search window to pop up, click on the dog, etc'.  Freaking annoying if

Anyone using a shell for Windows that works and supports UTF-8 properly?

I would guess it should work properly, most everything in windows supports
unicode.  Perhaps you have some configuration setting not set properly?  I'd
suggest searching msdn.

-Steve

Oct 24 2008
Yigal Chripun <yigal100 gmail.com> writes:
Steven Schveighoffer wrote:
"Bill Baxter" wrote
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Any text-based program uses the same Windows console (unless it's a GUI
application, and it uses controls to create a text box, etc).  Including
cygwin shell.

To say it's a legacy technology is like saying Linux is a legacy technology
because it's command line based.  It's a false experience promoted by
Microsoft to try and spread FUD about OSes that mainly support command line
tools, like Linux.  But command line tools are extremely useful and
powerful, much easier to develop, and IMO easier to use.  For instance, if
you want to find all files that contain a certain text, grep -R text / and
you're done.  On windows it's 'click the start menu, select search, wait for
the search window to pop up, click on the dog, etc'.  Freaking annoying if

Anyone using a shell for Windows that works and supports UTF-8 properly?

I would guess it should work properly, most everything in windows supports
unicode.  Perhaps you have some configuration setting not set properly?  I'd
suggest searching msdn.

-Steve

windows console AKA DOS Box *is* in fact legacy technology. It is
replaced by MS Powershell which is based on C#. they actually took many
ideas from Linux and incorporated in it.

Also, it doesn't have to be either/or situation regarding CLI vs GUI.
There's Apple's quicksilver (IIRC the name) which is a gui app with CLI
like interface. it has the best from both worlds. PowerShell is GUI
based as well. IMO, CLI should be provided as just a widget in the GUI
world and not a separate entity.

Oct 25 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 25, 2008 at 8:57 PM, Yigal Chripun <yigal100 gmail.com> wrote:
Steven Schveighoffer wrote:
"Bill Baxter" wrote
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Any text-based program uses the same Windows console (unless it's a GUI
application, and it uses controls to create a text box, etc).  Including
cygwin shell.

To say it's a legacy technology is like saying Linux is a legacy technology
because it's command line based.  It's a false experience promoted by
Microsoft to try and spread FUD about OSes that mainly support command line
tools, like Linux.  But command line tools are extremely useful and
powerful, much easier to develop, and IMO easier to use.  For instance, if
you want to find all files that contain a certain text, grep -R text / and
you're done.  On windows it's 'click the start menu, select search, wait for
the search window to pop up, click on the dog, etc'.  Freaking annoying if

Anyone using a shell for Windows that works and supports UTF-8 properly?

I would guess it should work properly, most everything in windows supports
unicode.  Perhaps you have some configuration setting not set properly?  I'd
suggest searching msdn.

-Steve

PowerShell is GUI based as well.

suspect.  What makes you say it's GUI based?  It has the exact same
decorations and goofy menu options as a regular non-GUI Windows
console.  If it were really a GUI, I doubt they would go through the
extra programming effort required to make it look *exactly* like a
console app.

--bb

Oct 25 2008
"Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Bill Baxter" wrote
On Sat, Oct 25, 2008 at 8:57 PM, Yigal Chripun <yigal100 gmail.com> wrote:
Steven Schveighoffer wrote:
"Bill Baxter" wrote
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why
expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Any text-based program uses the same Windows console (unless it's a GUI
application, and it uses controls to create a text box, etc).  Including
cygwin shell.

To say it's a legacy technology is like saying Linux is a legacy
technology
because it's command line based.  It's a false experience promoted by
Microsoft to try and spread FUD about OSes that mainly support command
line
tools, like Linux.  But command line tools are extremely useful and
powerful, much easier to develop, and IMO easier to use.  For instance,
if
you want to find all files that contain a certain text, grep -R text /
and
you're done.  On windows it's 'click the start menu, select search, wait
for
the search window to pop up, click on the dog, etc'.  Freaking annoying
if

Anyone using a shell for Windows that works and supports UTF-8
properly?

I would guess it should work properly, most everything in windows
supports
unicode.  Perhaps you have some configuration setting not set properly?
I'd
suggest searching msdn.

-Steve

PowerShell is GUI based as well.

suspect.  What makes you say it's GUI based?  It has the exact same
decorations and goofy menu options as a regular non-GUI Windows
console.  If it were really a GUI, I doubt they would go through the
extra programming effort required to make it look *exactly* like a
console app.

I've never used powershell, but most likely you are correct.  I think there
is a confusion of terms here.

Windows Console is the GUI that comes up with the black window, and displays
text.  It serves as a terminal, not a shell.  This is not 'old' technology,
it's just an integral piece of the OS.

cmd.exe is the command interpreter, which is definitely crappy technology
(and somewhat old).

The responsible party for displaying UTF properly is the console, not the
shell.

-Steve

Oct 25 2008
ore-sama <spam here.lot> writes:
Steven Schveighoffer Wrote:

The responsible party for displaying UTF properly is the console, not the
shell.

One important feature of legacy technology is it must not change for
compatibility with legacy code, stdout is just an oblique pipe and one has no
means to specify text encoding and legacy applications write OCP-encoded text
to stdout, that's why console expects OCP output and breaking this convention
will break legacy applications, piping etc, etc. BTW, cmd.exe can in fact
produce utf-16 output.

Oct 26 2008
Robert Fraser <fraserofthenight gmail.com> writes:
Bill Baxter wrote:
Yigal Chripun wrote:
PowerShell is GUI based as well.

suspect.  What makes you say it's GUI based?  It has the exact same
decorations and goofy menu options as a regular non-GUI Windows
console.  If it were really a GUI, I doubt they would go through the
extra programming effort required to make it look *exactly* like a
console app.

--bb

It uses the same console application to do the displaying/execution.
And, yes, this application sucks (ever done any serious copy/paste in it?)

There's PoshConsole ( http://www.codeplex.com/PoshConsole ), but that
TODO list is a bit extensive ;-P. Hopefully by Win7 time, the Windows
group gets around to fixing the console, but that's like hoping they'll

Oct 25 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Sun, Oct 26, 2008 at 9:18 AM, Robert Fraser
<fraserofthenight gmail.com> wrote:
Bill Baxter wrote:
Yigal Chripun wrote:
PowerShell is GUI based as well.

suspect.  What makes you say it's GUI based?  It has the exact same
decorations and goofy menu options as a regular non-GUI Windows
console.  If it were really a GUI, I doubt they would go through the
extra programming effort required to make it look *exactly* like a
console app.

--bb

It uses the same console application to do the displaying/execution. And,
yes, this application sucks (ever done any serious copy/paste in it?)

There's PoshConsole ( http://www.codeplex.com/PoshConsole ), but that TODO
list is a bit extensive ;-P. Hopefully by Win7 time, the Windows group gets
around to fixing the console, but that's like hoping they'll fix Paint or

I'm using "Console2" as my facade on the console window.
Works pretty nicely.
http://sourceforge.net/projects/console/

--bb

Oct 25 2008
KennyTM~ <kennytm gmail.com> writes:
Robert Fraser wrote:
Bill Baxter wrote:
Yigal Chripun wrote:
PowerShell is GUI based as well.

suspect.  What makes you say it's GUI based?  It has the exact same
decorations and goofy menu options as a regular non-GUI Windows
console.  If it were really a GUI, I doubt they would go through the
extra programming effort required to make it look *exactly* like a
console app.

--bb

It uses the same console application to do the displaying/execution.
And, yes, this application sucks (ever done any serious copy/paste in it?)

There's PoshConsole ( http://www.codeplex.com/PoshConsole ), but that
TODO list is a bit extensive ;-P. Hopefully by Win7 time, the Windows
group gets around to fixing the console, but that's like hoping they'll

Hey, they do have fixed MSPaint and WordPad! :)

Oct 25 2008
torhu <no spam.invalid> writes:
Robert Fraser wrote:
It uses the same console application to do the displaying/execution.
And, yes, this application sucks (ever done any serious copy/paste in it?)

That works fine for me if I enable Quick edit mode in the options.  Then
the right mouse button will do both copy and paste.

Oct 26 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Mon, Oct 27, 2008 at 1:51 AM, torhu <no spam.invalid> wrote:
Robert Fraser wrote:
It uses the same console application to do the displaying/execution. And,
yes, this application sucks (ever done any serious copy/paste in it?)

That works fine for me if I enable Quick edit mode in the options.  Then the
right mouse button will do both copy and paste.

Except it only does block-oriented rectangular selection, which is odd
for something that is primarily line-oriented.

--bb

Oct 26 2008
torhu <no spam.invalid> writes:
Bill Baxter wrote:
On Mon, Oct 27, 2008 at 1:51 AM, torhu <no spam.invalid> wrote:
Robert Fraser wrote:
It uses the same console application to do the displaying/execution. And,
yes, this application sucks (ever done any serious copy/paste in it?)

That works fine for me if I enable Quick edit mode in the options.  Then the
right mouse button will do both copy and paste.

Except it only does block-oriented rectangular selection, which is odd
for something that is primarily line-oriented.

Yeah, that's true.  Pretty stupid.

Oct 26 2008
Robert Fraser <fraserofthenight gmail.com> writes:
torhu wrote:
Bill Baxter wrote:
On Mon, Oct 27, 2008 at 1:51 AM, torhu <no spam.invalid> wrote:
Robert Fraser wrote:
It uses the same console application to do the displaying/execution.
And,
yes, this application sucks (ever done any serious copy/paste in it?)

That works fine for me if I enable Quick edit mode in the options.
Then the
right mouse button will do both copy and paste.

Except it only does block-oriented rectangular selection, which is odd
for something that is primarily line-oriented.

Yeah, that's true.  Pretty stupid.

My main problem is that you can't do it just with the keyboard, which is
my standard method. I also take issue with the fact you can't copy more
than is visible on a single screen, which goes along with the block
selection mode.

Oct 26 2008
"Bill Baxter" <wbaxter gmail.com> writes:
On Mon, Oct 27, 2008 at 1:52 PM, Robert Fraser
<fraserofthenight gmail.com> wrote:
torhu wrote:
Bill Baxter wrote:
On Mon, Oct 27, 2008 at 1:51 AM, torhu <no spam.invalid> wrote:
Robert Fraser wrote:
It uses the same console application to do the displaying/execution.
And,
yes, this application sucks (ever done any serious copy/paste in it?)

That works fine for me if I enable Quick edit mode in the options.  Then
the
right mouse button will do both copy and paste.

Except it only does block-oriented rectangular selection, which is odd
for something that is primarily line-oriented.

Yeah, that's true.  Pretty stupid.

My main problem is that you can't do it just with the keyboard, which is my
standard method. I also take issue with the fact you can't copy more than is
visible on a single screen, which goes along with the block selection mode.

By the way I tried running powershell as a tab inside the Console2
prog I mentioned before and it does work fine.

--bb

Oct 26 2008
Yigal Chripun <yigal100 gmail.com> writes:
Bill Baxter wrote:
On Sat, Oct 25, 2008 at 8:57 PM, Yigal Chripun <yigal100 gmail.com> wrote:
Steven Schveighoffer wrote:
"Bill Baxter" wrote
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Any text-based program uses the same Windows console (unless it's a GUI
application, and it uses controls to create a text box, etc).  Including
cygwin shell.

To say it's a legacy technology is like saying Linux is a legacy technology
because it's command line based.  It's a false experience promoted by
Microsoft to try and spread FUD about OSes that mainly support command line
tools, like Linux.  But command line tools are extremely useful and
powerful, much easier to develop, and IMO easier to use.  For instance, if
you want to find all files that contain a certain text, grep -R text / and
you're done.  On windows it's 'click the start menu, select search, wait for
the search window to pop up, click on the dog, etc'.  Freaking annoying if

Anyone using a shell for Windows that works and supports UTF-8 properly?

I would guess it should work properly, most everything in windows supports
unicode.  Perhaps you have some configuration setting not set properly?  I'd
suggest searching msdn.

-Steve

PowerShell is GUI based as well.

suspect.  What makes you say it's GUI based?  It has the exact same
decorations and goofy menu options as a regular non-GUI Windows
console.  If it were really a GUI, I doubt they would go through the
extra programming effort required to make it look *exactly* like a
console app.

--bb

I've just checked (it's been a long time since I used it) and you're
correct. I don't know Why I remembered it as being GUI based, maybe the
blue color threw me off..sorry for the confusion. but I'm sure that
there are 3rd party GUI based shells for Windows.

Oct 27 2008
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Yigal Chripun wrote:
Steven Schveighoffer wrote:
"Bill Baxter" wrote
On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

Any text-based program uses the same Windows console (unless it's a GUI
application, and it uses controls to create a text box, etc).  Including
cygwin shell.

To say it's a legacy technology is like saying Linux is a legacy technology
because it's command line based.  It's a false experience promoted by
Microsoft to try and spread FUD about OSes that mainly support command line
tools, like Linux.  But command line tools are extremely useful and
powerful, much easier to develop, and IMO easier to use.  For instance, if
you want to find all files that contain a certain text, grep -R text / and
you're done.  On windows it's 'click the start menu, select search, wait for
the search window to pop up, click on the dog, etc'.  Freaking annoying if

Anyone using a shell for Windows that works and supports UTF-8 properly?

I would guess it should work properly, most everything in windows supports
unicode.  Perhaps you have some configuration setting not set properly?  I'd
suggest searching msdn.

-Steve

windows console AKA DOS Box *is* in fact legacy technology. It is
replaced by MS Powershell which is based on C#. they actually took many
ideas from Linux and incorporated in it.

Windows has gotten a lot better in the recent times - ever since it
finally started to imitate Unix :o).

Also, it doesn't have to be either/or situation regarding CLI vs GUI.
There's Apple's quicksilver (IIRC the name) which is a gui app with CLI
like interface. it has the best from both worlds. PowerShell is GUI
based as well. IMO, CLI should be provided as just a widget in the GUI
world and not a separate entity.

I'm not sure I understand. Widget in the GUI = a window with text in it
living side by side, or embedded with, graphical windows? That's been
the case for a long time.

Andrei

Oct 25 2008
ore-sama <spam here.lot> writes:
Bill Baxter Wrote:

On Sat, Oct 25, 2008 at 6:37 AM, ore-sama <spam here.lot> wrote:
Bill Baxter Wrote:

(like I haven't been able to figure out how to get the
DOS console in Windows to display UTF-8)

Console is a legacy technology (you even still call it "DOS"), why expect
features from it?

So tell me what the alternative is?  I had trouble with running D
tools from a Cygwin shell.  Can't remember if I tried MSYS or not.

gui of course. MSYS's console is gui in fact.

Oct 25 2008
ore-sama <spam here.lot> writes:
Bill Baxter Wrote:

import std.stdio;
void main(string[] args) {  writefln("Args: %s", args); }

And passing it some wildcards.  It never expands anything.  Only thing
it does do is mess with quotes some.  Here's an example:

C:\> args.exe * "C:\Program Files" *.* c:\*
Args: [args,*,C:\Program Files,*.*,c:\*]

It's not windows, it's program's standard startup module gets command line with
GetCommandLine() and parses it into string[] args.

Oct 25 2008
ore-sama <spam here.lot> writes:
Bill Baxter Wrote:

I did that but "type <filewith-utf8.txt>"  still prints garbage.

--bb

if application prints garbage, this indicates that it's implemented incorrectly
or it's not encodings-aware. Correctly implemented application should transcode
text to OCP before printing to console. This is what std.stdio.writef is
supposed to do.

Oct 25 2008
Kevin Bealer <kevinbealer gmail.com> writes:
Andrei Alexandrescu Wrote:

Please vote up before the haters take it down, and discuss:

Andrei

I think this is a bad idea -- there are a lot of places that don't use Unicode
or don't support 8 bit clean
translation, and the operators in question would be a pain to use every time
they were needed, since
there is no obvious way to type them.  And I don't just mean organizations that
drag their feet, but also special cases within every new technology that have
these blind spots.  Does your cell phone web browser correctly display these
symbols?  Does the program "less" display these correctly?  If you
think it's just a matter of time, maybe, but consider that IBM still uses
EBCDIC internally in mainframes.

A lot of languages using only punctuation based syntax are already hard to read
because of it, e.g. Perl can be very hard to read in some cases.  Using the
word "and" would make a lot of languages easier to read than using "&&".  The
standardized meanings should be kept, but I would favor something like $( stuff )$, $[ more stuff ]$ and so on rather than using special unicode tokens.

We're not using "#" and "$" effectively, I would favor something like "$" to
modify bracket usage and
"#text" to indicate special symbols as an extension of the #line and #function
directives.  If ".operation" is good enough for every method call, then why
make \$ or # or   into extension points,
rather than importing thousands of individual extension operators that are only

Kevin

Oct 25 2008
Alix Pexton <alixD.TpextonNO SPAMgmailD.Tcom> writes:
Andrei Alexandrescu wrote:
Please vote up before the haters take it down, and discuss:

_in_d_similarly_to/

Andrei

I've been following this thread without really having an opinion to
offer, but I just had a thought...

We already know that D's CTFE and templates can be used together to
parse DSLs (matrix ops, regular expressions and IIRC Scheme too) and
turn them into optimal native code. That suggests to me that it is
already possible to write D code that can turn an expression written in
established mathematic/scientific notation (complete with unicode
symbols) into either conventional D code, or machine code.

What I am not sure of is whether is would be possible to make it general
enough to work with all mathmatical dialects (I seem to remember some
overlapping in ways that might be problematic). A complete solution
would have to be able to define new operatos (including thier
associativity and precidence) in such a way that they can be looked up
by the templates that evaluate the expresion.

Another related thought I had: Would it be possible to write a
compile-time parser that turned MathML into code? I'm not even sure if
MathML is structured enough to represent the undelying meaning of an
expression rather than just its graphical form. Perhaps it would be more
interesting to write the code that did the tranformation in the opposite
direction, turning expressions written in D into MathML ^^

A...

Oct 26 2008