www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - national language support

reply novice <novice_member pathlink.com> writes:
Hi.
Can i "switch off" utf8 support in dmd compiler?

My localized Windows (it's russian language, but IMHO it like to many other
europe languages) have no utf8 support. I use (and IMHO other europe users)
8-bit code page. Lower 128 symbols is ASCII. High 128 symbols is national
symbols.
But dmd want utf8 everywhere. So no comments in russain, no string constants in
russian - "invalid UTF-8 sequence" compiler error.
I never see programming language in windows with such restrictions before D :(
C, Delphi, perl - not need utf8 or unicode16 editor.
And such editors in windows is rare.

May be i don't understand something? Some D compiler options?
Sep 30 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cjgqiq$2ae3$1 digitaldaemon.com>, novice says...
Hi.
Can i "switch off" utf8 support in dmd compiler?

No. And beleive me - you don't want to.
My localized Windows (it's russian language, but IMHO it like to many other
europe languages) have no utf8 support. I use (and IMHO other europe users)
8-bit code page. Lower 128 symbols is ASCII. High 128 symbols is national
symbols.

Your local codepage is not relevant to D.
But dmd want utf8 everywhere.

True. Or UTF-16, or UTF-32.
So no comments in russain,

Not true. By its very nature, UTF-8 allows comments in Russian. It also allows comments in Greek, Arabic, Hebrew, Chinese, Japanese, and - well - /everything/.
no string constants in
russian 

Not true. Same answer as above. - "invalid UTF-8 sequence" compiler error. Your error report is genuine. You must save your D source files in UTF-8, UTF-16 or UTF-32 before compiling them. If you do this, you can insert all international characters directly into your source code. The trick is this - when you save your source files, select "Save As", instead of "Save". Then find the pull-down menu for "Encoding". Select "UTF-8". Your compile-time errors will then go away.
I never see programming language in windows with such restrictions before D :(

It's not a restriction, it's a liberation. The 8-bit code with which you are familiar will run correctly /only/ for users sharing your Windows code page. The equivalent D program will work for everyone, worldwide, regardless of their code page.
C, Delphi, perl - not need utf8 or unicode16 editor.
And such editors in windows is rare.

Also not true. Virtually every Windows text editor that exists is capable of saving text in UTF-8. Even Microsoft Notepad can do this. Pretty much all programmers text editors (e.g. TextPad, jEdit, UltraEdit, EmEditor, ...) can do this.
May be i don't understand something? Some D compiler options?

I think the thing you haven't understood is how wonderful Unicode is, and why D supports it in a way that C doesn't. With D, you just insert your international characters directly into the source code, as save as UTF-8. That source file will then read (and compile) the same for everyone, worldwide. Dependency on locale is gone. Although the concepts may take a little getting used to, beleive me - this is a good thing. Arcane Jill
Sep 30 2004
parent reply novice <novice_member pathlink.com> writes:
Thanks, Arcane Jill

beleive me - this is a good thing.

Hmm.. Yes, you are right. (But goodby my favorite editor) Sorry for crossposting in two themes.
Sep 30 2004
next sibling parent Stephan Wienczny <Stephan Wienczny.de> writes:
novice wrote:
 Thanks, Arcane Jill
 
 
beleive me - this is a good thing.

Hmm.. Yes, you are right. (But goodby my favorite editor) Sorry for crossposting in two themes.

You could ask the vendor of "my favorite editor" to support UTF!?! Stephan
Sep 30 2004
prev sibling parent reply David L. Davis <SpottedTiger yahoo.com> writes:
In article <cjgtf6$2c1o$1 digitaldaemon.com>, novice says...
Thanks, Arcane Jill

beleive me - this is a good thing.

Hmm.. Yes, you are right. (But goodby my favorite editor) Sorry for crossposting in two themes.

novice: I'm in the same boat...I'm gotta to say farewell to my favorite editor as well! :( But the good news is...I found a pretty good replacement for it today, that I'd like to share with you. ;) Crimson Editor (a Free "Professional Source Editor") http://www.crimsoneditor.com/english/ 1) Encodings: - ASCI - Unicode Little Endian - Unicode Big Endian - UTF-8 with BOM - UTF-8 without BOM 2) Code Syntax-Highlighting for D 3) a Tabbed Multi-Document Interface 4) Toggleable Side Line-Numbers 5) File Formats for: - DOS/Windows - Mac - UNIX --------------------- I've been checking it out, and it looks and operates rather cleanly. David L. ------------------------------------------------------------------- "Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
Sep 30 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cjiiou$2sc1$1 digitaldaemon.com>, David L. Davis says...

novice: I'm in the same boat...I'm gotta to say farewell to my favorite editor
as well! :(

But the good news is...I found a pretty good replacement for it today, that I'd
like to share with you. ;)

Crimson Editor (a Free "Professional Source Editor")
http://www.crimsoneditor.com/english/

1) Encodings:
- ASCI
- Unicode Little Endian
- Unicode Big Endian
- UTF-8 with BOM
- UTF-8 without BOM

Just for the sake of sheer pedantry, I'd like to point out that Windows misnames encodings. I'm guessing that "ASCI" was probably a typo for "ANSI" - it means the default local encoding of your PC, and it is /misnamed/, because of course Microsoft's code pages are _not_ ANSI standards. (I believe Microsoft applied, and got rejected). The encodings named "Unicode Little Endian" and "Unicode Big Endian" are also misnamed, and should in fact be "UTF-16LE" and "UTF-16BE". Again, that's Microsoft getting it wrong. (Windows was designed in the days when Unicode was only 16 bits wide). Unfortunately, a lot of Windows applications use Microsoft's names. Arcane Jill
Oct 01 2004
parent Arcane Jill <Arcane_member pathlink.com> writes:
In article <cjivrk$2el$1 digitaldaemon.com>, Arcane Jill says...

Crimson Editor (a Free "Professional Source Editor")
http://www.crimsoneditor.com/english/

1) Encodings:
- ASCI
- Unicode Little Endian
- Unicode Big Endian
- UTF-8 with BOM
- UTF-8 without BOM


I just installed crimson editor to check it out. The first named encoding is actually "ASCII" (not "ANSI", which is what I'd suspected). It is still misnamed, however. I just tried saving a text file containing a Euro currency sign as ASCII using Crimson Editor -- and it succeeded! Examination of the saved file with a binary editor revealed that the saved file contained the single byte 0x80 - in other words, the true encoding was WINDOWS-1252, not ASCII. I assume that this misnamed encoding is /actually/ your PC's default encoding, whatever that happens to be - same as "ANSI" on other editors. "Default" would be a much more accurate name in both cases. Don't let that put you off though - Crimson seems like a good editor. Arcane Jill
Oct 01 2004