digitalmars.D.bugs - [Issue 3193] New: Wrong processing by DMD.exe of Russian Windows-1251 character set: "invalid UTF-8 sequence"
- d-bugmail puremagic.com (30/30) Jul 20 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (13/13) Jul 20 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (8/8) Jul 20 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (9/9) Jul 20 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (9/9) Jul 20 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (9/9) Jul 20 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (23/23) Jul 20 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (16/16) Jul 20 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (11/11) Jul 21 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (7/9) Jul 21 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (10/10) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (20/25) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (12/12) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (19/21) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (8/8) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (8/8) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (9/9) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (10/10) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (10/10) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (11/11) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (10/10) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (16/21) Jul 22 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (12/12) Jul 24 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (11/11) Jul 27 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (9/12) Jul 27 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (7/7) Jul 28 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
- d-bugmail puremagic.com (11/13) Jul 28 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3193
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Summary: Wrong processing by DMD.exe of Russian Windows-1251 character set: "invalid UTF-8 sequence" Product: D Version: unspecified Platform: x86 OS/Version: Windows Status: NEW Keywords: diagnostic, wrong-code Severity: critical Priority: P2 Component: DMD AssignedTo: nobody puremagic.com ReportedBy: ok96 mail.ru If you compile hello.d example with Russian Win1251 charecters in this line: printf("Привет, D!\n"); dmd.exe reports an error: D:\Apps\Prog_D\dmd\samples\d>dmd hello.d hello.d(5): invalid UTF-8 sequence hello.d(5): invalid UTF-8 sequence hello.d(5): invalid UTF-8 sequence hello.d(5): invalid UTF-8 sequence hello.d(5): invalid UTF-8 sequence hello.d(5): invalid UTF-8 sequence If you save hello.d in UTF-8, then anyway dmd.exe compiles it wrong (see http link). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 20 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Jarrett Billingsley <jarrett.billingsley gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jarrett.billingsley gmail.c | |om 2009-07-20 06:09:18 PDT --- The compiler does not understand Windows-1251, so this is according to spec. However, you say the compiler compiles it wrong if it's in UTF-8; where's the link? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 20 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Created an attachment (id=428) --> (http://d.puremagic.com/issues/attachment.cgi?id=428) This screenshot is from Chris Miller -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 20 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 2009-07-20 07:52:38 PDT --- Sorry, this is invalid. To solve this, you have to do the following: 1) Set cmd.exe's font to Lucida Console. 2) Execute 'chcp 65001'. Then run your program. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 20 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 2009-07-20 07:53:22 PDT --- Created an attachment (id=429) --> (http://d.puremagic.com/issues/attachment.cgi?id=429) Correct Russian output Here's an image that shows it working properly. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 20 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Matti Niemenmaa <matti.niemenmaa+dbugzilla iki.fi> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 20 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Oleg Halzov <ok96 mail.ru> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | But Jarrett, almost everybody who codes in Russian needs Windows-1251 codepage by default. If we need to compile small program and we don't have robist IDE we use notapad.exe (or something like this) that saves Russian text in Windows-1251. And nobody will be changing his dafault font in "Command Prompt" to Lucida Console only for my small program - I swear you! Any other compilers (Pascal, C, C++) understand that the Russian text in Windows is in Windows-1251! Currently I dont have any good editor for D whare I can normally edit Russian texts in UTF-8. Entice Designer has a bug confirmed by Chris Miller - you cannot enter Russian text, only copy and paste. Therefore if you build a D compiler for Win32 platform, you have make it work with widely used regional codepages. Because the entire world is not English only and fully not UTF-8! -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 20 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Jarrett Billingsley <jarrett.billingsley gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|diagnostic, wrong-code | Summary|Wrong processing by DMD.exe |Support Windows-1251 as a |of Russian Windows-1251 |source encoding |character set: "invalid | |UTF-8 sequence" | Severity|critical |enhancement 2009-07-20 23:11:34 PDT --- What you're basically asking for is an enhancement. I'm sorry, but that's the way it works. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 20 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Stewart Gordon <smjg iname.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |smjg iname.com Why not make this enhancement request "Write a decent, free, Unicode-compatible code editor that syntax-highlights D properly"? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 21 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 2009-07-21 17:07:58 PDT ---Why not make this enhancement request "Write a decent, free, Unicode-compatible code editor that syntax-highlights D properly"?Why not be a sarcastic ass _all the time_? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 21 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Dear friends, D has really good ideas behind its face, but Unicode support (UTF-16) in the compiler instead of old UTF-8 is "MUST HAVE" feature. Its a Windows programmers. Otherwise D compiler will stay an experiment forever. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 ---Dear friends, D has really good ideas behind its face, but Unicode support (UTF-16) in the compiler instead of old UTF-8 is "MUST HAVE" feature.DMD already supports UTF-16. Even UTF-32. Why do you want UTF-8 support removed?for Russian Windows programmers. Otherwise D compiler will stay an experiment forever.How would supporting codepages work anyway? Would they be converted to UTF-8 at compiletime? In this case, D would need some form of character encoding declaration. Or would they be left as are, and be rejected only in wchar, wchar[], dchar and dchar[] literals? What about all the D features and APIs that rely on char[] being UTF-8? Seriously, if you're going to code in D and need to use non-ASCII characters, it goes without saying that you should have a Unicode-compatible editor. The lack of good D editors may be a real issue at the moment, but AISI it makes little sense to try to work around it. No programming language is born with high-quality development tools. People need to write them. (That said, there have been a few dedicated D IDE projects. What's the highest stage of development any of them is at?) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Stewart, Windows compilers SHOULD understand and correctly convert regional characters for console and dialogs (from resource files). The simplest test for the compiler in Windows is to enter text in notepad.exe in regional language and try to compile the file. MS VCPP compiler, BCC compiler and any other C++ compiler do it. And if DMD supports UTF-16 then how to make it work with UTF-16 Russian text entered in the simplest Notepad editor? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Walter Bright <bugzilla digitalmars.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bugzilla digitalmars.com 02:32:23 PDT ---And if DMD supports UTF-16 then how to make it work with UTF-16 Russian text entered in the simplest Notepad editor?DMD will automatically detect and work correctly with UTF-16 and UTF-32 encoded source files. The logic to do this is in module.c of the compiler source code. If it does not work with a particular UTF-16 encoded file, please attach that file to this bug report. Note that UTF-16 encoded files are not encoded using a code page. If a source file is encoded with a particular code page, there is no way for the compiler to automatically detect it. C compilers often have a command line flag which is used to tell it what code page to use. Using code pages, therefore, makes your source code completely non-portable which is one of the reasons why D uses Unicode instead. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Created an attachment (id=431) --> (http://d.puremagic.com/issues/attachment.cgi?id=431) D Windows Unicode text - edit in Notapad and compile result in Console -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Created an attachment (id=432) --> (http://d.puremagic.com/issues/attachment.cgi?id=432) D UTF-8 text - edit in Notapad and compile result in Console -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Created an attachment (id=433) --> (http://d.puremagic.com/issues/attachment.cgi?id=433) D ANSI text with Russian letters - edit in Notapad and compile result in Console -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Oleg Halzov <ok96 mail.ru> changed: What |Removed |Added ---------------------------------------------------------------------------- description|edit in Notapad and compile |edit in Notepad and compile |result in Console |result in Console -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Oleg Halzov <ok96 mail.ru> changed: What |Removed |Added ---------------------------------------------------------------------------- description|Notapad and compile result |Notepad and compile result |in Console |in Console -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Oleg Halzov <ok96 mail.ru> changed: What |Removed |Added ---------------------------------------------------------------------------- description|letters - edit in Notapad |letters - edit in Notepad |and compile result in |and compile result in |Console |Console -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Dear Walter, Please take a close look at my last 3 attachements having "edit in Notepad and compile result in Console" text in descriptions. Note that all Russians have 866 codepage by default in Windows Command Prompt. Nobody will be switching 866 to any other codepage for console application. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 --- Going by your screenshots and their descriptions, DMD is behaving correctly. I do, however, feel that D's stdio ought to support codepages(In reply toDear Walter, Please take a close look at my last 3 attachements having "edit in Notepad and compile result in Console" text in descriptions.Going by your screenshots and their descriptions, DMD is behaving correctly.Note that all Russians have 866 codepage by default in Windows Command Prompt.You mean it's hard-coded for each language's edition of Windows? That's something else that ought to change.Nobody will be switching 866 to any other codepage for console application.Console output is an entirely separate issue from source encoding. I feel that D's stdio ought to support codepages, but it doesn't (aside from the fact that printf isn't part of D's stdio). Meanwhile, please check out my utility library http://pr.stewartsplace.org.uk/d/sutil/ -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 Andrei Alexandrescu <andrei metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrei metalanguage.com 18:10:55 PDT --- I think support for codepages and other character types could be implemented in a library. That was the ambitious purpose behind std.encoding. Yet another great project for someone interested. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 24 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 PDT --- As to console output, it's a duplicate of (runtime) bug 2742 or bug 1448. Tango and C API work correctly, phobos doesn't. As to cp1251, this ice age technology is definitely not a way to go, unicode is a future. No, it's the present. Windows works in unicode and you should use it. As to convertion of source from ANSI to OEM codepage, it's valid RFE, but hardly one will implement it. You can try yourself. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 27 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 ---As to console output, it's a duplicate of (runtime) bug 2742 or bug 1448.This is getting OT for this bug report, but it's 2742 to which what this conversation has drifted into is related. 1448 is a separate issue.As to convertion of source from ANSI to OEM codepage, it's valid RFE, but hardly one will implement it. You can try yourself.I already have. See comment 17. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 27 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 PDT --- Hmm... your library is just an API, it has nothing to do with source encoding and as far as I see it accepts utf8 text, not ANSI. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 28 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3193 ---Hmm... your library is just an API, it has nothing to do with source encodingAs has a lot of the discussion here from comment 13 onwards. Maybe, to avoid confusion, we should continue this conversation at bug 2742. Or perhaps even better, on the newsgroup.and as far as I see it accepts utf8 text, not ANSI.Not quite. It communicates with the console in the console codepage. Application code communicates with it in UTF-8. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 28 2009