www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Error: 4invalid UTF-8 sequence

reply jicman <jicman_member pathlink.com> writes:
Greetings!

admire this complex piece of code: :-)

import std.stdio;
int main(char[][] args)
{
printf("josť" ~ "\n");
writefln("josť");
return (0);
}

when I try to compile it, I get,

16:21:30.97>dmd name.d
name.d(4): invalid UTF-8 sequence
name.d(5): invalid UTF-8 sequence

The 50 cents question is, how can I get rid of it?  The real reason is why I ask
is that I am downloading a bunch of xml code and some of the names are accented
by different languages and I am getting this error when I try to print
(writefln) a variable with an accented name.  However, an interesting outcome is
that when I use printf, the above problem is not encountered.  HUH!

Thanks much!

josť 

:-)
Feb 21 2005
next sibling parent Lars Ivar Igesund <larsivar igesund.net> writes:
DMD don't "understand" non-ASCII chars unless the source file is stored 
as UTF-8. Either it's a config setting in your editor that let's you do 
it, or you should change editor ASAP :) Note that converting non-UTF-8 
files to UTF-8 might produce artifacts.

Lars Ivar Igesund

jicman wrote:
 Greetings!
 
 admire this complex piece of code: :-)
 
 import std.stdio;
 int main(char[][] args)
 {
 printf("josť" ~ "\n");
 writefln("josť");
 return (0);
 }
 
 when I try to compile it, I get,
 
 16:21:30.97>dmd name.d
 name.d(4): invalid UTF-8 sequence
 name.d(5): invalid UTF-8 sequence
 
 The 50 cents question is, how can I get rid of it?  The real reason is why I
ask
 is that I am downloading a bunch of xml code and some of the names are accented
 by different languages and I am getting this error when I try to print
 (writefln) a variable with an accented name.  However, an interesting outcome
is
 that when I use printf, the above problem is not encountered.  HUH!
 
 Thanks much!
 
 josť 
 
 :-)
 
 

Feb 21 2005
prev sibling next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
jicman wrote:

 admire this complex piece of code: :-)
 
 import std.stdio;
 int main(char[][] args)
 {
 printf("josť" ~ "\n");
 writefln("josť");
 return (0);
 }
 
 when I try to compile it, I get,
 
 16:21:30.97>dmd name.d
 name.d(4): invalid UTF-8 sequence
 name.d(5): invalid UTF-8 sequence

Works For Me: josť josť
 The 50 cents question is, how can I get rid of it?

Save your file as UTF-8, and use an UTF-8 console... D *only* supports Unicode, not any legacy encodings. --anders
Feb 21 2005
parent reply jicman <jicman_member pathlink.com> writes:
Anders_F_Bj=F6rklund?= says...

 The 50 cents question is, how can I get rid of it?

Save your file as UTF-8, and use an UTF-8 console...

Ok, this is interesting Windows at its best! I have to completely retype that whole program! :-) Not a good thing. Ok, thanks.
Feb 21 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 21 Feb 2005 22:27:04 +0000 (UTC), jicman  
<jicman_member pathlink.com> wrote:
 Anders_F_Bj=F6rklund?= says...

 The 50 cents question is, how can I get rid of it?

Save your file as UTF-8, and use an UTF-8 console...

Ok, this is interesting Windows at its best! I have to completely retype that whole program! :-) Not a good thing.

What? Why? Can't you open it, then do a save-as, or copy/paste into another editor then do a save-as? Regan
Feb 21 2005
parent reply jicman <jicman_member pathlink.com> writes:
In article <opsmkl5qje23k2f5 ally>, Regan Heath says...

 Ok, this is interesting Windows at its best!  I have to completely  
 retype that
 whole program! :-)  Not a good thing.

What? Why? Can't you open it, then do a save-as, or copy/paste into another editor then do a save-as?

:-) I know exactly how you said that> :-) Yes, I tried that. I even opened the same program with notepad (that's as Windows as Windows can get) and tried to compile it and got the same error. Somehow, my dual keyboard system does not like those accented vowels. I am now searching for a new editor. I love vim, but this is going too far. Which freeware editors have d syntax hightliting? I am downloading one called Zeus that a d lover had on his page. thanks.
Feb 21 2005
next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 21 Feb 2005 23:39:37 +0000 (UTC), jicman  
<jicman_member pathlink.com> wrote:
 In article <opsmkl5qje23k2f5 ally>, Regan Heath says...

 Ok, this is interesting Windows at its best!  I have to completely
 retype that
 whole program! :-)  Not a good thing.

What? Why? Can't you open it, then do a save-as, or copy/paste into another editor then do a save-as?

:-) I know exactly how you said that> :-)

:)
 Yes, I tried that.  I even opened the same program with notepad (that's  
 as
 Windows as Windows can get) and tried to compile it and got the same  
 error.
 Somehow, my dual keyboard system does not like those accented vowels.  I  
 am now
 searching for a new editor.  I love vim, but this is going too far.

I have windows XP sp2, and... NotePad will save as: Unicode Unicode Big Endian UTF-8 (see "encoding" drop down in save-as dialog) WordPad will save as a "unicode document". I'm guessing that means UTF-16, hopefully with a BOM. (see "save as type" drop down in save-as dialog)
 Which freeware editors have d syntax hightliting?

 I am downloading one called Zeus that a d lover had on his page.

Try: http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport Regan
Feb 21 2005
prev sibling next sibling parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
jicman wrote:
 In article <opsmkl5qje23k2f5 ally>, Regan Heath says...
 
 
Ok, this is interesting Windows at its best!  I have to completely  
retype that
whole program! :-)  Not a good thing.

What? Why? Can't you open it, then do a save-as, or copy/paste into another editor then do a save-as?

:-) I know exactly how you said that> :-) Yes, I tried that. I even opened the same program with notepad (that's as Windows as Windows can get) and tried to compile it and got the same error. Somehow, my dual keyboard system does not like those accented vowels. I am now searching for a new editor. I love vim, but this is going too far.

Here is my UTF-part of _vimrc: set bomb set ff=unix set enc=utf-8 fileencodings= Lars Ivar Igesund
Feb 22 2005
parent jicman <jicman_member pathlink.com> writes:
In article <cvft8k$s5q$1 digitaldaemon.com>, Lars Ivar Igesund says...
Here is my UTF-part of _vimrc:

set bomb
set ff=unix
set enc=utf-8 fileencodings=

thanks. I didn't have that.
Feb 22 2005
prev sibling parent Charles Hixson <charleshixsn earthlink.net> writes:
jicman wrote:
 In article <opsmkl5qje23k2f5 ally>, Regan Heath says...
 
 
Ok, this is interesting Windows at its best!  I have to completely  
retype that
whole program! :-)  Not a good thing.

What? Why? Can't you open it, then do a save-as, or copy/paste into another editor then do a save-as?

:-) I know exactly how you said that> :-) Yes, I tried that. I even opened the same program with notepad (that's as Windows as Windows can get) and tried to compile it and got the same error. Somehow, my dual keyboard system does not like those accented vowels. I am now searching for a new editor. I love vim, but this is going too far. Which freeware editors have d syntax hightliting? I am downloading one called Zeus that a d lover had on his page. thanks.

And with NEdit you can make one, but that's X Window only. Then there's KEdit, but that's the same story as Kate. You could look up MultiEdit. That's what I used for odd languages when I was on MSWind. Again, it's a "define you own language" kind of thing. I seem to remember hearing of others, but since they were MSWind only, I ignored them. And again, you would need to define your own language. I hear that there's a version of KDE for MSWind now, but that seems like an awful lot of work to go to for an editor, and besides, I don't know how well it works. (It's still in the very early days.) What I did when I started finding MSWind too much of a bother was to get a second disk, and run linux from that. OTOH, if you don't need to boot frequently you could get a Mempis CD (or Knoppix) and boot from that. I'm pretty sure that Mempis will let you save your files into files on a MSWind partition. (Not certain, though, so a floppy might be needed for certainty.) Still, that's mainly a demo disk. Booting from a CD is SLOW, and again, every time you need to load something that isn't already in RAM everything turns into molasses.
Feb 23 2005
prev sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 21 Feb 2005 21:23:32 +0000 (UTC), jicman  
<jicman_member pathlink.com> wrote:
 Greetings!

 admire this complex piece of code: :-)

 import std.stdio;
 int main(char[][] args)
 {
 printf("josť" ~ "\n");
 writefln("josť");
 return (0);
 }

 when I try to compile it, I get,

 16:21:30.97>dmd name.d
 name.d(4): invalid UTF-8 sequence
 name.d(5): invalid UTF-8 sequence

 The 50 cents question is, how can I get rid of it?

The 50 cents answer is, ensure your editor is saving the source file as UTF-8, UTF-16 (with a BOM) or UTF-32 (also with a BOM).
 The real reason is why I ask
 is that I am downloading a bunch of xml code and some of the names are  
 accented
 by different languages and I am getting this error when I try to print
 (writefln) a variable with an accented name.  However, an interesting  
 outcome is
 that when I use printf, the above problem is not encountered.  HUH!

This is a somewhat complex area, and I'm not sure I have it 100% sorted myself, but I'll give this a go, I _know_ someone will set us both straight if I have it wrong. Things to consider/know: - D source files must be saved in UTF encoding. - on windows your console _might_ be in UTF, it might be in something else i.e. latin-1 - printf is a C function, it is oblivious to UTF etc. - writef is a D function, it ensures you're writing in UTF. So, what I suspect is happening to you is either: 1. You're reading these names from something which is not in UTF format. 2. Your source is not in UTF format. You might see odd results once you get it working, this will be due to your console not being in utf mode, I don't know how to change console modes, someone else will have to chip in here. Regan Regan
Feb 21 2005
parent reply =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= <afb algonet.se> writes:
Regan Heath wrote:

 - D source files must be saved in UTF encoding.

One simple such UTF encoding is (escaped) ASCII:
 import std.stdio;
 int main(char[][] args)
 {
   printf("jos\u00e9\n");
   writefln("jos\u00e9");
   return (0);
 }

This source code will "work", even in ISO-8859-*...
 - on windows your console _might_ be in UTF, it might be in something 
 else  i.e. latin-1

On Linux and other platforms, the console might also be in e.g. Latin-1. If you see something like "josé", then D does not like your console... Other languages, like C and Java for instance, support other encodings. But D only does Unicode, preferrably in the form of the UTF-8 encoding. On Linux and Mac OS X it is simple to set the console to UTF-8, and if someone could detail the steps needed on Windows that would be great ? I've heard some rumors that the "chcp 65001" command works on Win 2K... (although you might also have to change the default font being used ?) --anders
Feb 21 2005
parent Lars Ivar Igesund <larsivar igesund.net> writes:
Anders F Björklund wrote:

 On Linux and Mac OS X it is simple to set the console to UTF-8, and if
 someone could detail the steps needed on Windows that would be great ?
 
 I've heard some rumors that the "chcp 65001" command works on Win 2K...
 (although you might also have to change the default font being used ?)
 
 --anders

Yep, the 65001 cp is the one for UTF-8. In addition, the console font must be UTF-8. AFAIK, none of the raster fonts work which leaves Lucida Console font as the only feasible alternative on my comp. Lars Ivar Igesund
Feb 21 2005