digitalmars.D - INVALID UTF-8 SEQUENCE!

Martin (4/4) Aug 18 2004 I just downloaded the new dmd compiler and it tells me INVALID UTF-8 SEQ...

J C Calvarese (12/16) Aug 18 2004 What format is your file saved in?
Arcane Jill (10/14) Aug 18 2004 I have a sneaking suspicion you might find it will work just fine if you...

Martin (8/26) Aug 18 2004 I think I changed the 0.93 version to the 0.98. In 0.93 my files compile...

Arcane Jill (41/50) Aug 18 2004 That's not possible. In your original post you said "I use non.english

Martin (9/62) Aug 18 2004 Yes you are probably right, it is some kind of extended ascii, in this c...

Walter (11/19) Aug 18 2004 case I

Martin (4/24) Aug 19 2004 I think I will use the \xXX. My workaround solution was much uglyer, so ...
Arcane Jill (21/23) Aug 19 2004 Sorry, Walter - that's not right! You should not be encouraging the use ...

Martin (13/36) Aug 19 2004 I think I will move to UTF-8 with my next version of the program. I can'...

Arcane Jill (15/19) Aug 19 2004 But I don't think you can make demands on what encoding in which the POS...

Walter (5/7) Aug 19 2004 this

Walter (7/15) Aug 19 2004 \xXX

Arcane Jill (49/55) Aug 19 2004 I think that your statement might need some clarifying. Web servers by

Walter (8/14) Aug 18 2004 having

Arcane Jill (3/18) Aug 19 2004 Could be fun. So what are CUJ and DDJ? Could someone give me some URLs?

Jonathan Leffler (9/16) Aug 19 2004 CUJ = C User's Journal (or possibly Users'?)

Walter (13/25) Aug 19 2004 you

Nick (8/12) Aug 18 2004 If you are on linux you can convert from latin1 to utf8 with the command

Walter (7/12) Aug 18 2004 just

Martin <Martin_member pathlink.com> writes:

I just downloaded the new dmd compiler and it tells me INVALID UTF-8 SEQUENCE
when I compile. This is when I use non.english characters in my strings.(like
���) But I need to use them. The old version was just fine, why this change?

Most of the C compilers accept them, why not D?

Aug 18 2004

J C Calvarese <jcc7 cox.net> writes:

In article <cfvh55$2d5s$1 digitaldaemon.com>, Martin says...
I just downloaded the new dmd compiler and it tells me INVALID UTF-8 SEQUENCE
when I compile. This is when I use non.english characters in my strings.(like
���) But I need to use them. The old version was just fine, why this change?

Most of the C compilers accept them, why not D?

What format is your file saved in?

From http://www.digitalmars.com/d/lex.html:

Source Text
D source text can be in one of the following formats:

* ASCII
* UTF-8
* UTF-16BE
* UTF-16LE
* UTF-32BE
* UTF-32LE 

jcc7

Aug 18 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cfvh55$2d5s$1 digitaldaemon.com>, Martin says...
I just downloaded the new dmd compiler and it tells me INVALID UTF-8 SEQUENCE
when I compile. This is when I use non.english characters in my strings.(like
���) But I need to use them. The old version was just fine, why this change?

I have a sneaking suspicion you might find it will work just fine if you save
your source file in UTF-8 before trying to compile it. (Save As...). So far as I
know, the D compiler has not changed in this regard (except that it can now
auto-detect UTF-16 and UTF-32).


Most of the C compilers accept them, why not D?

Actually, I think most C compilers simply allow a string to consist of an
arbitrary sequence of bytes without any interpretation whatsoever - which just
happens to appear to work whenever the source file encoding is the same as the
run-time encoding.

Arcane Jill

Aug 18 2004

Martin <Martin_member pathlink.com> writes:

Thank you for your answer!
I have a sneaking suspicion you might find it will work just fine if you save
your source file in UTF-8 before trying to compile it. (Save As...). 

I am using the gnu midnight commander text editor, it only saves ascii.


So far as I know, the D compiler has not changed in this regard (except that it
>can now auto-detect UTF-16 and UTF-32).

I think I changed the 0.93 version to the 0.98. In 0.93 my files compiled fine,
in 0.98 I get an error. I changed back to 0.93, because I need to use these
characters.

So how to I tell the dmd that the source is an ascii file?

Thank you!






In article <cfvjdr$2dr2$1 digitaldaemon.com>, Arcane Jill says...
In article <cfvh55$2d5s$1 digitaldaemon.com>, Martin says...
I just downloaded the new dmd compiler and it tells me INVALID UTF-8 SEQUENCE
when I compile. This is when I use non.english characters in my strings.(like
���) But I need to use them. The old version was just fine, why this change?

I have a sneaking suspicion you might find it will work just fine if you save
your source file in UTF-8 before trying to compile it. (Save As...). So far as I
know, the D compiler has not changed in this regard (except that it can now
auto-detect UTF-16 and UTF-32).


Most of the C compilers accept them, why not D?

Actually, I think most C compilers simply allow a string to consist of an
arbitrary sequence of bytes without any interpretation whatsoever - which just
happens to appear to work whenever the source file encoding is the same as the
run-time encoding.

Arcane Jill

Aug 18 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cfvlcp$2eob$1 digitaldaemon.com>, Martin says...
Thank you for your answer!

Err... Don't thank me yet. Save that until the problem's actually solved!


I have a sneaking suspicion you might find it will work just fine if you save
your source file in UTF-8 before trying to compile it. (Save As...). 

I am using the gnu midnight commander text editor, it only saves ascii.

That's not possible. In your original post you said "I use non.english
characters in my strings.(like ���)". If that statement is true, you /cannot/ be
using ASCII, since these characters do not even /exist/ in ASCII. If your text
contains any of the characters '�', '�' or '�' then you are /not/ using ASCII.
Period.

Unfortunately, I am not familiar with this text editor, so I don't know how to
determine the encoding it uses, or how to change it. I may have a fix for you
even so, however. (Read on...)


So far as I know, the D compiler has not changed in this regard (except that it
>can now auto-detect UTF-16 and UTF-32).

I think I changed the 0.93 version to the 0.98. In 0.93 my files compiled fine,
in 0.98 I get an error. I changed back to 0.93, because I need to use these
characters.

Okay. Now, first off, the following compiles fine for me:






using DMD v0.98, with the file saved as UTF-8. However, when I resaved the file
as ISO-8859-1 (which is an invalid thing to do) then the compiler (correctly)
gave me the compile-time error message: "Invalid UTF-8 sequence".

I believe that the earlier version to which you refer (0.93) there was a bug,
which was fixed in 0.96 - according to the change log: "Invalid UTF characters
in string literals now diagnosed."  In other words, DMD 0.93 failed to diagnose
the invalid UTF-8 characters in your source file, and so the file compiled --
but it compiled incorrectly. The error would not have been detected until
runtime - and even then only IF you passed your string to a UTF conversion
routine. If you passed your invalid string straight to printf(), for example,
the v0.93 compiler wouldn't even have noticed. But you can bet your life that
even if your program had appeared to run correctly on your machine, it would not
necessarily have worked on anyone else's.

So tell me - what operating system are you using. The word "gnu" makes me
suspect Linux, in which case I believe you need to set the environment variable,
CHARSET to the value UTF-8. (But I'm a Windows user, so I could be wrong - I'm
hoping someone will leap in here and correct me if so). Anyway, once you've set
your environment variable, everything should work with the latest DMD - and this
time, it will work for everyone, not just for you.



So how to I tell the dmd that the source is an ascii file?

There's a problem here - which is that you and I are not speaking the same
language. An ASCII file is a file which DOES NOT CONTAIN any characters having
codepoints outside the range 0x00 to 0x7F. DMD is perfectly happy with ASCII
files, but your files are not ASCII.

Sorry to be pedantic. Your file is /probably/ ISO-8859-1 (aka "Latin 1"). But
it's not ASCII.

Arcane Jill

Aug 18 2004

Martin <Martin_member pathlink.com> writes:

Yes you are probably right, it is some kind of extended ascii, in this case I
think that yes it is ISO-8859-1.
My problem is, that the webserver that I am wrting this software for, uses the
same encoding.
With the old version everything worked fine. Everyone that used the server saw
the characters right. 

So can I tell the dmd to use  ISO-8859-1, or just not to check the things it
shouldn't be checking?



In article <cfvqq8$2jhu$1 digitaldaemon.com>, Arcane Jill says...
In article <cfvlcp$2eob$1 digitaldaemon.com>, Martin says...
Thank you for your answer!

Err... Don't thank me yet. Save that until the problem's actually solved!


I have a sneaking suspicion you might find it will work just fine if you save
your source file in UTF-8 before trying to compile it. (Save As...). 

I am using the gnu midnight commander text editor, it only saves ascii.

That's not possible. In your original post you said "I use non.english
characters in my strings.(like ���)". If that statement is true, you /cannot/ be
using ASCII, since these characters do not even /exist/ in ASCII. If your text
contains any of the characters '�', '�' or '�' then you are /not/ using ASCII.
Period.

Unfortunately, I am not familiar with this text editor, so I don't know how to
determine the encoding it uses, or how to change it. I may have a fix for you
even so, however. (Read on...)


So far as I know, the D compiler has not changed in this regard (except that it
>can now auto-detect UTF-16 and UTF-32).

I think I changed the 0.93 version to the 0.98. In 0.93 my files compiled fine,
in 0.98 I get an error. I changed back to 0.93, because I need to use these
characters.

Okay. Now, first off, the following compiles fine for me:






using DMD v0.98, with the file saved as UTF-8. However, when I resaved the file
as ISO-8859-1 (which is an invalid thing to do) then the compiler (correctly)
gave me the compile-time error message: "Invalid UTF-8 sequence".

I believe that the earlier version to which you refer (0.93) there was a bug,
which was fixed in 0.96 - according to the change log: "Invalid UTF characters
in string literals now diagnosed."  In other words, DMD 0.93 failed to diagnose
the invalid UTF-8 characters in your source file, and so the file compiled --
but it compiled incorrectly. The error would not have been detected until
runtime - and even then only IF you passed your string to a UTF conversion
routine. If you passed your invalid string straight to printf(), for example,
the v0.93 compiler wouldn't even have noticed. But you can bet your life that
even if your program had appeared to run correctly on your machine, it would not
necessarily have worked on anyone else's.

So tell me - what operating system are you using. The word "gnu" makes me
suspect Linux, in which case I believe you need to set the environment variable,
CHARSET to the value UTF-8. (But I'm a Windows user, so I could be wrong - I'm
hoping someone will leap in here and correct me if so). Anyway, once you've set
your environment variable, everything should work with the latest DMD - and this
time, it will work for everyone, not just for you.



So how to I tell the dmd that the source is an ascii file?

There's a problem here - which is that you and I are not speaking the same
language. An ASCII file is a file which DOES NOT CONTAIN any characters having
codepoints outside the range 0x00 to 0x7F. DMD is perfectly happy with ASCII
files, but your files are not ASCII.

Sorry to be pedantic. Your file is /probably/ ISO-8859-1 (aka "Latin 1"). But
it's not ASCII.

Arcane Jill

Aug 18 2004

"Walter" <newshound digitalmars.com> writes:

"Martin" <Martin_member pathlink.com> wrote in message
news:cg0ggt$16f3$1 digitaldaemon.com...
 Yes you are probably right, it is some kind of extended ascii, in this

case I
 think that yes it is ISO-8859-1.
 My problem is, that the webserver that I am wrting this software for, uses

the
 same encoding.
 With the old version everything worked fine. Everyone that used the server

saw
 the characters right.

 So can I tell the dmd to use  ISO-8859-1, or just not to check the things

it
 shouldn't be checking?

There's no way to do that right now. One of the problems with using such
charsets in source code is the source code is then non-portable. Someone can
just change a seemingly unrelated system setting, and poof, your builds
fail. You can also use \xXX to specify the characters, though that is ugly
enough to be unusable.

Aug 18 2004

Martin <Martin_member pathlink.com> writes:

I think I will use the \xXX. My workaround solution was much uglyer, so I am
quite happy with this one.

Thanks!

In article <cg0n3l$1ln6$1 digitaldaemon.com>, Walter says...
"Martin" <Martin_member pathlink.com> wrote in message
news:cg0ggt$16f3$1 digitaldaemon.com...
 Yes you are probably right, it is some kind of extended ascii, in this

case I
 think that yes it is ISO-8859-1.
 My problem is, that the webserver that I am wrting this software for, uses

the
 same encoding.
 With the old version everything worked fine. Everyone that used the server

saw
 the characters right.

 So can I tell the dmd to use  ISO-8859-1, or just not to check the things

it
 shouldn't be checking?

There's no way to do that right now. One of the problems with using such
charsets in source code is the source code is then non-portable. Someone can
just change a seemingly unrelated system setting, and poof, your builds
fail. You can also use \xXX to specify the characters, though that is ugly
enough to be unusable.

Aug 19 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cg0n3l$1ln6$1 digitaldaemon.com>, Walter says...

You can also use \xXX to specify the characters, though that is ugly
enough to be unusable.

Sorry, Walter - that's not right! You should not be encouraging the use of \xXX
in this context. This is wrong. Martin needs to be using \uXXXX, not \xXX.
Instead of \xD6, he needs to use \u00D6. (Martin, I hope you're listening).

Sticking \x's into a string literal is just another way to create an invalid
UTF-8 sequence. See this code:










This will output:




thereby proving that s1 contains an Invalid UTF-8 sequence! (But s2 is correct).

Remember - \x is used to insert literal bytes. \u inserts characters. All you've
done is provided a way to get pre DMD-0.96 behavior out of a DMD-0.96+ compiler.

Arcane Jill

Aug 19 2004

Martin <Martin_member pathlink.com> writes:

I think I will move to UTF-8 with my next version of the program. I can't do it
right now, because then it needs some rewriting.
The UTF-8 output is not the problem, it's more like UTF-8 input. I need to read
the POST data from users browser, to proccess it.
The problem with UTF-8 is that a character can be 1,2,3 or even 4 bytes long. I
do a lot of text proccessing and I need to rewrite, atleast look over all these
functions. But I have a deadline coming...

I wrote my last web with C++, didn't use UTF-8, and it works fine. I am only
writing application for Estonian people.

But probalby you are right, I need to move to UTF-8, but not before my next
version.

Martin



In article <cg1of6$18ss$1 digitaldaemon.com>, Arcane Jill says...
In article <cg0n3l$1ln6$1 digitaldaemon.com>, Walter says...

You can also use \xXX to specify the characters, though that is ugly
enough to be unusable.

Sorry, Walter - that's not right! You should not be encouraging the use of \xXX
in this context. This is wrong. Martin needs to be using \uXXXX, not \xXX.
Instead of \xD6, he needs to use \u00D6. (Martin, I hope you're listening).

Sticking \x's into a string literal is just another way to create an invalid
UTF-8 sequence. See this code:










This will output:




thereby proving that s1 contains an Invalid UTF-8 sequence! (But s2 is correct).

Remember - \x is used to insert literal bytes. \u inserts characters. All you've
done is provided a way to get pre DMD-0.96 behavior out of a DMD-0.96+ compiler.

Arcane Jill

Aug 19 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cg1q02$1c1h$1 digitaldaemon.com>, Martin says...

The UTF-8 output is not the problem, it's more like UTF-8 input. I need to read
the POST data from users browser, to proccess it.

But I don't think you can make demands on what encoding in which the POST data
is going to be presented, can you? You simply have to recognize it, and decode
it. If the data is in ISO-whatever, you must decode that; if the data is in
MAC-ROMAN, you must decode that; if the data is in UTF-8, you must decode that.
And so on.


The problem with UTF-8 is that a character can be 1,2,3 or even 4 bytes long.

Indeed, but D has lots of handy functions to convert them. And the problem with
ISO-8859-1 (Latin-1) is that characters beyond \u00FF are completely
unrepresentable. Like, AT ALL. If someone wants to use a lowercase c with an
acute accent ('\u0107'), you're completely screwed. UTF-8 is the solution.


I wrote my last web with C++, didn't use UTF-8, and it works fine.

But only if /you/ compile it. If someone else, with a different default
encoding, were to compile the same source code, it may fail badly. 

But it's nice to see you're writing for a non-English audience. I'm sure this
trend will continue.

Arcane Jill

Aug 19 2004

"Walter" <newshound digitalmars.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cg1rba$1fnh$1 digitaldaemon.com...
 But it's nice to see you're writing for a non-English audience. I'm sure

this
 trend will continue.

And that's great, because it helps us identify and shake out the problems
with the internationalization support.

Aug 19 2004

"Walter" <newshound digitalmars.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cg1of6$18ss$1 digitaldaemon.com...
 In article <cg0n3l$1ln6$1 digitaldaemon.com>, Walter says...

You can also use \xXX to specify the characters, though that is ugly
enough to be unusable.

 Sorry, Walter - that's not right! You should not be encouraging the use of

\xXX
 in this context. This is wrong. Martin needs to be using \uXXXX, not \xXX.
 Instead of \xD6, he needs to use \u00D6. (Martin, I hope you're

listening).
 Sticking \x's into a string literal is just another way to create an

invalid
 UTF-8 sequence. See this code:

True, but if they're used to create a ubyte[] sequence (not a char[]
sequence) it should work.

Aug 19 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cg0ggt$16f3$1 digitaldaemon.com>, Martin says...

My problem is, that the webserver that I am wrting this software for, uses the
same encoding.

I think that your statement might need some clarifying. Web servers by
definition need to do transcoding. Most programs need a concept of a "run-time
encoding" (so they can do printf(), etc.), but the run-time encoding of a web
server is no longer limited to that of one particular machine - a web server has
to deal with machines all over the internet, each possibly with its own local
encoding. The "Accept" field in an HTTP request can act as a request from the
browser to the server that the web content be delivered in a particular
encoding. For example:



When the page is delivered, a web server sends back:



If the encoding is not specified then HTML is supposed to default to ISO-8859-1,
but XML (including XHTML) is supposed to default to UTF-8. A web server which
doesn't do UTF-8, or which doesn't do transcoding, is all but useless. That
said, you may still be able to get away with it. If you send all your web
content in a particular encoding, then, as long as it is marked as such, the
user's browser /may/ be able to reinterpret the page (the Accept request header
is supposed to advise you of what the browser can or can't deal with).

So, when you say "the webserver ... uses the same encoding [ISO-8859-1]", I'm
still not clear what it uses that encoding /for/. It's the default for HTML, but
are you saying your server emits no other encoding? Not even UTF-8? That would
be weird. Any chance you could clarify?




With the old version everything worked fine. Everyone that used the server saw
the characters right. 

Providing your server emitted "Content-type: text/html; charset=ISO-8859-1" in
its response headers, (or just "Content-type: text/html" since ISO-8859-1 is the
default for HTML - but that's dangerous, since not all browsers obey the W3C
spec), that is likely to be true. But still, you're relying on a parochial
character set, and it /is/ possible that some viewers of your server simply
won't have that encoding in their browser.


So can I tell the dmd to use  ISO-8859-1, or just not to check the things it
shouldn't be checking?

No. You *MUST* save your DMD source files in either ASCII or UTF-8 before
attempting to compile them. If you wish to emit output in ISO-8859-1 then you
must ISO-8859-1-encode the output at runtime (which is easy - I can show you how
to do that).

But why is saving your source file as UTF-8 hard? I've never heard of a modern
text editor which can't do it, but if you've discovered one, why not just change
to a different text editor?

Nonetheless - if you really can't figure out how to save in UTF-8 (which would
be surprising for someone writing a web server, with all the transcoding
understanding required thereby), then your only remaining choice is to save as
ASCII. You can do this by replacing your non-ASCII characters either by Unicode
escape sequences (if you want DMD to interpret them) or HTML entities (if you
want the users' browsers to interpret them). So replace as follows:







Hope that helps.

Arcane Jill

Aug 19 2004

"Walter" <newshound digitalmars.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cfvqq8$2jhu$1 digitaldaemon.com...
 There's a problem here - which is that you and I are not speaking the same
 language. An ASCII file is a file which DOES NOT CONTAIN any characters

having
 codepoints outside the range 0x00 to 0x7F. DMD is perfectly happy with

ASCII
 files, but your files are not ASCII.

 Sorry to be pedantic. Your file is /probably/ ISO-8859-1 (aka "Latin 1").

But
 it's not ASCII.

You write well and understand the issues involved. Can I suggest that you
write an article about this for, say, CUJ or DDJ? Such an article exploring
this topic is sorely needed.

Aug 18 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <cg0gsg$16u8$2 digitaldaemon.com>, Walter says...
"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cfvqq8$2jhu$1 digitaldaemon.com...
 There's a problem here - which is that you and I are not speaking the same
 language. An ASCII file is a file which DOES NOT CONTAIN any characters

having
 codepoints outside the range 0x00 to 0x7F. DMD is perfectly happy with

ASCII
 files, but your files are not ASCII.

 Sorry to be pedantic. Your file is /probably/ ISO-8859-1 (aka "Latin 1").

But
 it's not ASCII.

You write well and understand the issues involved. Can I suggest that you
write an article about this for, say, CUJ or DDJ? Such an article exploring
this topic is sorely needed.

Could be fun. So what are CUJ and DDJ? Could someone give me some URLs?

Jill

Aug 19 2004

Jonathan Leffler <jleffler earthlink.net> writes:

Arcane Jill wrote:

 In article <cg0gsg$16u8$2 digitaldaemon.com>, Walter:
You write well and understand the issues involved. Can I suggest that you
write an article about this for, say, CUJ or DDJ? Such an article exploring
this topic is sorely needed.

 
 
 Could be fun. So what are CUJ and DDJ? Could someone give me some URLs?

CUJ = C User's Journal (or possibly Users'?)
	http://www.cuj.com/ (where there's no apostrophe in sight)
DDJ = Dr Dobb's Journal
	http://www.ddj.com/


-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler earthlink.net, jleffler us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/

Aug 19 2004

"Walter" <newshound digitalmars.com> writes:

"Jonathan Leffler" <jleffler earthlink.net> wrote in message
news:cg1n1p$13qg$1 digitaldaemon.com...
 Arcane Jill wrote:

 In article <cg0gsg$16u8$2 digitaldaemon.com>, Walter:
You write well and understand the issues involved. Can I suggest that



you
write an article about this for, say, CUJ or DDJ? Such an article



exploring
this topic is sorely needed.


 Could be fun. So what are CUJ and DDJ? Could someone give me some URLs?

 CUJ = C User's Journal (or possibly Users'?)
 http://www.cuj.com/ (where there's no apostrophe in sight)
 DDJ = Dr Dobb's Journal
 http://www.ddj.com/

Yes, they're the two main print publications that C/C++ programmers read.
The D articles published by them have been well received, and the publisher
(CMP Media) has indicated they want more. And besides, they even pay for
articles! Getting published in CUJ or DDJ is fairly prestigious, and will
look good on any resume. Many of the top highly paid C++ professionals built
their reputation early on by writing articles. Many companies also have a
policy of giving a bonus to engineering employees who get published in a
magazine, that's worth checking out.

So it's really an everybody wins kind of situation.

Aug 19 2004

Nick <Nick_member pathlink.com> writes:

In article <cfvlcp$2eob$1 digitaldaemon.com>, Martin says...
Thank you for your answer!
I have a sneaking suspicion you might find it will work just fine if you save
your source file in UTF-8 before trying to compile it. (Save As...). 

I am using the gnu midnight commander text editor, it only saves ascii.

If you are on linux you can convert from latin1 to utf8 with the command

iconv -f latin1 -t utf8 file.d > newfile.d
dmd newfile.d

You will probably be doing that a lot, so it's best if you can put it in a
script or something.

Hope this helps :)

Nick

Aug 18 2004

"Walter" <newshound digitalmars.com> writes:

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:cfvjdr$2dr2$1 digitaldaemon.com...
Most of the C compilers accept them, why not D?

 Actually, I think most C compilers simply allow a string to consist of an
 arbitrary sequence of bytes without any interpretation whatsoever - which

just
 happens to appear to work whenever the source file encoding is the same as

the
 run-time encoding.

It doesn't always work, some of the code pages include multibyte sequences
where " can be the second byte :-(. That's why DMC has special switches for
such. This is just the sort of thing I want to move away from.

Aug 18 2004

D Programming

C/C++ Programming

Other

digitalmars.D - INVALID UTF-8 SEQUENCE!