digitalmars.D.bugs - utf and std.file
- Carlos Santander B. <Carlos_member pathlink.com> Apr 29 2004
- C <dont respond.com> Apr 29 2004
- "Matthew" <matthew.hat stlsoft.dot.org> Apr 30 2004
- "Walter" <newshound digitalmars.com> May 02 2004
- "Carlos Santander B." <carlos8294 msn.com> May 02 2004
- "Walter" <newshound digitalmars.com> May 02 2004
- J C Calvarese <jcc7 cox.net> May 02 2004
- "Walter" <newshound digitalmars.com> May 02 2004
- J C Calvarese <jcc7 cox.net> May 03 2004
- "Carlos Santander B." <carlos8294 msn.com> May 03 2004
- "Walter" <newshound digitalmars.com> May 03 2004
- "Walter" <newshound digitalmars.com> May 04 2004
- Carlos Santander B. <Carlos_member pathlink.com> May 04 2004
- Carlos Santander B. <Carlos_member pathlink.com> May 04 2004
- "Walter" <newshound digitalmars.com> May 07 2004
- "Carlos Santander B." <carlos8294 msn.com> May 08 2004
- "Carlos Santander B." <carlos8294 msn.com> May 10 2004
- "Walter" <newshound digitalmars.com> May 19 2004
- "Carlos Santander B." <carlos8294 msn.com> May 19 2004
- "Walter" <newshound digitalmars.com> May 19 2004
- "Carlos Santander B." <carlos8294 msn.com> May 20 2004
- J C Calvarese <jcc7 cox.net> May 20 2004
- "Walter" <newshound digitalmars.com> May 21 2004
- "Walter" <newshound digitalmars.com> May 24 2004
- "Carlos Santander B." <carlos8294 msn.com> May 25 2004
This simple program:
import std.file;
import std.c.stdio;
import std.path;
import std.utf;
void main() {
char [][] archivos = listdir( curdir ) ;
foreach ( char [] a ; archivos )
try
validate(a);
catch (UtfError)
printf("%.*s: invalid\n",a);
}
Outputs "invalid" for any file that contains in its name any of: áéíóúñÁÉÍÓÚÑ,
and maybe other characters. That means that for any file named, say,
"año2004.dat", I can't do anything with it because DMD thinks its name is not
valid. That's annoying, at least for me, because those are characters that are
used all the time in Spanish and other languages, so I tend to name my files
using those characters.
-------------------
Carlos Santander B.
Apr 29 2004
Sorry if this is a stupid question , but does that mean file names can be anything ? Do the Russians use russian charaters for their files etc. ? C Carlos Santander B. wrote:This simple program: import std.file; import std.c.stdio; import std.path; import std.utf; void main() { char [][] archivos = listdir( curdir ) ; foreach ( char [] a ; archivos ) try validate(a); catch (UtfError) printf("%.*s: invalid\n",a); } Outputs "invalid" for any file that contains in its name any of: áéíóúñÁÉÍÓÚÑ, and maybe other characters. That means that for any file named, say, "año2004.dat", I can't do anything with it because DMD thinks its name is not valid. That's annoying, at least for me, because those are characters that are used all the time in Spanish and other languages, so I tend to name my files using those characters. ------------------- Carlos Santander B.
Apr 29 2004
He, he. You're American, right? <G> "C" <dont respond.com> wrote in message news:c6rcsk$1sei$1 digitaldaemon.com...Sorry if this is a stupid question , but does that mean file names can be anything ? Do the Russians use russian charaters for their files etc. ? C Carlos Santander B. wrote:This simple program: import std.file; import std.c.stdio; import std.path; import std.utf; void main() { char [][] archivos = listdir( curdir ) ; foreach ( char [] a ; archivos ) try validate(a); catch (UtfError) printf("%.*s: invalid\n",a); } Outputs "invalid" for any file that contains in its name any of:
and maybe other characters. That means that for any file named, say, "año2004.dat", I can't do anything with it because DMD thinks its name is not valid. That's annoying, at least for me, because those are characters that
used all the time in Spanish and other languages, so I tend to name my files using those characters. ------------------- Carlos Santander B.
Apr 30 2004
"Carlos Santander B." <Carlos_member pathlink.com> wrote in message news:c6rbd4$1q4a$1 digitaldaemon.com...Outputs "invalid" for any file that contains in its name any of:
and maybe other characters. That means that for any file named, say, "año2004.dat", I can't do anything with it because DMD thinks its name is
valid. That's annoying, at least for me, because those are characters that
used all the time in Spanish and other languages, so I tend to name my
using those characters.
Is it possible to use a unicode text editor instead? D doesn't support code pages, relying instead on unicode.
May 02 2004
"Walter" <newshound digitalmars.com> wrote in message news:c747m1$5te$1 digitaldaemon.com | "Carlos Santander B." <Carlos_member pathlink.com> wrote in message | news:c6rbd4$1q4a$1 digitaldaemon.com... || Outputs "invalid" for any file that contains in its name any of: | áéíóúñÁÉÍÓÚÑ, || and maybe other characters. That means that for any file named, say, || "año2004.dat", I can't do anything with it because DMD thinks its name is | not || valid. That's annoying, at least for me, because those are characters that | are || used all the time in Spanish and other languages, so I tend to name my | files || using those characters. | | Is it possible to use a unicode text editor instead? D doesn't support code | pages, relying instead on unicode. I don't understand. Let's say I have a file "año" and I want to read it from my D program. D won't let me because it says it's not a valid string. At the very least I'd like to read the file, but I can't. ----------------------- Carlos Santander Bernal
May 02 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c74do5$e3t$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> wrote in message news:c747m1$5te$1 digitaldaemon.com | "Carlos Santander B." <Carlos_member pathlink.com> wrote in message | news:c6rbd4$1q4a$1 digitaldaemon.com... || Outputs "invalid" for any file that contains in its name any of: | áéíóúñÁÉÍÓÚÑ, || and maybe other characters. That means that for any file named, say, || "año2004.dat", I can't do anything with it because DMD thinks its name
| not || valid. That's annoying, at least for me, because those are characters that | are || used all the time in Spanish and other languages, so I tend to name my | files || using those characters. | | Is it possible to use a unicode text editor instead? D doesn't support code | pages, relying instead on unicode. I don't understand. Let's say I have a file "año" and I want to read it
my D program. D won't let me because it says it's not a valid string. At
very least I'd like to read the file, but I can't.
There are two ways to do international character sets - one way is using unicode, which D supports (as does Win32 with 16 bit wchar's). The other way is a horrible kludge called "code pages". I presume you are using code pages. Can you switch to using unicode?
May 02 2004
Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Walter wrote:"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c74do5$e3t$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> wrote in message news:c747m1$5te$1 digitaldaemon.com | "Carlos Santander B." <Carlos_member pathlink.com> wrote in message | news:c6rbd4$1q4a$1 digitaldaemon.com... || Outputs "invalid" for any file that contains in its name any of: | áéíóúñÁÉÍÓÚÑ, || and maybe other characters. That means that for any file named, say, || "año2004.dat", I can't do anything with it because DMD thinks its name
is| not || valid. That's annoying, at least for me, because those are characters that | are || used all the time in Spanish and other languages, so I tend to name my | files || using those characters. | | Is it possible to use a unicode text editor instead? D doesn't support code | pages, relying instead on unicode. I don't understand. Let's say I have a file "año" and I want to read it
frommy D program. D won't let me because it says it's not a valid string. At
thevery least I'd like to read the file, but I can't.
There are two ways to do international character sets - one way is using unicode, which D supports (as does Win32 with 16 bit wchar's). The other way is a horrible kludge called "code pages". I presume you are using code pages. Can you switch to using unicode?
I think Carlos Santander is trying to use a non-English character in a filename. I attached an example of what I found won't work. The compiler won't even admit the file is there. I suspect this is a moot point if the linker (written in hand-tuned assembly) wouldn't handle such a filename either. -- Justin http://jcc_7.tripod.com/d/
May 02 2004
"J C Calvarese" <jcc7 cox.net> wrote in message news:c74m7l$qk3$1 digitaldaemon.com...Walter wrote:"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c74do5$e3t$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> wrote in message news:c747m1$5te$1 digitaldaemon.com | "Carlos Santander B." <Carlos_member pathlink.com> wrote in message | news:c6rbd4$1q4a$1 digitaldaemon.com... || Outputs "invalid" for any file that contains in its name any of: | áéíóúñÁÉÍÓÚÑ, || and maybe other characters. That means that for any file named, say, || "año2004.dat", I can't do anything with it because DMD thinks its
is| not || valid. That's annoying, at least for me, because those are characters that | are || used all the time in Spanish and other languages, so I tend to name
| files || using those characters. | | Is it possible to use a unicode text editor instead? D doesn't support code | pages, relying instead on unicode. I don't understand. Let's say I have a file "año" and I want to read it
frommy D program. D won't let me because it says it's not a valid string. At
thevery least I'd like to read the file, but I can't.
There are two ways to do international character sets - one way is using unicode, which D supports (as does Win32 with 16 bit wchar's). The other
is a horrible kludge called "code pages". I presume you are using code pages. Can you switch to using unicode?
I think Carlos Santander is trying to use a non-English character in a filename. I attached an example of what I found won't work.
I think he's talking about non-english characters in D strings, not in D source code filenames.The compiler won't even admit the file is there. I suspect this is a moot point if the linker (written in hand-tuned assembly) wouldn't handle such a filename either. -- Justin http://jcc_7.tripod.com/d/
May 02 2004
In article <c7521v$1dnt$3 digitaldaemon.com>, Walter says..."J C Calvarese" <jcc7 cox.net> wrote in message news:c74m7l$qk3$1 digitaldaemon.com...Walter wrote:"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c74do5$e3t$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> wrote in message news:c747m1$5te$1 digitaldaemon.com | "Carlos Santander B." <Carlos_member pathlink.com> wrote in message | news:c6rbd4$1q4a$1 digitaldaemon.com... || Outputs "invalid" for any file that contains in its name any of: | áéíóúñÁÉÍÓÚÑ, || and maybe other characters. That means that for any file named, say, || "año2004.dat", I can't do anything with it because DMD thinks its
is| not || valid. That's annoying, at least for me, because those are characters that | are || used all the time in Spanish and other languages, so I tend to name
| files || using those characters. | | Is it possible to use a unicode text editor instead? D doesn't support code | pages, relying instead on unicode. I don't understand. Let's say I have a file "año" and I want to read it
frommy D program. D won't let me because it says it's not a valid string. At
thevery least I'd like to read the file, but I can't.
There are two ways to do international character sets - one way is using unicode, which D supports (as does Win32 with 16 bit wchar's). The other
is a horrible kludge called "code pages". I presume you are using code pages. Can you switch to using unicode?
I think Carlos Santander is trying to use a non-English character in a filename. I attached an example of what I found won't work.
I think he's talking about non-english characters in D strings, not in D source code filenames.
Oops. What was I thinking? I must have been half-asleep. Justin
May 03 2004
"Carlos Santander B." <Carlos_member pathlink.com> wrote in message
news:c6rbd4$1q4a$1 digitaldaemon.com
| This simple program:
|
| import std.file;
| import std.c.stdio;
| import std.path;
| import std.utf;
| void main() {
| char [][] archivos = listdir( curdir ) ;
| foreach ( char [] a ; archivos )
| try
| validate(a);
| catch (UtfError)
| printf("%.*s: invalid\n",a);
| }
|
| Outputs "invalid" for any file that contains in its name any of:
áéíóúñÁÉÍÓÚÑ,
| and maybe other characters. That means that for any file named, say,
| "año2004.dat", I can't do anything with it because DMD thinks its name is
not
| valid. That's annoying, at least for me, because those are characters that
are
| used all the time in Spanish and other languages, so I tend to name my
files
| using those characters.
|
Walter, did you do any changes to the compiler for 0.86 regarding UTF?
Because now I ran the same code but on WinXP Pro (previously it was in
Win98) and it works just well. I'm gonna test it again tomorrow on 98 to see
if it's the OS.
-----------------------
Carlos Santander Bernal
May 03 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c766k4$3ad$1 digitaldaemon.com...Walter, did you do any changes to the compiler for 0.86 regarding UTF? Because now I ran the same code but on WinXP Pro (previously it was in Win98) and it works just well. I'm gonna test it again tomorrow on 98 to
if it's the OS.
I don't think so.
May 03 2004
"Walter" <newshound digitalmars.com> wrote in message news:c775i2$1jo3$1 digitaldaemon.com..."Carlos Santander B." <carlos8294 msn.com> wrote in message news:c766k4$3ad$1 digitaldaemon.com...Walter, did you do any changes to the compiler for 0.86 regarding UTF? Because now I ran the same code but on WinXP Pro (previously it was in Win98) and it works just well. I'm gonna test it again tomorrow on 98 to
if it's the OS.
I think Stewart Gordon put the finger on the problem. Filenames are used as part of the name mangling, and so need to contain valid identifier characters.
May 04 2004
In article <c78kcm$p7n$3 digitaldaemon.com>, Walter says..."Walter" <newshound digitalmars.com> wrote in message news:c775i2$1jo3$1 digitaldaemon.com... I think Stewart Gordon put the finger on the problem. Filenames are used as part of the name mangling, and so need to contain valid identifier characters.
That's not the problem. I can create "año.d", and compile and link it without an itch (in both WinXP and 98). ------------------- Carlos Santander B.
May 04 2004
In article <c775i2$1jo3$1 digitaldaemon.com>, Walter says..."Carlos Santander B." <carlos8294 msn.com> wrote in message news:c766k4$3ad$1 digitaldaemon.com...Walter, did you do any changes to the compiler for 0.86 regarding UTF? Because now I ran the same code but on WinXP Pro (previously it was in Win98) and it works just well. I'm gonna test it again tomorrow on 98 to
if it's the OS.
I don't think so.
It must be. Like I said, it passed on XP, but it didn't pass on 98. The real problem, like I said, is that std.utf.validate("áéíóúÚÓÍÉÁÑñ") throws an UTFError, but it doesn't happen on XP (haven't tried on Linux). Since it doesn't seem to be a Phobos bug, is there something that can be done to fix that? ------------------- Carlos Santander B.
May 04 2004
"Carlos Santander B." <Carlos_member pathlink.com> wrote in message news:c78qhg$141i$1 digitaldaemon.com...In article <c775i2$1jo3$1 digitaldaemon.com>, Walter says..."Carlos Santander B." <carlos8294 msn.com> wrote in message news:c766k4$3ad$1 digitaldaemon.com...Walter, did you do any changes to the compiler for 0.86 regarding UTF? Because now I ran the same code but on WinXP Pro (previously it was in Win98) and it works just well. I'm gonna test it again tomorrow on 98
seeif it's the OS.
I don't think so.
It must be. Like I said, it passed on XP, but it didn't pass on 98. The real problem, like I said, is that std.utf.validate("áéíóúÚÓÍÉÁÑñ")
an UTFError, but it doesn't happen on XP (haven't tried on Linux). Since
doesn't seem to be a Phobos bug, is there something that can be done to
that?
Is the string you're passing to validate in UTF-8 format?
May 07 2004
"Walter" <newshound digitalmars.com> wrote in message
news:c7fgtd$2g2b$1 digitaldaemon.com
| "Carlos Santander B." <Carlos_member pathlink.com> wrote in message
| news:c78qhg$141i$1 digitaldaemon.com...
|| It must be. Like I said, it passed on XP, but it didn't pass on 98.
||
|| The real problem, like I said, is that std.utf.validate("áéíóúÚÓÍÉÁÑñ")
| throws
|| an UTFError, but it doesn't happen on XP (haven't tried on Linux). Since
| it
|| doesn't seem to be a Phobos bug, is there something that can be done to
| fix
|| that?
|
| Is the string you're passing to validate in UTF-8 format?
My bad there. std.utf.validate("áéíóúÚÓÍÉÁÑñ") throws "invalid UTF-8
sequence" when the file is not in UTF-8 format. However, like I've said
before, if a file is named "á", I get its name (with listdir) and pass it to
validate, it fails on Win98.
I just thought of something else: could it be the file system? My XP is
running on NTFS, but at work, 98 is on FAT32. Could that be, instead?
-----------------------
Carlos Santander Bernal
May 08 2004
I'm really lost about this thing. There are only 2 things I can think of
that are causing the problem: the OS (XP vs 98) and the filesystem (NTFS vs
FAT32).
The following file compiles and runs perfectly fine on WinXP Pro, saved
either as 8-bit or any kind of Unicode.
//-------------------------
import std.file;
import std.c.stdio;
import std.path;
import std.utf;
void main() {
char [][] archivos = listdir( curdir ) ;
foreach ( char [] a ; archivos ) {
try
validate(a);
catch (UtfError) {
printf("%.*s: inválido\n",a);
continue;
}
if (isfile(a))
printf("%.*s: %d\n",a, read(a).length);
}
}
//-------------------------
That is, it outputs the size of every file in the current directory without
any complain.
On Win98, the same happens only if the file is saved as some Unicode flavor.
If saved as 8-bit, it prints "invalid" for any file containing an accented
character or anything like that. Now, I really don't understand how in this
particular case, the format of the file affects the outcome.
I tried to test the same on Linux, but since listdir doesn't seem to work
there, I haven't been able to do it.
I don't know where the problem is, but I know it's a annoying.
-----------------------
Carlos Santander Bernal
May 10 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c7pd5l$1ohk$1 digitaldaemon.com...I'm really lost about this thing. There are only 2 things I can think of that are causing the problem: the OS (XP vs 98) and the filesystem (NTFS
FAT32). The following file compiles and runs perfectly fine on WinXP Pro, saved either as 8-bit or any kind of Unicode. //------------------------- import std.file; import std.c.stdio; import std.path; import std.utf; void main() { char [][] archivos = listdir( curdir ) ; foreach ( char [] a ; archivos ) { try validate(a); catch (UtfError) { printf("%.*s: inválido\n",a); continue; } if (isfile(a)) printf("%.*s: %d\n",a, read(a).length); } } //------------------------- That is, it outputs the size of every file in the current directory
any complain. On Win98, the same happens only if the file is saved as some Unicode
If saved as 8-bit, it prints "invalid" for any file containing an accented character or anything like that. Now, I really don't understand how in
particular case, the format of the file affects the outcome. I tried to test the same on Linux, but since listdir doesn't seem to work there, I haven't been able to do it. I don't know where the problem is, but I know it's a annoying.
The problem is that Win98 does not support unicode, *unless* the unicode can be translated into the current code page. You can see this happening in std.file.isfile(), it calls std.file.toMBSz(). That relies on WideCharToMultiByte(), a Win32 API function with limited functionality under Win9x.
May 19 2004
"Walter" <newshound digitalmars.com> escribió en el mensaje
news:c8f5n8$167q$1 digitaldaemon.com
| The problem is that Win98 does not support unicode, *unless* the unicode
can
| be translated into the current code page. You can see this happening in
| std.file.isfile(), it calls std.file.toMBSz(). That relies on
| WideCharToMultiByte(), a Win32 API function with limited functionality
under
| Win9x.
So what's the solution? Write this in every program that uses std.file
(pseudo-code, btw):
if (OS.type == "win9x")
printf("sorry, can't be run here. get nt,2k,xp,2k3,etc.\n");
?
That just doesn't make sense to me. What about modifying Phobos so things
like this don't happen?
-----------------------
Carlos Santander Bernal
May 19 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c8h3f5$1coh$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> escribió en el mensaje news:c8f5n8$167q$1 digitaldaemon.com | The problem is that Win98 does not support unicode, *unless* the unicode can | be translated into the current code page. You can see this happening in | std.file.isfile(), it calls std.file.toMBSz(). That relies on | WideCharToMultiByte(), a Win32 API function with limited functionality under | Win9x. So what's the solution? Write this in every program that uses std.file (pseudo-code, btw): if (OS.type == "win9x") printf("sorry, can't be run here. get nt,2k,xp,2k3,etc.\n"); ? That just doesn't make sense to me. What about modifying Phobos so things like this don't happen?
It will work on win9x if the unicode characters you're using are representable in the system code page you've set on win9x.
May 19 2004
"Walter" <newshound digitalmars.com> escribió en el mensaje news:c8hkht$25vt$1 digitaldaemon.com | It will work on win9x if the unicode characters you're using are | representable in the system code page you've set on win9x. But that'd be an imposition for the end user of the application, not even for the developer, wouldn't it? Please correct me if I'm wrong. But if I'm not, then there must be something better to be done. ----------------------- Carlos Santander Bernal
May 20 2004
Carlos Santander B. wrote:"Walter" <newshound digitalmars.com> escribió en el mensaje news:c8hkht$25vt$1 digitaldaemon.com | It will work on win9x if the unicode characters you're using are | representable in the system code page you've set on win9x. But that'd be an imposition for the end user of the application, not even for the developer, wouldn't it? Please correct me if I'm wrong. But if I'm not, then there must be something better to be done. ----------------------- Carlos Santander Bernal
Microsoft would probably tell you to upgrade to Windows XP because they want your money. As a free alternative, have you looked at using MSLU? http://msdn.microsoft.com/msdnmag/issues/01/10/MSLU/default.aspx http://www.microsoft.com/globaldev/handson/dev/mslu_announce.mspx There'd still be a burden for Win9X users (they'd have to install it), and I don't even know that it'd help with your specific problem, but it might be useful to you. -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
May 20 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c8jd77$gk8$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> escribió en el mensaje news:c8hkht$25vt$1 digitaldaemon.com | It will work on win9x if the unicode characters you're using are | representable in the system code page you've set on win9x. But that'd be an imposition for the end user of the application, not even for the developer, wouldn't it? Please correct me if I'm wrong. But if I'm not, then there must be something better to be done.
Actually, I think I know what the problem is. listdir() is not converting the returned filenames into unicode as it should.
May 21 2004
"Carlos Santander B." <carlos8294 msn.com> wrote in message news:c8jd77$gk8$1 digitaldaemon.com..."Walter" <newshound digitalmars.com> escribió en el mensaje news:c8hkht$25vt$1 digitaldaemon.com | It will work on win9x if the unicode characters you're using are | representable in the system code page you've set on win9x. But that'd be an imposition for the end user of the application, not even for the developer, wouldn't it? Please correct me if I'm wrong. But if I'm not, then there must be something better to be done.
Try the following std.file.d and see if it works.
May 24 2004
"Walter" <newshound digitalmars.com> escribió en el mensaje news:c8trd4$q9q$1 digitaldaemon.com | "Carlos Santander B." <carlos8294 msn.com> wrote in message | news:c8jd77$gk8$1 digitaldaemon.com... || "Walter" <newshound digitalmars.com> escribió en el mensaje || news:c8hkht$25vt$1 digitaldaemon.com ||| It will work on win9x if the unicode characters you're using are ||| representable in the system code page you've set on win9x. || || But that'd be an imposition for the end user of the application, not even || for the developer, wouldn't it? Please correct me if I'm wrong. But if I'm || not, then there must be something better to be done. | | Try the following std.file.d and see if it works. Yes, it worked. Thanks. ----------------------- Carlos Santander Bernal
May 25 2004









"Matthew" <matthew.hat stlsoft.dot.org> 