www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - std.file and non-English filename in Windows

reply Domain <dont_email empty.com> writes:
In Windows, exists, rename, copy will report file not exists when 
you input non-English filename, such as Chinese 中文.txt
Dec 31 2017
next sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Sunday, 31 December 2017 at 18:21:29 UTC, Domain wrote:
 In Windows, exists, rename, copy will report file not exists 
 when you input non-English filename, such as Chinese 中文.txt
It's unclear what your problem is but here a wild guess. Windows API's for Unicode use UTF-16 as far as I know. Strings in D are utf-8. So before calling win32 API function, they have to be transformed to wstring i.e. utf-16 strings.
Jan 01
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Monday, January 01, 2018 10:47:51 Patrick Schluter via Digitalmars-d-
learn wrote:
 On Sunday, 31 December 2017 at 18:21:29 UTC, Domain wrote:
 In Windows, exists, rename, copy will report file not exists
 when you input non-English filename, such as Chinese 中文.txt
It's unclear what your problem is but here a wild guess. Windows API's for Unicode use UTF-16 as far as I know. Strings in D are utf-8. So before calling win32 API function, they have to be transformed to wstring i.e. utf-16 strings.
std.file abstracts all of that away for you, and it does have at least some tests that use Unicode characters, though I think that most of the functions don't have tests that use Unicode characters. I would not have expected a Unicode bug like this to be in std.file, but it's certainly possible. It's also possible that the console needs to be set to UTF-8 or UTF-16 or something, since the default often seems to cause problems for folks - though unless the file names are coming from the command-line, I wouldn't have expected that to be an issue. I do almost nothing with Windows though, so I'm not very familiar with the ins and outs of that mess. - Jonathan M Davis
Jan 01
prev sibling next sibling parent reply John Chapman <johnch_atms hotmail.com> writes:
On Sunday, 31 December 2017 at 18:21:29 UTC, Domain wrote:
 In Windows, exists, rename, copy will report file not exists 
 when you input non-English filename, such as Chinese 中文.txt
Works for me. I created a file with the name "中文.txt" and std.file.exists returned true. Is your D source file saved in ASCII by any chance? Try saving it with a different encoding, such as UTF8.
Jan 01
parent reply Domain <dont_email empty.com> writes:
On Monday, 1 January 2018 at 12:33:27 UTC, John Chapman wrote:
 On Sunday, 31 December 2017 at 18:21:29 UTC, Domain wrote:
 In Windows, exists, rename, copy will report file not exists 
 when you input non-English filename, such as Chinese 中文.txt
Works for me. I created a file with the name "中文.txt" and std.file.exists returned true. Is your D source file saved in ASCII by any chance? Try saving it with a different encoding, such as UTF8.
Yes, "中文.txt".exists return true. But when then filename read from stdin, it return false stdin .byLineCopy(No.keepTerminator) .each!((a) { writefln("%s --> %s", a, a.exists); }); dir *.txt /b | test.exe English.txt --> true 中文.txt --> false
Jan 01
parent Domain <dont_email empty.com> writes:
On Monday, 1 January 2018 at 16:13:06 UTC, Domain wrote:
 On Monday, 1 January 2018 at 12:33:27 UTC, John Chapman wrote:
 On Sunday, 31 December 2017 at 18:21:29 UTC, Domain wrote:
 In Windows, exists, rename, copy will report file not exists 
 when you input non-English filename, such as Chinese 中文.txt
Works for me. I created a file with the name "中文.txt" and std.file.exists returned true. Is your D source file saved in ASCII by any chance? Try saving it with a different encoding, such as UTF8.
Yes, "中文.txt".exists return true. But when then filename read from stdin, it return false stdin .byLineCopy(No.keepTerminator) .each!((a) { writefln("%s --> %s", a, a.exists); }); dir *.txt /b | test.exe English.txt --> true 中文.txt --> false
Problem solved! I change the properties of cmd from "Raster Fonts" to "Consolas" and all work well. But I don't know why.
Jan 01
prev sibling parent tipdbmp <email example.com> writes:
I think you have to decode your input to UTF-8.

stdin
.byLineCopy(No.keepTerminator)
.each!((string file_name_raw) {

     // change Latin1String to the code page of your console;
     // use the 'chcp' command to see the current code page of 
your console
     //
     import std.encoding;
     auto raw = cast(immutable( Latin1String)[]) file_name_raw;
     string file_name_utf8;
     transcode(raw, file_name_utf8);

     writefln("%s --> %s", file_name_utf8, file_name_utf8.exists);
});
Jan 01