digitalmars.D - Multibyte support on Windows, Phobos vs Tango, which is right?
- yidabu (57/57) Apr 09 2008 Multibyte support on Windows, Phobos vs Tango, which is right ?
- yidabu (26/119) Apr 10 2008 Kris,
- yidabu (55/62) Apr 10 2008 I've written a funciton to find the modules:
- Bill Baxter (14/74) Apr 10 2008 I don't think it's posssible for a logical drive to have non-ascii
Multibyte support on Windows, Phobos vs Tango, which is right ? 1 Phobos has toMBSz function for Converts the UTF-8 string s into a null-terminated string in a Windows 8-bit character set. like this: char* toMBSz(char[] s, uint codePage = 0) { // Only need to do this if any chars have the high bit set foreach (char c; s) { if (c >= 0x80) { //do convert } } return std.string.toStringz(s); } Tango has not this function, is it necessary ? 2 Is toMBSz(char[]) same as char[] ~ '\0' ? for example, FileCreateA Phobos way: char[] name; CreateFileA(toMBSz(name) ...) Tango way: char[] name; FileCreateA( name ~ '\0' ...) Is toMBSz(char[]) always same as char[] ~ '\0' ? Is toMBSz("Chinese汉语"c) always same as "Chinese汉语"c ~ '\0' ? If Phobos is right, too many bugs in Tango, Tango use char[] ~ '\0' everywhere for calling A version Windows API! 3 Phobos zip vs Tango Zip I used Phobos zip module, it works fine, a trick is zip.ArchiveMember.name should be locale encode for multibyte environment. Tango way: char[][] files = [r"D:\Chinese中文.txt"]; createArchive(r"test.zip", Method.Deflate, files); cause Exception: object.Exception: cannot encode character "20013" in codepage 437. Tango seems lacks multibyte support on Windows, and have not run special unittests for multibyte environment on Windows before publish a new vesion. -- yidabu <yidabu.nospam gmail.com> D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/
Apr 09 2008
On Wed, 9 Apr 2008 23:51:59 -0800 "Kris" <foo bar.com> wrote:Yidabu: Tango has a multi-platform API based around Unicode, thus it is not biased for windows, linux, or darwin. All the items you mention appear to be reasonably specific to Win32, so keep that in mind when reading this reply: 1) You'll find something functionally similar in tango.sys.win32.CodePage 2) Like many O/S, Tango expects file names to be Unicode. This helps makes the library portable. On Win32 the blahW() functions are used, with utf8 to utf16 conversion applied internally, except when you explicitly stipulate the version=Win32SansUnicode compiler option. If you do that, Tango currently does no internal conversion for file names. In short, if you explicitly disable Unicode support within the library then you currently problem if you're running Tango on Win95 or an old Win32S hybrid 3) you have a recent ticket open for this specific issue, and it is somewhat in a portable manner between O/S. Your ticket has identified a problem with the zip package, which does need to be fixed. Perhaps you'd like to try fixing the bug in the zip package yourself? Tango is open-source, and patches are always welcome. If you'd like to add some more multibyte testcases to the codebase, we'd certainly be happy to run them. Hope that helps "yidabu" <yidabu.nospam gmail.com> wrote in message news:20080410071434.587eb8e9.yidabu.nospam gmail.com...Kris, Thanks for you reply. 1) I know the CodePage module, the issue is Tango does not use it for conversion file names. 2) since pass (char[] ~ '\0') to Ansi Win32 API is not the right way, Why not instead of Phobos way ? Does pass toMBsz(char[]) to Ansi Win32 API influence on the library portable? Does Ansi Win32 API infulence on the library portalbe (My code is Unicode, just Ansi Win32API need local codepage encode, not me:) ? Some Tango modules only have Ansi Win32 API implementation, what Tango users can do ? copy the modue to somewhere, modify (char[] ~ '\0') to toMBSz(char[]) before use this module? 3) Since tango pass (char[] ~ '\0') to Ansi Win32 API everywhere, sometimes, it is diffcult to debug the code. Thank Tango team for the exciting Library you offered to all of us. -- yidabu <yidabu.nospam gmail.com> DWin http://www.dsource.org/projects/dwin D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/Multibyte support on Windows, Phobos vs Tango, which is right ? 1 Phobos has toMBSz function for Converts the UTF-8 string s into a null-terminated string in a Windows 8-bit character set. like this: char* toMBSz(char[] s, uint codePage = 0) { // Only need to do this if any chars have the high bit set foreach (char c; s) { if (c >= 0x80) { //do convert } } return std.string.toStringz(s); } Tango has not this function, is it necessary ? 2 Is toMBSz(char[]) same as char[] ~ '\0' ? for example, FileCreateA Phobos way: char[] name; CreateFileA(toMBSz(name) ...) Tango way: char[] name; FileCreateA( name ~ '\0' ...) Is toMBSz(char[]) always same as char[] ~ '\0' ? Is toMBSz("Chinese汉语"c) always same as "Chinese汉语"c ~ '\0' ? If Phobos is right, too many bugs in Tango, Tango use char[] ~ '\0' everywhere for calling A version Windows API! 3 Phobos zip vs Tango Zip I used Phobos zip module, it works fine, a trick is zip.ArchiveMember.name should be locale encode for multibyte environment. Tango way: char[][] files = [r"D:\Chinese中文.txt"]; createArchive(r"test.zip", Method.Deflate, files); cause Exception: object.Exception: cannot encode character "20013" in codepage 437. Tango seems lacks multibyte support on Windows, and have not run special unittests for multibyte environment on Windows before publish a new vesion.
Apr 10 2008
On Thu, 10 Apr 2008 01:23:35 -0800 "Kris" <foo bar.com> wrote:"yidabu" wrote in messageI've written a funciton to find the modules: import dwin.text.pcre.RegExp; import tango.text.Util; import tango.io.File; import tango.util.log.Trace; FileScan findAnsiWinAPI(char[] path) { auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\("); auto scan = new FileScan; scan ( path, (FilePath fp, bool isDir) { if(isDir) return true; if(fp.suffix != ".d") return false; auto content = cast(char[]) (new File(fp)).read; if(auto m = regex.execute(content)) { if(!content.containsPattern(m[1] ~ "W")) { Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W"); return true; } } return false; } ); return scan; } void main() { char[] path = r"path\to\tango\tango\"; auto fs = findAnsiWinAPI(path); } the result is : tango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains GetLogicalDriveStringsW tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileW tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains CreateFileMappingW tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/sys/Process.d contains CreateProcessA, but not contains CreateProcessW tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains LoadLibraryW -- yidabu <yidabu.nospam gmail.com> DWin http://www.dsource.org/projects/dwin D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/Some Tango modules only have Ansi Win32 API implementationIf this is true, then please write a ticket for it noting the module(s) in question
Apr 10 2008
yidabu wrote:On Thu, 10 Apr 2008 01:23:35 -0800 "Kris" <foo bar.com> wrote:You, sir (or ma'am), are hard core. And I applaud that."yidabu" wrote in messageI've written a funciton to find the modules: import dwin.text.pcre.RegExp; import tango.text.Util; import tango.io.File; import tango.util.log.Trace; FileScan findAnsiWinAPI(char[] path) { auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\("); auto scan = new FileScan; scan ( path, (FilePath fp, bool isDir) { if(isDir) return true; if(fp.suffix != ".d") return false; auto content = cast(char[]) (new File(fp)).read; if(auto m = regex.execute(content)) { if(!content.containsPattern(m[1] ~ "W")) { Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W"); return true; } } return false; } ); return scan; } void main() { char[] path = r"path\to\tango\tango\"; auto fs = findAnsiWinAPI(path); } the result is :Some Tango modules only have Ansi Win32 API implementationIf this is true, then please write a ticket for it noting the module(s) in questiontango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains GetLogicalDriveStringsWI don't think it's posssible for a logical drive to have non-ascii characters is it? So that should be ok.tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileWIt only creates a few specially named files, which are always ascii names. ("CONIN$\0", "CONOUT$\0", "CONOUT$\0")tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains CreateFileMappingWPasses null in for all string parameters, so shouldn't matter that it's just using the A version.tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but notcontains CreateSemaphoreW Ditto for these. They use null for the string params.tango/tango/sys/Process.d contains CreateProcessA, but not contains CreateProcessW*THIS* looks like it could be a genuine problem. So someone more familiar with the code should take a closer look.tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains LoadLibraryWThis looks potentially problematic too. --bb
Apr 10 2008
On Fri, 11 Apr 2008 08:44:55 +0900 Bill Baxter <dnewsgroup billbaxter.com> wrote:yidabu wrote:I'll copy your words to Tango ticket :) -- yidabu <yidabu.nospam gmail.com> DWin http://www.dsource.org/projects/dwin D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/On Thu, 10 Apr 2008 01:23:35 -0800 "Kris" <foo bar.com> wrote:You, sir (or ma'am), are hard core. And I applaud that."yidabu" wrote in messageI've written a funciton to find the modules: import dwin.text.pcre.RegExp; import tango.text.Util; import tango.io.File; import tango.util.log.Trace; FileScan findAnsiWinAPI(char[] path) { auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\("); auto scan = new FileScan; scan ( path, (FilePath fp, bool isDir) { if(isDir) return true; if(fp.suffix != ".d") return false; auto content = cast(char[]) (new File(fp)).read; if(auto m = regex.execute(content)) { if(!content.containsPattern(m[1] ~ "W")) { Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W"); return true; } } return false; } ); return scan; } void main() { char[] path = r"path\to\tango\tango\"; auto fs = findAnsiWinAPI(path); } the result is :Some Tango modules only have Ansi Win32 API implementationIf this is true, then please write a ticket for it noting the module(s) in questiontango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains GetLogicalDriveStringsWI don't think it's posssible for a logical drive to have non-ascii characters is it? So that should be ok.tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileWIt only creates a few specially named files, which are always ascii names. ("CONIN$\0", "CONOUT$\0", "CONOUT$\0")tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains CreateFileMappingWPasses null in for all string parameters, so shouldn't matter that it's just using the A version.tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but notcontains CreateSemaphoreW Ditto for these. They use null for the string params.tango/tango/sys/Process.d contains CreateProcessA, but not contains CreateProcessW*THIS* looks like it could be a genuine problem. So someone more familiar with the code should take a closer look.tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains LoadLibraryWThis looks potentially problematic too. --bb
Apr 11 2008
On Fri, 11 Apr 2008 19:57:48 +0800 yidabu <yidabu.nospam gmail.com> wrote:On Fri, 11 Apr 2008 08:44:55 +0900 Bill Baxter <dnewsgroup billbaxter.com> wrote:ticket for this: http://www.dsource.org/projects/tango/ticket/1035 -- yidabu <yidabu.nospam gmail.com> DWin http://www.dsource.org/projects/dwin D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/yidabu wrote:I'll copy your words to Tango ticket :)On Thu, 10 Apr 2008 01:23:35 -0800 "Kris" <foo bar.com> wrote:You, sir (or ma'am), are hard core. And I applaud that."yidabu" wrote in messageI've written a funciton to find the modules: import dwin.text.pcre.RegExp; import tango.text.Util; import tango.io.File; import tango.util.log.Trace; FileScan findAnsiWinAPI(char[] path) { auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\("); auto scan = new FileScan; scan ( path, (FilePath fp, bool isDir) { if(isDir) return true; if(fp.suffix != ".d") return false; auto content = cast(char[]) (new File(fp)).read; if(auto m = regex.execute(content)) { if(!content.containsPattern(m[1] ~ "W")) { Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W"); return true; } } return false; } ); return scan; } void main() { char[] path = r"path\to\tango\tango\"; auto fs = findAnsiWinAPI(path); } the result is :Some Tango modules only have Ansi Win32 API implementationIf this is true, then please write a ticket for it noting the module(s) in questiontango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains GetLogicalDriveStringsWI don't think it's posssible for a logical drive to have non-ascii characters is it? So that should be ok.tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileWIt only creates a few specially named files, which are always ascii names. ("CONIN$\0", "CONOUT$\0", "CONOUT$\0")tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains CreateFileMappingWPasses null in for all string parameters, so shouldn't matter that it's just using the A version.tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but notcontains CreateSemaphoreW Ditto for these. They use null for the string params.tango/tango/sys/Process.d contains CreateProcessA, but not contains CreateProcessW*THIS* looks like it could be a genuine problem. So someone more familiar with the code should take a closer look.tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains LoadLibraryWThis looks potentially problematic too. --bb
Apr 11 2008