www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Multibyte support on Windows, Phobos vs Tango, which is right?

reply yidabu <yidabu.nospam gmail.com> writes:
Multibyte support on Windows, Phobos vs Tango, which is right ?

1  Phobos has toMBSz function for Converts the UTF-8 string s into a
null-terminated string in a Windows
   8-bit character set.
   like this:
   
    char* toMBSz(char[] s, uint codePage = 0)
    {
        // Only need to do this if any chars have the high bit set
        foreach (char c; s)
        {
            if (c >= 0x80)
            {
                //do convert
            }
        }
        return std.string.toStringz(s);
    }
   
   Tango has not this function, is it necessary ?
   
2  Is toMBSz(char[]) same as char[] ~ '\0' ?

    for example, FileCreateA
    
    Phobos way:
    char[] name;
    CreateFileA(toMBSz(name) ...)
    
    Tango way:
    char[] name;
    FileCreateA( name ~ '\0' ...)
    
    Is toMBSz(char[]) always same as char[] ~ '\0' ?
    Is toMBSz("Chinese汉语"c) always same as "Chinese汉语"c ~ '\0' ?
    
    If Phobos is right, too many bugs in Tango, Tango use char[] ~ '\0'
everywhere for calling A version Windows API!
    
    
3   Phobos zip vs Tango Zip

    I used Phobos zip module, it works fine, a trick is zip.ArchiveMember.name
should be locale encode for multibyte environment.
    
    Tango way:
    char[][] files = [r"D:\Chinese中文.txt"];
    createArchive(r"test.zip", Method.Deflate, files);
    
    cause Exception: 
    object.Exception: cannot encode character "20013" in codepage 437.

    Tango seems lacks multibyte support on Windows, 
    and have not run special unittests for multibyte environment on Windows
before publish a new vesion.

    
    

-- 
yidabu <yidabu.nospam gmail.com>
D语言 中文支持(D Chinese Support)
http://www.d-programming-language-china.org/
http://bbs.d-programming-language-china.org/
http://dwin.d-programming-language-china.org/
http://scite4d.d-programming-language-china.org/
Apr 09 2008
next sibling parent reply yidabu <yidabu.nospam gmail.com> writes:
On Wed, 9 Apr 2008 23:51:59 -0800
"Kris" <foo bar.com> wrote:

 Yidabu:
 
 Tango has a multi-platform API based around Unicode, thus it is not biased 
 for windows, linux, or darwin. All the items you mention appear to be 
 reasonably specific to Win32, so keep that in mind when reading this reply:
 
 
 1) You'll find something functionally similar in tango.sys.win32.CodePage
 
 
 2) Like many O/S, Tango expects file names to be Unicode. This helps makes 
 the library portable. On Win32 the blahW() functions are used, with utf8 to 
 utf16 conversion applied internally, except when you explicitly stipulate 
 the version=Win32SansUnicode compiler option. If you do that, Tango 
 currently does no internal conversion for file names. In short, if you 
 explicitly disable Unicode support within the library then you currently 
 need to handle Win32 code-page conversion yourself (see #1). This might be a 
 problem if you're running Tango on Win95 or an old Win32S hybrid
 
 
 3) you have a recent ticket open for this specific issue, and it is somewhat 
 related to #2 above. By default, Tango should happily handle Unicode names 
 in a portable manner between O/S. Your ticket has identified a problem with 
 the zip package, which does need to be fixed. Perhaps you'd like to try 
 fixing the bug in the zip package yourself? Tango is open-source, and 
 patches are always welcome. If you'd like to add some more multibyte 
 testcases to the codebase, we'd certainly be happy to run them.
 
 
 Hope that helps
 
 
 
 
 "yidabu" <yidabu.nospam gmail.com> wrote in message 
 news:20080410071434.587eb8e9.yidabu.nospam gmail.com...
 Multibyte support on Windows, Phobos vs Tango, which is right ?

 1  Phobos has toMBSz function for Converts the UTF-8 string s into a 
 null-terminated string in a Windows
   8-bit character set.
   like this:

    char* toMBSz(char[] s, uint codePage = 0)
    {
        // Only need to do this if any chars have the high bit set
        foreach (char c; s)
        {
            if (c >= 0x80)
            {
                //do convert
            }
        }
        return std.string.toStringz(s);
    }

   Tango has not this function, is it necessary ?

 2  Is toMBSz(char[]) same as char[] ~ '\0' ?

    for example, FileCreateA

    Phobos way:
    char[] name;
    CreateFileA(toMBSz(name) ...)

    Tango way:
    char[] name;
    FileCreateA( name ~ '\0' ...)

    Is toMBSz(char[]) always same as char[] ~ '\0' ?
    Is toMBSz("Chinese汉语"c) always same as "Chinese汉语"c ~ '\0' ?

    If Phobos is right, too many bugs in Tango, Tango use char[] ~ '\0' 
 everywhere for calling A version Windows API!


 3   Phobos zip vs Tango Zip

    I used Phobos zip module, it works fine, a trick is 
 zip.ArchiveMember.name should be locale encode for multibyte environment.

    Tango way:
    char[][] files = [r"D:\Chinese中文.txt"];
    createArchive(r"test.zip", Method.Deflate, files);

    cause Exception:
    object.Exception: cannot encode character "20013" in codepage 437.

    Tango seems lacks multibyte support on Windows,
    and have not run special unittests for multibyte environment on Windows 
 before publish a new vesion.


Kris, Thanks for you reply. 1) I know the CodePage module, the issue is Tango does not use it for conversion file names. 2) since pass (char[] ~ '\0') to Ansi Win32 API is not the right way, Why not instead of Phobos way ? Does pass toMBsz(char[]) to Ansi Win32 API influence on the library portable? Does Ansi Win32 API infulence on the library portalbe (My code is Unicode, just Ansi Win32API need local codepage encode, not me:) ? Some Tango modules only have Ansi Win32 API implementation, what Tango users can do ? copy the modue to somewhere, modify (char[] ~ '\0') to toMBSz(char[]) before use this module? 3) Since tango pass (char[] ~ '\0') to Ansi Win32 API everywhere, sometimes, it is diffcult to debug the code. Thank Tango team for the exciting Library you offered to all of us. -- yidabu <yidabu.nospam gmail.com> DWin http://www.dsource.org/projects/dwin D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/
Apr 10 2008
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
yidabu wrote:
 On Thu, 10 Apr 2008 01:23:35 -0800
 "Kris" <foo bar.com> wrote:
 
 "yidabu" wrote in message

 Some Tango modules only have Ansi Win32 API implementation

question

I've written a funciton to find the modules: import dwin.text.pcre.RegExp; import tango.text.Util; import tango.io.File; import tango.util.log.Trace; FileScan findAnsiWinAPI(char[] path) { auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\("); auto scan = new FileScan; scan ( path, (FilePath fp, bool isDir) { if(isDir) return true; if(fp.suffix != ".d") return false; auto content = cast(char[]) (new File(fp)).read; if(auto m = regex.execute(content)) { if(!content.containsPattern(m[1] ~ "W")) { Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W"); return true; } } return false; } ); return scan; } void main() { char[] path = r"path\to\tango\tango\"; auto fs = findAnsiWinAPI(path); } the result is :

You, sir (or ma'am), are hard core. And I applaud that.
 tango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains
GetLogicalDriveStringsW

I don't think it's posssible for a logical drive to have non-ascii characters is it? So that should be ok.
 tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileW

It only creates a few specially named files, which are always ascii names. ("CONIN$\0", "CONOUT$\0", "CONOUT$\0")
 tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains
CreateFileMappingW

Passes null in for all string parameters, so shouldn't matter that it's just using the A version.
 tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains
CreateSemaphoreW
 tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but not

Ditto for these. They use null for the string params.
 tango/tango/sys/Process.d contains CreateProcessA, but not contains
CreateProcessW

*THIS* looks like it could be a genuine problem. So someone more familiar with the code should take a closer look.
 tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains
LoadLibraryW

This looks potentially problematic too. --bb
Apr 10 2008
prev sibling next sibling parent yidabu <yidabu.nospam gmail.com> writes:
On Thu, 10 Apr 2008 01:23:35 -0800
"Kris" <foo bar.com> wrote:

 
 "yidabu" wrote in message
 
 Some Tango modules only have Ansi Win32 API implementation

If this is true, then please write a ticket for it noting the module(s) in question

I've written a funciton to find the modules: import dwin.text.pcre.RegExp; import tango.text.Util; import tango.io.File; import tango.util.log.Trace; FileScan findAnsiWinAPI(char[] path) { auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\("); auto scan = new FileScan; scan ( path, (FilePath fp, bool isDir) { if(isDir) return true; if(fp.suffix != ".d") return false; auto content = cast(char[]) (new File(fp)).read; if(auto m = regex.execute(content)) { if(!content.containsPattern(m[1] ~ "W")) { Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W"); return true; } } return false; } ); return scan; } void main() { char[] path = r"path\to\tango\tango\"; auto fs = findAnsiWinAPI(path); } the result is : tango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains GetLogicalDriveStringsW tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileW tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains CreateFileMappingW tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but not contains CreateSemaphoreW tango/tango/sys/Process.d contains CreateProcessA, but not contains CreateProcessW tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains LoadLibraryW -- yidabu <yidabu.nospam gmail.com> DWin http://www.dsource.org/projects/dwin D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/
Apr 10 2008
prev sibling next sibling parent yidabu <yidabu.nospam gmail.com> writes:
On Fri, 11 Apr 2008 08:44:55 +0900
Bill Baxter <dnewsgroup billbaxter.com> wrote:

 yidabu wrote:
 On Thu, 10 Apr 2008 01:23:35 -0800
 "Kris" <foo bar.com> wrote:
 
 "yidabu" wrote in message

 Some Tango modules only have Ansi Win32 API implementation

question

I've written a funciton to find the modules: import dwin.text.pcre.RegExp; import tango.text.Util; import tango.io.File; import tango.util.log.Trace; FileScan findAnsiWinAPI(char[] path) { auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\("); auto scan = new FileScan; scan ( path, (FilePath fp, bool isDir) { if(isDir) return true; if(fp.suffix != ".d") return false; auto content = cast(char[]) (new File(fp)).read; if(auto m = regex.execute(content)) { if(!content.containsPattern(m[1] ~ "W")) { Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W"); return true; } } return false; } ); return scan; } void main() { char[] path = r"path\to\tango\tango\"; auto fs = findAnsiWinAPI(path); } the result is :

You, sir (or ma'am), are hard core. And I applaud that.
 tango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains
GetLogicalDriveStringsW

I don't think it's posssible for a logical drive to have non-ascii characters is it? So that should be ok.
 tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileW

It only creates a few specially named files, which are always ascii names. ("CONIN$\0", "CONOUT$\0", "CONOUT$\0")
 tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains
CreateFileMappingW

Passes null in for all string parameters, so shouldn't matter that it's just using the A version.
 tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains
CreateSemaphoreW
 tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but not

Ditto for these. They use null for the string params.
 tango/tango/sys/Process.d contains CreateProcessA, but not contains
CreateProcessW

*THIS* looks like it could be a genuine problem. So someone more familiar with the code should take a closer look.
 tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains
LoadLibraryW

This looks potentially problematic too. --bb

I'll copy your words to Tango ticket :) -- yidabu <yidabu.nospam gmail.com> DWin http://www.dsource.org/projects/dwin D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/
Apr 11 2008
prev sibling parent yidabu <yidabu.nospam gmail.com> writes:
On Fri, 11 Apr 2008 19:57:48 +0800
yidabu <yidabu.nospam gmail.com> wrote:

 On Fri, 11 Apr 2008 08:44:55 +0900
 Bill Baxter <dnewsgroup billbaxter.com> wrote:
 
 yidabu wrote:
 On Thu, 10 Apr 2008 01:23:35 -0800
 "Kris" <foo bar.com> wrote:
 
 "yidabu" wrote in message

 Some Tango modules only have Ansi Win32 API implementation

question

I've written a funciton to find the modules: import dwin.text.pcre.RegExp; import tango.text.Util; import tango.io.File; import tango.util.log.Trace; FileScan findAnsiWinAPI(char[] path) { auto regex = RegExp(r"\b([A-Z][a-z][a-zA-Z]+?)A\b\s*\("); auto scan = new FileScan; scan ( path, (FilePath fp, bool isDir) { if(isDir) return true; if(fp.suffix != ".d") return false; auto content = cast(char[]) (new File(fp)).read; if(auto m = regex.execute(content)) { if(!content.containsPattern(m[1] ~ "W")) { Trace.formatln("{} contains {}, but not contains {}", fp.toString, m[1] ~ "A", m[1] ~ "W"); return true; } } return false; } ); return scan; } void main() { char[] path = r"path\to\tango\tango\"; auto fs = findAnsiWinAPI(path); } the result is :

You, sir (or ma'am), are hard core. And I applaud that.
 tango/tango/io/FileRoots.d contains GetLogicalDriveStringsA, but not contains
GetLogicalDriveStringsW

I don't think it's posssible for a logical drive to have non-ascii characters is it? So that should be ok.
 tango/tango/io/Console.d contains CreateFileA, but not contains CreateFileW

It only creates a few specially named files, which are always ascii names. ("CONIN$\0", "CONOUT$\0", "CONOUT$\0")
 tango/tango/io/MappedBuffer.d contains CreateFileMappingA, but not contains
CreateFileMappingW

Passes null in for all string parameters, so shouldn't matter that it's just using the A version.
 tango/tango/core/sync/Semaphore.d contains CreateSemaphoreA, but not contains
CreateSemaphoreW
 tango/tango/core/sync/Condition.d contains CreateSemaphoreA, but not

Ditto for these. They use null for the string params.
 tango/tango/sys/Process.d contains CreateProcessA, but not contains
CreateProcessW

*THIS* looks like it could be a genuine problem. So someone more familiar with the code should take a closer look.
 tango/tango/sys/SharedLib.d contains LoadLibraryA, but not contains
LoadLibraryW

This looks potentially problematic too. --bb

I'll copy your words to Tango ticket :)

ticket for this: http://www.dsource.org/projects/tango/ticket/1035 -- yidabu <yidabu.nospam gmail.com> DWin http://www.dsource.org/projects/dwin D语言 中文支持(D Chinese Support) http://www.d-programming-language-china.org/ http://bbs.d-programming-language-china.org/ http://dwin.d-programming-language-china.org/ http://scite4d.d-programming-language-china.org/
Apr 11 2008