www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - recognizing asciiz, utf ...

reply newbee <newbee newbee.com> writes:
Hi all,

How does one check for asciiz, utf ...?
I do get a buffer with characters as parameter in a function, but i don’t know
if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and dmd2?

Any help is appreciated.
Mar 13 2009
next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Fri, Mar 13, 2009 at 3:04 PM, newbee <newbee newbee.com> wrote:
 Hi all,

 How does one check for asciiz, utf ...?
 I do get a buffer with characters as parameter in a function, but i don=
=92t know if it is asciiz or utf or wchar. Is it possible to find out in dm= d1 and dmd2?
 Any help is appreciated.
How are you getting this buffer? What type is it, char[]? D strings are supposed to be Unicode, always. If you read the data in from a file, there's little to no guarantee as to what encoding it is (unless it started with a Unicode BOM). If you have a zero-terminated char* that a C function gives you, you can turn it into a D string with std.string.toString (Phobos) or tango.stdc.stringz.fromStringz (Tango).
Mar 13 2009
prev sibling next sibling parent reply newbee <newbee newbee.com> writes:
Jarrett Billingsley Wrote:

 On Fri, Mar 13, 2009 at 3:04 PM, newbee <newbee newbee.com> wrote:
 Hi all,

 How does one check for asciiz, utf ...?
 I do get a buffer with characters as parameter in a function, but i don’t know
if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and dmd2?

 Any help is appreciated.
How are you getting this buffer? What type is it, char[]? D strings are supposed to be Unicode, always. If you read the data in from a file, there's little to no guarantee as to what encoding it is (unless it started with a Unicode BOM). If you have a zero-terminated char* that a C function gives you, you can turn it into a D string with std.string.toString (Phobos) or tango.stdc.stringz.fromStringz (Tango).
i get it from a tcp buffer and do not know in advace if it is char[], asciiz or wchar. is it possible to check for that?
Mar 13 2009
parent Daniel Keep <daniel.keep.lists gmail.com> writes:
newbee wrote:
 Jarrett Billingsley Wrote:
 
 On Fri, Mar 13, 2009 at 3:04 PM, newbee <newbee newbee.com> wrote:
 Hi all,

 How does one check for asciiz, utf ...?
 I do get a buffer with characters as parameter in a function, but i don�t
know if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and
dmd2?

 Any help is appreciated.
How are you getting this buffer? What type is it, char[]? D strings are supposed to be Unicode, always. If you read the data in from a file, there's little to no guarantee as to what encoding it is (unless it started with a Unicode BOM). If you have a zero-terminated char* that a C function gives you, you can turn it into a D string with std.string.toString (Phobos) or tango.stdc.stringz.fromStringz (Tango).
i get it from a tcp buffer and do not know in advace if it is char[], asciiz or wchar. is it possible to check for that?
If you're getting data from a network connection and you have no idea what it is, then the language certainly isn't going to help you with that. Perhaps reading the documentation for the network protocol is in order? :P -- Daniel
Mar 13 2009
prev sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Fri, 13 Mar 2009 15:04:12 -0400, newbee wrote:

 How does one check for asciiz, utf ...? 
 I do get a buffer with characters as parameter in a function, but i
 don’t know if it is asciiz or utf or wchar. Is it possible to find
 out in dmd1 and dmd2? 
There is some redundancy in UTF-8 format so you can test if your string is a valid UTF-8 string. There is std.utf.validate() for you. Any ASCII string will also pass since ASCII is a special case of UTF-8. Not all code points are defined in Unicode. This means you can cast your string to wchar[] and then test every char using the std.utf.isValidDchar() function. If it fails, then you definitely not dealing with a valid wchar[] string, so test dchar[] similarly. Be prepared though that these tests will sometimes give you false positives.
Mar 14 2009
parent reply newbee <newbee newbee.com> writes:
Sergey Gromov Wrote:

 Fri, 13 Mar 2009 15:04:12 -0400, newbee wrote:
 
 How does one check for asciiz, utf ...? 
 I do get a buffer with characters as parameter in a function, but i
 don�t know if it is asciiz or utf or wchar. Is it possible to find
 out in dmd1 and dmd2? 
There is some redundancy in UTF-8 format so you can test if your string is a valid UTF-8 string. There is std.utf.validate() for you. Any ASCII string will also pass since ASCII is a special case of UTF-8. Not all code points are defined in Unicode. This means you can cast your string to wchar[] and then test every char using the std.utf.isValidDchar() function. If it fails, then you definitely not dealing with a valid wchar[] string, so test dchar[] similarly. Be prepared though that these tests will sometimes give you false positives.
thank you kindly. this explanation really helped me. i will try that.
Mar 15 2009
parent Sergey Gromov <snake.scaly gmail.com> writes:
Sun, 15 Mar 2009 05:20:08 -0400, newbee wrote:

 Sergey Gromov Wrote:
 
 Fri, 13 Mar 2009 15:04:12 -0400, newbee wrote:
 
 How does one check for asciiz, utf ...? 
 I do get a buffer with characters as parameter in a function, but i
 don�t know if it is asciiz or utf or wchar. Is it possible to find
 out in dmd1 and dmd2? 
There is some redundancy in UTF-8 format so you can test if your string is a valid UTF-8 string. There is std.utf.validate() for you. Any ASCII string will also pass since ASCII is a special case of UTF-8. Not all code points are defined in Unicode. This means you can cast your string to wchar[] and then test every char using the std.utf.isValidDchar() function. If it fails, then you definitely not dealing with a valid wchar[] string, so test dchar[] similarly. Be prepared though that these tests will sometimes give you false positives.
thank you kindly. this explanation really helped me. i will try that.
You're welcome. I just realized that there are wchar[] and dchar[] versions of std.utf.validate(). This should make your test really straight-forward.
Mar 15 2009