digitalmars.D.learn - recognizing asciiz, utf ...

newbee (4/4) Mar 13 2009 Hi all,

Jarrett Billingsley (10/14) Mar 13 2009 =92t know if it is asciiz or utf or wchar. Is it possible to find out in...
newbee (2/18) Mar 13 2009 i get it from a tcp buffer and do not know in advace if it is char[], as...

Daniel Keep (5/25) Mar 13 2009 If you're getting data from a network connection and you have no idea

Sergey Gromov (10/14) Mar 14 2009 There is some redundancy in UTF-8 format so you can test if your string

newbee (2/20) Mar 15 2009 thank you kindly. this explanation really helped me. i will try that.

Sergey Gromov (4/26) Mar 15 2009 You're welcome.

newbee <newbee newbee.com> writes:

Hi all,

How does one check for asciiz, utf ...?
I do get a buffer with characters as parameter in a function, but i don�t know
if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and dmd2?

Any help is appreciated.

Mar 13 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Fri, Mar 13, 2009 at 3:04 PM, newbee <newbee newbee.com> wrote:
 Hi all,

 How does one check for asciiz, utf ...?
 I do get a buffer with characters as parameter in a function, but i don=

=92t know if it is asciiz or utf or wchar. Is it possible to find out in dm=
d1 and dmd2?
 Any help is appreciated.

How are you getting this buffer?  What type is it, char[]?  D strings
are supposed to be Unicode, always.  If you read the data in from a
file, there's little to no guarantee as to what encoding it is (unless
it started with a Unicode BOM).

If you have a zero-terminated char* that a C function gives you, you
can turn it into a D string with std.string.toString (Phobos) or
tango.stdc.stringz.fromStringz (Tango).

Mar 13 2009

newbee <newbee newbee.com> writes:

Jarrett Billingsley Wrote:

 On Fri, Mar 13, 2009 at 3:04 PM, newbee <newbee newbee.com> wrote:
 Hi all,

 How does one check for asciiz, utf ...?
 I do get a buffer with characters as parameter in a function, but i don�t know
if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and dmd2?

 Any help is appreciated.

 
 How are you getting this buffer?  What type is it, char[]?  D strings
 are supposed to be Unicode, always.  If you read the data in from a
 file, there's little to no guarantee as to what encoding it is (unless
 it started with a Unicode BOM).
 
 If you have a zero-terminated char* that a C function gives you, you
 can turn it into a D string with std.string.toString (Phobos) or
 tango.stdc.stringz.fromStringz (Tango).


i get it from a tcp buffer and do not know in advace if it is char[], asciiz or
wchar. is it possible to check for that?

Mar 13 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

newbee wrote:
 Jarrett Billingsley Wrote:
 
 On Fri, Mar 13, 2009 at 3:04 PM, newbee <newbee newbee.com> wrote:
 Hi all,

 How does one check for asciiz, utf ...?
 I do get a buffer with characters as parameter in a function, but i don�t
know if it is asciiz or utf or wchar. Is it possible to find out in dmd1 and
dmd2?

 Any help is appreciated.

 How are you getting this buffer?  What type is it, char[]?  D strings
 are supposed to be Unicode, always.  If you read the data in from a
 file, there's little to no guarantee as to what encoding it is (unless
 it started with a Unicode BOM).

 If you have a zero-terminated char* that a C function gives you, you
 can turn it into a D string with std.string.toString (Phobos) or
 tango.stdc.stringz.fromStringz (Tango).

 
 
 i get it from a tcp buffer and do not know in advace if it is char[], asciiz
or wchar. is it possible to check for that?

If you're getting data from a network connection and you have no idea
what it is, then the language certainly isn't going to help you with that.

Perhaps reading the documentation for the network protocol is in order? :P

  -- Daniel

Mar 13 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Fri, 13 Mar 2009 15:04:12 -0400, newbee wrote:

 How does one check for asciiz, utf ...? 
 I do get a buffer with characters as parameter in a function, but i
 don�t know if it is asciiz or utf or wchar. Is it possible to find
 out in dmd1 and dmd2? 

There is some redundancy in UTF-8 format so you can test if your string
is a valid UTF-8 string.  There is std.utf.validate() for you.  Any
ASCII string will also pass since ASCII is a special case of UTF-8.

Not all code points are defined in Unicode.  This means you can cast
your string to wchar[] and then test every char using the
std.utf.isValidDchar() function.  If it fails, then you definitely not
dealing with a valid wchar[] string, so test dchar[] similarly.

Be prepared though that these tests will sometimes give you false
positives.

Mar 14 2009

newbee <newbee newbee.com> writes:

Sergey Gromov Wrote:

 Fri, 13 Mar 2009 15:04:12 -0400, newbee wrote:
 
 How does one check for asciiz, utf ...? 
 I do get a buffer with characters as parameter in a function, but i
 don�t know if it is asciiz or utf or wchar. Is it possible to find
 out in dmd1 and dmd2? 

 
 There is some redundancy in UTF-8 format so you can test if your string
 is a valid UTF-8 string.  There is std.utf.validate() for you.  Any
 ASCII string will also pass since ASCII is a special case of UTF-8.
 
 Not all code points are defined in Unicode.  This means you can cast
 your string to wchar[] and then test every char using the
 std.utf.isValidDchar() function.  If it fails, then you definitely not
 dealing with a valid wchar[] string, so test dchar[] similarly.
 
 Be prepared though that these tests will sometimes give you false
 positives.

thank you kindly. this explanation really helped me. i will try that.

Mar 15 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Sun, 15 Mar 2009 05:20:08 -0400, newbee wrote:

 Sergey Gromov Wrote:
 
 Fri, 13 Mar 2009 15:04:12 -0400, newbee wrote:
 
 How does one check for asciiz, utf ...? 
 I do get a buffer with characters as parameter in a function, but i
 don�t know if it is asciiz or utf or wchar. Is it possible to find
 out in dmd1 and dmd2? 

 
 There is some redundancy in UTF-8 format so you can test if your string
 is a valid UTF-8 string.  There is std.utf.validate() for you.  Any
 ASCII string will also pass since ASCII is a special case of UTF-8.
 
 Not all code points are defined in Unicode.  This means you can cast
 your string to wchar[] and then test every char using the
 std.utf.isValidDchar() function.  If it fails, then you definitely not
 dealing with a valid wchar[] string, so test dchar[] similarly.
 
 Be prepared though that these tests will sometimes give you false
 positives.

 
 thank you kindly. this explanation really helped me. i will try that.

You're welcome.

I just realized that there are wchar[] and dchar[] versions of
std.utf.validate().  This should make your test really straight-forward.

Mar 15 2009

D Programming

C/C++ Programming

Other

digitalmars.D.learn - recognizing asciiz, utf ...