www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - YASQ - Proper way to convert byte[] <--> string

reply Steve Teale <steve.teale britseyeview.com> writes:
I have a byte[] A that contains an AJP13 packet, presumably including UTF8
strings.  I need to extract such strings and to place strings in such a buffer.
 I'm using:

string s = A[n .. m].dup;  // n and m from prefixed string length/position
return s;

to get strings, and

byte[] ba = cast(byte[]) s;
A[n .. n+ba.length] = ba[0 .. $].dup;

to put them.  Are these a) sensible, b) optimal?
Jul 11 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Steve Teale wrote:
 I have a byte[] A that contains an AJP13 packet, presumably including UTF8
strings.  I need to extract such strings and to place strings in such a buffer.
 I'm using:
 
 string s = A[n .. m].dup;  // n and m from prefixed string length/position
 return s;
 
 to get strings, and
That should work, and be optimal unless you can be sure the A array doesn't change while you still need the string (in which case the .dup is unnecessary).
 byte[] ba = cast(byte[]) s;
 A[n .. n+ba.length] = ba[0 .. $].dup;
 
 to put them.  Are these a) sensible, b) optimal?
This one should work as well, but isn't optimal; the .dup is unnecessary. This should be equivalent but more efficient: --- A[n .. n+s.length] = cast(byte[]) s; ---
Jul 12 2007
parent reply Steve Teale <steve.teale britseyeview.com> writes:
Frits van Bommel Wrote:

 Steve Teale wrote:
 I have a byte[] A that contains an AJP13 packet, presumably including UTF8
strings.  I need to extract such strings and to place strings in such a buffer.
 I'm using:
 
 string s = A[n .. m].dup;  // n and m from prefixed string length/position
 return s;
 
 to get strings, and
That should work, and be optimal unless you can be sure the A array doesn't change while you still need the string (in which case the .dup is unnecessary).
 byte[] ba = cast(byte[]) s;
 A[n .. n+ba.length] = ba[0 .. $].dup;
 
 to put them.  Are these a) sensible, b) optimal?
This one should work as well, but isn't optimal; the .dup is unnecessary. This should be equivalent but more efficient: --- A[n .. n+s.length] = cast(byte[]) s; ---
Can I use n+s.length? In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes.
Jul 12 2007
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Steve Teale wrote:
 Frits van Bommel Wrote:
 
 ---
 A[n .. n+s.length] = cast(byte[]) s;
 ---
Can I use n+s.length? In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes.
You noticed wrong... char[]s in D aren't very special, they're just specific array types that happen to be handled specially by some functions (such as writef*)[1]. The .length is the number of elements, and each element is a fixed size. A char is just a type representing a byte from UTF-8 text. --- import std.stdio; void main() { auto s = "\u0100"; writefln(s); writefln(s.length); writefln((cast(byte[])s).length); } --- Outputs a weird character (an A with a - on top) and two times the number 2. [1]: and by foreach statements as well; they can automagically extract char/wchar/dchar elements from char[]/wchar[]dchar[], in any combination.
Jul 12 2007
next sibling parent Steve Teale <steve.teale britseyeview.com> writes:
Frits van Bommel Wrote:

 Steve Teale wrote:
 Frits van Bommel Wrote:
 
 ---
 A[n .. n+s.length] = cast(byte[]) s;
 ---
Can I use n+s.length? In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes.
You noticed wrong... char[]s in D aren't very special, they're just specific array types that happen to be handled specially by some functions (such as writef*)[1]. The .length is the number of elements, and each element is a fixed size. A char is just a type representing a byte from UTF-8 text. --- import std.stdio; void main() { auto s = "\u0100"; writefln(s); writefln(s.length); writefln((cast(byte[])s).length); } --- Outputs a weird character (an A with a - on top) and two times the number 2. [1]: and by foreach statements as well; they can automagically extract char/wchar/dchar elements from char[]/wchar[]dchar[], in any combination.
You are correct, I had misinterpreted my own test program.
Jul 12 2007
prev sibling parent reply 0ffh <spam frankhirsch.net> writes:
Frits van Bommel wrote:
 Outputs a weird character (an A with a - on top) [...]
Hah, Null-A! Reading A.E. van Vogt? Regards, Frank
Jul 13 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
0ffh wrote:
 Frits van Bommel wrote:
 Outputs a weird character (an A with a - on top) [...]
Hah, Null-A! Reading A.E. van Vogt?
No, never heard of him. I just picked \u0100 because it was a round character code and it happened to be that character...
Jul 13 2007