digitalmars.D - YASQ - Proper way to convert byte[] <--> string

Steve Teale (7/7) Jul 11 2007 I have a byte[] A that contains an AJP13 packet, presumably including UT...

Frits van Bommel (9/19) Jul 12 2007 That should work, and be optimal unless you can be sure the A array

Steve Teale (2/24) Jul 12 2007 Can I use n+s.length? In my experimentation i noticed that a UTF8 strin...

Frits van Bommel (18/25) Jul 12 2007 You noticed wrong...

Steve Teale (2/31) Jul 12 2007 You are correct, I had misinterpreted my own test program.
0ffh (3/4) Jul 13 2007 Hah, Null-A! Reading A.E. van Vogt?

Frits van Bommel (3/7) Jul 13 2007 No, never heard of him. I just picked \u0100 because it was a round

Steve Teale <steve.teale britseyeview.com> writes:

I have a byte[] A that contains an AJP13 packet, presumably including UTF8
strings.  I need to extract such strings and to place strings in such a buffer.
 I'm using:

string s = A[n .. m].dup;  // n and m from prefixed string length/position
return s;

to get strings, and

byte[] ba = cast(byte[]) s;
A[n .. n+ba.length] = ba[0 .. $].dup;

to put them.  Are these a) sensible, b) optimal?

Jul 11 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Steve Teale wrote:
 I have a byte[] A that contains an AJP13 packet, presumably including UTF8
strings.  I need to extract such strings and to place strings in such a buffer.
 I'm using:
 
 string s = A[n .. m].dup;  // n and m from prefixed string length/position
 return s;
 
 to get strings, and

That should work, and be optimal unless you can be sure the A array 
doesn't change while you still need the string (in which case the .dup 
is unnecessary).

 byte[] ba = cast(byte[]) s;
 A[n .. n+ba.length] = ba[0 .. $].dup;
 
 to put them.  Are these a) sensible, b) optimal?

This one should work as well, but isn't optimal; the .dup is 
unnecessary. This should be equivalent but more efficient:
---
A[n .. n+s.length] = cast(byte[]) s;
---

Jul 12 2007

Steve Teale <steve.teale britseyeview.com> writes:

Frits van Bommel Wrote:

 Steve Teale wrote:
 I have a byte[] A that contains an AJP13 packet, presumably including UTF8
strings.  I need to extract such strings and to place strings in such a buffer.
 I'm using:
 


 string s = A[n .. m].dup;  // n and m from prefixed string length/position
 return s;
 
 to get strings, and

 
 That should work, and be optimal unless you can be sure the A array 
 doesn't change while you still need the string (in which case the .dup 
 is unnecessary).
 
 byte[] ba = cast(byte[]) s;
 A[n .. n+ba.length] = ba[0 .. $].dup;
 
 to put them.  Are these a) sensible, b) optimal?

 
 This one should work as well, but isn't optimal; the .dup is 
 unnecessary. This should be equivalent but more efficient:
 ---
 A[n .. n+s.length] = cast(byte[]) s;
 ---

Can I use n+s.length?  In my experimentation i noticed that a UTF8 string
containing a character using a two-byte representation definitely had an
s.length of the number of characters, which was one less than the number of
bytes.

Jul 12 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Steve Teale wrote:
 Frits van Bommel Wrote:
 
 ---
 A[n .. n+s.length] = cast(byte[]) s;
 ---

 
 Can I use n+s.length?  In my experimentation i noticed that a UTF8 string
containing a character using a two-byte representation definitely had an
s.length of the number of characters, which was one less than the number of
bytes.

You noticed wrong...
char[]s in D aren't very special, they're just specific array types that 
happen to be handled specially by some functions (such as writef*)[1]. 
The .length is the number of elements, and each element is a fixed size. 
A char is just a type representing a byte from UTF-8 text.
---
import std.stdio;

void main() {
	auto s = "\u0100";
	writefln(s);
	writefln(s.length);
	writefln((cast(byte[])s).length);
}
---
Outputs a weird character (an A with a - on top) and two times the number 2.


[1]: and by foreach statements as well; they can automagically extract 
char/wchar/dchar elements from char[]/wchar[]dchar[], in any combination.

Jul 12 2007

Steve Teale <steve.teale britseyeview.com> writes:

Frits van Bommel Wrote:

 Steve Teale wrote:
 Frits van Bommel Wrote:
 
 ---
 A[n .. n+s.length] = cast(byte[]) s;
 ---

 
 Can I use n+s.length?  In my experimentation i noticed that a UTF8 string
containing a character using a two-byte representation definitely had an
s.length of the number of characters, which was one less than the number of
bytes.

 
 You noticed wrong...
 char[]s in D aren't very special, they're just specific array types that 
 happen to be handled specially by some functions (such as writef*)[1]. 
 The .length is the number of elements, and each element is a fixed size. 
 A char is just a type representing a byte from UTF-8 text.
 ---
 import std.stdio;
 
 void main() {
 	auto s = "\u0100";
 	writefln(s);
 	writefln(s.length);
 	writefln((cast(byte[])s).length);
 }
 ---
 Outputs a weird character (an A with a - on top) and two times the number 2.
 
 
 [1]: and by foreach statements as well; they can automagically extract 
 char/wchar/dchar elements from char[]/wchar[]dchar[], in any combination.

You are correct, I had misinterpreted my own test program.

Jul 12 2007

0ffh <spam frankhirsch.net> writes:

Frits van Bommel wrote:
 Outputs a weird character (an A with a - on top) [...]

Hah, Null-A! Reading A.E. van Vogt?

Regards, Frank

Jul 13 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

0ffh wrote:
 Frits van Bommel wrote:
 Outputs a weird character (an A with a - on top) [...]

 
 Hah, Null-A! Reading A.E. van Vogt?

No, never heard of him. I just picked \u0100 because it was a round 
character code and it happened to be that character...

Jul 13 2007

D Programming

C/C++ Programming

Other

digitalmars.D - YASQ - Proper way to convert byte[] <--> string