www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - "bstring"

reply Michel Fortin <michel.fortin michelf.com> writes:
Lately I've been using the type "immutable(ubyte)[]" a lot to pass 
around binary data of various kinds. In a couple of places now, to save 
some typing, I'm using this alias:

	alias immutable(ubyte)[] bstring;

Would that make a worthy addition to the other standard string formats 
defined in object.o? Or am I the only one who is using this type a lot?

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/
Apr 05 2010
parent reply Justin Spahr-Summers <Justin.SpahrSummers gmail.com> writes:
On Mon, 5 Apr 2010 18:48:58 -0400, Michel Fortin 
<michel.fortin michelf.com> wrote:
 
 Lately I've been using the type "immutable(ubyte)[]" a lot to pass 
 around binary data of various kinds. In a couple of places now, to save 
 some typing, I'm using this alias:
 
 	alias immutable(ubyte)[] bstring;
 
 Would that make a worthy addition to the other standard string formats 
 defined in object.o? Or am I the only one who is using this type a lot?

I use it quite a lot too, but I'm not sure if making it (effectively) a language keyword is the right approach. I mean, I use mutable byte strings probably just as often. The fact that ubytes are really just arbitrary data I think somewhat diminishes the usefulness of a keyword; to compare, 'string' to me represents a contiguous run of valid *characters* (i.e., the data has meaning and representation in and of itself)... not strictly enforced by D, of course, but that's how the type is used. Apologies if this came out rather disjointed.
Apr 05 2010
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
Justin Spahr-Summers wrote:

 'string' to me represents a contiguous run of valid
 *characters* (i.e., the data has meaning and representation in and of
 itself)... not strictly enforced by D, of course, but that's how the
 type is used.

If by "character" you mean "code unit", yes. string characters are UTF-8 code units in D and have meanings by themselves only if they are one-byte UTF-8 sequences. Ali
Apr 06 2010
next sibling parent reply BCS <none anon.com> writes:
Hello Ali,

 Justin Spahr-Summers wrote:
 
 'string' to me represents a contiguous run of valid
 *characters* (i.e., the data has meaning and representation in and of
 itself)... not strictly enforced by D, of course, but that's how the
 type is used.

string characters are UTF-8 code units in D and have meanings by themselves only if they are one-byte UTF-8 sequences.

I think that's what the "not strictly enforced by D" part was about. True or not, people often assume that a string is valid UTF-8 of some kind. -- ... <IXOYE><
Apr 06 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-04-06 17:10:25 -0400, BCS <none anon.com> said:

 Hello Ali,
 
 Justin Spahr-Summers wrote:
 
 'string' to me represents a contiguous run of valid
 *characters* (i.e., the data has meaning and representation in and of
 itself)... not strictly enforced by D, of course, but that's how the
 type is used.

string characters are UTF-8 code units in D and have meanings by themselves only if they are one-byte UTF-8 sequences.

I think that's what the "not strictly enforced by D" part was about. True or not, people often assume that a string is valid UTF-8 of some kind.

It may not be strictly enforced, but std.range now iterates on code points instead of code units, making 'string' not very practical to use as a range when you need to iterate over UTF-8 code units (bytes), or with other text encodings. "bstring" is more appropriate for those cases. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Apr 06 2010
prev sibling parent Justin Spahr-Summers <Justin.SpahrSummers gmail.com> writes:
On Tue, 06 Apr 2010 11:50:36 -0700, Ali Çehreli <acehreli yahoo.com> 
wrote:
 
 Justin Spahr-Summers wrote:
 
  > 'string' to me represents a contiguous run of valid
  > *characters* (i.e., the data has meaning and representation in and of
  > itself)... not strictly enforced by D, of course, but that's how the
  > type is used.
 
 If by "character" you mean "code unit", yes.
 
 string characters are UTF-8 code units in D and have meanings by 
 themselves only if they are one-byte UTF-8 sequences.
 
 Ali

Sorry, yes. I'm not very familiar with Unicode terminology, but I do know that strings don't always contain valid Unicode sequences, and that's what I meant.
Apr 06 2010