digitalmars.D.learn - Operating with substrings in strings

Heinz (8/8) Aug 18 2006 Hi, i haven't found a function in the phobos lib to read a block of char...

Kirk McDonald (10/20) Aug 18 2006 Slicing:

Frank Benoit (10/18) Aug 18 2006 I do not know much about UTF8. And I am often not sure if I do string

Oskar Linde (6/30) Aug 18 2006 It counts the byte positions. And you are correct. You risk splitting in...

Frank Benoit (8/22) Aug 18 2006 char is a utf8 character. Where is the difference to ubyte or

Sean Kelly (5/8) Aug 18 2006 Yes. Though I think in practice, slicing through the middle of a UTF8

Derek Parnell (14/36) Aug 18 2006 The number of bytes not characters.

Heinz <malagana15 yahoo.es> writes:

Hi, i haven't found a function in the phobos lib to read a block of chars of a
given length from a string taking the start index as a parameter, for example:
we have the word "hello", i want to read starting from index 1 and i want this
substring to have a length of 2, so the result should be "el", i've seen this
in other languages, the function looks like: GetSubString(string mystring, int
startindex, int length).

Is there a way to acomplish this?

Thx

Aug 18 2006

Kirk McDonald <kirklin.mcdonald gmail.com> writes:

Heinz wrote:
 Hi, i haven't found a function in the phobos lib to read a block of chars of a
 given length from a string taking the start index as a parameter, for example:
 we have the word "hello", i want to read starting from index 1 and i want this
 substring to have a length of 2, so the result should be "el", i've seen this
 in other languages, the function looks like: GetSubString(string mystring, int
 startindex, int length).
 
 Is there a way to acomplish this?
 
 Thx

Slicing:

char[] h = "hello";
char[] sub = h[1..3] // Slice the string "hello"
writefln(sub); // Prints "el"

http://digitalmars.com/d/arrays.html#slicing

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://pyd.dsource.org

Aug 18 2006

Frank Benoit <keinfarbton nospam.xyz> writes:

 Slicing:
 
 char[] h = "hello";
 char[] sub = h[1..3] // Slice the string "hello"
 writefln(sub); // Prints "el"
 
 http://digitalmars.com/d/arrays.html#slicing
 

I do not know much about UTF8. And I am often not sure if I do string
processing right. Can someone enlighten me?

If I have
char[] str = ... some multibyte utf8 chars;

What does str.length give me. The number of bytes or the number of
characters by looking at every character, which one are multi-bytes?

If I do some slicing (str[3..4]), does the indices slice at these byte
positions and I have the risk of destroying the string or does it look
at the characters to find the start of the third utf8 character?

Or did I miss something completely?

Aug 18 2006

Oskar Linde <olREM OVEnada.kth.se> writes:

Frank Benoit wrote:

 
 Slicing:
 
 char[] h = "hello";
 char[] sub = h[1..3] // Slice the string "hello"
 writefln(sub); // Prints "el"
 
 http://digitalmars.com/d/arrays.html#slicing
 

 
 I do not know much about UTF8. And I am often not sure if I do string
 processing right. Can someone enlighten me?
 
 If I have
 char[] str = ... some multibyte utf8 chars;
 
 What does str.length give me. The number of bytes or the number of
 characters by looking at every character, which one are multi-bytes?

The number of bytes.

 
 If I do some slicing (str[3..4]), does the indices slice at these byte
 positions and I have the risk of destroying the string or does it look
 at the characters to find the start of the third utf8 character?

It counts the byte positions. And you are correct. You risk splitting in the
middle of a utf-8 code sequence making the string invalid. 

 
 Or did I miss something completely?

Not as far as I can tell. :)

/Oskar

Aug 18 2006

Frank Benoit <keinfarbton nospam.xyz> writes:

Oskar Linde schrieb:
 Frank Benoit wrote:
 What does str.length give me. The number of bytes or the number of
 characters by looking at every character, which one are multi-bytes?

 
 The number of bytes.
 
 If I do some slicing (str[3..4]), does the indices slice at these byte
 positions and I have the risk of destroying the string or does it look
 at the characters to find the start of the third utf8 character?

 
 It counts the byte positions. And you are correct. You risk splitting in the
 middle of a utf-8 code sequence making the string invalid. 
 
 /Oskar


char is a utf8 character. Where is the difference to ubyte or
'ascii/latin1/...' char if there is no native support?

If the functionality is in a lib like phobos std.utf, ubyte/ushort/uint
would work also. (Ok, the init values are different, but I hope that is
not all).

Is dchar (utf32) the only save option to easily work with strings in a
correct way?

Aug 18 2006

Sean Kelly <sean f4.ca> writes:

Frank Benoit wrote:
 
 Is dchar (utf32) the only save option to easily work with strings in a
 correct way?

Yes.  Though I think in practice, slicing through the middle of a UTF8 
character is probably unlikely as most string operations begin with 
search operations and the like.


Sean

Aug 18 2006

Derek Parnell <derek psyc.ward> writes:

On Fri, 18 Aug 2006 22:03:49 +0200, Frank Benoit wrote:

 Slicing:
 
 char[] h = "hello";
 char[] sub = h[1..3] // Slice the string "hello"
 writefln(sub); // Prints "el"
 
 http://digitalmars.com/d/arrays.html#slicing
 

 
 I do not know much about UTF8. And I am often not sure if I do string
 processing right. Can someone enlighten me?
 
 If I have
 char[] str = ... some multibyte utf8 chars;
 
 What does str.length give me. The number of bytes or the number of
 characters by looking at every character, which one are multi-bytes?

The number of bytes not characters.
 
 If I do some slicing (str[3..4]), does the indices slice at these byte
 positions and I have the risk of destroying the string or does it look
 at the characters to find the start of the third utf8 character?
 
 Or did I miss something completely?

No you didn't. The above slicing is only guaranteed if the variable
contains ASCII text. If it doesn't then you will have to use more
sophisticated methods.

For example:

  char[] subtext;
  char[] text;

  subtext = toUTF8(toUTF32(text)[1..3]);


-- 
Derek Parnell
Melbourne, Australia
"Down with mediocrity!"

Aug 18 2006

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Operating with substrings in strings