digitalmars.D.learn - How to get a substring?

Gautam Goel (5/5) Oct 26 2013 Dumb Newbie Question: I've searched through the library

Namespace (4/9) Oct 26 2013 Use slices:

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (28/36) Oct 26 2013 Yes but that works only if the string is known to contain only ASCII

Namespace (5/49) Oct 26 2013 Yeah that is of course easier and nicer than C++... :D Just

Damian (4/64) Oct 26 2013 This functionality should really be provided in phobos str.string!

Jesse Phillips (5/12) Oct 26 2013 But that isn't how substring works. At least it seams neither
Timothee Cour (41/82) Oct 26 2013 I've posted a while back a string=3D>string substring function that does...

Nicolas Sicard (3/43) Oct 26 2013 Another one, with negative index like Javascript's String.slice():

Timothee Cour (6/52) Oct 26 2013 ralize

Nicolas Sicard (3/62) Oct 27 2013 Yes. slice was a quick addition to before/after.

Jonathan M Davis (37/84) Oct 26 2013 d

Jakob Ovrum (21/26) Oct 27 2013 There are a lot of good answers in this thread but I also think

Nicolas Sicard (8/16) Oct 27 2013 I don't think so. It's indeed worth noticing that Phobos'

Jakob Ovrum (9/16) Oct 27 2013 It is a means to an end. I'm saying it can be replaced with a
Jonathan M Davis (26/44) Oct 27 2013 Sometimes, but it usually isn't. If you find that you frequently need to...

Jakob Ovrum (16/68) Oct 27 2013 +1

Jakob Ovrum (4/9) Oct 27 2013 Actually, I think that normalization of the input strings might

"Gautam Goel" <gautamcgoel gmail.com> writes:

Dumb Newbie Question: I've searched through the library 
reference, but I haven't figured out how to extract a substring 
from a string. I'd like something like string.substring("Hello", 
0, 2) to return "Hel", for example. What method am I looking for? 
Thanks!

Oct 26 2013

"Namespace" <rswhite4 googlemail.com> writes:

On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
 Dumb Newbie Question: I've searched through the library 
 reference, but I haven't figured out how to extract a substring 
 from a string. I'd like something like 
 string.substring("Hello", 0, 2) to return "Hel", for example. 
 What method am I looking for? Thanks!

Use slices:

string msg = "Hello";
string sub = msg[0 .. 2];

Oct 26 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 10/26/2013 02:25 PM, Namespace wrote:
 On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
 Dumb Newbie Question: I've searched through the library reference, but
 I haven't figured out how to extract a substring from a string. I'd
 like something like string.substring("Hello", 0, 2) to return "Hel",
 for example. What method am I looking for? Thanks!

 Use slices:

 string msg = "Hello";
 string sub = msg[0 .. 2];

Yes but that works only if the string is known to contain only ASCII 
codes. (Otherwise, a string is a collection of UTF-8 code units.)

I could not find a subString() function either but it turns out to be 
trivial to implement with Phobos:

import std.range;
import std.algorithm;

auto subRange(R)(R s, size_t beg, size_t end)
{
     return s.dropExactly(beg).take(end - beg);
}

unittest
{
     assert("abcçdef".subRange(2, 4).equal("cç"));
}

void main()
{}

That function produces a lazy range. To convert it eagerly to a string:

import std.conv;

string subString(string s, size_t beg, size_t end)
{
     return s.subRange(beg, end).text;
}

unittest
{
     assert("Hello".subString(0, 2) == "He");
}

Ali

Oct 26 2013

"Namespace" <rswhite4 googlemail.com> writes:

On Saturday, 26 October 2013 at 22:17:33 UTC, Ali Çehreli wrote:
 On 10/26/2013 02:25 PM, Namespace wrote:
 On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel 
 wrote:
 Dumb Newbie Question: I've searched through the library 
 reference, but
 I haven't figured out how to extract a substring from a 
 string. I'd
 like something like string.substring("Hello", 0, 2) to return 
 "Hel",
 for example. What method am I looking for? Thanks!

 Use slices:

 string msg = "Hello";
 string sub = msg[0 .. 2];

 Yes but that works only if the string is known to contain only 
 ASCII codes. (Otherwise, a string is a collection of UTF-8 code 
 units.)

 I could not find a subString() function either but it turns out 
 to be trivial to implement with Phobos:

 import std.range;
 import std.algorithm;

 auto subRange(R)(R s, size_t beg, size_t end)
 {
     return s.dropExactly(beg).take(end - beg);
 }

 unittest
 {
     assert("abcçdef".subRange(2, 4).equal("cç"));
 }

 void main()
 {}

 That function produces a lazy range. To convert it eagerly to a 
 string:

 import std.conv;

 string subString(string s, size_t beg, size_t end)
 {
     return s.subRange(beg, end).text;
 }

 unittest
 {
     assert("Hello".subString(0, 2) == "He");
 }

 Ali

Yeah that is of course easier and nicer than C++... :D Just 
kidding. I think the slice should be enough. This example would 
have deterred me from further use if I had seen it it in my 
beginning.

Oct 26 2013

"Damian" <damianday hotmail.co.uk> writes:

On Saturday, 26 October 2013 at 23:19:56 UTC, Namespace wrote:
 On Saturday, 26 October 2013 at 22:17:33 UTC, Ali Çehreli wrote:
 On 10/26/2013 02:25 PM, Namespace wrote:
 On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel 
 wrote:
 Dumb Newbie Question: I've searched through the library 
 reference, but
 I haven't figured out how to extract a substring from a 
 string. I'd
 like something like string.substring("Hello", 0, 2) to 
 return "Hel",
 for example. What method am I looking for? Thanks!

 Use slices:

 string msg = "Hello";
 string sub = msg[0 .. 2];

 Yes but that works only if the string is known to contain only 
 ASCII codes. (Otherwise, a string is a collection of UTF-8 
 code units.)

 I could not find a subString() function either but it turns 
 out to be trivial to implement with Phobos:

 import std.range;
 import std.algorithm;

 auto subRange(R)(R s, size_t beg, size_t end)
 {
    return s.dropExactly(beg).take(end - beg);
 }

 unittest
 {
    assert("abcçdef".subRange(2, 4).equal("cç"));
 }

 void main()
 {}

 That function produces a lazy range. To convert it eagerly to 
 a string:

 import std.conv;

 string subString(string s, size_t beg, size_t end)
 {
    return s.subRange(beg, end).text;
 }

 unittest
 {
    assert("Hello".subString(0, 2) == "He");
 }

 Ali

 Yeah that is of course easier and nicer than C++... :D Just 
 kidding. I think the slice should be enough. This example would 
 have deterred me from further use if I had seen it it in my 
 beginning.

This functionality should really be provided in phobos str.string!
It is a very common function and I have also made the mistake of
slicing a range in the past :/

Oct 26 2013

"Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:

On Saturday, 26 October 2013 at 22:17:33 UTC, Ali Çehreli wrote:
 Use slices:

 string msg = "Hello";
 string sub = msg[0 .. 2];

 Yes but that works only if the string is known to contain only 
 ASCII codes. (Otherwise, a string is a collection of UTF-8 code 
 units.)

But that isn't how substring works. At least it seams neither 


Though D generally has much better functions for some situations, 
find/until/countUntil/startsWith.

Oct 26 2013

Timothee Cour <thelastmammoth gmail.com> writes:

I've posted a while back a string=3D>string substring function that doesn't
allocating: google
"nonallocating unicode string manipulations"

code:

auto slice(T)(T a,size_t u, size_t v)if(is(T=3D=3Dstring)){//TODO:generaliz=
e to
isSomeString
import std.exception;
auto m=3Da.length;
size_t i;
enforce(u<=3Dv);
import std.utf;
while(u-- && i<m){
auto si=3Dstride(a,i);
i+=3Dsi;
v--;
}
// assert(u=3D=3D-1);
// enforce(u=3D=3D-1);
size_t i2=3Di;
while(v-- && i2<m){
auto si=3Dstride(a,i2);
i2+=3Dsi;
}
// assert(v=3D=3D-1);
enforce(v=3D=3D-1);
return a[i..i2];
}
unittest{
import std.range;
auto a=3D"=E2=89=88a=C3=A7=C3=A7=E2=88=9Aef";
auto b=3Da.slice(2,6);
assert(a.slice(2,6)=3D=3D"=C3=A7=C3=A7=E2=88=9Ae");
assert(a.slice(2,6).ptr=3D=3Da.slice(2,3).ptr);
assert(a.slice(0,a.walkLength) is a);
import std.exception;
assertThrown(a.slice(2,8));
assertThrown(a.slice(2,1));
}


On Sat, Oct 26, 2013 at 3:17 PM, Ali =C3=87ehreli <acehreli yahoo.com> wrot=
e:

 On 10/26/2013 02:25 PM, Namespace wrote:

 On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:

 Dumb Newbie Question: I've searched through the library reference, but
 I haven't figured out how to extract a substring from a string. I'd
 like something like string.substring("Hello", 0, 2) to return "Hel",
 for example. What method am I looking for? Thanks!

 Use slices:

 string msg =3D "Hello";
 string sub =3D msg[0 .. 2];

 Yes but that works only if the string is known to contain only ASCII
 codes. (Otherwise, a string is a collection of UTF-8 code units.)

 I could not find a subString() function either but it turns out to be
 trivial to implement with Phobos:

 import std.range;
 import std.algorithm;

 auto subRange(R)(R s, size_t beg, size_t end)
 {
     return s.dropExactly(beg).take(end - beg);
 }

 unittest
 {
     assert("abc=C3=A7def".subRange(2, 4).equal("c=C3=A7"));
 }

 void main()
 {}

 That function produces a lazy range. To convert it eagerly to a string:

 import std.conv;

 string subString(string s, size_t beg, size_t end)
 {
     return s.subRange(beg, end).text;
 }

 unittest
 {
     assert("Hello".subString(0, 2) =3D=3D "He");
 }

 Ali

Oct 26 2013

"Nicolas Sicard" <dransic gmail.com> writes:

On Sunday, 27 October 2013 at 00:18:41 UTC, Timothee Cour wrote:
 I've posted a while back a string=>string substring function 
 that doesn't
 allocating: google
 "nonallocating unicode string manipulations"

 code:

 auto slice(T)(T a,size_t u, size_t 
 v)if(is(T==string)){//TODO:generalize to
 isSomeString
 import std.exception;
 auto m=a.length;
 size_t i;
 enforce(u<=v);
 import std.utf;
 while(u-- && i<m){
 auto si=stride(a,i);
 i+=si;
 v--;
 }
 // assert(u==-1);
 // enforce(u==-1);
 size_t i2=i;
 while(v-- && i2<m){
 auto si=stride(a,i2);
 i2+=si;
 }
 // assert(v==-1);
 enforce(v==-1);
 return a[i..i2];
 }
 unittest{
 import std.range;
 auto a="≈açç√ef";
 auto b=a.slice(2,6);
 assert(a.slice(2,6)=="çç√e");
 assert(a.slice(2,6).ptr==a.slice(2,3).ptr);
 assert(a.slice(0,a.walkLength) is a);
 import std.exception;
 assertThrown(a.slice(2,8));
 assertThrown(a.slice(2,1));
 }

Another one, with negative index like Javascript's String.slice():
http://dpaste.dzfl.pl/608435c5

Oct 26 2013

Timothee Cour <thelastmammoth gmail.com> writes:

On Sat, Oct 26, 2013 at 6:24 PM, Nicolas Sicard <dransic gmail.com> wrote:

 On Sunday, 27 October 2013 at 00:18:41 UTC, Timothee Cour wrote:

 I've posted a while back a string=3D>string substring function that does=


n't
 allocating: google
 "nonallocating unicode string manipulations"

 code:

 auto slice(T)(T a,size_t u, size_t v)if(is(T=3D=3Dstring)){//TODO:**gene=


ralize
 to
 isSomeString
 import std.exception;
 auto m=3Da.length;
 size_t i;
 enforce(u<=3Dv);
 import std.utf;
 while(u-- && i<m){
 auto si=3Dstride(a,i);
 i+=3Dsi;
 v--;
 }
 // assert(u=3D=3D-1);
 // enforce(u=3D=3D-1);
 size_t i2=3Di;
 while(v-- && i2<m){
 auto si=3Dstride(a,i2);
 i2+=3Dsi;
 }
 // assert(v=3D=3D-1);
 enforce(v=3D=3D-1);
 return a[i..i2];
 }
 unittest{
 import std.range;
 auto a=3D"=E2=89=88a=C3=A7=C3=A7=E2=88=9Aef";
 auto b=3Da.slice(2,6);
 assert(a.slice(2,6)=3D=3D"=C3=A7=C3=A7=E2=88=9Ae");
 assert(a.slice(2,6).ptr=3D=3Da.**slice(2,3).ptr);
 assert(a.slice(0,a.walkLength) is a);
 import std.exception;
 assertThrown(a.slice(2,8));
 assertThrown(a.slice(2,1));
 }

 Another one, with negative index like Javascript's String.slice():
 http://dpaste.dzfl.pl/608435c5

not as efficient as what I proposed since it's iterating over the string
twice (the 2nd index redoes the work done by 1st index). Could be adapted
though.

Oct 26 2013

"Nicolas Sicard" <dransic gmail.com> writes:

On Sunday, 27 October 2013 at 03:45:50 UTC, Timothee Cour wrote:
 On Sat, Oct 26, 2013 at 6:24 PM, Nicolas Sicard 
 <dransic gmail.com> wrote:

 On Sunday, 27 October 2013 at 00:18:41 UTC, Timothee Cour 
 wrote:

 I've posted a while back a string=>string substring function 
 that doesn't
 allocating: google
 "nonallocating unicode string manipulations"

 code:

 auto slice(T)(T a,size_t u, size_t 
 v)if(is(T==string)){//TODO:**generalize
 to
 isSomeString
 import std.exception;
 auto m=a.length;
 size_t i;
 enforce(u<=v);
 import std.utf;
 while(u-- && i<m){
 auto si=stride(a,i);
 i+=si;
 v--;
 }
 // assert(u==-1);
 // enforce(u==-1);
 size_t i2=i;
 while(v-- && i2<m){
 auto si=stride(a,i2);
 i2+=si;
 }
 // assert(v==-1);
 enforce(v==-1);
 return a[i..i2];
 }
 unittest{
 import std.range;
 auto a="≈açç√ef";
 auto b=a.slice(2,6);
 assert(a.slice(2,6)=="çç√e");
 assert(a.slice(2,6).ptr==a.**slice(2,3).ptr);
 assert(a.slice(0,a.walkLength) is a);
 import std.exception;
 assertThrown(a.slice(2,8));
 assertThrown(a.slice(2,1));
 }

 Another one, with negative index like Javascript's 
 String.slice():
 http://dpaste.dzfl.pl/608435c5

 not as efficient as what I proposed since it's iterating over 
 the string
 twice (the 2nd index redoes the work done by 1st index). Could 
 be adapted
 though.

Yes. slice was a quick addition to before/after.

I also wonder if std.uni.graphemeStride would be more appropriate.

Oct 27 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, October 26, 2013 15:17:33 Ali =C3=87ehreli wrote:
 On 10/26/2013 02:25 PM, Namespace wrote:
 On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
 Dumb Newbie Question: I've searched through the library reference,=



 but
 I haven't figured out how to extract a substring from a string. I'=



d
 like something like string.substring("Hello", 0, 2) to return "Hel=



",
 for example. What method am I looking for? Thanks!

=20
 Use slices:
=20
 string msg =3D "Hello";
 string sub =3D msg[0 .. 2];

=20
 Yes but that works only if the string is known to contain only ASCII
 codes. (Otherwise, a string is a collection of UTF-8 code units.)
=20
 I could not find a subString() function either but it turns out to be=

 trivial to implement with Phobos:
=20
 import std.range;
 import std.algorithm;
=20
 auto subRange(R)(R s, size_t beg, size_t end)
 {
      return s.dropExactly(beg).take(end - beg);
 }
=20
 unittest
 {
      assert("abc=C3=A7def".subRange(2, 4).equal("c=C3=A7"));
 }
=20
 void main()
 {}
=20
 That function produces a lazy range. To convert it eagerly to a strin=

g:
=20
 import std.conv;
=20
 string subString(string s, size_t beg, size_t end)
 {
      return s.subRange(beg, end).text;
 }
=20
 unittest
 {
      assert("Hello".subString(0, 2) =3D=3D "He");
 }

There's also std.utf.toUTFindex, which allows you to do

    auto str =3D "Hello";
    assert(str[0 .. str.toUTFindex(2)] =3D=3D "He");

but you have to be careful with it when using anything other than 0 for=
 the=20
first index, because you don't want it to have to traverse the range mu=
ltiple=20
times. With your unicode example you're forced to do something like

    auto str =3D "abc=C3=A7def";
    immutable first =3D str.toUTFindex(2);
    immutable second =3D str[first .. $].toUTFindex(2) + first;
    assert(str[first .. second] =3D=3D "c=C3=A7");

It also has the advantage of the final result being a string without ha=
ving to=20
do any conversions. So, subString should probably be defined as

    inout(C)[] subString(C)(inout(C)[] str, size_t i, size_t j)
        if(isSomeChar!C)
    {
        import std.utf;
        immutable first =3D str.toUTFindex(i);
        immutable second =3D str[first .. $].toUTFindex(i) + first;
        return str[first .. second];
    }

Using drop/dropExactly with take/takeExactly makes more sense when you =
want to=20
iterate over the characters but don't need a string (especially if you'=
re not=20
necessarily going to iterate over them all), but if you really want a s=
tring,=20
then finding the right index for the slice and then slicing is arguably=
 better.

- Jonathan M Davis

Oct 26 2013

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
 Dumb Newbie Question: I've searched through the library 
 reference, but I haven't figured out how to extract a substring 
 from a string. I'd like something like 
 string.substring("Hello", 0, 2) to return "Hel", for example. 
 What method am I looking for? Thanks!

There are a lot of good answers in this thread but I also think 
they miss the real issue here.

When working with Unicode, you'll want to stop thinking in terms 
of indices, in order to produce correct code. Getting a 
sub-string by passing indices is a means to an end; you'll want 
to replace the index paradigm with an approach that does not rely 
on indices.

Working with indices where the smallest unit is a code point 
(dchar), which has been suggested in this thread, is still not 
good enough because you'll either a) potentially break up 
grapheme clusters, which can have disastrous results, or b) end 
up redundantly searching through the string to find the correct 
code point positions.

With Phobos, you can use algorithms such as `find` and the 
`findSplit` family of algorithms to do string manipulation 
without using indices, and the cool thing about this approach is 
that as long as the input strings are properly formed UTF and 
have intact grapheme clusters, it's impossible to get 
Unicode-incorrect results!

When working with ASCII, just slice.

Oct 27 2013

"Nicolas Sicard" <dransic gmail.com> writes:

On Sunday, 27 October 2013 at 07:44:06 UTC, Jakob Ovrum wrote:
 On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
 Dumb Newbie Question: I've searched through the library 
 reference, but I haven't figured out how to extract a 
 substring from a string. I'd like something like 
 string.substring("Hello", 0, 2) to return "Hel", for example. 
 What method am I looking for? Thanks!

 There are a lot of good answers in this thread but I also think 
 they miss the real issue here.

I don't think so. It's indeed worth noticing that Phobos' 
algorithms work with Unicode nicely, but:
a) working on indices is sometimes the actual functionality you 
need
b) you need to allocate a new string from the range they return 
(the slice functions in this thread don't)
c) do they really handle grapheme clusters? (I don't know)

Oct 27 2013

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Sunday, 27 October 2013 at 08:14:30 UTC, Nicolas Sicard wrote:
 I don't think so. It's indeed worth noticing that Phobos' 
 algorithms work with Unicode nicely, but:
 a) working on indices is sometimes the actual functionality you 
 need

It is a means to an end. I'm saying it can be replaced with a 
much superior approach.

 b) you need to allocate a new string from the range they return 
 (the slice functions in this thread don't)

That's only the case with strictly lazy algorithms. The 
algorithms I pointed out are eager.

 c) do they really handle grapheme clusters? (I don't know)

Rendering code and such is just about the only domain that needs 
to parse grapheme clusters. However, that doesn't mean that naive 
string manipulation code that uses indices can't break the string 
horribly before sending it off to rendering.

Oct 27 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, October 27, 2013 09:14:28 Nicolas Sicard wrote:
 On Sunday, 27 October 2013 at 07:44:06 UTC, Jakob Ovrum wrote:
 On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel wrote:
 Dumb Newbie Question: I've searched through the library
 reference, but I haven't figured out how to extract a
 substring from a string. I'd like something like
 string.substring("Hello", 0, 2) to return "Hel", for example.
 What method am I looking for? Thanks!

 
 There are a lot of good answers in this thread but I also think
 they miss the real issue here.

 
 I don't think so. It's indeed worth noticing that Phobos'
 algorithms work with Unicode nicely, but:
 a) working on indices is sometimes the actual functionality you
 need

Sometimes, but it usually isn't. If you find that you frequently need to use 
indices for a string, then you should probably rethink how you're using 
strings. Phobos aims at operating on ranges, which rarely means using indices, 
and _very_ rarely means using indices on strings. In general, indices only get 
used on strings when you're trying to optimize a particular algorithm for 
strings and make sure that you slice the string so that the result is a string 
rather than a wrapper range.

Sure, indexing strings can be very useful, but they way that Phobos is 
designed does not lend itself to using string indices (quite the opposite in 
fact), and in my experince, using string indices is rarely needed even when 
doing heavy string manipulation.

 b) you need to allocate a new string from the range they return
 (the slice functions in this thread don't)

You _rarely_ want to do that. Allocating a new string is just plain wasteful 
in most cases. The fact that the elements in a string are immutable makes it 
so that you can slice without worrying about allocating new strings. You 
should pretty much only be allocating new strings when slicing when the 
original was something like char[] rather than string. The main place where 
that's usually forced is when reading from a file (since buffers are frequently 
reused when reading files and therefore not immutable).

 c) do they really handle grapheme clusters? (I don't know)

I believe that that sort of thing is properly supported by the updated std.uni 
in 2.064, but it is the sort of thing that you have to code for. Phobos as a 
whole operates on ranges of dchar - which is correct most of the time but not 
enough when you need full-on grapheme support. I haven't yet looked in detail 
at what std.uni now provides though. I just know that it's added some grapheme 
support.

- Jonathan M Davis

Oct 27 2013

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Sunday, 27 October 2013 at 08:35:11 UTC, Jonathan M Davis 
wrote:
 On Sunday, October 27, 2013 09:14:28 Nicolas Sicard wrote:
 On Sunday, 27 October 2013 at 07:44:06 UTC, Jakob Ovrum wrote:
 On Saturday, 26 October 2013 at 21:23:13 UTC, Gautam Goel 
 wrote:
 Dumb Newbie Question: I've searched through the library
 reference, but I haven't figured out how to extract a
 substring from a string. I'd like something like
 string.substring("Hello", 0, 2) to return "Hel", for 
 example.
 What method am I looking for? Thanks!

 
 There are a lot of good answers in this thread but I also 
 think
 they miss the real issue here.

 
 I don't think so. It's indeed worth noticing that Phobos'
 algorithms work with Unicode nicely, but:
 a) working on indices is sometimes the actual functionality you
 need

 Sometimes, but it usually isn't. If you find that you 
 frequently need to use
 indices for a string, then you should probably rethink how 
 you're using
 strings. Phobos aims at operating on ranges, which rarely means 
 using indices,
 and _very_ rarely means using indices on strings. In general, 
 indices only get
 used on strings when you're trying to optimize a particular 
 algorithm for
 strings and make sure that you slice the string so that the 
 result is a string
 rather than a wrapper range.

+1

Also, I think if users need to get UTF indices in their own code, 
it's indicative of either a) Phobos lacking an (optimized) 
algorithm, or b) the user is doing something extremely niche that 
Phobos can't aim to cover generically.

 Sure, indexing strings can be very useful, but they way that 
 Phobos is
 designed does not lend itself to using string indices (quite 
 the opposite in
 fact), and in my experince, using string indices is rarely 
 needed even when
 doing heavy string manipulation.

And Phobos is better off for it!

I don't know if we do a good enough job of educating users about 
Unicode and its implications though, assuming this is a 
responsibility of the D community towards new D users.

 c) do they really handle grapheme clusters? (I don't know)

 I believe that that sort of thing is properly supported by the 
 updated std.uni
 in 2.064, but it is the sort of thing that you have to code 
 for. Phobos as a
 whole operates on ranges of dchar - which is correct most of 
 the time but not
 enough when you need full-on grapheme support. I haven't yet 
 looked in detail
 at what std.uni now provides though. I just know that it's 
 added some grapheme
 support.

 - Jonathan M Davis

The new std.uni supports all the grapheme-related functionality 
you would ever need (as far as I can tell), but the nice thing is 
that most code doesn't need to use it to be Unicode-correct. i.e. 
you don't need to be "aware" of grapheme clusters to not break 
them in the vast majority of code domains.

Oct 27 2013

"Jakob Ovrum" <jakobovrum gmail.com> writes:

On Sunday, 27 October 2013 at 08:53:48 UTC, Jakob Ovrum wrote:
 The new std.uni supports all the grapheme-related functionality 
 you would ever need (as far as I can tell), but the nice thing 
 is that most code doesn't need to use it to be Unicode-correct. 
 i.e. you don't need to be "aware" of grapheme clusters to not 
 break them in the vast majority of code domains.

Actually, I think that normalization of the input strings might 
be a common requirement for Unicode-correct string manipulation. 
I guess that falls under grapheme-related functionality.

Oct 27 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How to get a substring?