digitalmars.D.learn - String Prefix Predicate

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (5/5) Aug 14 2014 What's the preferrred way to check if a string starts with

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (2/4) Aug 14 2014 Should I use std.algorithm.startsWith() in all cases?
Justin Whear (3/8) Aug 14 2014 std.algorithm.startsWith? Should auto-decode, so it'll do a utf-32

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (9/10) Aug 14 2014 What about

Jonathan M Davis (20/30) Aug 14 2014 Except that you _have_ to decode in this case. Unless the string
monarch_dodra (10/18) Aug 16 2014 I don't get it? If you use "byDchar", you are *explicitly*

monarch_dodra (11/13) Aug 16 2014 By that I want to stress what Jonathan M Davis said
=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (6/10) Aug 18 2014 byDchar and alikes are lazy ranges, ie they don't allocate.

monarch_dodra (21/32) Aug 18 2014 Lazy does NOT mean does not allocate. You are making a terrible

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (9/42) Aug 18 2014 Ok, sorry about that. My mistake. And thanks for correcting me on

monarch_dodra (11/17) Aug 19 2014 You could define your own range of chars, for example, a "rope".

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (2/13) Aug 19 2014 Ok, thanks again.

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

What's the preferrred way to check if a string starts with 
another string if the string is a

1. string (utf-8) BiDir
2. wstring (utf-16) BiDir
3. dstring (utf-32) Random

Aug 14 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Thursday, 14 August 2014 at 17:17:13 UTC, Nordlöw wrote:
 What's the preferrred way to check if a string starts with 
 another string if the string is a

Should I use std.algorithm.startsWith() in all cases?

Aug 14 2014

Justin Whear <justin economicmodeling.com> writes:

On Thu, 14 Aug 2014 17:17:11 +0000, Nordlöw wrote:

 What's the preferrred way to check if a string starts with another
 string if the string is a
 
 1. string (utf-8) BiDir 2. wstring (utf-16) BiDir 3. dstring (utf-32)
 Random

std.algorithm.startsWith?  Should auto-decode, so it'll do a utf-32 
comparison behind the scenes.

Aug 14 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Thursday, 14 August 2014 at 17:33:41 UTC, Justin Whear wrote:
 std.algorithm.startsWith?  Should auto-decode, so it'll do a

What about 
https://github.com/D-Programming-Language/phobos/pull/2043

Auto-decoding should be avoided when possible.

I guess something like

whole.byDchar().startsWith(part.byDchar())

is preferred right?

If so is this what we will live with until Phobos has been 
upgraded to using pull 2043 in a few years?

Aug 14 2014

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Thursday, 14 August 2014 at 17:41:08 UTC, Nordlöw wrote:
 On Thursday, 14 August 2014 at 17:33:41 UTC, Justin Whear wrote:
 std.algorithm.startsWith?  Should auto-decode, so it'll do a

 What about 
 https://github.com/D-Programming-Language/phobos/pull/2043

 Auto-decoding should be avoided when possible.

 I guess something like

 whole.byDchar().startsWith(part.byDchar())

 is preferred right?

 If so is this what we will live with until Phobos has been 
 upgraded to using pull 2043 in a few years?

Except that you _have_ to decode in this case. Unless the string 
types match, there's no way around it. And startsWith won't 
decode if the string types match. So, I really see no issue in 
just straight-up using startsWith.

Where you run into problems with auto-decoding in Phobos 
functions is when a function results in a new range type. That 
forces you into a range of dchar, whether you wanted it or not. 
But beyond that, Phobos is actually pretty good about avoiding 
unnecessary decoding (though there probably are places where it 
could be improved). The big problem is that that requires 
special-casing a lot of functions, whereas that wouldn't be 
required with a range of char or wchar.

So, the biggest problems with automatic decoding are when a 
function returns a range of dchar when you wanted to operate on 
code units or when you write a function and then have to special 
case it for strings if you want to avoid the auto-decoding, 
whereas that's already been done for you with most Phobos 
functions.

- Jonathan M Davis

Aug 14 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 14 August 2014 at 17:41:08 UTC, Nordlöw wrote:
 On Thursday, 14 August 2014 at 17:33:41 UTC, Justin Whear wrote:
 std.algorithm.startsWith?  Should auto-decode, so it'll do a

 What about 
 https://github.com/D-Programming-Language/phobos/pull/2043

 Auto-decoding should be avoided when possible.

 I guess something like

 whole.byDchar().startsWith(part.byDchar())

 is preferred right?

I don't get it? If you use "byDchar", you are *explicitly* 
decoding. How is that any better? If anything, you are 
*preventing* the (many) opportunities phobos has to *avoid* 
decoding when it can...

If you really want to avoid decoding, use either "representation" 
which will do char[] => ubyte[] conversion, or "byCodeUnit", 
which will create a range that returns single elements (IMO, 
"byCodeUnit" should be prefered over "byChar", as it infers the 
correct width).

Aug 16 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Saturday, 16 August 2014 at 20:59:47 UTC, monarch_dodra wrote:
 If anything, you are *preventing* the (many) opportunities 
 phobos has to *avoid* decoding when it can...

By that I want to stress what Jonathan M Davis said
"Unless the string types match, there's no way around it."

You should absolutely realize that that means that when the 
string types (widths) *do* match, then "search" (which includes 
all flavors in phobos) will NOT decode.

Heck, if you do a "string, element" search, eg find("my phrase", 
someDchar), then phobos will *encode* someDchar into a correctly 
sized string, and then do a full non-decoding string-string 
search, which is actually much faster than the naive decoding 
search.

Aug 16 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Saturday, 16 August 2014 at 20:59:47 UTC, monarch_dodra wrote:
 I don't get it? If you use "byDchar", you are *explicitly* 
 decoding. How is that any better? If anything, you are 
 *preventing* the (many) opportunities phobos has to *avoid* 
 decoding when it can...

byDchar and alikes are lazy ranges, ie they don't allocate.

They also don't throw exceptions which is prefferably in some 
cases.

Read the details at
https://github.com/D-Programming-Language/phobos/pull/2043

Aug 18 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 18 August 2014 at 11:28:25 UTC, Nordlöw wrote:
 On Saturday, 16 August 2014 at 20:59:47 UTC, monarch_dodra 
 wrote:
 I don't get it? If you use "byDchar", you are *explicitly* 
 decoding. How is that any better? If anything, you are 
 *preventing* the (many) opportunities phobos has to *avoid* 
 decoding when it can...

 byDchar and alikes are lazy ranges, ie they don't allocate.

Lazy does NOT mean does not allocate. You are making a terrible 
mistake if you assume that.

Furthermore decoding does NOT allocate either. At worst, it can 
throw an exception, but that's exceptional.

 They also don't throw exceptions which is preferably in some 
 cases.

Even then, "startsWith(string1, string2)" will *NOT* decode. It 
will do a binary comparison of the codeunits. A fast one at that, 
since you'll use SIMD vector comparison. Because of this, it 
won't throw any exceptions either. This compiles just fine:
void main() nothrow
{
     bool b = "foobar".startsWith("foo");
}


In contrast, with:
whole.byDchar().startsWith(part.byDchar())
You *will* decode. *THAT* will be painfully slow.

 Read the details at
 https://github.com/D-Programming-Language/phobos/pull/2043

If you are using a string, the only thing helpful in there is 
`byCodeunit`. The rest is only useful if you have actual ranges.

If you are using phobos, you should really trust the 
implementation that decoding will only happen on a "as needed" 
basis.

Aug 18 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Monday, 18 August 2014 at 12:42:25 UTC, monarch_dodra wrote:
 On Monday, 18 August 2014 at 11:28:25 UTC, Nordlöw wrote:
 On Saturday, 16 August 2014 at 20:59:47 UTC, monarch_dodra 
 wrote:
 I don't get it? If you use "byDchar", you are *explicitly* 
 decoding. How is that any better? If anything, you are 
 *preventing* the (many) opportunities phobos has to *avoid* 
 decoding when it can...

 byDchar and alikes are lazy ranges, ie they don't allocate.

 Lazy does NOT mean does not allocate. You are making a terrible 
 mistake if you assume that.

Ok, sorry about that. My mistake. And thanks for correcting me on 
this matter.

 Furthermore decoding does NOT allocate either. At worst, it can 
 throw an exception, but that's exceptional.

 They also don't throw exceptions which is preferably in some 
 cases.

 Even then, "startsWith(string1, string2)" will *NOT* decode. It 
 will do a binary comparison of the codeunits. A fast one at 
 that, since you'll use SIMD vector comparison. Because of this, 
 it won't throw any exceptions either. This compiles just fine:
 void main() nothrow
 {
     bool b = "foobar".startsWith("foo");
 }

Ok, so decoding is needed only when whole and part have different 
encodings,

 In contrast, with:
 whole.byDchar().startsWith(part.byDchar())
 You *will* decode. *THAT* will be painfully slow.

Ok.

 Read the details at
 https://github.com/D-Programming-Language/phobos/pull/2043

 If you are using a string, the only thing helpful in there is 
 `byCodeunit`. The rest is only useful if you have actual ranges.

Actual ranges of...characters and strings? Could you gives some 
examples? I'm curious.

 If you are using phobos, you should really trust the 
 implementation that decoding will only happen on a "as needed" 
 basis.

Ok, got it.

Aug 18 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 18 August 2014 at 20:50:55 UTC, Nordlöw wrote:
 On Monday, 18 August 2014 at 12:42:25 UTC, monarch_dodra wrote:
 If you are using a string, the only thing helpful in there is 
 `byCodeunit`. The rest is only useful if you have actual 
 ranges.

 Actual ranges of...characters and strings? Could you gives some 
 examples? I'm curious.

You could define your own range of chars, for example, a "rope". 
Or, you want to store your string in a deterministic container 
("Array!char"). These would produce individual code units, but 
you'd still need them to be interpreted your range as a sequence 
of code points. This is where `byDchar` would come in handy.

There is a fair bit of discrepancy between a "char[]", and a 
range where `ElementType!R` is `char`, which is quite 
unfortunate. There have been talks of killing auto-decode, in 
which case, a range of chars would have the same behavior as a 
char[].

Aug 19 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Tuesday, 19 August 2014 at 08:23:53 UTC, monarch_dodra wrote:
 You could define your own range of chars, for example, a 
 "rope". Or, you want to store your string in a deterministic 
 container ("Array!char"). These would produce individual code 
 units, but you'd still need them to be interpreted your range 
 as a sequence of code points. This is where `byDchar` would 
 come in handy.

 There is a fair bit of discrepancy between a "char[]", and a 
 range where `ElementType!R` is `char`, which is quite 
 unfortunate. There have been talks of killing auto-decode, in 
 which case, a range of chars would have the same behavior as a 
 char[].

Ok, thanks again.

Aug 19 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - String Prefix Predicate