www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Bug in countUntil?

reply "monarch_dodra" <monarchdodra gmail.com> writes:
I was looking in countUntil to fix another issue, and I think the 
string support is broken

This program:
//----
import std.algorithm;
import std.stdio;

void main()
{
     "日本語".countUntil('本').writeln();
}
//----

Will produce "3".

...

I'd have straight up said it was a bug, but the implementation 
goes out of its way to special case narrow strings, when the 
default implementation would have produced the right result 
anyway. So I was thinking it is somehow by design...?

Am I missing something, or is it just implementation sillyness?
Oct 12 2012
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, October 12, 2012 21:02:47 monarch_dodra wrote:
 I was looking in countUntil to fix another issue, and I think the
 string support is broken
=20
 This program:
 //----
 import std.algorithm;
 import std.stdio;
=20
 void main()
 {
      "=E6=97=A5=E6=9C=AC=E8=AA=9E".countUntil('=E6=9C=AC').writeln();=

 }
 //----
=20
 Will produce "3".
=20
 ...
=20
 I'd have straight up said it was a bug, but the implementation
 goes out of its way to special case narrow strings, when the
 default implementation would have produced the right result
 anyway. So I was thinking it is somehow by design...?
=20
 Am I missing something, or is it just implementation sillyness?

Many algorithms special case narrow strings for efficiency. However, in= this=20 case, it looks just plain wrong. countUntil is supposed to return the n= umber=20 of elements (i.e. code points in this case), but it looks like it's ret= urning=20 the number of code units. So, I'd say that it's definitely wrong. If yo= u want=20 code units, then use std.string.indexOf. countUntil is supposed to retu= rn the=20 number of code points. - Jonathan M Davis
Oct 12 2012
parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:
Am 10/12/2012 9:27 PM, schrieb monarch_dodra:
 
 yeah, that's what I thought, but wanted it double checked. I'll take
 care of it then.

Just wanted to mention that this kind of subtle change in behavior can break a lot of code in non-obvious ways. In any case, the documentation for countUntil, but more importantly for (last)IndexOf, needs to state clearly what it does for narrow strings (the countUntil docs at least imply this by using the term "elements", but an explicit statement can do no harm).
Oct 13 2012
prev sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 12 October 2012 at 19:17:13 UTC, Jonathan M Davis 
wrote:
 On Friday, October 12, 2012 21:02:47 monarch_dodra wrote:
 I was looking in countUntil to fix another issue, and I think 
 the
 string support is broken
 
 This program:
 //----
 import std.algorithm;
 import std.stdio;
 
 void main()
 {
      "日本語".countUntil('本').writeln();
 }
 //----
 
 Will produce "3".
 
 ...
 
 I'd have straight up said it was a bug, but the implementation
 goes out of its way to special case narrow strings, when the
 default implementation would have produced the right result
 anyway. So I was thinking it is somehow by design...?
 
 Am I missing something, or is it just implementation sillyness?

Many algorithms special case narrow strings for efficiency. However, in this case, it looks just plain wrong. countUntil is supposed to return the number of elements (i.e. code points in this case), but it looks like it's returning the number of code units. So, I'd say that it's definitely wrong. If you want code units, then use std.string.indexOf. countUntil is supposed to return the number of code points. - Jonathan M Davis

yeah, that's what I thought, but wanted it double checked. I'll take care of it then.
Oct 12 2012