www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - evenChunks on a string - hasLength constraint fails?

reply amarillion <mpvaniersel gmx.com> writes:
Hey

I'm trying to split a string down the middle. I thought the 
function std.range.evenChunks would be perfect for this:

```


import std.range;

void main() {
	string line = "abcdef";
	auto parts = evenChunks(line, 2);
	assert(parts == ["abc", "def"]);
}
```

But I'm getting a compiler error:

```
/usr/include/dmd/phobos/std/range/package.d(8569):        
Candidate is: `evenChunks(Source)(Source source, size_t 
chunkCount)`
   with `Source = string`
   whose parameters have the following constraints:
   `~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
`    isForwardRange!Source
   > hasLength!Source
`  `~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
./test.d(7):        All possible candidates are marked as 
`deprecated` or ` disable`
```

I'm trying to understand why this doesn't work. I don't really 
understand the error. If I interpret this correctly, it's missing 
a length attribute on a string, but shouldn't length be there?
Mar 14 2023
parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 14 March 2023 at 08:21:00 UTC, amarillion wrote:
 I'm trying to understand why this doesn't work. I don't really 
 understand the error. If I interpret this correctly, it's 
 missing a length attribute on a string, but shouldn't length be 
 there?
By default, D's standard library treats a `string` as a range of Unicode code points (i.e., a range of `dchar`s), encoded in UTF-8. Because UTF-8 is a variable-length encoding, it's impossible to know how many code points there are in a `string` without iterating it--which means that, as far as the standard library is concerned, `string` does not have a valid `.length` property. This behavior is known as "auto decoding", and is described in more detail in this article by Jack Stouffer: https://jackstouffer.com/blog/d_auto_decoding_and_you.html If you do not want the standard library to treat your `string` as an array of code points, you must use a wrapper like [`std.utf.byCodeUnit`][1] (to get a range of `char`s) or [`std.string.representation`][2] (to get a range of `ubyte`s). For example: ```d auto parts = evenChunks(line.byCodeUnit, 2); ``` Of course, if you do this, there is a risk that you will split a code point in half and end up with invalid Unicode. If your program needs to handle Unicode input, you would be better off finding a different solution—for example, you could use [`std.range.primitives.walkLength`][3] to compute the midpoint of the range by hand, and split it using [`std.range.chunks`][4]: ```d size_t length = line.walkLength; auto parts = chunks(line, length / 2); ``` [1]: https://phobos.dpldocs.info/std.utf.byCodeUnit.html [2]: https://phobos.dpldocs.info/std.string.representation.html [3]: https://phobos.dpldocs.info/std.range.primitives.walkLength.1.html [4]: https://phobos.dpldocs.info/std.range.chunks.html
Mar 14 2023
parent amarillion <mpvaniersel gmx.com> writes:
On Tuesday, 14 March 2023 at 18:41:50 UTC, Paul Backus wrote:
 On Tuesday, 14 March 2023 at 08:21:00 UTC, amarillion wrote:
 I'm trying to understand why this doesn't work. I don't really 
 understand the error. If I interpret this correctly, it's 
 missing a length attribute on a string, but shouldn't length 
 be there?
By default, D's standard library treats a `string` as a range of Unicode code points (i.e., a range of `dchar`s), encoded in UTF-8. Because UTF-8 is a variable-length encoding, it's impossible to know how many code points there are in a `string` without iterating it--which means that, as far as the standard library is concerned, `string` does not have a valid `.length` property.
Thanks for the clear explanation! I was already aware that you could iterate by codepoint with foreach(dchar c; s), but it just didn't cross my mind that the same concept was playing a role here. I guess it's just one of those things that you just have to know. regards, Amarillion
Mar 16 2023