digitalmars.D.learn - retro() on a `string` creates a range of `dchar`, causing array()
- Jakob Ovrum (12/12) Apr 17 2012 Consider this simple function:
- bearophile (4/10) Apr 17 2012 Try "text" instead of "array".
- Jakob Ovrum (4/17) Apr 17 2012 Thanks, that did it :)
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (4/13) Apr 17 2012 The reason is, a sequence of UTF-8 code units are not a valid UTF-8 when...
- bearophile (4/6) Apr 17 2012 But reversed(char[]) now works :-)
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (8/14) Apr 17 2012 That's pretty cool. :) (You meant reverse()).
- Timon Gehr (3/19) Apr 17 2012 It does not have to build a local string, see
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (31/35) Apr 17 2012 I never said otherwise. :p
- bearophile (6/10) Apr 17 2012 The basic idea for that algorithm was mine, and Andrei was very gentle t...
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (10/17) Apr 17 2012 aware of a
- bearophile (24/30) Apr 17 2012 I see. This is a matter of design. I see some possible solutions:
- Jakob Ovrum (5/9) Apr 17 2012 It is absolutely possible to walk a UTF-8 string backwards.
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (6/14) Apr 18 2012 Indeed. I didn't mean otherwise. I was trying to explain why "The type
Consider this simple function: private string findParameterList(string typestr) { auto strippedHead = typestr.find("(")[1 .. $]; auto strippedTail = retro(strippedHead).find(")"); strippedTail.popFront(); // slice off closing parenthesis return array(strippedTail); } The type of the return expression is dstring, not string. What is the most elegant way or correct way to solve this friction? (Note: the function is used in CTFE)
Apr 17 2012
Jakob Ovrum:return array(strippedTail); } The type of the return expression is dstring, not string. What is the most elegant way or correct way to solve this friction? (Note: the function is used in CTFE)Try "text" instead of "array". Bye, bearophile
Apr 17 2012
On Tuesday, 17 April 2012 at 15:18:49 UTC, bearophile wrote:Jakob Ovrum:Thanks, that did it :) (I also forgot to retro() a second time to make it build the array in the original direction, before anyone points it out)return array(strippedTail); } The type of the return expression is dstring, not string. What is the most elegant way or correct way to solve this friction? (Note: the function is used in CTFE)Try "text" instead of "array". Bye, bearophile
Apr 17 2012
On 04/17/2012 08:12 AM, Jakob Ovrum wrote:Consider this simple function: private string findParameterList(string typestr) { auto strippedHead = typestr.find("(")[1 .. $]; auto strippedTail = retro(strippedHead).find(")"); strippedTail.popFront(); // slice off closing parenthesis return array(strippedTail); } The type of the return expression is dstring, not string.The reason is, a sequence of UTF-8 code units are not a valid UTF-8 when reversed (or retro'ed :p). But a dchar array can be reversed. Ali
Apr 17 2012
Ali Çehreli:The reason is, a sequence of UTF-8 code units are not a valid UTF-8 when reversed (or retro'ed :p).But reversed(char[]) now works :-) Bye, bearophile
Apr 17 2012
On 04/17/2012 08:58 AM, bearophile wrote:Ali Çehreli:That's pretty cool. :) (You meant reverse()). Interesting, because there could be no other way anyway because reverse() is in-place. Iterating by dchar without damaging the other end must have been challenging because the first half of the string may have been all multi-bype UTF-8 code units and all of the rest of single-bytes. The algorithm must be building a local string.The reason is, a sequence of UTF-8 code units are not a valid UTF-8 when reversed (or retro'ed :p).But reversed(char[]) now works :-)Bye, bearophileAli
Apr 17 2012
On 04/17/2012 06:09 PM, Ali Çehreli wrote:On 04/17/2012 08:58 AM, bearophile wrote: > Ali Çehreli: > >> The reason is, a sequence of UTF-8 code units are not a valid UTF-8 >> when reversed (or retro'ed :p). > > But reversed(char[]) now works :-) That's pretty cool. :) (You meant reverse()). Interesting, because there could be no other way anyway because reverse() is in-place. Iterating by dchar without damaging the other end must have been challenging because the first half of the string may have been all multi-bype UTF-8 code units and all of the rest of single-bytes. The algorithm must be building a local string. > Bye, > bearophile AliIt does not have to build a local string, see http://dlang.org/phobos/std_utf.html#strideBack
Apr 17 2012
On 04/17/2012 09:12 AM, Timon Gehr wrote:On 04/17/2012 06:09 PM, Ali Çehreli wrote:The algorithm must be building a local string.It does not have to build a local string, see http://dlang.org/phobos/std_utf.html#strideBackI never said otherwise. :p I was too lazy to locate where 2.059's algorithm.d was placed under. Apparently it is here: /usr/include/x86_64-linux-gnu/dmd/phobos/std/algorithm.d The algorithm is smart. It reverses individual Unicode characters in-place first and then reverses the whole string one last time: void reverse(Char)(Char[] s) if (isNarrowString!(Char[]) && !is(Char == const) && !is(Char == immutable)) { auto r = representation(s); for (size_t i = 0; i < s.length; ) { immutable step = std.utf.stride(s, i); if (step > 1) { .reverse(r[i .. i + step]); i += step; } else { ++i; } } reverse(r); } Ali P.S. Being a C++ programmer, exception-safety is always warm in my mind. Unfortunately the topic does not come up much in D forums. The algorithm above is not exception-safe because stride() may throw. But this way off topic on this thread. :)
Apr 17 2012
Ali:The algorithm is smart.The basic idea for that algorithm was mine, and Andrei was very gentle to implement it, defining it a "Very fun exercise" :-) http://d.puremagic.com/issues/show_bug.cgi?id=7086The algorithm above is not exception-safe because stride() may throw. But this way off topic on this thread. :)You can't expect Phobos to be perfect, it needs to be improved iteratively. If you think that's not exception safe and and there are simple means to do it, then please add this in Bugzilla. Being formally aware of a problem is the second step toward improving the situation. Bye, bearophile
Apr 17 2012
On 04/17/2012 12:57 PM, bearophile wrote:The algorithm above is not exception-safe because stride() may throw. But this way off topic on this thread. :)You can't expect Phobos to be perfect, it needs to be improved iteratively. If you think that's not exception safe and and there aresimplemeans to do it, then please add this in Bugzilla. Being formallyaware of aproblem is the second step toward improving the situation.Agreed. But I am not that sure about this particular function anymore because for the function to be not 'strongly exception safe', the input string must be invalid UTF-8 to begin with. I am not sure how bad it is to not preserve the actual invalidness of the string in that case. :) Ali
Apr 17 2012
Ali Çehreli:Agreed. But I am not that sure about this particular function anymore because for the function to be not 'strongly exception safe', the input string must be invalid UTF-8 to begin with. I am not sure how bad it is to not preserve the actual invalidness of the string in that case. :)I see. This is a matter of design. I see some possible solutions: 1) Do nothing, assume input is well-formed UTF-8, otherwise output will be wrong (or it will throw an exception unsafely). This is what Phobos may be doing in this case. 2) Put a UTF validate inside the function pre-condition if the input is a narrow string. This will slow down code in non-release mode, maybe too much. 3) Use a stronger type system, that enforces pre-conditions and post-conditions in a smarter way. This means if the return value of a function that has 'validate' inside its post-condition is given as input to a function that has 'validate' inside its pre-condition, the validate is run only once even in non-release mode. Generally if you use many string functions this leads to the saving of lot of 'validate' functions. This solution is appreciated by Eiffel languages. 4) Use two different types, one for validated UTF-8 and one for unvalidated UTF-8. Unless you have bad bugs in your code this will avoid most calls to 'validate'. This solution is very simple because it doesn't require a smart compiler, and it's appreciated in languages like Haskell (example, see: http://www.yesodweb.com/ ). Bye, bearophile
Apr 17 2012
On Tuesday, 17 April 2012 at 15:36:39 UTC, Ali Çehreli wrote:The reason is, a sequence of UTF-8 code units are not a valid UTF-8 when reversed (or retro'ed :p). But a dchar array can be reversed. AliIt is absolutely possible to walk a UTF-8 string backwards. The problem here is that arrays of char are ranges of dchar; hence you can't go the regular generic path and have to use text() instead.
Apr 17 2012
On Wednesday, 18 April 2012 at 05:45:06 UTC, Jakob Ovrum wrote:On Tuesday, 17 April 2012 at 15:36:39 UTC, Ali Çehreli wrote:Indeed. I didn't mean otherwise. I was trying to explain why "The type of the return expression is dstring, not string." And I just checked, again, that my use of "UTF-8 code units" above was correct. :) I didn't say "Unicode code points". AliThe reason is, a sequence of UTF-8 code units are not a valid UTF-8 when reversed (or retro'ed :p). But a dchar array can be reversed. AliIt is absolutely possible to walk a UTF-8 string backwards.
Apr 18 2012