www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - retro() on a `string` creates a range of `dchar`, causing array()

reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
Consider this simple function:

	private string findParameterList(string typestr)
	{
		auto strippedHead = typestr.find("(")[1 .. $];
		auto strippedTail = retro(strippedHead).find(")");

		strippedTail.popFront(); // slice off closing parenthesis

		return array(strippedTail);
	}

The type of the return expression is dstring, not string.

What is the most elegant way or correct way to solve this 
friction?

(Note: the function is used in CTFE)
Apr 17 2012
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Jakob Ovrum:

 		return array(strippedTail);
 	}

 The type of the return expression is dstring, not string.

 What is the most elegant way or correct way to solve this 
 friction?

 (Note: the function is used in CTFE)
Try "text" instead of "array". Bye, bearophile
Apr 17 2012
parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Tuesday, 17 April 2012 at 15:18:49 UTC, bearophile wrote:
 Jakob Ovrum:

 		return array(strippedTail);
 	}

 The type of the return expression is dstring, not string.

 What is the most elegant way or correct way to solve this 
 friction?

 (Note: the function is used in CTFE)
Try "text" instead of "array". Bye, bearophile
Thanks, that did it :) (I also forgot to retro() a second time to make it build the array in the original direction, before anyone points it out)
Apr 17 2012
prev sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 04/17/2012 08:12 AM, Jakob Ovrum wrote:
 Consider this simple function:

 private string findParameterList(string typestr)
 {
 auto strippedHead = typestr.find("(")[1 .. $];
 auto strippedTail = retro(strippedHead).find(")");

 strippedTail.popFront(); // slice off closing parenthesis

 return array(strippedTail);
 }

 The type of the return expression is dstring, not string.
The reason is, a sequence of UTF-8 code units are not a valid UTF-8 when reversed (or retro'ed :p). But a dchar array can be reversed. Ali
Apr 17 2012
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Ali Çehreli:

 The reason is, a sequence of UTF-8 code units are not a valid 
 UTF-8 when reversed (or retro'ed :p).
But reversed(char[]) now works :-) Bye, bearophile
Apr 17 2012
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 04/17/2012 08:58 AM, bearophile wrote:
 Ali Çehreli:

 The reason is, a sequence of UTF-8 code units are not a valid UTF-8
 when reversed (or retro'ed :p).
But reversed(char[]) now works :-)
That's pretty cool. :) (You meant reverse()). Interesting, because there could be no other way anyway because reverse() is in-place. Iterating by dchar without damaging the other end must have been challenging because the first half of the string may have been all multi-bype UTF-8 code units and all of the rest of single-bytes. The algorithm must be building a local string.
 Bye,
 bearophile
Ali
Apr 17 2012
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 04/17/2012 06:09 PM, Ali Çehreli wrote:
 On 04/17/2012 08:58 AM, bearophile wrote:
  > Ali Çehreli:
  >
  >> The reason is, a sequence of UTF-8 code units are not a valid UTF-8
  >> when reversed (or retro'ed :p).
  >
  > But reversed(char[]) now works :-)

 That's pretty cool. :) (You meant reverse()).

 Interesting, because there could be no other way anyway because
 reverse() is in-place. Iterating by dchar without damaging the other end
 must have been challenging because the first half of the string may have
 been all multi-bype UTF-8 code units and all of the rest of single-bytes.

 The algorithm must be building a local string.

  > Bye,
  > bearophile

 Ali
It does not have to build a local string, see http://dlang.org/phobos/std_utf.html#strideBack
Apr 17 2012
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 04/17/2012 09:12 AM, Timon Gehr wrote:
 On 04/17/2012 06:09 PM, Ali Çehreli wrote:
 The algorithm must be building a local string.
 It does not have to build a local string, see
 http://dlang.org/phobos/std_utf.html#strideBack
I never said otherwise. :p I was too lazy to locate where 2.059's algorithm.d was placed under. Apparently it is here: /usr/include/x86_64-linux-gnu/dmd/phobos/std/algorithm.d The algorithm is smart. It reverses individual Unicode characters in-place first and then reverses the whole string one last time: void reverse(Char)(Char[] s) if (isNarrowString!(Char[]) && !is(Char == const) && !is(Char == immutable)) { auto r = representation(s); for (size_t i = 0; i < s.length; ) { immutable step = std.utf.stride(s, i); if (step > 1) { .reverse(r[i .. i + step]); i += step; } else { ++i; } } reverse(r); } Ali P.S. Being a C++ programmer, exception-safety is always warm in my mind. Unfortunately the topic does not come up much in D forums. The algorithm above is not exception-safe because stride() may throw. But this way off topic on this thread. :)
Apr 17 2012
parent reply bearophile <bearophileHUGS lycos.com> writes:
Ali:

 The algorithm is smart.
The basic idea for that algorithm was mine, and Andrei was very gentle to implement it, defining it a "Very fun exercise" :-) http://d.puremagic.com/issues/show_bug.cgi?id=7086
 The algorithm
 above is not exception-safe because stride() may throw. But this way off 
 topic on this thread. :)
You can't expect Phobos to be perfect, it needs to be improved iteratively. If you think that's not exception safe and and there are simple means to do it, then please add this in Bugzilla. Being formally aware of a problem is the second step toward improving the situation. Bye, bearophile
Apr 17 2012
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 04/17/2012 12:57 PM, bearophile wrote:

 The algorithm
 above is not exception-safe because stride() may throw. But this way off
 topic on this thread. :)
 You can't expect Phobos to be perfect, it needs to be improved
 iteratively. If you think that's not exception safe and and there are 
simple
 means to do it, then please add this in Bugzilla. Being formally 
aware of a
 problem is the second step toward improving the situation.
Agreed. But I am not that sure about this particular function anymore because for the function to be not 'strongly exception safe', the input string must be invalid UTF-8 to begin with. I am not sure how bad it is to not preserve the actual invalidness of the string in that case. :) Ali
Apr 17 2012
parent "bearophile" <bearophileHUGS lycos.com> writes:
Ali Çehreli:

 Agreed.

 But I am not that sure about this particular function anymore 
 because for the function to be not 'strongly exception safe', 
 the input string must be invalid UTF-8 to begin with.

 I am not sure how bad it is to not preserve the actual 
 invalidness of the string in that case. :)
I see. This is a matter of design. I see some possible solutions: 1) Do nothing, assume input is well-formed UTF-8, otherwise output will be wrong (or it will throw an exception unsafely). This is what Phobos may be doing in this case. 2) Put a UTF validate inside the function pre-condition if the input is a narrow string. This will slow down code in non-release mode, maybe too much. 3) Use a stronger type system, that enforces pre-conditions and post-conditions in a smarter way. This means if the return value of a function that has 'validate' inside its post-condition is given as input to a function that has 'validate' inside its pre-condition, the validate is run only once even in non-release mode. Generally if you use many string functions this leads to the saving of lot of 'validate' functions. This solution is appreciated by Eiffel languages. 4) Use two different types, one for validated UTF-8 and one for unvalidated UTF-8. Unless you have bad bugs in your code this will avoid most calls to 'validate'. This solution is very simple because it doesn't require a smart compiler, and it's appreciated in languages like Haskell (example, see: http://www.yesodweb.com/ ). Bye, bearophile
Apr 17 2012
prev sibling parent reply "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Tuesday, 17 April 2012 at 15:36:39 UTC, Ali Çehreli wrote:
 The reason is, a sequence of UTF-8 code units are not a valid 
 UTF-8 when reversed (or retro'ed :p). But a dchar array can be 
 reversed.

 Ali
It is absolutely possible to walk a UTF-8 string backwards. The problem here is that arrays of char are ranges of dchar; hence you can't go the regular generic path and have to use text() instead.
Apr 17 2012
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On Wednesday, 18 April 2012 at 05:45:06 UTC, Jakob Ovrum wrote:
 On Tuesday, 17 April 2012 at 15:36:39 UTC, Ali Çehreli wrote:
 The reason is, a sequence of UTF-8 code units are not a valid
 UTF-8 when reversed (or retro'ed :p). But a dchar array can be
 reversed.

 Ali
It is absolutely possible to walk a UTF-8 string backwards.
Indeed. I didn't mean otherwise. I was trying to explain why "The type of the return expression is dstring, not string." And I just checked, again, that my use of "UTF-8 code units" above was correct. :) I didn't say "Unicode code points". Ali
Apr 18 2012