www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Improving std.algorithm.find

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I was thinking of improving std.find. We have this bug report:

http://d.puremagic.com/issues/show_bug.cgi?id=3923

which is pretty vague but does have a point.

For starters, it should be told that std.algorithm.find _does_ a lot, 
which at least partially justifies its complexity. One thing that I 
haven't seen other find()s doing is to be able to search in one pass a 
given range for multiple other ranges, like this:

int[] a = [ 1, 4, 2, 3 ];
assert(find(a, 5, 4, 3) == tuple([ 4, 2, 3 ], 2));

When passed more than two arguments, find returns a tuple continuing the 
searched ranged positioned on the element found and the 1-based index of 
the parameter that was found. The trick is that find() makes exactly one 
pass through the searched range, which is often more efficient than 
searching the same range for each element in turn. Also the one-pass 
approach works with input ranges so it doesn't put pressure on the range 
capabilities.

However the simplified find() looks like, I'd like to keep this feature 
unless it brings serious aggravation. Right now it's the #1 factor that 
complicates find's signature and implementation.

Another aspect I'd like to discuss is use of Boyer-Moore and related 
fast finding techniques. Currently the use of B-M is explicit and a bit 
awkward. I was thinking instead to automatically detect its 
appropriatedness and use it transparently. Bearophile posted at a link 
to such a blend string search routine 
(http://effbot.org/zone/stringlib.htm) that I think can be generalized 
quite easily.

Any ideas - please chime in.


Andrei
Jul 17 2010
next sibling parent reply Jonathan M Davis <jmdavisprog gmail.com> writes:
On Saturday 17 July 2010 15:55:26 Andrei Alexandrescu wrote:
 I was thinking of improving std.find. We have this bug report:
 
 http://d.puremagic.com/issues/show_bug.cgi?id=3923
 
 which is pretty vague but does have a point.
 
 For starters, it should be told that std.algorithm.find _does_ a lot,
 which at least partially justifies its complexity. One thing that I
 haven't seen other find()s doing is to be able to search in one pass a
 given range for multiple other ranges, like this:
 
 int[] a = [ 1, 4, 2, 3 ];
 assert(find(a, 5, 4, 3) == tuple([ 4, 2, 3 ], 2));
 
 When passed more than two arguments, find returns a tuple continuing the
 searched ranged positioned on the element found and the 1-based index of
 the parameter that was found. The trick is that find() makes exactly one
 pass through the searched range, which is often more efficient than
 searching the same range for each element in turn. Also the one-pass
 approach works with input ranges so it doesn't put pressure on the range
 capabilities.
 
 However the simplified find() looks like, I'd like to keep this feature
 unless it brings serious aggravation. Right now it's the #1 factor that
 complicates find's signature and implementation.
 
 Another aspect I'd like to discuss is use of Boyer-Moore and related
 fast finding techniques. Currently the use of B-M is explicit and a bit
 awkward. I was thinking instead to automatically detect its
 appropriatedness and use it transparently. Bearophile posted at a link
 to such a blend string search routine
 (http://effbot.org/zone/stringlib.htm) that I think can be generalized
 quite easily.
 
 Any ideas - please chime in.
 
 
 Andrei

I think that the problem with find() is not so much find() but it's documentation. In all honesty, anything with a return type like FindResult!(Range, Ranges) is going to scare people off. The find() that takes only a predicate and the range to search in is much better: Range find(alias pred, Range)(Range haystack); Now, unfortunately, I'm not sure how to improve the main version of find other than the return type. FindResult!(Range,Ranges) find(alias pred = "a == b", Range, Ranges...)(Range haystack, Ranges needles); becomes Range find(alias pred = "a == b", Range, Ranges...)(Range haystack, Ranges needles); However, that (alias pred = "a == b", Range, Ranges...) is still pretty ugly. Unfortunately, I'm not sure how you would eliminate that. The only part that the caller would normally care about is alias pred = "a == b", but the rest is part of the template definition as well, and I don't think that we have a way to clean that up at the moment. The other problem is that you're getting two totally different return types depending on whether you give it one needle or many (which may make Range not work as a return type anyway since what you get in the case of multiple needs isn't exactly a Range). So, I'd suggest either making it so that they return the same thing and you have to do a comparison to figure out which needle was found (in many cases, all you care about is that one of them was found anyway, but it would be less efficient in cases where you do care), or you should split find() into two functions - one which takes a single needle and one which takes many. That should make things a bit clearer. It doesn't really change the functionality, but the docs would be less cluttered. Another option would be to make find only return a range for multiple needles but have a separate function which returns a tuple if you really care, thereby simplifying find() a bit. In such a case, it would probably still be best to separate find() itself into two functions, but it wouldn't be as critical. As for integrating BoyerMoore into find(), if you can do it cleanly and seemlessly, I'm all for it. It would be good if there were a note about it in the documentation so that those using it wouldn't feel the need to reimplement it themselves, but I'm all for having versions of find() be specialized for the type that they're searching on if it can be done seemlessly. I'd even suggest doing the same for the various container types which could benefit from a specialized find - though perhaps those would be better served by having a special version of find() as part of their definition which std.algorithm.find() then calls. If anything, I've been more interested in canFind() and until() being made to match up with find(). I'd like to be able to give them both pretty much the same arguments and then get the bool from canFind() and the range that find() would have walked over in the case of until(). A function which gave you both the range that until() would have given you as well as the one which find() would have given you would be nice as well. I previously opened a bug with thoughts along those lines: http://d.puremagic.com/issues/show_bug.cgi?id=3888 In any case, I think that find() itself is more or less fine from a usage standpoint. It's the docs that need help. And splitting up the function and trying to simplify the signature in the docs (whether the signature itself is really simplified or not) is what really needs to happen. The other thing would be to add a lot more examples. That would be a big boon for a lot of std.algorithm functions. They're not necessarily hard to use, and the examples show it much more clearly than the signatures often do. The problem is that the signatures are so ugly. And the fact that they have to be (in terms of the actual function if not the documentation) is pretty much required given what they do, so I'm not sure how much that can really be fixed. Maybe some extensions to DDoc need to be done for it happen, but it would be good if we could find a way to give simplified function signatures for a lot of these functions (possibly with a way in the docs to see the full function signature) so that users of the docs can see what they need to to use the functions without having to wade through the nasty (albeit necessarily so) signatures. - Jonathan M Davis
Jul 17 2010
parent reply sybrandy <sybrandy gmail.com> writes:
 I think that the problem with find() is not so much find() but it's
documentation.
 In all honesty, anything with a return type like FindResult!(Range, Ranges) is
 going to scare people off.

I fully agree. I had no idea from the documentation what something like this was: [1,2,3,4][]. The array documentation doesn't have any details, so for the most part I've stayed away from std.algorithm because I don't understand the return types very well. Better documentation would help a lot. Casey
Jul 19 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 07/19/2010 06:36 PM, sybrandy wrote:
 I think that the problem with find() is not so much find() but it's
 documentation.
 In all honesty, anything with a return type like FindResult!(Range,
 Ranges) is
 going to scare people off.

I fully agree. I had no idea from the documentation what something like this was: [1,2,3,4][]. The array documentation doesn't have any details, so for the most part I've stayed away from std.algorithm because I don't understand the return types very well. Better documentation would help a lot.

I agree as well. FWIW [1, 2, 3, 4][] is a workaround for a (now gone) bug in the compiler. That bug made [1, 2, 3, 4] a fixed-size array by default. I'll work on improving the documentation. Andrei
Jul 19 2010
parent sybrandy <sybrandy gmail.com> writes:
On 07/19/2010 10:18 PM, Andrei Alexandrescu wrote:
 On 07/19/2010 06:36 PM, sybrandy wrote:
 I think that the problem with find() is not so much find() but it's
 documentation.
 In all honesty, anything with a return type like FindResult!(Range,
 Ranges) is
 going to scare people off.

I fully agree. I had no idea from the documentation what something like this was: [1,2,3,4][]. The array documentation doesn't have any details, so for the most part I've stayed away from std.algorithm because I don't understand the return types very well. Better documentation would help a lot.

I agree as well. FWIW [1, 2, 3, 4][] is a workaround for a (now gone) bug in the compiler. That bug made [1, 2, 3, 4] a fixed-size array by default. I'll work on improving the documentation. Andrei

Casey
Jul 20 2010
prev sibling parent reply sybrandy <sybrandy gmail.com> writes:
 In most cases, the return types are just ranges, but because they're
constructed
 by the algorithm, the algorithm creates its own range type. So, generally, all
 you need to know is that a range is being returned. But even then, you run into
 some issues where you need to do stuff like run the returned range through
 array() to get something useable - depending on what you're doing with it.
 Regardless of the return type though, std.algorithm is definitely a prime
example
 of how auto is your friend. That way you often don't have to care all that much
 about what the return type really looks like.

That paragraph would have been immensely useful to me a few months ago.
 However, the documentation on it is obviously ugly. It really should give the
 programmer exactly what they need to use the function and nothing more. As it
 stands, you have to really want to use it to figure it out. Personally, I think
 that it's well worth it - std.algorithm is fantastic - but the documentation is
 nowhere near as straightforward as it should be (even if using the functions is
 actually pretty straightforward in most cases).

Agreed. I'm also a big fan of examples as it's very easy to look at an example and get something working. Then, once you see it work and get somewhat of an understanding, you can dig deeper into the details to learn more. Casey
Jul 20 2010
parent bearophile <bearophileHUGS lycos.com> writes:
sybrandy:
 Agreed.  I'm also a big fan of examples as it's very easy to look at an 
 example and get something working.  Then, once you see it work and get 
 somewhat of an understanding, you can dig deeper into the details to 
 learn more.

I am not sure, but I think find() is not what I need, even if you add lot of usage examples to it. (It seems my list of find-like micropatterns got ignored, I don't know why, maybe they don't look enough like things done by STL C++.) Bye, bearophile
Jul 20 2010
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei:

Any ideas - please chime in.<

As with most software it's better to first ask yourself what your software will mostly be used for, and then to write it. Otherwise you are wasting your time and the user time. The most common usages (I call them micropatterns) for find-like functions (every one of the following seven operations takes an optional transform function like schwartzSort): 1) boolean predicate, is the item/iterable x inside the iterable foo? ("in" operator, or if not possible a "isIn" function). 2) int function, tell me how many items/arrays x are inside the iterable foo ("count" function). 3) int function, tell me the index of the first item/iterable x in the iterable foo ("index" function). 4 extra) find lazily all the indexes of an item/iterable x inside the iterable foo ("indexes" struct or class). 5) find the first max/min item in a iterable ("max", "min" functions, they have overloads for 2 and 3 items too). 6) find the index of the max/min item in a iterable ("maxPos", "minPos" functions). 7) find lazily all the max/min items in a iterable ("maxs", "mins" structs or classes). An iterable is a Range (like a string, array, AA, etc) or a struct/class that defines just opApply. In my dlibs1 in all those functions if the sequence is an associative arrays, the search is done on the keys. D2 AAs have .byKey() that reduces this need a bit. In the first 4 micropatterns it's good to have a specialization for strings, to increase performance with some string-specific tricks . Bye, bearophile
Jul 18 2010
prev sibling next sibling parent reply Philippe Sigaud <philippe.sigaud gmail.com> writes:
--0016368e2bcb22c957048bb02855
Content-Type: text/plain; charset=ISO-8859-1

On Sun, Jul 18, 2010 at 00:55, Andrei Alexandrescu  wrote:
*****
    I was thinking of improving std.find.

    For starters, it should be told that std.algorithm.find _does_ a lot,
which at least partially justifies its complexity. One thing that I haven't
seen other find()s doing is to be able to search in one pass a given range
for multiple other ranges, like this:

    int[] a = [ 1, 4, 2, 3 ];
    assert(find(a, 5, 4, 3) == tuple([ 4, 2, 3 ], 2));

    When passed more than two arguments, find returns a tuple continuing the
searched ranged positioned on the element found and the 1-based index of the
parameter that was found. The trick is that find() makes exactly one pass
through the searched range, which is often more efficient than searching the
same range for each element in turn. Also the one-pass approach works with
input ranges so it doesn't put pressure on the range capabilities.
*****

You could consider than find() is a multiple-returns function: it may return
0, 1 or many values (the successive finds). Which can, obviously, be easily
represented by a range. I see find() as bit like filter(): its return type
is a range determined by the predicate and whose elements are the rest of
the input range.
so:
find([ 1, 4, 2, 3, 4, 0 ], 4) returns a range whose elements are [4,2,3,4,0]
and [4,0].

If no needle is present in haystack, then the result range is empty. It's
just it's an empty range of ranges, not an empty version of the original
range.

A complement information can be how many steps were taken (that is, the
index of the found elements):  ([4,2,3,4,0], 1), ([4,0], 4). A policy or
another function could take care of that: findIndexed(haystack, needle) or
find!(withIndex)(haystack, needle).

The advantage of returning a range is the possibility to iterate on the
results. I, for one, like the idea of having find() in the same family that
workhorses like map() or filter()... The drawback is the necessity to test
the result with empty to know if you had results, but that's not different
from the current version.

So, you could have a findFirst() that does not return a range, but one
element, like the current find() does. Or call what I suggest findAll().

Now, for many needles... Currently, you pass the predicate "a==b" to find()
and use it on all needles. A possible generalization is to have a isOneOf()
predicate, called with isOneOf(a, b0, b1, ... bn) where a is the current
element being tested and the b's are the needles passed to find(). This
generic calling of pred with (a, needles) could be done for any predicate,
like for example:

find!(isBetween)(haystack, 0, 10)

where isBetween(a, left, right) returns true if (left <= a && a <= right).

The default would be "a==b" for one needle and isOneOf(a, b...) if
needles.length > 1. So you still get the current behavior, with a bit of
possible customization thrown in, like the isBetween example. I don't think
that's possible today?

The limit of this is you cannot easily get the index of the found needle.
btw, could you remind us why you used a 1-based index? My opinion on this is
that if you want to find any needle, then you mostly care for them being
present, and much less for wich one is the first. Also, the found element is
accessible by taking the front of the retuned range. Why provide it to the
caller?

Another problem I see is to get the nice current behavior of having a needle
being a range like haystack:

find([0,1,2,3,4,3,2,1], [4,3]) => [4,3,2,1]

And that's I particularly like.


*****
    Another aspect I'd like to discuss is use of Boyer-Moore and related
fast finding techniques. Currently the use of B-M is explicit and a bit
awkward. I was thinking instead to automatically detect its appropriatedness
and use it transparently. Bearophile posted at a link to such a blend string
search routine (http://effbot.org/zone/stringlib.htm) that I think can be
generalized quite easily.
*****

Indeed, reading this, it seems like it could be generalized easily. You need
a random-access range with a defined length. So the usage of such an algo
could indeed be done transparently. Hmm...


Philippe

--0016368e2bcb22c957048bb02855
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Sun, Jul 18, 2010 at 00:55, Andrei Alexandrescu=A0 wrote:<br>*****<br>=
=A0=A0=A0 I was thinking of improving std.find.<br><br>=A0=A0=A0 For starte=
rs, it should be told that std.algorithm.find _does_ a lot, which at least =
partially justifies its complexity. One thing that I haven&#39;t seen other=
 find()s doing is to be able to search in one pass a given range for multip=
le other ranges, like this:<br>
<br>=A0=A0=A0 int[] a =3D [ 1, 4, 2, 3 ];<br>=A0=A0=A0 assert(find(a, 5, 4,=
 3) =3D=3D tuple([ 4, 2, 3 ], 2));<br><br>=A0=A0=A0 When passed more than t=
wo arguments, find returns a tuple continuing the searched ranged positione=
d on the element found and the 1-based index of the parameter that was foun=
d. The trick is that find() makes exactly one pass through the searched ran=
ge, which is often more efficient than searching the same range for each el=
ement in turn. Also the one-pass approach works with input ranges so it doe=
sn&#39;t put pressure on the range capabilities.<br>
*****<br>=A0=A0=A0 <br>You could consider than find() is a multiple-returns=
 function: it may return 0, 1 or many values (the successive finds). Which =
can, obviously, be easily represented by a range. I see find() as bit like =
filter(): its return type is a range determined by the predicate and whose =
elements are the rest of the input range.<br>
so:<br>find([ 1, 4, 2, 3, 4, 0 ], 4) returns a range whose elements are [4,=
2,3,4,0] and [4,0].<br><br>If no needle is present in haystack, then the re=
sult range is empty. It&#39;s just it&#39;s an empty range of ranges, not a=
n empty version of the original range.<br>
<br>A complement information can be how many steps were taken (that is, the=
 index of the found elements):=A0 ([4,2,3,4,0], 1), ([4,0], 4). A policy or=
 another function could take care of that: findIndexed(haystack, needle) or=
 find!(withIndex)(haystack, needle).<br>
<br>The advantage of returning a range is the possibility to iterate on the=
 results. I, for one, like the idea of having find() in the same family tha=
t workhorses like map() or filter()... The drawback is the necessity to tes=
t the result with empty to know if you had results, but that&#39;s not diff=
erent from the current version.<br>
<br>So, you could have a findFirst() that does not return a range, but one =
element, like the current find() does. Or call what I suggest findAll().<br=
<br>Now, for many needles... Currently, you pass the predicate &quot;a=3D=

s to have a isOneOf() predicate, called with isOneOf(a, b0, b1, ... bn) whe= re a is the current element being tested and the b&#39;s are the needles pa= ssed to find(). This generic calling of pred with (a, needles) could be don= e for any predicate, like for example:<br> <br>find!(isBetween)(haystack, 0, 10)<br><br>where isBetween(a, left, right= ) returns true if (left &lt;=3D a &amp;&amp; a &lt;=3D right).<br><br>The d= efault would be &quot;a=3D=3Db&quot; for one needle and isOneOf(a, b...) if= needles.length &gt; 1. So you still get the current behavior, with a bit o= f possible customization thrown in, like the isBetween example. I don&#39;t= think that&#39;s possible today?<br> <br>The limit of this is you cannot easily get the index of the found needl= e. btw, could you remind us why you used a 1-based index? My opinion on thi= s is that if you want to find any needle, then you mostly care for them bei= ng present, and much less for wich one is the first. Also, the found elemen= t is accessible by taking the front of the retuned range. Why provide it to= the caller?<br> <br>Another problem I see is to get the nice current behavior of having a n= eedle being a range like haystack:<br><br>find([0,1,2,3,4,3,2,1], [4,3]) = =3D&gt; [4,3,2,1]<br><br>And that&#39;s I particularly like.<br><br><br>***= **<br> =A0=A0=A0 Another aspect I&#39;d like to discuss is use of Boyer-Moore and = related fast finding techniques. Currently the use of B-M is explicit and a= bit awkward. I was thinking instead to automatically detect its appropriat= edness and use it transparently. Bearophile posted at a link to such a blen= d string search routine (<a href=3D"http://effbot.org/zone/stringlib.htm">h= ttp://effbot.org/zone/stringlib.htm</a>) that I think can be generalized qu= ite easily.<br> *****<br>=A0=A0=A0 <br>Indeed, reading this, it seems like it could be gene= ralized easily. You need a random-access range with a defined length. So th= e usage of such an algo could indeed be done transparently. Hmm...<br><br><= br> Philippe<br><br> --0016368e2bcb22c957048bb02855--
Jul 18 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 07/18/2010 04:30 PM, Philippe Sigaud wrote:
 On Sun, Jul 18, 2010 at 00:55, Andrei Alexandrescu  wrote:
 You could consider than find() is a multiple-returns function: it may
 return 0, 1 or many values (the successive finds). Which can, obviously,
 be easily represented by a range. I see find() as bit like filter(): its
 return type is a range determined by the predicate and whose elements
 are the rest of the input range.
 so:
 find([ 1, 4, 2, 3, 4, 0 ], 4) returns a range whose elements are
 [4,2,3,4,0] and [4,0].

 If no needle is present in haystack, then the result range is empty.
 It's just it's an empty range of ranges, not an empty version of the
 original range.

 A complement information can be how many steps were taken (that is, the
 index of the found elements):  ([4,2,3,4,0], 1), ([4,0], 4). A policy or
 another function could take care of that: findIndexed(haystack, needle)
 or find!(withIndex)(haystack, needle).

I agree that a findAll function yielding a range is nice, however it's not the simplest interface. I think there should be a function that just finds something somewhere.
 Now, for many needles... Currently, you pass the predicate "a==b" to
 find() and use it on all needles. A possible generalization is to have a
 isOneOf() predicate, called with isOneOf(a, b0, b1, ... bn) where a is
 the current element being tested and the b's are the needles passed to
 find(). This generic calling of pred with (a, needles) could be done for
 any predicate, like for example:

 find!(isBetween)(haystack, 0, 10)

Aren't such scenarios better served by putting the limits inside the predicate? Otherwise the list of what can be done could go on forever.
 The limit of this is you cannot easily get the index of the found
 needle. btw, could you remind us why you used a 1-based index? My
 opinion on this is that if you want to find any needle, then you mostly
 care for them being present, and much less for wich one is the first.

I used one-based index to make tests easier - as you said, most often you care for the presence so zero/nonzero is easiest to tell. Then I thought it wouldn't hurt to provide the index since it's not any extra work.
 Also, the found element is accessible by taking the front of the retuned
 range. Why provide it to the caller?

Because that doesn't generalize well to searching ranges (as oposed to individual elements).
 Another problem I see is to get the nice current behavior of having a
 needle being a range like haystack:

 find([0,1,2,3,4,3,2,1], [4,3]) => [4,3,2,1]

 And that's I particularly like.

Not sure I understand whether that's good or bad, but I also want to provide the likes of findSkip() (it's been discussed here under a different name, findConsume) that positions the searched range after the found range: findSkip([0,1,2,3,4,3,2,1], [4,3]) => [2,1] It's very useful in practice.
 *****
      Another aspect I'd like to discuss is use of Boyer-Moore and
 related fast finding techniques. Currently the use of B-M is explicit
 and a bit awkward. I was thinking instead to automatically detect its
 appropriatedness and use it transparently. Bearophile posted at a link
 to such a blend string search routine
 (http://effbot.org/zone/stringlib.htm) that I think can be generalized
 quite easily.
 *****

 Indeed, reading this, it seems like it could be generalized easily. You
 need a random-access range with a defined length. So the usage of such
 an algo could indeed be done transparently. Hmm...

The current B-M implementation already works with any random-access range with length, but it's not integrated transparently with the regular find. I wonder what the threshold conditions might look like. Andrei
Jul 18 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 07/19/2010 02:12 PM, Philippe Sigaud wrote:
 On Mon, Jul 19, 2010 at 03:19, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>>
 wrote:


     I agree that a findAll function yielding a range is nice, however
     it's not the simplest interface. I think there should be a function
     that just finds something somewhere.


 Yes, I agree that's the common case.
 What I personally would like is a way to find an element in an ordered
 container (binary tree...)

Wouldn't that be a member of the binary tree?
     Aren't such scenarios better served by putting the limits inside the
     predicate? Otherwise the list of what can be done could go on forever.


 The problem is that, as the predicate is passed at compile-time, the
 limits must be CT-defined too. I'd like the limits to be defined at
 runtime. I run into this regularly, while using predicates in
 std.algorithms.

I think this is a misunderstanding. All of std.algorithm works with delegates: int[] a = [ 1, 4, 2, 3 ]; int b = 2; assert(find!((x) { return x > b; })(a) == [4, 2, 3]);
 A not-so-uncommon scenario for me is having one range delivering limits
 (or zones to explore, or any other runtime arguments), and finding a
 value inside those limits in another range. Ideally, I'd like to use
 find() on a spatial structure like an R-tree. (
 http://en.wikipedia.org/wiki/R-tree ).

 But I admit it's not that common.

Unless you manage to define appropriate and generally applicable iterators, I think that would overgeneralize find().
 Anything that can mimic regex functions on generic ranges is good, IMO.
 I regularly find myself wanting to do some regex-like matching. We do
 not need real pattern matching but even with simplified predicates, we
 can do many things.

I agree we're in desperate need for a good range-based, character-divorced, statically-computed regex engine.
 I'm sorry I didn't look at your implementation, as I don't know
 Boyer-Moore that much. Does it need to know the alphabet size or do you
 use another variant?

My explanation of B-M would do little more than pasting information from various places on the Net. Andrei
Jul 19 2010
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 20.07.2010 0:05, Philippe Sigaud wrote:
 On Mon, Jul 19, 2010 at 21:23, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> 
 wrote:


         What I personally would like is a way to find an element in an
         ordered
         container (binary tree...)


     Wouldn't that be a member of the binary tree?


 Yes, indeed. Unless I define some cousin to range, a recursive range, 
 to iterate on any recursive container, like trees or graphs and then 
 devise algorithms on them. I played with this idea a few weeks ago but 
 went to do other things. Anyway, carry on.
 Btw, I still plan to update some simple generic graphs and trees 
 structs to obey TotalContainer interface when it makes sense, and 
 update my graph algorithms accordingly. If I ever get somewhere, I'll 
 post something.


            Aren't such scenarios better served by putting the limits
         inside the
            predicate? Otherwise the list of what can be done could go
         on forever.


         The problem is that, as the predicate is passed at
         compile-time, the
         limits must be CT-defined too. I'd like the limits to be
         defined at
         runtime. I run into this regularly, while using predicates in
         std.algorithms.


     I think this is a misunderstanding. All of std.algorithm works
     with delegates:


        int[] a = [ 1, 4, 2, 3 ];
        int b = 2;
        assert(find!((x) { return x > b; })(a) == [4, 2, 3]);

 Ah yes, but I regularly use algorithms structs as return values, like 
 this:

 auto nTimes(E, R)(E multiplier, R range)
 {
     return map!((ElementType!R e) { return e*multiplier;})(range);
 }

auto nTimes(E, R)(E multiplier, R range){ ElementType!R fun(ElementType!R e) { return e*multiplier; } return map!(fun)(range); } For some reason, compiler never plays nice with map, and I believe there are some issues with current implementation (2.047 release ) that Andrei (hopefully) fixed in recent commit.
 Philippe

-- Dmitry Olshansky
Jul 19 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 07/19/2010 05:36 PM, Dmitry Olshansky wrote:
 On 20.07.2010 0:05, Philippe Sigaud wrote:
 On Mon, Jul 19, 2010 at 21:23, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>>
 wrote:


 What I personally would like is a way to find an element in an
 ordered
 container (binary tree...)


 Wouldn't that be a member of the binary tree?


 Yes, indeed. Unless I define some cousin to range, a recursive range,
 to iterate on any recursive container, like trees or graphs and then
 devise algorithms on them. I played with this idea a few weeks ago but
 went to do other things. Anyway, carry on.
 Btw, I still plan to update some simple generic graphs and trees
 structs to obey TotalContainer interface when it makes sense, and
 update my graph algorithms accordingly. If I ever get somewhere, I'll
 post something.


 Aren't such scenarios better served by putting the limits
 inside the
 predicate? Otherwise the list of what can be done could go
 on forever.


 The problem is that, as the predicate is passed at
 compile-time, the
 limits must be CT-defined too. I'd like the limits to be
 defined at
 runtime. I run into this regularly, while using predicates in
 std.algorithms.


 I think this is a misunderstanding. All of std.algorithm works
 with delegates:


 int[] a = [ 1, 4, 2, 3 ];
 int b = 2;
 assert(find!((x) { return x > b; })(a) == [4, 2, 3]);

 Ah yes, but I regularly use algorithms structs as return values, like
 this:

 auto nTimes(E, R)(E multiplier, R range)
 {
 return map!((ElementType!R e) { return e*multiplier;})(range);
 }

auto nTimes(E, R)(E multiplier, R range){ ElementType!R fun(ElementType!R e) { return e*multiplier; } return map!(fun)(range); } For some reason, compiler never plays nice with map, and I believe there are some issues with current implementation (2.047 release ) that Andrei (hopefully) fixed in recent commit.

Yes, I'd discussed this with Walter on Friday and he said the semantics should be identical in the two cases. So we're dealing with a compiler bug. Andrei
Jul 19 2010
prev sibling next sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
--0016e6de17dab76aa2048bb032c3
Content-Type: text/plain; charset=ISO-8859-1

On Sun, Jul 18, 2010 at 06:19, Jonathan M Davis wrote:
*****
I think that the problem with find() is not so much find() but it's
documentation.
In all honesty, anything with a return type like FindResult!(Range, Ranges)
is
going to scare people off.
*****

What should be done (in a perfect world) is to have a simplified signature
that can become the complete signature on a click

auto find!()(haystack, needles)  =>  FindResult!(etc.  ...
the empty !() being a cue that it's a template function and that there is
more to it that meets the eye.

(edit: crap, you said it afterwards. I should not reply hours after first
reading a message)

*****
If anything, I've been more interested in canFind() and until() being made
to
match up with find(). I'd like to be able to give them both pretty much the
same
arguments and then get the bool from canFind() and the range that find()
would
have walked over in the case of until(). A function which gave you both the
range that until() would have given you as well as the one which find()
would
have given you would be nice as well. I previously opened a bug with
thoughts
along those lines: http://d.puremagic.com/issues/show_bug.cgi?id=3888
*****

Ah, these are called takeWhile, dropWhile and span, in Haskell and other
sequence-heavy languages.
Or splitBy, cutAt, etc. Quite handy to have, I agree std.algorithm should
have them. But then, it's huge module already.


*****
In any case, I think that find() itself is more or less fine from a usage
standpoint. It's the docs that need help. The other thing would
be to add a lot more examples. That would be a big boon for a lot of
std.algorithm functions. They're not necessarily hard to use, and the
examples
show it much more clearly than the signatures often do.
*****

I sometimes dream of a complement to Phobos that would demonstrate its use,
either by providing lots of examples for each function or by taking a module
and tackling a task with it. Obviously, as both a complement to Phobos and a
way to demo-nstrate D, it should be called Deimos, the second moon of Mars,
Phobos' sibling.

Now, the Deimos Project, that's some cool name, if I may say so.

But I guess the wiki already does this.

Philippe

--0016e6de17dab76aa2048bb032c3
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div class=3D"gmail_quote">On Sun, Jul 18, 2010 at 06:19, Jonathan M Davis<=
span dir=3D"ltr"></span> wrote:<br>*****<br>I think that the problem with f=
ind() is not so much find() but it&#39;s documentation.<br>In all honesty, =
anything with a return type like FindResult!(Range, Ranges) is<br>
going to scare people off. <br>*****<br><br>What should be done (in a perfe=
ct world) is to have a simplified signature that can become the complete si=
gnature on a click<br><br>auto find!()(haystack, needles)=A0 =3D&gt;=A0 Fin=
dResult!(etc.=A0 ...<br>
the empty !() being a cue that it&#39;s a template function and that there =
is more to it that meets the eye.<br><br>(edit: crap, you said it afterward=
s. I should not reply hours after first reading a message)<br><br>*****<br>
If anything, I&#39;ve been more interested in canFind() and until() being m=
ade to<br>match up with find(). I&#39;d like to be able to give them both p=
retty much the same<br>arguments and then get the bool from canFind() and t=
he range that find() would<br>
have walked over in the case of until(). A function which gave you both the=
<br>range that until() would have given you as well as the one which find()=
 would<br>have given you would be nice as well. I previously opened a bug w=
ith thoughts<br>
along those lines: <a href=3D"http://d.puremagic.com/issues/show_bug.cgi?id=
=3D3888">http://d.puremagic.com/issues/show_bug.cgi?id=3D3888</a><br>*****<=
br><br>Ah, these are called takeWhile, dropWhile and span, in Haskell and o=
ther sequence-heavy languages.<br>
Or splitBy, cutAt, etc. Quite handy to have, I agree std.algorithm should h=
ave them. But then, it&#39;s huge module already.<br><br><br>*****<br>In an=
y case, I think that find() itself is more or less fine from a usage<br>
standpoint. It&#39;s the docs that need help. The other thing would<br>be t=
o add a lot more examples. That would be a big boon for a lot of<br>std.alg=
orithm functions. They&#39;re not necessarily hard to use, and the examples=
<br>
show it much more clearly than the signatures often do. <br>*****<br><br>I =
sometimes dream of a complement to Phobos that would demonstrate its use, e=
ither by providing lots of examples for each function or by taking a module=
 and tackling a task with it. Obviously, as both a complement to Phobos and=
 a way to demo-nstrate D, it should be called Deimos, the second moon of Ma=
rs, Phobos&#39; sibling.<br>
<br>Now, the Deimos Project, that&#39;s some cool name, if I may say so.<br=
<br>But I guess the wiki already does this.<br><br>Philippe<br><br><br></d=

--0016e6de17dab76aa2048bb032c3--
Jul 18 2010
prev sibling next sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
--000325557a7e5923cf048bc25a58
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Jul 19, 2010 at 03:19, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 I agree that a findAll function yielding a range is nice, however it's not
 the simplest interface. I think there should be a function that just finds
 something somewhere.

What I personally would like is a way to find an element in an ordered container (binary tree...)
  Now, for many needles... Currently, you pass the predicate "a==b" to
 find() and use it on all needles. A possible generalization is to have a
 isOneOf() predicate, called with isOneOf(a, b0, b1, ... bn) where a is
 the current element being tested and the b's are the needles passed to
 find(). This generic calling of pred with (a, needles) could be done for
 any predicate, like for example:

 find!(isBetween)(haystack, 0, 10)

Aren't such scenarios better served by putting the limits inside the predicate? Otherwise the list of what can be done could go on forever.

The problem is that, as the predicate is passed at compile-time, the limits must be CT-defined too. I'd like the limits to be defined at runtime. I run into this regularly, while using predicates in std.algorithms. A not-so-uncommon scenario for me is having one range delivering limits (or zones to explore, or any other runtime arguments), and finding a value inside those limits in another range. Ideally, I'd like to use find() on a spatial structure like an R-tree. ( http://en.wikipedia.org/wiki/R-tree ). But I admit it's not that common.
  The limit of this is you cannot easily get the index of the found
 needle. btw, could you remind us why you used a 1-based index? My
 opinion on this is that if you want to find any needle, then you mostly
 care for them being present, and much less for wich one is the first.

I used one-based index to make tests easier - as you said, most often you care for the presence so zero/nonzero is easiest to tell. Then I thought it wouldn't hurt to provide the index since it's not any extra work.

Ah, I thought the scenario was: 1) test for the returned range (as the tuple._0 element) being empty or not. 2) if not empty, something was found, get the index. I forgot getting 0 as index would mean 'no element found'. If I understood you correctly, then I'd prefer a -1 index indicating failure, and a 0-based indexing on the needles if on is found. But that's a cosmetic issue.
  Also, the found element is accessible by taking the front of the retuned
 range. Why provide it to the caller?

Because that doesn't generalize well to searching ranges (as oposed to individual elements).

Good point.
  Another problem I see is to get the nice current behavior of having a
 needle being a range like haystack:

 find([0,1,2,3,4,3,2,1], [4,3]) => [4,3,2,1]

 And that's I particularly like.

Not sure I understand whether that's good or bad,

That's good. I mean, the current situation is good and that's a functionality I like.
 but I also want to provide the likes of findSkip() (it's been discussed
 here under a different name, findConsume) that positions the searched range
 after the found range:

 findSkip([0,1,2,3,4,3,2,1], [4,3]) => [2,1]

 It's very useful in practice.

I see that. And and empty range if it found nothing? Another useful one is split() to cut the range in two or three parts : before the match, the match itself (which can be one of the needles, so we do not know it in advance) and the part after that. Return that triple as a tuple. Hey, that way you can even implement replace: return chain(firstpart, replacedmatch, tailpart). Anything that can mimic regex functions on generic ranges is good, IMO. I regularly find myself wanting to do some regex-like matching. We do not need real pattern matching but even with simplified predicates, we can do many things.
  *****
     Another aspect I'd like to discuss is use of Boyer-Moore and
 related fast finding techniques. Currently the use of B-M is explicit
 and a bit awkward. I was thinking instead to automatically detect its
 appropriatedness and use it transparently. Bearophile posted at a link
 to such a blend string search routine
 (http://effbot.org/zone/stringlib.htm) that I think can be generalized
 quite easily.
 *****

 Indeed, reading this, it seems like it could be generalized easily. You
 need a random-access range with a defined length. So the usage of such
 an algo could indeed be done transparently. Hmm...

The current B-M implementation already works with any random-access range with length, but it's not integrated transparently with the regular find. I wonder what the threshold conditions might look like.

I'm sorry I didn't look at your implementation, as I don't know Boyer-Moore that much. Does it need to know the alphabet size or do you use another variant? Philippe --000325557a7e5923cf048bc25a58 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Mon, Jul 19, 2010 at 03:19, Andrei Alexandres= cu <span dir=3D"ltr">&lt;<a href=3D"mailto:SeeWebsiteForEmail erdani.org">S= eeWebsiteForEmail erdani.org</a>&gt;</span> wrote:<br><blockquote class=3D"= gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb= (204, 204, 204); padding-left: 1ex;"> <br> I agree that a findAll function yielding a range is nice, however it&#39;s = not the simplest interface. I think there should be a function that just fi= nds something somewhere.<div class=3D"im"><br></div></blockquote><div><br> Yes, I agree that&#39;s the common case. <br>What I personally would like i= s a way to find an element in an ordered container (binary tree...)<br>=A0<= /div><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; = border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div class=3D"im"> <br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> Now, for many needles... Currently, you pass the predicate &quot;a=3D=3Db&q= uot; to<br> find() and use it on all needles. A possible generalization is to have a<br=

the current element being tested and the b&#39;s are the needles passed to<= br> find(). This generic calling of pred with (a, needles) could be done for<br=

<br> find!(isBetween)(haystack, 0, 10)<br> </blockquote> <br></div> Aren&#39;t such scenarios better served by putting the limits inside the pr= edicate? Otherwise the list of what can be done could go on forever.</block= quote><div><br>The problem is that, as the predicate is passed at compile-t= ime, the limits must be CT-defined too. I&#39;d like the limits to be defin= ed at runtime. I run into this regularly, while using predicates in std.alg= orithms.<br> <br>A not-so-uncommon scenario for me is having one range delivering limits= (or zones to explore, or any other runtime arguments), and finding a value= inside those limits in another range. Ideally, I&#39;d like to use find() = on a spatial structure like an R-tree. ( <a href=3D"http://en.wikipedia.org= /wiki/R-tree">http://en.wikipedia.org/wiki/R-tree</a> ).<br> <br>But I admit it&#39;s not that common.<br><br>=A0</div><blockquote class= =3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid= rgb(204, 204, 204); padding-left: 1ex;"><div class=3D"im"> <br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> The limit of this is you cannot easily get the index of the found<br> needle. btw, could you remind us why you used a 1-based index? My<br> opinion on this is that if you want to find any needle, then you mostly<br> care for them being present, and much less for wich one is the first.<br> </blockquote> <br></div> I used one-based index to make tests easier - as you said, most often you c= are for the presence so zero/nonzero is easiest to tell. Then I thought it = wouldn&#39;t hurt to provide the index since it&#39;s not any extra work.</= blockquote> <div><br>Ah, I thought the scenario was: 1) test for the returned range (as= the tuple._0 element) being empty or not. 2) if not empty, something was f= ound, get the index. I forgot getting 0 as index would mean &#39;no element= found&#39;.<br> If I understood you correctly, then I&#39;d prefer a -1 index indicating fa= ilure, and a 0-based indexing on the needles if on is found. But that&#39;s= a cosmetic issue.<br><br><br>=A0</div><blockquote class=3D"gmail_quote" st= yle=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204)= ; padding-left: 1ex;"> <div class=3D"im"> <br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> Also, the found element is accessible by taking the front of the retuned<br=

</blockquote> <br></div> Because that doesn&#39;t generalize well to searching ranges (as oposed to = individual elements).</blockquote><div><br>Good point.<br><br>=A0</div><blo= ckquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-le= ft: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <div class=3D"im"> <br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> Another problem I see is to get the nice current behavior of having a<br> needle being a range like haystack:<br> <br> find([0,1,2,3,4,3,2,1], [4,3]) =3D&gt; [4,3,2,1]<br> <br> And that&#39;s I particularly like.<br> </blockquote> <br></div> Not sure I understand whether that&#39;s good or bad, </blockquote><div><br=
That&#39;s good. I mean, the current situation is good and that&#39;s a fu=

=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); p= adding-left: 1ex;"> but I also want to provide the likes of findSkip() (it&#39;s been discussed= here under a different name, findConsume) that positions the searched rang= e after the found range:<br> <br> findSkip([0,1,2,3,4,3,2,1], [4,3]) =3D&gt; [2,1]<br> <br> It&#39;s very useful in practice.<br></blockquote><div><br>I see that. And = and empty range if it found nothing? <br>Another useful one is split() to c= ut the range in two or three parts : before the match, the match itself (wh= ich can be one of the needles, so we do not know it in advance) and the par= t after that. Return that triple as a tuple. Hey, that way you can even imp= lement replace: return chain(firstpart, replacedmatch, tailpart). <br> <br>Anything that can mimic regex functions on generic ranges is good, IMO.= I regularly find myself wanting to do some regex-like matching. We do not = need real pattern matching but even with simplified predicates, we can do m= any things.<br> <br>=A0<br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt= 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> *****<div class=3D"im"><br> =A0 =A0 Another aspect I&#39;d like to discuss is use of Boyer-Moore and<b= r> related fast finding techniques. Currently the use of B-M is explicit<br> and a bit awkward. I was thinking instead to automatically detect its<br> appropriatedness and use it transparently. Bearophile posted at a link<br> to such a blend string search routine<br> (<a href=3D"http://effbot.org/zone/stringlib.htm" target=3D"_blank">http://= effbot.org/zone/stringlib.htm</a>) that I think can be generalized<br> quite easily.<br></div><div class=3D"im"> *****<br> <br> Indeed, reading this, it seems like it could be generalized easily. You<br> need a random-access range with a defined length. So the usage of such<br> an algo could indeed be done transparently. Hmm...<br> </div></blockquote> <br> The current B-M implementation already works with any random-access range w= ith length, but it&#39;s not integrated transparently with the regular find= . I wonder what the threshold conditions might look like.<br></blockquote> <div><br>I&#39;m sorry I didn&#39;t look at your implementation, as I don&#= 39;t know Boyer-Moore that much. Does it need to know the alphabet size or = do you use another variant?<br><br>Philippe<br></div></div> --000325557a7e5923cf048bc25a58--
Jul 19 2010
prev sibling next sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
--0016e6d644caf7802c048bc3161a
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Jul 19, 2010 at 21:23, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 What I personally would like is a way to find an element in an ordered
 container (binary tree...)

Wouldn't that be a member of the binary tree?

Yes, indeed. Unless I define some cousin to range, a recursive range, to iterate on any recursive container, like trees or graphs and then devise algorithms on them. I played with this idea a few weeks ago but went to do other things. Anyway, carry on. Btw, I still plan to update some simple generic graphs and trees structs to obey TotalContainer interface when it makes sense, and update my graph algorithms accordingly. If I ever get somewhere, I'll post something.
     Aren't such scenarios better served by putting the limits inside the
    predicate? Otherwise the list of what can be done could go on forever.


 The problem is that, as the predicate is passed at compile-time, the
 limits must be CT-defined too. I'd like the limits to be defined at
 runtime. I run into this regularly, while using predicates in
 std.algorithms.

I think this is a misunderstanding. All of std.algorithm works with delegates: int[] a = [ 1, 4, 2, 3 ]; int b = 2; assert(find!((x) { return x > b; })(a) == [4, 2, 3]); Ah yes, but I regularly use algorithms structs as return values, like this:

auto nTimes(E, R)(E multiplier, R range) { return map!((ElementType!R e) { return e*multiplier;})(range); } Error: delegate std.algorithm.__dgliteral2 cannot access frame of function __dgliteral2| I ran into this yesterday while trying to encode power series as (possibly infinite) ranges, and getting problems for the operators overloading. Too bad, I got to the point where I could calculate sin/cos/exp values quite precisely and then could easily define derivation and such. There are workaround, but they are not as handy. I had the same problems some time ago, trying to define predicates on the fly, though I cannot right now remember what it was. I know it's not a problem in std.algorithm per se, but it's a limit on its usefulness, as far as I'm concerned.
  Anything that can mimic regex functions on generic ranges is good, IMO.
 I regularly find myself wanting to do some regex-like matching. We do
 not need real pattern matching but even with simplified predicates, we
 can do many things.

I agree we're in desperate need for a good range-based, character-divorced, statically-computed regex engine.

Oh, a full statically computed engine would be the Graal. But I did not go that far. Being able to simply match, split and replace at runtime on any range would be good also, without going all regex. Heck, I may even have already coded something like this, I'll go and check. What I'm trying to do right now as a back-burner task is a function transforming a range in a set-like range (each element is present only once), but testing for "did I already see that?" with a user-provided predicate.
  I'm sorry I didn't look at your implementation, as I don't know
 Boyer-Moore that much. Does it need to know the alphabet size or do you
 use another variant?

My explanation of B-M would do little more than pasting information from various places on the Net.

OK. Philippe --0016e6d644caf7802c048bc3161a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Mon, Jul 19, 2010 at 21:23, Andrei Alexandres= cu <span dir=3D"ltr">&lt;<a href=3D"mailto:SeeWebsiteForEmail erdani.org">S= eeWebsiteForEmail erdani.org</a>&gt;</span> wrote:<br><blockquote class=3D"= gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb= (204, 204, 204); padding-left: 1ex;"> <br><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; b= order-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class=3D= "im"> What I personally would like is a way to find an element in an ordered<br> container (binary tree...)<br> </div></blockquote> <br> Wouldn&#39;t that be a member of the binary tree?<br></blockquote><div><br>= Yes, indeed. Unless I define some cousin to range, a recursive range, to it= erate on any recursive container, like trees or graphs and then devise algo= rithms on them. I played with this idea a few weeks ago but went to do othe= r things. Anyway, carry on. <br> Btw, I still plan to update some simple generic graphs and trees structs to= obey TotalContainer interface when it makes sense, and update my graph alg= orithms accordingly. If I ever get somewhere, I&#39;ll post something.<br> <br>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt= 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> <br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class=3D"im"=

e the<br> =A0 =A0predicate? Otherwise the list of what can be done could go on forev= er.<br> <br> <br></div><div class=3D"im"> The problem is that, as the predicate is passed at compile-time, the<br> limits must be CT-defined too. I&#39;d like the limits to be defined at<br> runtime. I run into this regularly, while using predicates in<br> std.algorithms.<br> </div></blockquote> <br> I think this is a misunderstanding. All of std.algorithm works with delegat= es:<div class=3D"im"><br> <br> =A0 =A0int[] a =3D [ 1, 4, 2, 3 ];<br></div> =A0 =A0int b =3D 2;<br> =A0 =A0assert(find!((x) { return x &gt; b; })(a) =3D=3D [4, 2, 3]);<div cl= ass=3D"im"><br></div></blockquote><div>Ah yes, but I regularly use algorith= ms structs as return values, like this:<br><br>auto nTimes(E, R)(E multipli= er, R range)<br> {<br>=A0=A0=A0 return map!((ElementType!R e) { return e*multiplier;})(range= );<br>}<br><br>Error: delegate std.algorithm.__dgliteral2 cannot access fra= me of function __dgliteral2|<br><br>I ran into this yesterday while trying = to encode power series as (possibly infinite) ranges, and getting problems = for the operators overloading. Too bad, I got to the point where I could ca= lculate sin/cos/exp values quite precisely and then could easily define der= ivation and such. There are workaround, but they are not as handy.<br> I had the same problems some time ago, trying to define predicates on the f= ly, though I cannot right now remember what it was.<br><br>I know it&#39;s = not a problem in std.algorithm per se, but it&#39;s a limit on its usefulne= ss, as far as I&#39;m concerned.<br> <br><br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0p= t 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><di= v class=3D"im"> <br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> Anything that can mimic regex functions on generic ranges is good, IMO.<br> I regularly find myself wanting to do some regex-like matching. We do<br> not need real pattern matching but even with simplified predicates, we<br> can do many things.<br> </blockquote> <br></div> I agree we&#39;re in desperate need for a good range-based, character-divor= ced, statically-computed regex engine.</blockquote><div><br>Oh, a full stat= ically computed engine would be the Graal. But I did not go that far. Being= able to simply match, split and replace at runtime on any range would be g= ood also, without going all regex. Heck, I may even have already coded some= thing like this, I&#39;ll go and check.<br> <br>What I&#39;m trying to do right now as a back-burner task is a function= transforming a range in a set-like range (each element is present only onc= e), but testing for &quot;did I already see that?&quot; with a user-provide= d predicate.<br> <br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.= 8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div cl= ass=3D"im"> <br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> I&#39;m sorry I didn&#39;t look at your implementation, as I don&#39;t know= <br> Boyer-Moore that much. Does it need to know the alphabet size or do you<br> use another variant?<br> </blockquote> <br></div> My explanation of B-M would do little more than pasting information from va= rious places on the Net.<br></blockquote><div><br>OK.<br><br><br>Philippe<b= r></div></div> --0016e6d644caf7802c048bc3161a--
Jul 19 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisprog gmail.com> writes:
On Monday, July 19, 2010 16:36:57 sybrandy wrote:
 I think that the problem with find() is not so much find() but it's
 documentation. In all honesty, anything with a return type like
 FindResult!(Range, Ranges) is going to scare people off.

I fully agree. I had no idea from the documentation what something like this was: [1,2,3,4][]. The array documentation doesn't have any details, so for the most part I've stayed away from std.algorithm because I don't understand the return types very well. Better documentation would help a lot. Casey

In most cases, the return types are just ranges, but because they're constructed by the algorithm, the algorithm creates its own range type. So, generally, all you need to know is that a range is being returned. But even then, you run into some issues where you need to do stuff like run the returned range through array() to get something useable - depending on what you're doing with it. Regardless of the return type though, std.algorithm is definitely a prime example of how auto is your friend. That way you often don't have to care all that much about what the return type really looks like. However, the documentation on it is obviously ugly. It really should give the programmer exactly what they need to use the function and nothing more. As it stands, you have to really want to use it to figure it out. Personally, I think that it's well worth it - std.algorithm is fantastic - but the documentation is nowhere near as straightforward as it should be (even if using the functions is actually pretty straightforward in most cases). - Jonathan M Davis
Jul 19 2010
prev sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
--001517447eeadd8367048bcbfb6c
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Jul 20, 2010 at 00:36, Dmitry Olshansky <dmitry.olsh gmail.com>wrote:


 Ah yes, but I regularly use algorithms structs as return values, like this:
 auto nTimes(E, R)(E multiplier, R range)
 {
    return map!((ElementType!R e) { return e*multiplier;})(range);
 }


 Try this workaround, replacing delegate with nested function, works for me:

 auto nTimes(E, R)(E multiplier, R range){
    ElementType!R fun(ElementType!R e) { return e*multiplier; }
    return map!(fun)(range);
 }
 For some reason, compiler never plays nice with map, and I believe there
 are some issues with current implementation (2.047 release ) that Andrei
 (hopefully) fixed in recent commit.

at the time. Good to know it's a bug. right now, it does not work, though: void main() { auto arr = [0,1,2,3,4]; writeln(array(nTimes(3, arr))); // prints 0, then four big numbers: it's stomping on another part of memory? } Strangely, even using 0 as a multiplier does not work. Using .save() in nTimes neither. You see, these kinds of issues all have solutions (in that particular case, the workaround is a bit complicated but works in practice), but they are cumbersome and limit my usage of std.algorithms. FWIW, the workaround I used there is to take the multiplier, call repeat on it to make it a range (3,3,3,3 ...), zip the two ranges together and call map with a function like this: times(A,B)(Tuple!(A,B) t) { return t._0*t._1;} so, you get: (0,3), (1,3), (2,3), ... and then 0,3,6, etc. Extension of this idea saved me many times. Philippe --001517447eeadd8367048bcbfb6c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Tue, Jul 20, 2010 at 00:36, Dmitry Olshansky = <span dir=3D"ltr">&lt;<a href=3D"mailto:dmitry.olsh gmail.com">dmitry.olsh = gmail.com</a>&gt;</span> wrote:<br><div>=A0</div><blockquote class=3D"gmail= _quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204,= 204, 204); padding-left: 1ex;"> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Ah yes, but I reg= ularly use algorithms structs as return values, like this:<br><div class=3D= "im"> <br> auto nTimes(E, R)(E multiplier, R range)<br> {<br> =A0 =A0return map!((ElementType!R e) { return e*multiplier;})(range);<br> }<br> </div></blockquote></blockquote><div>=A0</div><blockquote class=3D"gmail_qu= ote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 20= 4, 204); padding-left: 1ex;">Try this workaround, replacing delegate with n= ested function, works for me:<div class=3D"im"> <br> auto nTimes(E, R)(E multiplier, R range){<br></div> =A0 =A0ElementType!R fun(ElementType!R e) { return e*multiplier; }<br> =A0 =A0return map!(fun)(range);<br> }<br> For some reason, compiler never plays nice with map, and I believe there ar= e some issues with current implementation (2.047 release ) that Andrei (hop= efully) fixed in recent commit.<br><br></blockquote><div><br>Hmm, using a n= ested function was the first thing I tried, and it didn&#39;t work at the t= ime. Good to know it&#39;s a bug. right now, it does not work, though:<br> <br><br>void main()<br>{<br>=A0=A0=A0 auto arr =3D [0,1,2,3,4];<br>=A0=A0= =A0 writeln(array(nTimes(3, arr))); // prints 0, then four big numbers: it&= #39;s stomping on another part of memory?<br>} <br><br>Strangely, even usin= g 0 as a multiplier does not work. Using .save() in nTimes neither.<br> <br>You see, these kinds of issues all have solutions (in that particular c= ase, the workaround is a bit complicated but works in practice), but they a= re cumbersome and limit my usage of std.algorithms.<br><br>FWIW, the workar= ound I used there is to take the multiplier, call repeat on it to make it a= range (3,3,3,3 ...), zip the two ranges together and call map with a funct= ion like this:=A0 times(A,B)(Tuple!(A,B) t) { return t._0*t._1;}<br> so, you get:=A0 (0,3), (1,3), (2,3), ... and then 0,3,6, etc.<br>Extension = of this idea saved me many times.<br><br><br>Philippe<br><br></div></div> --001517447eeadd8367048bcbfb6c--
Jul 19 2010