www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Splitting Ranges using Lambda Predicates

reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
I'm missing a version of splitter that can be used to split 
ranges based on arbitrary predicates. I need to for conversion 
between different symbol casings, typically:

1. someName => SomeName

In this case the lambda should take two arguments (a,b)

where in

1. a should be lowercase and b should be uppercase

Have i missed a reusable component that can be applied here?
Jun 10 2014
parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 1. someName => SomeName
My example is dumb and incorrect. I actually want this to do the following 2. "_someGreatVariableName" => "Some Great Varible Name"
Jun 10 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 10 June 2014 at 21:11:17 UTC, Nordlöw wrote:
 1. someName => SomeName
My example is dumb and incorrect. I actually want this to do the following 2. "_someGreatVariableName" => "Some Great Varible Name"
The current splitter works on the notion of splitter "tokens", eg, it splits when it find an element or range that corresponds to the passed value/pred. What exactly are you requesting though? - Split on the "edge" lowercase to uppercase? - Split on uppercase but keep the uppercase element? Either way, your example should look like this: "SomeGreatVariableName" => ["Some", "Great", "Variable", "Name"] Since you are splitting up your range into subranges. And also either way, AFAIK, yeah, we don't have any splitter that does that. We are also missing the version that takes both a range an predicate, which would allow things like: "This Cool thing is COOL!!!".split((a, b)=>a == b.toLower())("cool") => ["This ", " thing is ", "!!!"] Looks like you'll have to roll your own :/
Jun 10 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 10 June 2014 at 21:26:50 UTC, monarch_dodr
 What exactly are you requesting though?
 - Split on the "edge" lowercase to uppercase?
 - Split on uppercase but keep the uppercase element?
Thinking about this more: Do you *actually* have two different predicates, or are they mutually exclusive? EG: predicate "isUpper", and split on a false->true? Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your requirements could even make it simpler, since you don't need to "skip" the splitter element (which, for strings, can actually be tricky business...). You'd just have to use 2 calls to find though, for "both your predicates" or for "not predicate then predicate".
Jun 10 2014
parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 Either way, it shouldn't be too hard to implement. Base it off 
 "splitter!pred", which is actually quite trivial. AFAIK, your
What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch? Thx.
Jun 10 2014
next sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 10 June 2014 at 22:31:37 UTC, Nordlöw wrote:
 Either way, it shouldn't be too hard to implement. Base it off 
 "splitter!pred", which is actually quite trivial. AFAIK, your
What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch? Thx.
I meant mostly copy pasting it, and modifying it to your needs. For example, I adapted it into this. For simplicity, I stripped infinite and forward only range support. The only functions I actually modified were "findTerminator", to actually find according to what I want, and popFront. //---- auto slicer(alias isTerminator, Range)(Range input) if (((isRandomAccessRange!Range && hasSlicing!Range) || isSomeString!Range) && is(typeof(unaryFun!isTerminator(input.front)))) { return SlicerResult!(unaryFun!isTerminator, Range)(input); } private struct SlicerResult(alias isTerminator, Range) { alias notTerminator = not!isTerminator; private Range _input; private size_t _end = 0; private void findTerminator() { auto r = _input.save.find!(not!isTerminator).find!isTerminator(); _end = _input.length - r.length; } this(Range input) { _input = input; if (!_input.empty) findTerminator(); else _end = size_t.max; } static if (isInfinite!Range) enum bool empty = false; // Propagate infiniteness. else property bool empty() { return _end == size_t.max; } property auto front() { return _input[0 .. _end]; } void popFront() { _input = _input[_end .. _input.length]; if (_input.empty) { _end = size_t.max; return; } findTerminator(); } property typeof(this) save() { auto ret = this; ret._input = _input.save; return ret; } } //---- This will split on before the first element where pred is true, provided there are previous elements where pred is false: //---- void main() { "SomeGreatVariableName" .slicer!isUpper.writeln(); "someGGGreatVariableName".slicer!isUpper.writeln(); "".slicer!isUpper.writeln(); "a".slicer!isUpper.writeln(); "A".slicer!isUpper.writeln(); } //---- ["Some", "Great", "Variable", "Name"] ["some", "GGGreat", "Variable", "Name"] [] ["a"] ["A"] //---- This may or may not be what you wanted, depending on how you want to split "GGGreat". If you wanted it to simply split *ON* the left of every capital letter, then you can modify the the find terminator into: private void findTerminator() { auto r = _input.save.dropOne.find!isTerminator; _end = _input.length - r.length; } And you'd get: ["some", "G", "G", "Great", "Variable", "Name"] ******************************************* ******************************************* ******************************************* In any case, yeah, it shouldn't be too hard to shape it into what you want. A more involved solution to this problem could be to simply pass a "searchFun" predicate, in which case you'd be able to split not just according to any "unitary predicate", but according to an entire "range search strategy: //---- auto slicer(alias searchFun, Range)(Range input) if (((isRandomAccessRange!Range && hasSlicing!Range) || isSomeString!Range) && is(typeof(searchFun(input)))) { return SlicerResult!(searchFun, Range)(input); } private struct SlicerResult(alias searchFun, Range) { private Range _input; private size_t _end = 0; private void findTerminator() { auto r = searchFun(_input.save); _end = _input.length - r.length; } ... //---- And then: "SomeGGGGreatVariableName".slicer!((s)=>s.find!isLower.find!isUpper).writeln(); "someGGGreatVariableName" .slicer!((s)=>s.dropOne.find!isUpper).writeln(); ["Some", "GGGGreat", "Variable", "Name"] ["some", "G", "G", "Great", "Variable", "Name"] Just ideas.
Jun 11 2014
parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Wednesday, 11 June 2014 at 08:58:58 UTC, monarch_dodra wrote:
 auto slicer(alias isTerminator, Range)(Range input)
 if (((isRandomAccessRange!Range && hasSlicing!Range) || 
 isSomeString!Range)
     && is(typeof(unaryFun!isTerminator(input.front))))
 {
     return SlicerResult!(unaryFun!isTerminator, Range)(input);
 }
 ...
Your solution copied here https://github.com/nordlow/justd/blob/master/slicer.d errors as /home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/a gorithm.d(5752,24): Error: template std.functional.not!(isUpper).not cannot deduce function from argument types !()(dchar), candidates are: /home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/f nctional.d(393,10): std.functional.not!(isUpper).not(T...)(T args) if (is(typeof(!unaryFun!pred(args))) || is(typeof(!binaryFun!pred(args)))) slicer.d(29,31): Error: template instance std.algorithm.find!(not, string) error instantiating slicer.d(16,12): instantiated from here: Slicer!(isUpper, string) slicer.d(85,30): instantiated from here: slicer!(isUpper, string) What's wrong?
Oct 09 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 9 October 2014 at 21:55:03 UTC, Nordlöw wrote:
 On Wednesday, 11 June 2014 at 08:58:58 UTC, monarch_dodra wrote:
 auto slicer(alias isTerminator, Range)(Range input)
 if (((isRandomAccessRange!Range && hasSlicing!Range) || 
 isSomeString!Range)
    && is(typeof(unaryFun!isTerminator(input.front))))
 {
    return SlicerResult!(unaryFun!isTerminator, Range)(input);
 }
 ...
Your solution copied here https://github.com/nordlow/justd/blob/master/slicer.d errors as /home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/a gorithm.d(5752,24): Error: template std.functional.not!(isUpper).not cannot deduce function from argument types !()(dchar), candidates are: /home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/functional.d(393,10): std.functional.not!(isUpper).not(T...)(T args) if (is(typeof(!unaryFun!pred(args))) || is(typeof(!binaryFun!pred(args)))) slicer.d(29,31): Error: template instance std.algorithm.find!(not, string) error instantiating slicer.d(16,12): instantiated from here: Slicer!(isUpper, string) slicer.d(85,30): instantiated from here: slicer!(isUpper, string) What's wrong?
My quick guess is you are missing the *global* imports for the restraints. The compiler doesn't complain because the constraint is in a "is(typeof(...))" test. The reason the typeof fails is simply cause the compiler has no idea what unaryFun is.
Oct 09 2014
parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Thursday, 9 October 2014 at 22:01:31 UTC, monarch_dodra wrote:
 My quick guess is you are missing the *global* imports for the 
 restraints. The compiler doesn't complain because the 
 constraint is in a "is(typeof(...))" test. The reason the 
 typeof fails is simply cause the compiler has no idea what 
 unaryFun is.
I don't understand. The restraints are commented out at https://github.com/nordlow/justd/blob/master/slicer.d I made a couple of changes and now it works but I don't quite understand why... Thanks anyway.
Oct 09 2014
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 10 October 2014 at 05:55:00 UTC, Nordlöw wrote:
 On Thursday, 9 October 2014 at 22:01:31 UTC, monarch_dodra 
 wrote:
 My quick guess is you are missing the *global* imports for the 
 restraints. The compiler doesn't complain because the 
 constraint is in a "is(typeof(...))" test. The reason the 
 typeof fails is simply cause the compiler has no idea what 
 unaryFun is.
I don't understand. The restraints are commented out at https://github.com/nordlow/justd/blob/master/slicer.d I made a couple of changes and now it works but I don't quite understand why... Thanks anyway.
Sorry, I read your compile error wrong, and was on my phone. Anyways, I'm a bit worried, because it looks like an internal phobos compile error. May I request you attempt to re-create and potentially reduce the issue? I want to make sure you didn't uncover a bug...
Oct 10 2014
prev sibling parent reply Artur Skawina via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On 06/11/14 00:31, "Nordlöw" via Digitalmars-d-learn wrote:
 Either way, it shouldn't be too hard to implement. Base it off
"splitter!pred", which is actually quite trivial. AFAIK, your
What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch?
Starting from scratch is actually not a bad idea, at least for this kind of trivial functionality. A working version can be written in less time than copying, analyzing and modifying another implementation... For example (using monarch_dodra's test inputs): auto slicer(alias PRED, R)(R r) { import std.algorithm, std.array, std.range; struct Slicer { R r; size_t c; bool empty() property const { return r.empty; } auto front() property { c = r.dropExactly(1).countUntil!PRED()+1; if (c==0) c = r.length; return r.takeExactly(c); } void popFront() { r.popFrontN(c); } } return Slicer(r); } auto slicer(R)(R r) { import std.uni; return slicer!(a=>isUpper(a))(r); } void main() { import std.stdio, std.range; "SomeGreatVariableName" .slicer().writeln(); "someGGGreatVariableName".slicer().join(" ").writeln(); "".slicer().writeln(); "a".slicer().writeln(); "A".slicer().writeln(); } artur
Jun 11 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 11 June 2014 at 11:42:42 UTC, Artur Skawina via 
Digitalmars-d-learn wrote:
 On 06/11/14 00:31, "Nordlöw" via Digitalmars-d-learn wrote:
 Either way, it shouldn't be too hard to implement. Base it 
 off "splitter!pred", which is actually quite trivial. AFAIK, 
 your
What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch?
Starting from scratch is actually not a bad idea, at least for this kind of trivial functionality. A working version can be written in less time than copying, analyzing and modifying another implementation... ... artur
I don't know about "starting from scratch" entirely. Maybe not copy paste, but it always helps to have a reference implementation. For example, you should avoid "countUntil" and "takeExactly" when dealing with strings, since these are not O(1) operations, and don't actually return string slices. EG: string s = "someGGGreatVariableName".slicer().front; Error: cannot implicitly convert expression (slicer("someGGGreatVariableName").front()) of type Result to string That's why the splitter code uses the more verbose "r.length - r.find!pred".
Jun 11 2014
next sibling parent reply Artur Skawina via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On 06/11/14 14:40, monarch_dodra via Digitalmars-d-learn wrote:
 For example, you should avoid "countUntil" and "takeExactly" when dealing with
strings, since these are not O(1) operations, and don't actually return string
slices. EG:
 string s = "someGGGreatVariableName".slicer().front;
 Error: cannot implicitly convert expression
(slicer("someGGGreatVariableName").front()) of type Result to string
That's a problem with D's string handling. For both examples and rarely called code, i always prefer correctness over performance. Of course if that code ends up being perf significant it could be further optimized by adding a specialization for D strings...
 That's why the splitter code uses the more verbose "r.length - r.find!pred".
... but that's the wrong approach. You end up writing a string-specific (or -aware) special version for practically every algorithm... If, instead, you create a string-specific 'countUntil' that returns a type that holds both the byte and code-point counts and implicitly converts to the latter, then you can have a 'takeExactly' overload that uses the extra info to avoid the unnecessary decoding and is able to directly return slices. The overhead is minimal; one extra integer that's passed around, and often completely eliminated by function inlining. But now you don't need to write a string-specific version of every algorithm. (Of course this description is slightly oversimplified) There is a reason why I never use D's std lib. artur
Jun 11 2014
parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 11 June 2014 at 13:44:25 UTC, Artur Skawina via 
Digitalmars-d-learn wrote:
 There is a reason why I never use D's std lib.

 artur
Well, (IMO) it's a problem with no real solution. But for what it's worth, most (if not all) of the algorithms in the standard lib know how to handle strings efficiently and correctly (split, find, etc...). Things only start getting funny once you start mixing indexing and element counts.
Jun 11 2014
parent Artur Skawina via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On 06/11/14 16:05, monarch_dodra via Digitalmars-d-learn wrote:
 Well, (IMO) it's a problem with no real solution. But for what it's worth,
most (if not all) of the algorithms in the standard lib know how to handle
strings efficiently and correctly (split, find, etc...). Things only start
getting funny once you start mixing indexing and element counts.
If the recommended approach for writing a really trivial string transformation is to copy and modify ~60 lines of std lib code then something is very wrong. The consequence is that such code will often end up being eager, as it's much easier to write it that way... That is why, after seeing your solution, I wrote the most compact lazy version I could think of - let's not scare people away from using ranges. AFAIUI the OP basically wanted a version of splitter that does not eat the separators and a predicate that keeps around previous state. The latter can already be achieved via a lambda, so the missing bit is an optional splitter template parameter (eg consume=false). So at least in this case the problem could be easily handled. artur
Jun 11 2014
prev sibling parent Artur Skawina via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On 06/11/14 15:44, Artur Skawina wrote:
 If, instead, you create a string-specific 'countUntil' that returns
 a type that holds both the byte and code-point counts and implicitly
 converts to the latter, then you can have a 'takeExactly' overload
 that uses the extra info to avoid the unnecessary decoding and is able
 to directly return slices. The overhead is minimal; one extra integer
 that's passed around, and often completely eliminated by function inlining.
 But now you don't need to write a string-specific version of every
 algorithm. (Of course this description is slightly oversimplified)
Well, for safety, you'd most likely want to pass around the range too (ie string slice, or just the pointer as the min-length is implied); so it's two size_ts, not one. Still, the overhead is tiny and the advantage of not having to special-case for strings everywhere is worth it. artur
Jun 11 2014