digitalmars.D.learn - Splitting Ranges using Lambda Predicates

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (8/8) Jun 10 2014 I'm missing a version of splitter that can be used to split

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (3/4) Jun 10 2014 My example is dumb and incorrect.

monarch_dodra (18/22) Jun 10 2014 The current splitter works on the notion of splitter "tokens",

monarch_dodra (11/14) Jun 10 2014 Thinking about this more: Do you *actually* have two different

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (4/6) Jun 10 2014 What do you mean by basing it off splitter!pred - should I start

monarch_dodra (126/132) Jun 11 2014 I meant mostly copy pasting it, and modifying it to your needs.

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (18/26) Oct 09 2014 Your solution copied here

monarch_dodra (5/31) Oct 09 2014 My quick guess is you are missing the *global* imports for the

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (6/11) Oct 09 2014 I don't understand. The restraints are commented out at

monarch_dodra (6/18) Oct 10 2014 Sorry, I read your compile error wrong, and was on my phone.

Artur Skawina via Digitalmars-d-learn (34/37) Jun 11 2014 Starting from scratch is actually not a bad idea, at least for this kind

monarch_dodra (14/29) Jun 11 2014 I don't know about "starting from scratch" entirely. Maybe not

Artur Skawina via Digitalmars-d-learn (17/21) Jun 11 2014 That's a problem with D's string handling. For both examples and rarely

monarch_dodra (7/9) Jun 11 2014 Well, (IMO) it's a problem with no real solution. But for what

Artur Skawina via Digitalmars-d-learn (13/14) Jun 11 2014 If the recommended approach for writing a really trivial string transfor...

Artur Skawina via Digitalmars-d-learn (7/15) Jun 11 2014 Well, for safety, you'd most likely want to pass around the range too

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

I'm missing a version of splitter that can be used to split 
ranges based on arbitrary predicates. I need to for conversion 
between different symbol casings, typically:

1. someName => SomeName

In this case the lambda should take two arguments (a,b)

where in

1. a should be lowercase and b should be uppercase

Have i missed a reusable component that can be applied here?

Jun 10 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

 1. someName => SomeName

My example is dumb and incorrect.

I actually want this to do the following

2. "_someGreatVariableName" => "Some Great Varible Name"

Jun 10 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 10 June 2014 at 21:11:17 UTC, Nordlöw wrote:
 1. someName => SomeName

 My example is dumb and incorrect.

 I actually want this to do the following

 2. "_someGreatVariableName" => "Some Great Varible Name"

The current splitter works on the notion of splitter "tokens", 
eg, it splits when it find an element or range that corresponds 
to the passed value/pred.

What exactly are you requesting though?
- Split on the "edge" lowercase to uppercase?
- Split on uppercase but keep the uppercase element?

Either way, your example should look like this:

"SomeGreatVariableName" => ["Some", "Great", "Variable", "Name"]

Since you are splitting up your range into subranges.

And also either way, AFAIK, yeah, we don't have any splitter that 
does that. We are also missing the version that takes both a 
range an predicate, which would allow things like:

"This Cool thing is COOL!!!".split((a, b)=>a == 
b.toLower())("cool")
   =>
["This ", " thing is ", "!!!"]

Looks like you'll have to roll your own :/

Jun 10 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 10 June 2014 at 21:26:50 UTC, monarch_dodr
 What exactly are you requesting though?
 - Split on the "edge" lowercase to uppercase?
 - Split on uppercase but keep the uppercase element?

Thinking about this more: Do you *actually* have two different 
predicates, or are they mutually exclusive? EG: predicate 
"isUpper", and split on a false->true?

Either way, it shouldn't be too hard to implement. Base it off 
"splitter!pred", which is actually quite trivial. AFAIK, your 
requirements could even make it simpler, since you don't need to 
"skip" the splitter element (which, for strings, can actually be 
tricky business...).

You'd just have to use 2 calls to find though, for "both your 
predicates" or for "not predicate then predicate".

Jun 10 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

 Either way, it shouldn't be too hard to implement. Base it off 
 "splitter!pred", which is actually quite trivial. AFAIK, your

What do you mean by basing it off splitter!pred - should I start 
with some existing splitter algorithm in Phobos or start from 
scratch?

Thx.

Jun 10 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Tuesday, 10 June 2014 at 22:31:37 UTC, Nordlöw wrote:
 Either way, it shouldn't be too hard to implement. Base it off 
 "splitter!pred", which is actually quite trivial. AFAIK, your

 What do you mean by basing it off splitter!pred - should I 
 start with some existing splitter algorithm in Phobos or start 
 from scratch?

 Thx.

I meant mostly copy pasting it, and modifying it to your needs. 
For example, I adapted it into this. For simplicity, I stripped 
infinite and forward only range support. The only functions I 
actually modified were "findTerminator", to actually find 
according to what I want, and popFront.

//----
auto slicer(alias isTerminator, Range)(Range input)
if (((isRandomAccessRange!Range && hasSlicing!Range) || 
isSomeString!Range)
     && is(typeof(unaryFun!isTerminator(input.front))))
{
     return SlicerResult!(unaryFun!isTerminator, Range)(input);
}

private struct SlicerResult(alias isTerminator, Range)
{
     alias notTerminator = not!isTerminator;

     private Range _input;
     private size_t _end = 0;

     private void findTerminator()
     {
         auto r = 
_input.save.find!(not!isTerminator).find!isTerminator();
         _end = _input.length - r.length;
     }

     this(Range input)
     {
         _input = input;

         if (!_input.empty)
             findTerminator();
         else
             _end = size_t.max;
     }

     static if (isInfinite!Range)
         enum bool empty = false;  // Propagate infiniteness.
     else
          property bool empty()
         {
             return _end == size_t.max;
         }

      property auto front()
     {
         return _input[0 .. _end];
     }

     void popFront()
     {
         _input = _input[_end .. _input.length];
         if (_input.empty)
         {
             _end = size_t.max;
             return;
         }
         findTerminator();
     }

      property typeof(this) save()
     {
         auto ret = this;
         ret._input = _input.save;
         return ret;
     }
}
//----

This will split on before the first element where pred is true, 
provided there are previous elements where pred is false:

//----
void main()
{
     "SomeGreatVariableName"  .slicer!isUpper.writeln();
     "someGGGreatVariableName".slicer!isUpper.writeln();
     "".slicer!isUpper.writeln();
     "a".slicer!isUpper.writeln();
     "A".slicer!isUpper.writeln();
}
//----
["Some", "Great", "Variable", "Name"]
["some", "GGGreat", "Variable", "Name"]
[]
["a"]
["A"]
//----

This may or may not be what you wanted, depending on how you want 
to split "GGGreat". If you wanted it to simply split *ON* the 
left of every capital letter, then you can modify the the find 
terminator into:


     private void findTerminator()
     {
         auto r = _input.save.dropOne.find!isTerminator;
         _end = _input.length - r.length;
     }

And you'd get:
["some", "G", "G", "Great", "Variable", "Name"]


*******************************************
*******************************************
*******************************************


In any case, yeah, it shouldn't be too hard to shape it into what 
you want. A more involved solution to this problem could be to 
simply pass a "searchFun" predicate, in which case you'd be able 
to split not just according to any "unitary predicate", but 
according to an entire "range search strategy:

//----
auto slicer(alias searchFun, Range)(Range input)
if (((isRandomAccessRange!Range && hasSlicing!Range) || 
isSomeString!Range)
     && is(typeof(searchFun(input))))
{
     return SlicerResult!(searchFun, Range)(input);
}

private struct SlicerResult(alias searchFun, Range)
{
     private Range _input;
     private size_t _end = 0;

     private void findTerminator()
     {
         auto r = searchFun(_input.save);
         _end = _input.length - r.length;
     }

     ...
//----

And then:

     
"SomeGGGGreatVariableName".slicer!((s)=>s.find!isLower.find!isUpper).writeln();
     "someGGGreatVariableName" 
.slicer!((s)=>s.dropOne.find!isUpper).writeln();

["Some", "GGGGreat", "Variable", "Name"]
["some", "G", "G", "Great", "Variable", "Name"]

Just ideas.

Jun 11 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Wednesday, 11 June 2014 at 08:58:58 UTC, monarch_dodra wrote:
 auto slicer(alias isTerminator, Range)(Range input)
 if (((isRandomAccessRange!Range && hasSlicing!Range) || 
 isSomeString!Range)
     && is(typeof(unaryFun!isTerminator(input.front))))
 {
     return SlicerResult!(unaryFun!isTerminator, Range)(input);
 }
 ...

Your solution copied here

https://github.com/nordlow/justd/blob/master/slicer.d

errors as
/home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/a
gorithm.d(5752,24): 
Error: template std.functional.not!(isUpper).not cannot deduce 
function from argument types !()(dchar), candidates are:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/f
nctional.d(393,10): 
        std.functional.not!(isUpper).not(T...)(T args) if 
(is(typeof(!unaryFun!pred(args))) || 
is(typeof(!binaryFun!pred(args))))
slicer.d(29,31): Error: template instance 
std.algorithm.find!(not, string) error instantiating
slicer.d(16,12):        instantiated from here: Slicer!(isUpper, 
string)
slicer.d(85,30):        instantiated from here: slicer!(isUpper, 
string)

What's wrong?

Oct 09 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Thursday, 9 October 2014 at 21:55:03 UTC, Nordlöw wrote:
 On Wednesday, 11 June 2014 at 08:58:58 UTC, monarch_dodra wrote:
 auto slicer(alias isTerminator, Range)(Range input)
 if (((isRandomAccessRange!Range && hasSlicing!Range) || 
 isSomeString!Range)
    && is(typeof(unaryFun!isTerminator(input.front))))
 {
    return SlicerResult!(unaryFun!isTerminator, Range)(input);
 }
 ...

 Your solution copied here

 https://github.com/nordlow/justd/blob/master/slicer.d

 errors as
 /home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/a
gorithm.d(5752,24): 
 Error: template std.functional.not!(isUpper).not cannot deduce 
 function from argument types !()(dchar), candidates are:
 /home/per/opt/x86_64-unknown-linux-gnu/dmd/linux/bin64/src/phobos/std/functional.d(393,10):
        std.functional.not!(isUpper).not(T...)(T args) if 
 (is(typeof(!unaryFun!pred(args))) || 
 is(typeof(!binaryFun!pred(args))))
 slicer.d(29,31): Error: template instance 
 std.algorithm.find!(not, string) error instantiating
 slicer.d(16,12):        instantiated from here: 
 Slicer!(isUpper, string)
 slicer.d(85,30):        instantiated from here: 
 slicer!(isUpper, string)

 What's wrong?

My quick guess is you are missing the *global* imports for the 
restraints. The compiler doesn't complain because the constraint 
is in a "is(typeof(...))" test. The reason the typeof fails is 
simply cause the compiler has no idea what unaryFun is.

Oct 09 2014

=?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:

On Thursday, 9 October 2014 at 22:01:31 UTC, monarch_dodra wrote:
 My quick guess is you are missing the *global* imports for the 
 restraints. The compiler doesn't complain because the 
 constraint is in a "is(typeof(...))" test. The reason the 
 typeof fails is simply cause the compiler has no idea what 
 unaryFun is.

I don't understand. The restraints are commented out at

https://github.com/nordlow/justd/blob/master/slicer.d

I made a couple of changes and now it works but I don't quite 
understand why...

Thanks anyway.

Oct 09 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Friday, 10 October 2014 at 05:55:00 UTC, Nordlöw wrote:
 On Thursday, 9 October 2014 at 22:01:31 UTC, monarch_dodra 
 wrote:
 My quick guess is you are missing the *global* imports for the 
 restraints. The compiler doesn't complain because the 
 constraint is in a "is(typeof(...))" test. The reason the 
 typeof fails is simply cause the compiler has no idea what 
 unaryFun is.

 I don't understand. The restraints are commented out at

 https://github.com/nordlow/justd/blob/master/slicer.d

 I made a couple of changes and now it works but I don't quite 
 understand why...

 Thanks anyway.

Sorry, I read your compile error wrong, and was on my phone.

Anyways, I'm a bit worried, because it looks like an internal 
phobos compile error. May I request you attempt to re-create and 
potentially reduce the issue? I want to make sure you didn't 
uncover a bug...

Oct 10 2014

Artur Skawina via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On 06/11/14 00:31, "Nordlöw" via Digitalmars-d-learn wrote:
 Either way, it shouldn't be too hard to implement. Base it off
"splitter!pred", which is actually quite trivial. AFAIK, your

 
 What do you mean by basing it off splitter!pred - should I start with some
existing splitter algorithm in Phobos or start from scratch?

Starting from scratch is actually not a bad idea, at least for this kind
of trivial functionality. A working version can be written in less time
than copying, analyzing and modifying another implementation...

For example (using monarch_dodra's test inputs):

   auto slicer(alias PRED, R)(R r) {
      import std.algorithm, std.array, std.range;
      struct Slicer {
         R r;
         size_t c;
         bool empty()  property const { return r.empty; }
         auto front()  property {
            c = r.dropExactly(1).countUntil!PRED()+1;
            if (c==0)
               c = r.length;
            return r.takeExactly(c);
         }
         void popFront() { r.popFrontN(c); }
      }
      return Slicer(r);
   }

   auto slicer(R)(R r) {
      import std.uni;
      return slicer!(a=>isUpper(a))(r);
   }

   void main() {
      import std.stdio, std.range;
      "SomeGreatVariableName"  .slicer().writeln();
      "someGGGreatVariableName".slicer().join(" ").writeln();
      "".slicer().writeln();
      "a".slicer().writeln();
      "A".slicer().writeln();
   }

artur

Jun 11 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Wednesday, 11 June 2014 at 11:42:42 UTC, Artur Skawina via 
Digitalmars-d-learn wrote:
 On 06/11/14 00:31, "Nordlöw" via Digitalmars-d-learn wrote:
 Either way, it shouldn't be too hard to implement. Base it 
 off "splitter!pred", which is actually quite trivial. AFAIK, 
 your

 
 What do you mean by basing it off splitter!pred - should I 
 start with some existing splitter algorithm in Phobos or start 
 from scratch?

 Starting from scratch is actually not a bad idea, at least for 
 this kind
 of trivial functionality. A working version can be written in 
 less time
 than copying, analyzing and modifying another implementation...

 ...

 artur

I don't know about "starting from scratch" entirely. Maybe not 
copy paste, but it always helps to have a reference 
implementation.

For example, you should avoid "countUntil" and "takeExactly" when 
dealing with strings, since these are not O(1) operations, and 
don't actually return string slices. EG:
string s = "someGGGreatVariableName".slicer().front;
Error: cannot implicitly convert expression 
(slicer("someGGGreatVariableName").front()) of type Result to 
string

That's why the splitter code uses the more verbose "r.length - 
r.find!pred".

Jun 11 2014

Artur Skawina via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On 06/11/14 14:40, monarch_dodra via Digitalmars-d-learn wrote:
 For example, you should avoid "countUntil" and "takeExactly" when dealing with
strings, since these are not O(1) operations, and don't actually return string
slices. EG:
 string s = "someGGGreatVariableName".slicer().front;
 Error: cannot implicitly convert expression
(slicer("someGGGreatVariableName").front()) of type Result to string

That's a problem with D's string handling. For both examples and rarely
called code, i always prefer correctness over performance. Of course if
that code ends up being perf significant it could be further optimized by
adding a specialization for D strings...

 That's why the splitter code uses the more verbose "r.length - r.find!pred".

... but that's the wrong approach. You end up writing a string-specific
(or -aware) special version for practically every algorithm...

If, instead, you create a string-specific 'countUntil' that returns
a type that holds both the byte and code-point counts and implicitly
converts to the latter, then you can have a 'takeExactly' overload
that uses the extra info to avoid the unnecessary decoding and is able
to directly return slices. The overhead is minimal; one extra integer
that's passed around, and often completely eliminated by function inlining.
But now you don't need to write a string-specific version of every
algorithm. (Of course this description is slightly oversimplified)

There is a reason why I never use D's std lib.

artur

Jun 11 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Wednesday, 11 June 2014 at 13:44:25 UTC, Artur Skawina via 
Digitalmars-d-learn wrote:
 There is a reason why I never use D's std lib.

 artur

Well, (IMO) it's a problem with no real solution. But for what 
it's worth, most (if not all) of the algorithms in the standard 
lib know how to handle strings efficiently and correctly (split, 
find, etc...). Things only start getting funny once you start 
mixing indexing and element counts.

Jun 11 2014

Artur Skawina via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On 06/11/14 16:05, monarch_dodra via Digitalmars-d-learn wrote:
 Well, (IMO) it's a problem with no real solution. But for what it's worth,
most (if not all) of the algorithms in the standard lib know how to handle
strings efficiently and correctly (split, find, etc...). Things only start
getting funny once you start mixing indexing and element counts.

If the recommended approach for writing a really trivial string transformation
is to copy and modify ~60 lines of std lib code then something is very wrong.
The consequence is that such code will often end up being eager, as it's much
easier to write it that way... That is why, after seeing your solution, I wrote
the most compact lazy version I could think of - let's not scare people away
from using ranges.

AFAIUI the OP basically wanted a version of splitter that does not eat the
separators and a predicate that keeps around previous state. The latter can
already be achieved via a lambda, so the missing bit is an optional splitter
template parameter (eg consume=false). So at least in this case the problem
could be easily handled.

artur

Jun 11 2014

Artur Skawina via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On 06/11/14 15:44, Artur Skawina wrote:
 If, instead, you create a string-specific 'countUntil' that returns
 a type that holds both the byte and code-point counts and implicitly
 converts to the latter, then you can have a 'takeExactly' overload
 that uses the extra info to avoid the unnecessary decoding and is able
 to directly return slices. The overhead is minimal; one extra integer
 that's passed around, and often completely eliminated by function inlining.
 But now you don't need to write a string-specific version of every
 algorithm. (Of course this description is slightly oversimplified)

Well, for safety, you'd most likely want to pass around the range too
(ie string slice, or just the pointer as the min-length is implied);
so it's two size_ts, not one. Still, the overhead is tiny and the
advantage of not having to special-case for strings everywhere is worth
it.

artur

Jun 11 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Splitting Ranges using Lambda Predicates