www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - regular expression engine and ranges

reply ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
Hello.

is there any decent regular expression engine which works with input
ranges? under "decent" i mean "good D code", "[t]nfa" and "no
backtracking". support for captures and greedy/non-greedy modes are
must.

i found that some popular regex libraries and std.regex are sure that
the only data layout regex engine is supposed to work with is plain
text array. now, i'm writing a small text editor (yes, another one;
please, i know that there are alot of them already! ;-) and internal
text layout is anything but plain array. yet i want to use regular
expressions for alot of things -- not only for "search that piece of
text", but for syntax highlighting (i know, i know; don't think about
it, everything is much more complicated there), navigation and so on.

i was thinking that it will not be that hard, but found that if you
want to use existing regexp engine, you *have* to either use plain
array and alot of shitcode around it for bookkeeping to please RE, or
build that plain array each time you want to use RE. this sux.

for now it seems that i have no choice except to write yet another one
regular expression engine. and this is the thing i don't want to do.
but maybe someone already did that and just don't think that it worth
publishing as we have std.regex and so?
Dec 02 2014
parent reply "MrSmith" <mrsmith33 yandex.ru> writes:
On Tuesday, 2 December 2014 at 19:17:43 UTC, ketmar via 
Digitalmars-d-learn wrote:
 Hello.

 is there any decent regular expression engine which works with 
 input
 ranges? under "decent" i mean "good D code", "[t]nfa" and "no
 backtracking". support for captures and greedy/non-greedy modes 
 are
 must.

 i found that some popular regex libraries and std.regex are 
 sure that
 the only data layout regex engine is supposed to work with is 
 plain
 text array. now, i'm writing a small text editor (yes, another 
 one;
 please, i know that there are alot of them already! ;-) and 
 internal
 text layout is anything but plain array. yet i want to use 
 regular
 expressions for alot of things -- not only for "search that 
 piece of
 text", but for syntax highlighting (i know, i know; don't think 
 about
 it, everything is much more complicated there), navigation and 
 so on.

 i was thinking that it will not be that hard, but found that if 
 you
 want to use existing regexp engine, you *have* to either use 
 plain
 array and alot of shitcode around it for bookkeeping to please 
 RE, or
 build that plain array each time you want to use RE. this sux.

 for now it seems that i have no choice except to write yet 
 another one
 regular expression engine. and this is the thing i don't want 
 to do.
 but maybe someone already did that and just don't think that it 
 worth
 publishing as we have std.regex and so?
IIRC, there was a request for ranged regex in phobos somewhere. Is there anything simple that can be easily ported? Btw, do you use ropes for text? What do you use for storing lines, wrapped lines and text style?
Dec 02 2014
parent ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Tue, 02 Dec 2014 22:47:05 +0000
MrSmith via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 IIRC, there was a request for ranged regex in phobos somewhere.
 Is there anything simple that can be easily ported?
i don't think so. i.e. there are either very simple and barely usable engines, or complex and hard to port ones. maybe i'll port TRE engine someday, but for now i don't want to do that.
 Btw, do you use ropes for text? What do you use for storing=20
 lines, wrapped lines and text style?
i don't care about wrapping. as for other things -- it's a slightly modified piece chains which maps down to paging system, with "line chunks" that keeps some line info together (such as line length, state of the syntax highlighter in the beginning of the line and so on). style is generally aplied to the whole piece of text, with occasional "special marks" inside piece if engine decides that breaking the given piece for restyling is silly thing. this is more the experiment in various structures for text processing than a real editor, so some of structures are stupid and others are overcomplicated, it allocates like crazy and so on.
Dec 02 2014