www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - grep library?

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Hello,


A coworker implemented a system that spawns grep to rummage through a 
large log file. Apparently doing so is quite a bit faster than using a 
regex.

This is because grep is highly specialized and optimized. I was 
wondering if we could implement a grepping library that builds on 
regex's strengths and also grep's many optimization tricks: 
http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html


Andrei
Oct 26 2013
next sibling parent "Kelet" <kelethunter gmail.com> writes:
On Sunday, 27 October 2013 at 02:59:58 UTC, Andrei Alexandrescu
wrote:
 Hello,


 A coworker implemented a system that spawns grep to rummage 
 through a large log file. Apparently doing so is quite a bit 
 faster than using a regex.

 This is because grep is highly specialized and optimized. I was 
 wondering if we could implement a grepping library that builds 
 on regex's strengths and also grep's many optimization tricks: 
 http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html


 Andrei
Great idea, The author of The Silver Searcher, a similar text search tool, has also posted some information on how to make a fast search tool. It seems to use Boyer-Moore-Horspool for literal searches (similar to grep); however, regex searches are done using PCRE. http://geoff.greer.fm/2011/12/27/the-silver-searcher-better-than-ack/ https://github.com/ggreer/the_silver_searcher
Oct 26 2013
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
27-Oct-2013 07:00, Andrei Alexandrescu пишет:
 Hello,


 A coworker implemented a system that spawns grep to rummage through a
 large log file. Apparently doing so is quite a bit faster than using a
 regex.
I would love to see how does the usage of regex looks like. Some numbers would be just awesome. If regex is std.regex then also see: https://github.com/D-Programming-Language/phobos/pull/1553
 This is because grep is highly specialized and optimized. I was
 wondering if we could implement a grepping library that builds on
 regex's strengths and also grep's many optimization tricks:
If we talking D, then std.regex doesn't do Boyer-Moore search hence for patterns that start with 100% fixed prefix it must be a great deal slower. I'm using a ShiftOr search that has the benefit of being applicable to fairly large class of prefixes (but it does look at every byte). There are also things a grep can do that in a general purpose library I felt are just too heavy. A dedicated grep module could spent more time on these tricks though knowing in advance that the input is going to be quite large.
 http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html


 Andrei
-- Dmitry Olshansky
Oct 27 2013