www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How to avoid ctRegex (solved)

reply cy <dlang verge.info.tm> writes:
At seconds PER (character range) pattern, ctRegex slows down 
compilation like crazy, but it's not obvious how to avoid using 
it, since Regex(Char) is kind of weird for a type. So, here's 
what I do. I think this is right.

in the module scope, you start with:
auto pattern = ctRegex!"foobar";

and you substitute with:
typeof(regex("")) pattern;
static this() {
   pattern = regex("foobar");
}

That way you don't have to worry about whether to use a 
Regex!char, or a Regex!dchar, or a Regex!ubyte. It gives you the 
same functionality, at the cost a few microseconds slowdown on 
running your program. And once you're done debugging, you can 
always switch back, so...

string defineRegex(string name, string pattern)() {
   import std.string: replace;
   return q{
		debug {
			pragma(msg, "fast $name");
			import std.regex: regex;
			typeof(regex("")) $name;
			static this() {
				$name = regex(`$pattern`);
			}
		} else {
			pragma(msg, "slooow $name");
			import std.regex: ctRegex;
			auto $name = ctRegex!`$pattern`;
		}
	}.replace("$pattern",pattern)
			.replace("$name",name);
}

mixin(defineRegex!("naword",r"[\W]+"));
mixin(defineRegex!("alnum",r"[a-zA-Z]+"));
mixin(defineRegex!("pattern","foo([a-z]*?)bar"));
mixin(defineRegex!("pattern2","foobar([^0-9z]+)"));

void main() {
}

/*
$ time rdmd -release /tmp/derp.d
slooow naword
slooow alnum
slooow pattern
slooow pattern2
slooow naword
slooow alnum
slooow pattern
slooow pattern2
rdmd -release /tmp/derp.d  17.57s user 1.57s system 82% cpu 
23.210 total

$ time rdmd -debug /tmp/derp.d
fast naword
fast alnum
fast pattern
fast pattern2
fast naword
fast alnum
fast pattern
fast pattern2
rdmd -debug /tmp/derp.d  2.92s user 0.37s system 71% cpu 4.623 
total
*/

...sure would be nice if you could cache precompiled regular 
expressions as files.
Aug 21 2016
parent reply ag0aep6g <anonymous example.com> writes:
On 08/21/2016 10:06 PM, cy wrote:
 in the module scope, you start with:
 auto pattern = ctRegex!"foobar";

 and you substitute with:
 typeof(regex("")) pattern;
 static this() {
   pattern = regex("foobar");
 }
I may be missing the point here, but just putting `auto pattern = regex("foobar");` at module level works for me.
Aug 21 2016
parent reply cy <dlang verge.info.tm> writes:
On Sunday, 21 August 2016 at 21:18:11 UTC, ag0aep6g wrote:

 I may be missing the point here, but just putting `auto pattern 
 = regex("foobar");` at module level works for me.
Really? I thought global variables could only be initialized with static stuff available during compile time, and you needed a "static this() {}" block to initialize them otherwise.
Aug 22 2016
parent reply ag0aep6g <anonymous example.com> writes:
On 08/23/2016 06:06 AM, cy wrote:
 On Sunday, 21 August 2016 at 21:18:11 UTC, ag0aep6g wrote:

 I may be missing the point here, but just putting `auto pattern =
 regex("foobar");` at module level works for me.
Really? I thought global variables could only be initialized with static stuff available during compile time, and you needed a "static this() {}" block to initialize them otherwise.
That's true, and apparently `regex("foobar")` can be evaluated at compile time.
Aug 22 2016
parent reply cy <dlang verge.info.tm> writes:
On Tuesday, 23 August 2016 at 04:51:19 UTC, ag0aep6g wrote:

 That's true, and apparently `regex("foobar")` can be evaluated 
 at compile time.
Then what's ctRegex in there for at all...?
Aug 23 2016
parent reply ag0aep6g <anonymous example.com> writes:
On 08/24/2016 03:07 AM, cy wrote:
 Then what's ctRegex in there for at all...?
Optimization. ctRegex requires that the pattern is available as a compile time constant. It uses that property to "generate optimized native machine code". The plain regex function doesn't have such a requirement. It also works with a pattern that's generated at run time, e.g. from user input. But you can use it with a compile time constant, too. And it works in CTFE then, but it does not "generate optimized native machine code".
Aug 23 2016
next sibling parent Seb <seb wilzba.ch> writes:
On Wednesday, 24 August 2016 at 05:29:57 UTC, ag0aep6g wrote:
 On 08/24/2016 03:07 AM, cy wrote:
 Then what's ctRegex in there for at all...?
Optimization. ctRegex requires that the pattern is available as a compile time constant. It uses that property to "generate optimized native machine code". The plain regex function doesn't have such a requirement. It also works with a pattern that's generated at run time, e.g. from user input. But you can use it with a compile time constant, too. And it works in CTFE then, but it does not "generate optimized native machine code".
Yep, that's why ctRegex is 2x faster than the highly-tuned grep, e.g. https://github.com/dlang/phobos/pull/4286
Aug 24 2016
prev sibling parent reply cy <dlang verge.info.tm> writes:
On Wednesday, 24 August 2016 at 05:29:57 UTC, ag0aep6g wrote:
 The plain regex function doesn't have such a requirement. It 
 also works with a pattern that's generated at run time, e.g. 
 from user input. But you can use it with a compile time 
 constant, too. And it works in CTFE then, but it does not 
 "generate optimized native machine code".
It's not using it with a compile time constant that struck me as weird. It's using it to assign a global variable that struck me as weird. When I saw `auto a = b;` at the module level, I thought that b had to be something you could evaluate at compile time. But I guess it can be a runtime calculated value, acting like it was assigned in a a static this() clause, and the requirement for it to be compile time generated is only for immutable? like `immutable auto a = b`?
Aug 27 2016
next sibling parent reply Dicebot <public dicebot.lv> writes:
On Saturday, 27 August 2016 at 17:35:04 UTC, cy wrote:
 On Wednesday, 24 August 2016 at 05:29:57 UTC, ag0aep6g wrote:
 The plain regex function doesn't have such a requirement. It 
 also works with a pattern that's generated at run time, e.g. 
 from user input. But you can use it with a compile time 
 constant, too. And it works in CTFE then, but it does not 
 "generate optimized native machine code".
It's not using it with a compile time constant that struck me as weird. It's using it to assign a global variable that struck me as weird.
But actual value of that Regex struct is perfectly known during compile time. Thus it is possible and fine to use it as initializer. You can use any struct or class as initializer if it can be computed during compile-time.
Aug 27 2016
parent David Nadlinger <code klickverbot.at> writes:
On Saturday, 27 August 2016 at 17:47:33 UTC, Dicebot wrote:
 But actual value of that Regex struct is perfectly known during 
 compile time. Thus it is possible and fine to use it as 
 initializer. You can use any struct or class as initializer if 
 it can be computed during compile-time.
Yes, regex() is CTFEable, but this still comes at a significant compile-time cost as the constructor does quite a bit of string manipulation, etc. I've seen this, i.e. inconsiderate use of regex() globals, cost tens of seconds in build time for bigger codebases. — David
Aug 27 2016
prev sibling parent ag0aep6g <anonymous example.com> writes:
On 08/27/2016 07:35 PM, cy wrote:
 When I saw `auto a = b;` at the module level, I thought that b had to be
 something you could evaluate at compile time.
That's right.
 But I guess it can be a
 runtime calculated value, acting like it was assigned in a a static
 this() clause,
No, that's not right. The initializer for a module level variable has to be a compile-time constant. If the initializer is a function call, the compiler attempts to evaluate it at compile time. We have an acronym for that: CTFE = Compile Time Function Evaluation. `regex("foobar")` can be evaluated that way, so it can be used as an initializer for a module level variable.
Aug 27 2016