www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Bye bye, fast compilation times

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
One of my D projects for the past while has been taking unusually long
times to compile.  This morning, I finally decided to sit down and
figure out exactly why. What I found was rather disturbing:

------
import std.regex;
void main() {
	auto re = regex(``);
}
------

Compile command: time dmd -c test.d

Output:
------
real    0m3.113s
user    0m2.884s
sys     0m0.226s
------

Comment out the call to `regex()`, and I get:

------
real    0m0.285s
user    0m0.262s
sys     0m0.023s
------

Clearly, something is wrong if the mere act of compiling a regex causes
a 4-line program to take *3 seconds* to compile, where normally dmd
takes less than a second.

Apparently, the offending Phobos PR was merged late last year:

	https://issues.dlang.org/show_bug.cgi?id=18378

This is a serious slap-in-the-face to dmd's reputation of super-fast
compilation.  Makes our "fast code, fast" slogan look more and more
ironic. :-(

(Note: this particular regression is in *compilation* times; it's not
directly related to the *performance* of the regex code itself. The
latter department as also suffered a regression; see for example:
https://github.com/dlang/phobos/pull/5981.)


T

-- 
Маленькие детки - маленькие бедки.
Feb 05 2018
next sibling parent reply psychoticRabbit <meagain meagain.com> writes:
On Monday, 5 February 2018 at 21:27:57 UTC, H. S. Teoh wrote:
 Comment out the call to `regex()`, and I get:

 ------
 real    0m0.285s
 user    0m0.262s
 sys     0m0.023s
 ------
regex is not the only one I avoid.. how long you think this takes to compile? (try ldc2 too ..just for laughs ;-) ---- import std.net.isemail; void main() { auto checkEmail = "someone somewhere.com".isEmail(); } ----
Feb 05 2018
next sibling parent psychoticRabbit <meagain meagain.com> writes:
On Tuesday, 6 February 2018 at 04:09:24 UTC, psychoticRabbit 
wrote:
 how long you think this takes to compile?
 (try ldc2 too ..just for laughs ;-)

 ----
 import std.net.isemail;

 void main()
 {
     auto checkEmail = "someone somewhere.com".isEmail();
 }
 ----
oh.. and for an even bigger laugh... -O -release (ldc2 took ~10 seconds)
Feb 05 2018
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/5/18 11:09 PM, psychoticRabbit wrote:
 On Monday, 5 February 2018 at 21:27:57 UTC, H. S. Teoh wrote:
 Comment out the call to `regex()`, and I get:

 ------
 real    0m0.285s
 user    0m0.262s
 sys     0m0.023s
 ------
regex is not the only one I avoid.. how long you think this takes to compile? (try ldc2 too ..just for laughs ;-) ---- import std.net.isemail; void main() {     auto checkEmail = "someone somewhere.com".isEmail(); } ----
I was surprised at this, then I looked at the first line of isEmail: static ipRegex = ctRegex!(`\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}`~ `(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$`.to!(const(Char)[])); So it's really still related to regex. -Steve
Feb 05 2018
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 06/02/2018 4:35 AM, Steven Schveighoffer wrote:
 On 2/5/18 11:09 PM, psychoticRabbit wrote:
 On Monday, 5 February 2018 at 21:27:57 UTC, H. S. Teoh wrote:
 Comment out the call to `regex()`, and I get:

 ------
 real    0m0.285s
 user    0m0.262s
 sys     0m0.023s
 ------
regex is not the only one I avoid.. how long you think this takes to compile? (try ldc2 too ..just for laughs ;-) ---- import std.net.isemail; void main() {      auto checkEmail = "someone somewhere.com".isEmail(); } ----
I was surprised at this, then I looked at the first line of isEmail:     static ipRegex = ctRegex!(`\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}`~ `(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$`.to!(const(Char)[])); So it's really still related to regex. -Steve
On that note, we really should remove it performance-aside, you cannot really trust it.
Feb 05 2018
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 6 February 2018 at 04:35:42 UTC, Steven Schveighoffer 
wrote:
 On 2/5/18 11:09 PM, psychoticRabbit wrote:
 On Monday, 5 February 2018 at 21:27:57 UTC, H. S. Teoh wrote:
 Comment out the call to `regex()`, and I get:

 ------
 real    0m0.285s
 user    0m0.262s
 sys     0m0.023s
 ------
regex is not the only one I avoid.. how long you think this takes to compile? (try ldc2 too ..just for laughs ;-) ---- import std.net.isemail; void main() {     auto checkEmail = "someone somewhere.com".isEmail(); } ----
I was surprised at this, then I looked at the first line of isEmail: static ipRegex = ctRegex!(`\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}`~ `(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$`.to!(const(Char)[])); So it's really still related to regex.
That’s really bad idea - isEmail is template so the burden of freaking slow ctRegex is paid on per instantiation basis. Could be horrible with separate compilation.
 -Steve
Feb 05 2018
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/6/18 12:35 AM, Dmitry Olshansky wrote:
 On Tuesday, 6 February 2018 at 04:35:42 UTC, Steven Schveighoffer wrote:
 On 2/5/18 11:09 PM, psychoticRabbit wrote:
 On Monday, 5 February 2018 at 21:27:57 UTC, H. S. Teoh wrote:
 Comment out the call to `regex()`, and I get:

 ------
 real    0m0.285s
 user    0m0.262s
 sys     0m0.023s
 ------
regex is not the only one I avoid.. how long you think this takes to compile? (try ldc2 too ..just for laughs ;-) ---- import std.net.isemail; void main() {      auto checkEmail = "someone somewhere.com".isEmail(); } ----
I was surprised at this, then I looked at the first line of isEmail:     static ipRegex = ctRegex!(`\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}`~ `(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$`.to!(const(Char)[])); So it's really still related to regex.
That’s really bad idea - isEmail is template so the burden of freaking slow ctRegex is paid on per instantiation basis. Could be horrible with separate compilation.
Obviously it is horrible. On my mac, it took about 2.5 seconds to compile this one line. I'm not sure how to fix it though... I suppose you could make it 3 overloads, but this defeats a lot of the purpose of having templates in the first place. -Steve
Feb 05 2018
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 6 February 2018 at 05:45:35 UTC, Steven Schveighoffer 
wrote:
 On 2/6/18 12:35 AM, Dmitry Olshansky wrote:
 
 That’s really bad idea - isEmail is template so the burden of 
 freaking slow ctRegex
 is paid on per instantiation basis. Could be horrible with 
 separate compilation.
Obviously it is horrible. On my mac, it took about 2.5 seconds to compile this one line. I'm not sure how to fix it though... I suppose you could make
Just use the run-time version, it’s not that much slower. But then again static ipRegex = regex(...) will parse and build regex at CTFE. Maybe lazy init?
 it 3 overloads, but this defeats a lot of the purpose of having 
 templates in the first place.

 -Steve
Feb 05 2018
next sibling parent reply Nathan S. <no.public.email example.com> writes:
On Tuesday, 6 February 2018 at 06:11:55 UTC, Dmitry Olshansky 
wrote:
 On Tuesday, 6 February 2018 at 05:45:35 UTC, Steven 
 Schveighoffer wrote:
 On 2/6/18 12:35 AM, Dmitry Olshansky wrote:
 
 That’s really bad idea - isEmail is template so the burden of 
 freaking slow ctRegex
 is paid on per instantiation basis. Could be horrible with 
 separate compilation.
Obviously it is horrible. On my mac, it took about 2.5 seconds to compile this one line. I'm not sure how to fix it though... I suppose you could make
Just use the run-time version, it’s not that much slower. But then again static ipRegex = regex(...) will parse and build regex at CTFE. Maybe lazy init?
FYI I've made a pull request that replaces uses of regexes in std.net.isemail. It turns out they weren't being used for anything indispensable. Import benchmark results were encouraging. https://github.com/dlang/phobos/pull/6129
Feb 06 2018
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 6 February 2018 at 13:51:01 UTC, Nathan S. wrote:
 Just use the run-time version, it’s not that much slower. But 
 then again static ipRegex = regex(...) will parse and build 
 regex at CTFE.

 Maybe lazy init?
FYI I've made a pull request that replaces uses of regexes in std.net.isemail. It turns out they weren't being used for anything indispensable. Import benchmark results were encouraging. https://github.com/dlang/phobos/pull/6129
Then again if you may not need regex for IPv4 / IPv6. In theory it should have been the goto case for ctRegex but not at the cost of such horrible compile times.
Feb 06 2018
prev sibling parent "Nick Sabalausky (Abscissa)" <SeeWebsiteToContactMe semitwist.com> writes:
On 02/06/2018 01:11 AM, Dmitry Olshansky wrote:
 On Tuesday, 6 February 2018 at 05:45:35 UTC, Steven Schveighoffer wrote:
 On 2/6/18 12:35 AM, Dmitry Olshansky wrote:
 That’s really bad idea - isEmail is template so the burden of 
 freaking slow ctRegex
 is paid on per instantiation basis. Could be horrible with separate 
 compilation.
Obviously it is horrible. On my mac, it took about 2.5 seconds to compile this one line. I'm not sure how to fix it though... I suppose you could make
Just use the run-time version, it’s not that much slower. But then again static ipRegex = regex(...) will parse and build regex at CTFE. Maybe lazy init?
If the regex string isn't dependent on the template's params, just move the regex outside the template.
Feb 06 2018
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Feb 06, 2018 at 05:35:44AM +0000, Dmitry Olshansky via Digitalmars-d
wrote:
 On Tuesday, 6 February 2018 at 04:35:42 UTC, Steven Schveighoffer wrote:
 On 2/5/18 11:09 PM, psychoticRabbit wrote:
[...]
 ----
 import std.net.isemail;
 
 void main()
 {
      auto checkEmail = "someone somewhere.com".isEmail();
 }
 ----
I was surprised at this, then I looked at the first line of isEmail: static ipRegex = ctRegex!(`\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}`~ `(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$`.to!(const(Char)[])); So it's really still related to regex.
Yeah, ctRegex is a bear at compile-time. Why can't we just use a runtime regex? It will at least take "only" 3 seconds to compile. :-D Or just don't use a regex at all.
 That’s really bad idea - isEmail is template so the burden of freaking
 slow ctRegex is paid on per instantiation basis. Could be horrible
 with separate compilation.
[...] I'm not sure I'm seeing the value of using ctRegex here. What's wrong with a module static runtime regex initialized by a static this()? And before anyone complains about initializing the regex if user code never actually uses it, it's possible to use static this() on an as-needed basis: template ipRegex() { // Eponymous templates FTW! Regex!char ipRegex; static this() { ipRegex = regex(`blah blah blah`); } } auto isEmail(... blah blah ...) { ... if (ipRegex.match(...)) ... ... } Basically, if `ipRegex` is never referenced, the template is never instantiated and the static this() basically doesn't exist. :-D Pay-as-you-go FTW! T -- If you want to solve a problem, you need to address its root cause, not just its symptoms. Otherwise it's like treating cancer with Tylenol...
Feb 06 2018
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/6/18 2:07 PM, H. S. Teoh wrote:
 I'm not sure I'm seeing the value of using ctRegex here.  What's wrong
 with a module static runtime regex initialized by a static this()?
No, I'd rather have it initialized on first call.
 
 And before anyone complains about initializing the regex if user code
 never actually uses it, it's possible to use static this() on an
 as-needed basis:
 
 	template ipRegex()
 	{
 		// Eponymous templates FTW!
 		Regex!char ipRegex;
 
 		static this()
 		{
 			ipRegex = regex(`blah blah blah`);
 		}
 	}
 
 	auto isEmail(... blah blah ...)
 	{
 		...
 		if (ipRegex.match(...)) ...
 		...
 	}
 
 Basically, if `ipRegex` is never referenced, the template is never
 instantiated and the static this() basically doesn't exist. :-D
 Pay-as-you-go FTW!
You may not realize that this actually compiles it for ALL modules that use it, and the compiler puts in a gate to prevent it from running more than once. So you pay every time anyways (compile-time wise at least). It also makes any importing module now a module that defines a static ctor, so cycles are much more likely. In any case, there is a PR in the works that should eliminate the need for regex altogether: https://github.com/dlang/phobos/pull/6129 -Steve
Feb 06 2018
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/5/2018 9:35 PM, Dmitry Olshansky wrote:
 That’s really bad idea - isEmail is template so the burden of freaking slow
ctRegex
 is paid on per instantiation basis. Could be horrible with separate
compilation.
std.string.isEmail() in D1 was a simple function. Maybe regex is just the wrong solution for this problem. ---------------------- std.string.isEmail -------------------- /*************************** * Does string s[] start with an email address? * Returns: * null it does not * char[] it does, and this is the slice of s[] that is that email address * References: * RFC2822 */ char[] isEmail(char[] s) { size_t i; if (!isalpha(s[0])) goto Lno; for (i = 1; 1; i++) { if (i == s.length) goto Lno; auto c = s[i]; if (isalnum(c)) continue; if (c == '-' || c == '_' || c == '.') continue; if (c != ' ') goto Lno; i++; break; } //writefln("test1 '%s'", s[0 .. i]); /* Now do the part past the ' ' */ size_t lastdot; for (; i < s.length; i++) { auto c = s[i]; if (isalnum(c)) continue; if (c == '-' || c == '_') continue; if (c == '.') { lastdot = i; continue; } break; } if (!lastdot || (i - lastdot != 3 && i - lastdot != 4)) goto Lno; return s[0 .. i]; Lno: return null; }
Feb 06 2018
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/6/18 3:11 PM, Walter Bright wrote:
 On 2/5/2018 9:35 PM, Dmitry Olshansky wrote:
 That’s really bad idea - isEmail is template so the burden of freaking 
 slow ctRegex
 is paid on per instantiation basis. Could be horrible with separate 
 compilation.
std.string.isEmail() in D1 was a simple function. Maybe regex is just the wrong solution for this problem.
The regex in question I think is to ensure an email address like abc 192.168.0.5 has a valid IP address. The D1 function doesn't support that requirement. I admit, I've never used it, so I don't know why it needs to be so complex. But I assume some people depend on that functionality. -Steve
Feb 06 2018
next sibling parent Timothee Cour <thelastmammoth gmail.com> writes:
another weird gotcha:
  auto s="foo".isEmail;
  writeln(s.toString); // ok
  writeln(s); // compile error


On Tue, Feb 6, 2018 at 12:30 PM, Steven Schveighoffer via
Digitalmars-d <digitalmars-d puremagic.com> wrote:
 On 2/6/18 3:11 PM, Walter Bright wrote:
 On 2/5/2018 9:35 PM, Dmitry Olshansky wrote:
 That’s really bad idea - isEmail is template so the burden of freaking
 slow ctRegex
 is paid on per instantiation basis. Could be horrible with separate
 compilation.
std.string.isEmail() in D1 was a simple function. Maybe regex is just the wrong solution for this problem.
The regex in question I think is to ensure an email address like abc 192.168.0.5 has a valid IP address. The D1 function doesn't support that requirement. I admit, I've never used it, so I don't know why it needs to be so complex. But I assume some people depend on that functionality. -Steve
Feb 06 2018
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/6/2018 12:30 PM, Steven Schveighoffer wrote:
 The regex in question I think is to ensure an email address like
abc 192.168.0.5 
 has a valid IP address. The D1 function doesn't support that requirement.
 
 I admit, I've never used it, so I don't know why it needs to be so complex.
But 
 I assume some people depend on that functionality.
Regex is well known to not always be the best solution for string processing tasks. For example, it does not work well at all where recursion is desired, and nobody uses regex for lexer in a compiler.
Feb 06 2018
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Feb 06, 2018 at 02:29:07PM -0800, Walter Bright via Digitalmars-d wrote:
 On 2/6/2018 12:30 PM, Steven Schveighoffer wrote:
 The regex in question I think is to ensure an email address like
 abc 192.168.0.5 has a valid IP address. The D1 function doesn't
 support that requirement.
 
 I admit, I've never used it, so I don't know why it needs to be so
 complex. But I assume some people depend on that functionality.
Regex is well known to not always be the best solution for string processing tasks. For example, it does not work well at all where recursion is desired, and nobody uses regex for lexer in a compiler.
Are you sure? What about lex and its successors, like flex? Of course, one could argue that the generated code isn't strictly a regex implementation in the same way as std.regex... but isn't that just a QoI issue? T -- Life would be easier if I had the source code. -- YHL
Feb 06 2018
prev sibling parent reply Nathan S. <no.public.email example.com> writes:
On Tuesday, 6 February 2018 at 22:29:07 UTC, Walter Bright wrote:
 nobody uses regex for lexer in a compiler.
Some years ago I was surprised when I saw this in Clojure's source code. It appears to still be there today: https://github.com/clojure/clojure/blob/1215ba346ffea3fe48def6ec70542e3300b6f9ed/src/jvm/clojure/lang/LispReader.java#L66-L73 --- static Pattern symbolPat = Pattern.compile("[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)"); //static Pattern varPat = Pattern.compile("([\\D&&[^:\\.]][^:\\.]*):([\\D&&[^:\\.]][^:\\.]*)"); //static Pattern intPat = Pattern.compile("[-+]?[0-9]+\\.?"); static Pattern intPat = Pattern.compile( "([-+]?)(?:(0)|([1-9][0-9]*)|0[xX]([0-9A-Fa-f]+)|0([0-7]+)|([1-9][0-9]?)[rR]([0-9A-Za-z]+)|0[0-9]+)(N)?"); static Pattern ratioPat = Pattern.compile("([-+]?[0-9]+)/([0-9]+)"); static Pattern floatPat = Pattern.compile("([-+]?[0-9]+(\\.[0-9]*)?([eE][-+]?[0-9]+)?)(M)?"); ---
Feb 07 2018
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/7/2018 1:07 PM, Nathan S. wrote:
 On Tuesday, 6 February 2018 at 22:29:07 UTC, Walter Bright wrote:
 nobody uses regex for lexer in a compiler.
Some years ago I was surprised when I saw this in Clojure's source code. It appears to still be there today: https://github.com/clojure/clojure/blob/1215ba346ffea3fe48def6ec70542e3300b6f9ed/src/jvm/clojure/lang/Lis Reader.java#L66-L73 --- static Pattern symbolPat = Pattern.compile("[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)"); //static Pattern varPat = Pattern.compile("([\\D&&[^:\\.]][^:\\.]*):([\\D&&[^:\\.]][^:\\.]*)"); //static Pattern intPat = Pattern.compile("[-+]?[0-9]+\\.?"); static Pattern intPat =         Pattern.compile( "([-+]?)(?:(0)|([1-9][0-9]*)|0[xX]([0-9A-Fa-f]+)|0([0-7]+)|([1-9][0-9]?)[rR]([0-9A-Za- ]+)|0[0-9]+)(N)?"); static Pattern ratioPat = Pattern.compile("([-+]?[0-9]+)/([0-9]+)"); static Pattern floatPat = Pattern.compile("([-+]?[0-9]+(\\.[0-9]*)?([eE][-+]?[0-9]+)?)(M)?"); ---
Yes, I'm sure somebody does it. And now that regex has produced a match, you have to scan it again to turn it into a number, making for slow lexing. And if regex doesn't produce a match, you get a generic error message rather than something specific like "character 'A' is not allowed in a numeric literal". (Generic error messages are one of the downsides of using tools like lex and yacc.)
Feb 07 2018
prev sibling parent reply bauss <jj_1337 live.dk> writes:
On Tuesday, 6 February 2018 at 20:30:42 UTC, Steven Schveighoffer 
wrote:
 The regex in question I think is to ensure an email address 
 like abc 192.168.0.5 has a valid IP address. The D1 function 
 doesn't support that requirement.
 -Steve
An invalid IP is not necessarily an invalid email though. You'd be surprised how much __garbage__ a valid email actually can contain. https://www.w3.org/Protocols/rfc822/ Generally the best way to validate an email is just to check if there is a value before and a value after. The real way to validate an email is to check if the email exists on a SMTP server, BUT some SMTP servers will not provide such information (Such as gmail I think?) and thus you can't really rely on that either.
Feb 09 2018
parent reply aliak <something something.com> writes:
On Friday, 9 February 2018 at 14:19:56 UTC, bauss wrote:
 Generally the best way to validate an email is just to check if 
 there is a value before   and a value after.

 The real way to validate an email is to check if the email 
 exists on a SMTP server, BUT some SMTP servers will not provide 
 such information (Such as gmail I think?) and thus you can't 
 really rely on that either.
+1. If anyone wants to do email validation this should be read first: https://hackernoon.com/the-100-correct-way-to-validate-email-addresses-7c4818f24643
Feb 11 2018
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 11 February 2018 at 16:26:19 UTC, aliak wrote:
 If anyone wants to do email validation this should be read 
 first:
The isemail function isn't about validating email addresses. It is just about recognizing something that looks like one. just like isurl doesn't actually try to fetch the site to see if it is broken, it just sees if it looks like one as a first step.
Feb 11 2018
parent aliak <something something.com> writes:
On Sunday, 11 February 2018 at 16:35:35 UTC, Adam D. Ruppe wrote:
 The isemail function isn't about validating email addresses. It 
 is just about recognizing something that looks like one. just 
 like isurl doesn't actually try to fetch the site to see if it 
 is broken, it just sees if it looks like one as a first step.
*valid email format... (is better? :) ) When someone says isurl checks if a string is a valid url, I don't think the general assumption is that it makes a network call to check if it is a resolvable url. (could be mistaken of course, but not to me at least). Isurl checks that the format is correct. Same for isemail. The isemail API and the docs all use the term valid as well. Plus, to further see how hard it is to validate an email, these are apparently all erroneous results (granted wikipedia could be wrong as well): import std.net.isemail, std.stdio; void main() { isEmail("john.smith(comment) example.com").valid.writeln; // is valid, prints false isEmail("user [2001:DB8::1]").valid.writeln; // is valid, prints false isEmail(`" " example.org`).valid.writeln; // not valid, prints true isEmail(`"very.unusual. .unusual.com" example.com`).valid.writeln; // not valid, prints true }
Feb 11 2018
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2018-02-06 21:11, Walter Bright wrote:

 std.string.isEmail() in D1 was a simple function. Maybe regex is just 
 the wrong solution for this problem.
If I recall correctly, the current implementation of std.net.isEmail was requested by you. -- /Jacob Carlborg
Feb 06 2018
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/6/2018 2:03 PM, Jacob Carlborg wrote:
 On 2018-02-06 21:11, Walter Bright wrote:
 
 std.string.isEmail() in D1 was a simple function. Maybe regex is just the 
 wrong solution for this problem.
If I recall correctly, the current implementation of std.net.isEmail was requested by you.
Regardless of whether it was requested by me or not, if the current version is not working for us, we need to explore alternatives.
Feb 06 2018
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 2/6/18 5:23 PM, Walter Bright wrote:
 On 2/6/2018 2:03 PM, Jacob Carlborg wrote:
 On 2018-02-06 21:11, Walter Bright wrote:

 std.string.isEmail() in D1 was a simple function. Maybe regex is just 
 the wrong solution for this problem.
If I recall correctly, the current implementation of std.net.isEmail was requested by you.
Regardless of whether it was requested by me or not, if the current version is not working for us, we need to explore alternatives.
The regex problem is being solved: https://github.com/dlang/phobos/pull/6129 -Steve
Feb 06 2018
next sibling parent reply Andres Clari <andres steelcode.net> writes:
On Tuesday, 6 February 2018 at 22:51:51 UTC, Steven Schveighoffer 
wrote:
 On 2/6/18 5:23 PM, Walter Bright wrote:
 On 2/6/2018 2:03 PM, Jacob Carlborg wrote:
 On 2018-02-06 21:11, Walter Bright wrote:

 std.string.isEmail() in D1 was a simple function. Maybe 
 regex is just the wrong solution for this problem.
If I recall correctly, the current implementation of std.net.isEmail was requested by you.
Regardless of whether it was requested by me or not, if the current version is not working for us, we need to explore alternatives.
The regex problem is being solved: https://github.com/dlang/phobos/pull/6129 -Steve
That's fixing just the "isEmail" issue which is good I guess. But after reading this thread, I run some tests on one of my code bases, which uses about 6 regex throughout. Switching from ctRegex! to regex yielded a 50% build time reduction, and from what I read even the normal regex are slowing things down considerably. Might need a warning on the docs for ctRegex! explaining it'll screw your build times if you use it, unless there's some way to speed that up to something normal. Btw, my project which is 3517 lines of D builds in 20s disabling the ctRegex on an i7 4770k at 4.3Ghz. So I'd say once you start doing some more complex usages, D's build speed goes out the door.
Feb 06 2018
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Feb 06, 2018 at 11:20:53PM +0000, Andres Clari via Digitalmars-d wrote:
[...]
 Switching from ctRegex! to regex yielded a 50% build time reduction,
 and from what I read even the normal regex are slowing things down
 considerably.
I seem to vaguely recall that in some cases, ctRegex might even perform slower than regex(). But either way, my use cases for regexes generally aren't performance-sensitive enough to be worth the trouble of huge compilation time slowdown -- I just use regex() instead of ctRegex. [...]
 Btw, my project which is 3517 lines of D builds in 20s disabling the
 ctRegex on an i7 4770k at 4.3Ghz.
 
 So I'd say once you start doing some more complex usages, D's build
 speed goes out the door.
That depends on what you're doing with it, and also how you're building it. 3500+ lines isn't a lot of code; it ought to compile pretty fast unless you're using a lot of (1) templates, (2) CTFE. Also, I find that dub builds are excruciatingly slow compared to just invoking dmd directly, due to network access and rescanning dependencies on every invocation. I have a 4700+ line vibe.d project; Diet templates are template/CTFE-heavy and generally take the longest to build. (I dumped dub and went back to an SCons-based system with separate compilation for major subsystems -- as long as I don't recompile Diet templates, the whole thing can build within seconds; with Diet templates it takes about 30 seconds :-/.) T -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn
Feb 06 2018
parent reply Andres Clari <andres steelcode.net> writes:
On Wednesday, 7 February 2018 at 00:36:22 UTC, H. S. Teoh wrote:
 On Tue, Feb 06, 2018 at 11:20:53PM +0000, Andres Clari via 
 Digitalmars-d wrote: [...]
 [...]
I seem to vaguely recall that in some cases, ctRegex might even perform slower than regex(). But either way, my use cases for regexes generally aren't performance-sensitive enough to be worth the trouble of huge compilation time slowdown -- I just use regex() instead of ctRegex. [...]
 [...]
That depends on what you're doing with it, and also how you're building it. 3500+ lines isn't a lot of code; it ought to compile pretty fast unless you're using a lot of (1) templates, (2) CTFE. Also, I find that dub builds are excruciatingly slow compared to just invoking dmd directly, due to network access and rescanning dependencies on every invocation. I have a 4700+ line vibe.d project; Diet templates are template/CTFE-heavy and generally take the longest to build. (I dumped dub and went back to an SCons-based system with separate compilation for major subsystems -- as long as I don't recompile Diet templates, the whole thing can build within seconds; with Diet templates it takes about 30 seconds :-/.) T
Well I'm using vibe.d, but not templates on this project, just a minimal rest service, and a few timers and runTasks. So yeah I don't see why it should slow down that much. Is there some tutorial or example for using SCons with dub dependencies?
Feb 06 2018
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Feb 07, 2018 at 01:22:02AM +0000, Andres Clari via Digitalmars-d wrote:
[...]
 Is there some tutorial or example for using SCons with dub
 dependencies?
Not that I know of. Basically what I did was: - Create a dummy dub project in a subdirectory, containing a dummy source file containing an empty main(). - Declare whatever dub dependencies you need in this dummy project. - Run `dub build -v` inside this subdirectory to make dub fetch dependencies, build libraries, etc.. - Parse the output, esp. the last few lines that show which include paths, linker flags, and libraries are required to build the main program. - Specify these include paths, linker flags, and libraries in your SConstruct file for building your real project. - Build away. - If you need to refresh dependencies, go into the dummy project and run `dub build --force` to rebuild all dependencies, then run scons in your real project. Arguably, some/all of the above could be automated by SCons. Though the whole point is to *not* run dub every single time you build, so I'd keep them separate, or as a non-default build target that only triggers when you explicitly want it to. Also, none of this is specific to SCons; you could use whatever other build system you wish with the above steps. T -- This sentence is false.
Feb 06 2018
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/6/2018 2:51 PM, Steven Schveighoffer wrote:
 The regex problem is being solved:
 
 https://github.com/dlang/phobos/pull/6129
Great!
Feb 06 2018
prev sibling parent psychoticRabbit <meagain meagain.com> writes:
On Tuesday, 6 February 2018 at 20:11:56 UTC, Walter Bright wrote:
 std.string.isEmail() in D1 was a simple function. Maybe regex 
 is just the wrong solution for this problem.

 [...]
C .. D style. I love it! (bugs and all).
Feb 06 2018
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Monday, 5 February 2018 at 21:27:57 UTC, H. S. Teoh wrote:
 One of my D projects for the past while has been taking 
 unusually long times to compile.  This morning, I finally 
 decided to sit down and figure out exactly why. What I found 
 was rather disturbing:

 ------
 import std.regex;
 void main() {
 	auto re = regex(``);
 }
 ------

 Compile command: time dmd -c test.d

 Output:
 ------
 real    0m3.113s
 user    0m2.884s
 sys     0m0.226s
 ------

 Comment out the call to `regex()`, and I get:

 ------
 real    0m0.285s
 user    0m0.262s
 sys     0m0.023s
 ------
 Clearly, something is wrong if the mere act of compiling a 
 regex causes a 4-line program to take *3 seconds* to compile,
There is a fuckton of templates involved, plus a couple of tries are built at CTFE. The regression is curious though, maybe something gets recomputed at CTFE over and over again.
 where normally dmd takes less than a second.
Honestly I’m tired to hell of working with our compiler and its compile time features. When it doesn’t pee itself due to OOM I’m almost happy. In retrospect I should have just provided a C interface and compiled the whole thing separately. And CTFE could easily be replaced by a small custom JIT compiler, it would also work at run-time(!). Especially considering that it’s been 6 years but it’s still is not practical to use ctRegex.
 The latter department as also suffered a regression; see for 
 example: https://github.com/dlang/phobos/pull/5981.)
Yup, Martin seems on top of it, thankfully.
 T
Feb 05 2018
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Feb 06, 2018 at 05:44:17AM +0000, Dmitry Olshansky via Digitalmars-d
wrote:
[...]
 Honestly I’m tired to hell of working with our compiler and its
 compile time features. When it doesn’t pee itself due to OOM I’m
 almost happy.
Heh, dmd's famous memory usage is causing me tons of grief on low-memory systems, too. Basically if you have anything less than 2GB of RAM, you might as well give up trying to compile anything non-trivial. We need to get a serious handle on dmd's memory consumption -- at least let there be an option or something that will turn out the GC or whatever. It's better for dmd to be (gosh) slow, than for it not to be able to compile anything at all due to it provoking the kernel OOM killer.
 In retrospect I should have just provided a C interface and compiled
 the whole thing separately. And CTFE could easily be replaced by a
 small custom JIT compiler, it would also work at run-time(!).
We seriously need to get newCTFE finished and merged. Stefan is very busy with other stuff ATM; I wonder if a few of us can continue his work and get newCTFE into a mergeable state. Given how much D's "compile-time" features are advertised, and D's new (ick) slogan of being fast or whatever, it's high time we actually delivered on our promises by actually making CTFE more usable. On that note, though, I think a JIT regex compiler totally makes sense. I'd totally support that.
 Especially considering that it’s been 6 years but it’s still is not
 practical to use ctRegex.
I find that using just plain `regex` is Good Enough(tm) for my purposes. Do we really need ctRegex? The idea of generating an optimal FSM at compile-time is rather appealing, but in the grand scheme of things, doesn't seem like an absolute must-have.
 The latter department as also suffered a regression; see for
 example: https://github.com/dlang/phobos/pull/5981.)
Yup, Martin seems on top of it, thankfully.
[...] Unfortunately, Martin's PR is only to improve runtime performance. It's still dog-slow to *compile* std.regex. :-( T -- Dogs have owners ... cats have staff. -- Krista Casada
Feb 06 2018
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 6 February 2018 at 18:56:44 UTC, H. S. Teoh wrote:
 We seriously need to get newCTFE finished and merged.  Stefan 
 is very busy with other stuff ATM; I wonder if a few of us can 
 continue his work and get newCTFE into a mergeable state.  
 Given how much D's "compile-time" features are advertised, and 
 D's new (ick) slogan of being fast or whatever, it's high time 
 we actually delivered on our promises by actually making CTFE 
 more usable.
There are some good news for you. I've recently allocated a few more resources to newCTFE again. I have to stress that it is not enough to get newCTFE feature complete. It is also vital make performance-related pass through the code. newCTFE currently still at a Proof-Of-Concept quality level. That said, newCTFE is designed with performance and JIT in mind. It can achieve a 10-30x speed-up when implemented properly. One thing that I really need in druntime is a cross-platform way to allocate executable memory-pages, this can be done by someone else. Another Thing that can be done is reviewing the code and alerting me to potential problems. i.e. Missing or indecipherable comments as well as spelling mistakes. (with the correction please (just telling me something is wrong, will not help since I obliviously don't know how to spell it))
Feb 07 2018
prev sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 6 February 2018 at 18:56:44 UTC, H. S. Teoh wrote:
 We seriously need to get newCTFE finished and merged.  Stefan 
 is very busy with other stuff ATM; I wonder if a few of us can 
 continue his work and get newCTFE into a mergeable state.  
 Given how much D's "compile-time" features are advertised, and 
 D's new (ick) slogan of being fast or whatever, it's high time 
 we actually delivered on our promises by actually making CTFE 
 more usable.
There are some good news for you. I've recently allocated a few more resources to newCTFE again. I have to stress that it is not enough to get newCTFE feature complete. It is also vital make performance-related pass through the code. newCTFE currently still at a Proof-Of-Concept quality level. That said, newCTFE is designed with performance and JIT in mind. It can achieve a 10-30x speed-up when implemented properly. One thing that I really need in druntime is a cross-platform way to allocate executable memory-pages, this can be done by someone else. Another Thing that can be done is reviewing the code and alerting me to potential problems. i.e. Missing or indecipherable comments as well as spelling mistakes. (with the correction please (just telling me something is wrong, will not help since I obliviously don't know how to spell it))
Feb 07 2018
next sibling parent reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Wednesday, 7 February 2018 at 09:27:47 UTC, Stefan Koch wrote:
 Another Thing that can be done is reviewing the code and 
 alerting me to potential problems. i.e. Missing or 
 indecipherable comments as well as spelling mistakes.
 (with the correction please (just telling me something is 
 wrong, will not help since I obliviously don't know how to 
 spell it))
What is the preferred place for this? https://github.com/dlang/dmd/pull/7073 or do you want PRs against a fork of yours?
Feb 07 2018
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 7 February 2018 at 22:00:48 UTC, Bastiaan Veelo 
wrote:
 On Wednesday, 7 February 2018 at 09:27:47 UTC, Stefan Koch 
 wrote:
 Another Thing that can be done is reviewing the code and 
 alerting me to potential problems. i.e. Missing or 
 indecipherable comments as well as spelling mistakes.
 (with the correction please (just telling me something is 
 wrong, will not help since I obliviously don't know how to 
 spell it))
What is the preferred place for this? https://github.com/dlang/dmd/pull/7073 or do you want PRs against a fork of yours?
I'd prefer a pr against https://github.com/UplinkCoder/dmd/newCTFE_reboot
Feb 08 2018
prev sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 7 February 2018 at 22:00:48 UTC, Bastiaan Veelo 
wrote:
 On Wednesday, 7 February 2018 at 09:27:47 UTC, Stefan Koch 
 wrote:
 Another Thing that can be done is reviewing the code and 
 alerting me to potential problems. i.e. Missing or 
 indecipherable comments as well as spelling mistakes.
 (with the correction please (just telling me something is 
 wrong, will not help since I obliviously don't know how to 
 spell it))
What is the preferred place for this? https://github.com/dlang/dmd/pull/7073 or do you want PRs against a fork of yours?
Corrected link: https://github.com/UplinkCoder/dmd/tree/newCTFE_reboot
Feb 08 2018
prev sibling parent reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Wednesday, 7 February 2018 at 09:27:47 UTC, Stefan Koch wrote:
 One thing that I really need in druntime is a cross-platform 
 way to allocate executable memory-pages, this can be done by 
 someone else.
Is this on someone's agenda? It probably needs an enhancement request at the very least, I don't think it's there yet [1].
 Another Thing that can be done is reviewing the code and 
 alerting me to potential problems. i.e. Missing or 
 indecipherable comments as well as spelling mistakes.
I had a go at this [2]. [1] https://issues.dlang.org/buglist.cgi?bug_severity=enhancement&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&component=druntime&list_id=219522&product=D&query_format=advanced [2] https://github.com/UplinkCoder/dmd/pull/3
Feb 11 2018
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Monday, 12 February 2018 at 00:24:32 UTC, Bastiaan Veelo wrote:
 On Wednesday, 7 February 2018 at 09:27:47 UTC, Stefan Koch 
 wrote:
 One thing that I really need in druntime is a cross-platform 
 way to allocate executable memory-pages, this can be done by 
 someone else.
Is this on someone's agenda? It probably needs an enhancement request at the very least, I don't think it's there yet [1].
Was once on my together with other OS memory manager functions, but postponed the work indefinetly. https://github.com/dlang/druntime/pull/1549 If someone is willing to revive that I’d gladly assist with review. Lastly on Windows it would need FlushCpuCaches call before executing new memory. And ofc JIT is cool, but it would be more cool to have sane interpreter that doesn’t leak sooner. Simply put JIT is x5 work due to different architectures and seeing first-hand how it goes I’m not sure we want that in our compiler yet.
Feb 12 2018
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 13 February 2018 at 05:47:10 UTC, Dmitry Olshansky 
wrote:
 Was once on my together with other OS memory manager functions, 
 but postponed the work indefinetly.

 https://github.com/dlang/druntime/pull/1549

 If someone is willing to revive that I’d gladly assist with 
 review.

 Lastly on Windows it would need FlushCpuCaches call before 
 executing new memory.

 And ofc JIT is cool, but it would be more cool to have sane 
 interpreter that doesn’t leak sooner. Simply put JIT is x5 work 
 due to different architectures and seeing first-hand how it 
 goes I’m not sure we want that in our compiler yet.
Since dmd is only targeting x86/x86_64 there is really just one arch to support for now. All the others can fallback to either the interpreter or generated c code compiled into a shared lib :) newCTFE already provides a very low-level IR that should be trivially translatable to machine -code. (famous last words :o) )
Feb 13 2018
prev sibling parent reply Martin Tschierschke <mt smartdolphin.de> writes:
On Monday, 5 February 2018 at 21:27:57 UTC, H. S. Teoh wrote:
 One of my D projects for the past while has been taking 
 unusually long times to compile.  This morning, I finally 
 decided to sit down and figure out exactly why. What I found 
 was rather disturbing:

 ------
 import std.regex;
 void main() {
 	auto re = regex(``);
 }
 ------

 Compile command: time dmd -c test.d

 Output:
 ------
 real    0m3.113s
 user    0m2.884s
 sys     0m0.226s
 ------

 Comment out the call to `regex()`, and I get:

 ------
 real    0m0.285s
 user    0m0.262s
 sys     0m0.023s
 ------

 Clearly, something is wrong if the mere act of compiling a 
 regex causes a 4-line program to take *3 seconds* to compile, 
 where normally dmd takes less than a second.
Thank you for this finding! I was wondering why my little vibe.d project started to take approximately twice the time to compile, and because of making a mistake in my test setup, even my minimal program still included the file containing the regex. So that even reducing the used code to a minimum the compilation time was ~7 sec compared to less than 4 seconds. Would be cool if we could get fast compilation of regex. I am coming from using scripting languages (perl and ruby) using regex a lot, so that this is really disappointing for me. Beginner question: How to split my project, to compile the regex part separately as a lib and just link them?
Feb 08 2018
parent "Nick Sabalausky (Abscissa)" <SeeWebsiteToContactMe semitwist.com> writes:
On 02/08/2018 06:21 AM, Martin Tschierschke wrote:
 
 Beginner question:
 How to split my project, to compile the regex part separately as a lib 
 and just link them?
 
Unfortunately that depends completely on what buildsystem you're using. But if you're just calling the compiler directly, then it's really easy:
 dmd -lib -of=myLib.a [all other flags your project may need] 
fileYouWantInLib.d anyOtherFileYouAlsoWant.d
 dmd myLib.a [your project's usual flags, and all the rest of your .d 
files] If on windows, then just replace ".a" with ".lib".
Feb 08 2018