www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Use SIMD to accelerate comment lexing

reply Walter Bright <newshound2 digitalmars.com> writes:
https://issues.dlang.org/show_bug.cgi?id=14641

Manu, our resident god of vector instructions, do you want to take this on?
Jun 01 2015
next sibling parent reply "Brian Schott" <briancschott gmail.com> writes:
On Monday, 1 June 2015 at 19:38:59 UTC, Walter Bright wrote:
 https://issues.dlang.org/show_bug.cgi?id=14641

 Manu, our resident god of vector instructions, do you want to 
 take this on?
libdparse does this already. I added some information to that bug report that may be useful.
Jun 01 2015
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/1/2015 1:18 PM, Brian Schott wrote:
 libdparse does this already. I added some information to that bug report that
 may be useful.
Thank you!
Jun 01 2015
prev sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, 1 June 2015 at 20:18:19 UTC, Brian Schott wrote:
 On Monday, 1 June 2015 at 19:38:59 UTC, Walter Bright wrote:
 https://issues.dlang.org/show_bug.cgi?id=14641

 Manu, our resident god of vector instructions, do you want to 
 take this on?
libdparse does this already. I added some information to that bug report that may be useful.
Honestly, I'm quite impressed with what I've heard you say about the lengths that you've gone to in order to optimize your lexer and parser. _Very_ cool. And all the better if dmd gets some of the same optimizations. - Jonathan M Davis
Jun 01 2015
prev sibling next sibling parent "luminousone" <rd.hunt gmail.com> writes:
On Monday, 1 June 2015 at 19:38:59 UTC, Walter Bright wrote:
 https://issues.dlang.org/show_bug.cgi?id=14641

 Manu, our resident god of vector instructions, do you want to 
 take this on?
Looking at that code, I would think that some well placed prefetch and Non Temporal move intrinsic's, would do more good then anything else. CPU speculative read ahead is not so smart as one would think it otta be.
Jun 01 2015
prev sibling parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 2 June 2015 at 05:39, Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 https://issues.dlang.org/show_bug.cgi?id=14641

 Manu, our resident god of vector instructions, do you want to take this on?
How do you measure this? Is there a convenient setup that will produce a realistic test environment? This is more awkward in C than in D, it needs a different implementation for each compiler... will DMD's CI build with all common C compilers to prove that the implementations are correct?
Jun 02 2015
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/2/2015 5:27 AM, Manu via Digitalmars-d wrote:
 On 2 June 2015 at 05:39, Walter Bright via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 https://issues.dlang.org/show_bug.cgi?id=14641

 Manu, our resident god of vector instructions, do you want to take this on?
How do you measure this?
Time how long the compiler takes when running a non-optimized build.
 Is there a convenient setup that will produce
 a realistic test environment?
Compile Phobos.
 This is more awkward in C than in D, it needs a different
 implementation for each compiler... will DMD's CI build with all
 common C compilers to prove that the implementations are correct?
Just make it work with one compiler on one platform, the most convenient one. We can extend it to others later.
Jun 02 2015
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, 2 June 2015 at 15:08:07 UTC, Walter Bright wrote:
 Just make it work with one compiler on one platform, the most 
 convenient one. We can extend it to others later.
Plus, within a few months, we may have switched over to ddmd (hopefully), in which case, you can just do it the D way. - Jonathan M Davis
Jun 02 2015
parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 2 June 2015 at 19:11, Jonathan M Davis via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 On Tuesday, 2 June 2015 at 15:08:07 UTC, Walter Bright wrote:

 Just make it work with one compiler on one platform, the most convenient
 one. We can extend it to others later.
Plus, within a few months, we may have switched over to ddmd (hopefully), in which case, you can just do it the D way.
And what would the D way be?
Jun 02 2015
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, 2 June 2015 at 17:24:09 UTC, Iain Buclaw wrote:
 On 2 June 2015 at 19:11, Jonathan M Davis via Digitalmars-d <
 digitalmars-d puremagic.com> wrote:

 On Tuesday, 2 June 2015 at 15:08:07 UTC, Walter Bright wrote:

 Just make it work with one compiler on one platform, the most 
 convenient
 one. We can extend it to others later.
Plus, within a few months, we may have switched over to ddmd (hopefully), in which case, you can just do it the D way.
And what would the D way be?
We have simd in the language, so we can use that, whereas Manu was talking about how he would have to do it differently in C/C++. - Jonathan M Davis
Jun 02 2015
parent reply Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 2 June 2015 at 19:25, Jonathan M Davis via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 On Tuesday, 2 June 2015 at 17:24:09 UTC, Iain Buclaw wrote:

 On 2 June 2015 at 19:11, Jonathan M Davis via Digitalmars-d <
 digitalmars-d puremagic.com> wrote:

  On Tuesday, 2 June 2015 at 15:08:07 UTC, Walter Bright wrote:
  Just make it work with one compiler on one platform, the most convenient
 one. We can extend it to others later.
Plus, within a few months, we may have switched over to ddmd (hopefully), in which case, you can just do it the D way. And what would the D way be?
We have simd in the language, so we can use that, whereas Manu was talking about how he would have to do it differently in C/C++. - Jonathan M Davis
I was being deliberately quizzical because there are different takes on what you would call simd in the language, what set of types are available to you, what intrinsics are exposed (and how they are exposed), etc.
Jun 02 2015
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, 2 June 2015 at 17:54:38 UTC, Iain Buclaw wrote:
 I was being deliberately quizzical because there are different 
 takes on
 what you would call simd in the language, what set of types are 
 available
 to you, what intrinsics are exposed (and how they are exposed), 
 etc.
Well, Manu would know that a lot better than I would. I hadn't even heard of SIMD before he got Walter to put it into the language, and I have yet to use it, though I do think that I need to look into it at some point to see where I could take advantage of it. - Jonathan M Davis
Jun 02 2015
parent reply "weaselcat" <weaselcat gmail.com> writes:
On Tuesday, 2 June 2015 at 18:20:51 UTC, Jonathan M Davis wrote:
 On Tuesday, 2 June 2015 at 17:54:38 UTC, Iain Buclaw wrote:
 I was being deliberately quizzical because there are different 
 takes on
 what you would call simd in the language, what set of types 
 are available
 to you, what intrinsics are exposed (and how they are 
 exposed), etc.
Well, Manu would know that a lot better than I would. I hadn't even heard of SIMD before he got Walter to put it into the language, and I have yet to use it, though I do think that I need to look into it at some point to see where I could take advantage of it. - Jonathan M Davis
D's simd library is difficult to use in comparison with gcc or clang's extensions to C/C++. bye,
Jun 02 2015
parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 3 June 2015 at 07:18, weaselcat via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Tuesday, 2 June 2015 at 18:20:51 UTC, Jonathan M Davis wrote:
 On Tuesday, 2 June 2015 at 17:54:38 UTC, Iain Buclaw wrote:
 I was being deliberately quizzical because there are different takes on
 what you would call simd in the language, what set of types are available
 to you, what intrinsics are exposed (and how they are exposed), etc.
Well, Manu would know that a lot better than I would. I hadn't even heard of SIMD before he got Walter to put it into the language, and I have yet to use it, though I do think that I need to look into it at some point to see where I could take advantage of it. - Jonathan M Davis
D's simd library is difficult to use in comparison with gcc or clang's extensions to C/C++. bye,
I'll wear responsibility for this, but std.simd is proving really hard for me to finish. I think in order to get something in there to start with, I need to reduce the scope to the simplest bits, get them in, then build outwards. It's fairly large to cover everything I think is important, and there's a few tools missing still; I can't finish without some way to know the SIMD flags fed to the compiler from the command line (some standard versions?), and it's also difficult to resolve without forceinline of some sort. I've reached situations where the compiler(/s) just don't do what I want it to. Also, I think the stack of simd function influence the compilers inline heuristics, and even thought the compiler decides to inline simd functions, presence of many of them in an outer function seems to reduce the probability that the outer function will be inlined as it should. forceinline needs to be a hard statement to the compiler, and ideally, a forceinline call tree shouldn't improperly influence the compilers inline decisions for outer functions. As an aside, I need a test environment for each compiler, targetting x86, x64 and arm at least, where I can submit some code, and have it run the unittests on a matrix of appropriate targets. (ideally PPC and MIPS would also be included, so they can influence design decisions.) Does any such test system exist? A web service to provide this would be invaluable... I don't have all those systems available to me.
Jun 02 2015
next sibling parent Johannes Pfau <nospam example.com> writes:
Am Wed, 3 Jun 2015 09:08:52 +1000
schrieb Manu via Digitalmars-d <digitalmars-d puremagic.com>:

 As an aside, I need a test environment for each compiler, targetting
 x86, x64 and arm at least, where I can submit some code, and have it
 run the unittests on a matrix of appropriate targets. (ideally PPC and
 MIPS would also be included, so they can influence design decisions.)
 Does any such test system exist? A web service to provide this would
 be invaluable... I don't have all those systems available to me.
There are qemu images for most architectures supported by debian: https://people.debian.org/~aurel32/qemu/ You'll need a very recent qemu but it works better than expected. (I hope to utilize these for cross compiler testing in the future. But someone first needs to implement cross compiler testing in the dmd test suite...)
Jun 02 2015
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2015-06-03 01:08, Manu via Digitalmars-d wrote:

 It's fairly large to cover everything I think is important, and
 there's a few tools missing still; I can't finish without some way to
 know the SIMD flags fed to the compiler from the command line (some
 standard versions?), and it's also difficult to resolve without
 forceinline of some sort.
Isn't it possible to proceed without forceinline, to be able to finish the functionality. I understand that you think it's useless for performance reasons, but is it enough to get the functionality correct?
 As an aside, I need a test environment for each compiler, targetting
 x86, x64 and arm at least, where I can submit some code, and have it
 run the unittests on a matrix of appropriate targets. (ideally PPC and
 MIPS would also be included, so they can influence design decisions.)
 Does any such test system exist? A web service to provide this would
 be invaluable... I don't have all those systems available to me.
Travis CI [1] can be used for x86-64, Linux and OS X. There's also a service that uses Windows for their hosts, but I can't remember the name right now. [1] https://travis-ci.org -- /Jacob Carlborg
Jun 03 2015
next sibling parent Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 3 June 2015 at 17:50, Jacob Carlborg via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 2015-06-03 01:08, Manu via Digitalmars-d wrote:

 It's fairly large to cover everything I think is important, and
 there's a few tools missing still; I can't finish without some way to
 know the SIMD flags fed to the compiler from the command line (some
 standard versions?), and it's also difficult to resolve without
 forceinline of some sort.
Isn't it possible to proceed without forceinline, to be able to finish the functionality. I understand that you think it's useless for performance reasons, but is it enough to get the functionality correct?
The codegen is everything. Functionality is the easy part here ;) Most things are already correct, but I can't confidently proof out the codegen. The main blocker though is that I don't know what simd level the user requested on the command line. The library has no idea what hardware features to target without explicit statement by the user.
 As an aside, I need a test environment for each compiler, targetting
 x86, x64 and arm at least, where I can submit some code, and have it
 run the unittests on a matrix of appropriate targets. (ideally PPC and
 MIPS would also be included, so they can influence design decisions.)
 Does any such test system exist? A web service to provide this would
 be invaluable... I don't have all those systems available to me.
Travis CI [1] can be used for x86-64, Linux and OS X. There's also a service that uses Windows for their hosts, but I can't remember the name right now.
I use travis. I was thinking smething more d-specific, along the lines of DPaste...
Jun 03 2015
prev sibling parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 3 June 2015 at 11:28, Manu via Digitalmars-d <digitalmars-d puremagic.com
 wrote:
 On 3 June 2015 at 17:50, Jacob Carlborg via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 2015-06-03 01:08, Manu via Digitalmars-d wrote:

 It's fairly large to cover everything I think is important, and
 there's a few tools missing still; I can't finish without some way to
 know the SIMD flags fed to the compiler from the command line (some
 standard versions?), and it's also difficult to resolve without
 forceinline of some sort.
Isn't it possible to proceed without forceinline, to be able to finish
the
 functionality. I understand that you think it's useless for performance
 reasons, but is it enough to get the functionality correct?
The codegen is everything. Functionality is the easy part here ;) Most things are already correct, but I can't confidently proof out the codegen. The main blocker though is that I don't know what simd level the user requested on the command line. The library has no idea what hardware features to target without explicit statement by the user.
Well, the compiler knows whether the types are supported natively at least, and you can probe this information using CTFE.
Jun 03 2015
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/2/2015 4:08 PM, Manu via Digitalmars-d wrote:
 I'll wear responsibility for this, but std.simd is proving really hard
 for me to finish.
 I think in order to get something in there to start with, I need to
 reduce the scope to the simplest bits, get them in, then build
 outwards.
 It's fairly large to cover everything I think is important, and
 there's a few tools missing still; I can't finish without some way to
 know the SIMD flags fed to the compiler from the command line (some
 standard versions?), and it's also difficult to resolve without
 forceinline of some sort. I've reached situations where the
 compiler(/s) just don't do what I want it to. Also, I think the stack
 of simd function influence the compilers inline heuristics, and even
 thought the compiler decides to inline simd functions, presence of
 many of them in an outer function seems to reduce the probability that
 the outer function will be inlined as it should. forceinline needs to
 be a hard statement to the compiler, and ideally, a forceinline call
 tree shouldn't improperly influence the compilers inline decisions for
 outer functions.
I suggest not worrying about forceinline for the moment, and just write the code as if it existed.
 As an aside, I need a test environment for each compiler, targetting
 x86, x64 and arm at least, where I can submit some code, and have it
 run the unittests on a matrix of appropriate targets. (ideally PPC and
 MIPS would also be included, so they can influence design decisions.)
 Does any such test system exist? A web service to provide this would
 be invaluable... I don't have all those systems available to me.
Just make it work on the machine you have, and prove it out on that machine. Then worry about porting it.
Jun 03 2015
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/2/15 5:27 AM, Manu via Digitalmars-d wrote:
 On 2 June 2015 at 05:39, Walter Bright via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 https://issues.dlang.org/show_bug.cgi?id=14641

 Manu, our resident god of vector instructions, do you want to take this on?
How do you measure this? Is there a convenient setup that will produce a realistic test environment?
Probably building Phobos is a good baseline. If simple experiments show measurable speedup with Phobos, it's likely to be worth the effort. -- Andrei
Jun 02 2015
prev sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 2 June 2015 at 12:27:38 UTC, Manu wrote:
 On 2 June 2015 at 05:39, Walter Bright via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 https://issues.dlang.org/show_bug.cgi?id=14641

 Manu, our resident god of vector instructions, do you want to 
 take this on?
How do you measure this? Is there a convenient setup that will produce a realistic test environment? This is more awkward in C than in D, it needs a different implementation for each compiler... will DMD's CI build with all common C compilers to prove that the implementations are correct?
Well, I discussed that with clang people a while ago and here are how they do it and their measurement : You go though character and look for a '/'. When you hit one, you check if the character before it is a *, and if so, you have the end of the comment. There is obviously various edges cases to take into account, but that is the general idea. You can find the code in Lexer::SkipBlockComment in clang/lib/Lex/Lexer.cpp Various benchmark on their side have shown that alignment is desirable before having vector operations to kick in. They used to have an AVX implementation, but it seems to be gone now, I'm not sure why at this stage.
Jun 02 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/2/2015 5:45 PM, deadalnix wrote:
 Well, I discussed that with clang people a while ago and here are how they do
it
 and their measurement :

 You go though character and look for a '/'. When you hit one, you check if the
 character before it is a *, and if so, you have the end of the comment. There
is
 obviously various edges cases to take into account, but that is the general
idea.

 You can find the code in Lexer::SkipBlockComment in clang/lib/Lex/Lexer.cpp

 Various benchmark on their side have shown that alignment is desirable before
 having vector operations to kick in. They used to have an AVX implementation,
 but it seems to be gone now, I'm not sure why at this stage.
Line numbers have to be kept track of as well.
Jun 03 2015
parent reply "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 3 June 2015 at 22:50:52 UTC, Walter Bright wrote:
 On 6/2/2015 5:45 PM, deadalnix wrote:
 Well, I discussed that with clang people a while ago and here 
 are how they do it
 and their measurement :

 You go though character and look for a '/'. When you hit one, 
 you check if the
 character before it is a *, and if so, you have the end of the 
 comment. There is
 obviously various edges cases to take into account, but that 
 is the general idea.

 You can find the code in Lexer::SkipBlockComment in 
 clang/lib/Lex/Lexer.cpp

 Various benchmark on their side have shown that alignment is 
 desirable before
 having vector operations to kick in. They used to have an AVX 
 implementation,
 but it seems to be gone now, I'm not sure why at this stage.
Line numbers have to be kept track of as well.
They retrieve line number lazily when needed, with various mechanism to speedup the lookup.
Jun 03 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/3/2015 7:05 PM, deadalnix wrote:
 On Wednesday, 3 June 2015 at 22:50:52 UTC, Walter Bright wrote:
 On 6/2/2015 5:45 PM, deadalnix wrote:
 You go though character and look for a '/'. When you hit one, you check if the
 character before it is a *, and if so, you have the end of the comment. There
is
 obviously various edges cases to take into account, but that is the general
 idea.
Line numbers have to be kept track of as well.
They retrieve line number lazily when needed, with various mechanism to speedup the lookup.
Hmm. There's no way to get the line number without counting LFs, and that means searching for them.
Jun 04 2015
next sibling parent reply "Brian Schott" <briancschott gmail.com> writes:
On Thursday, 4 June 2015 at 18:39:02 UTC, Walter Bright wrote:
 Hmm. There's no way to get the line number without counting 
 LFs, and that means searching for them.
It would be nice if it was that simple. EndOfLine: \u000D \u000A \u000D \u000A \u2028 \u2029 EndOfFile
Jun 04 2015
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2015 1:44 PM, Brian Schott wrote:
 It would be nice if it was that simple.

 EndOfLine:
      \u000D
      \u000A
      \u000D \u000A
      \u2028
      \u2029
      EndOfFile
Yeah, you're right
Jun 04 2015
prev sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
On Thursday, 4 June 2015 at 18:39:02 UTC, Walter Bright wrote:
 On 6/3/2015 7:05 PM, deadalnix wrote:
 On Wednesday, 3 June 2015 at 22:50:52 UTC, Walter Bright wrote:
 On 6/2/2015 5:45 PM, deadalnix wrote:
 You go though character and look for a '/'. When you hit 
 one, you check if the
 character before it is a *, and if so, you have the end of 
 the comment. There is
 obviously various edges cases to take into account, but that 
 is the general
 idea.
Line numbers have to be kept track of as well.
They retrieve line number lazily when needed, with various mechanism to speedup the lookup.
Hmm. There's no way to get the line number without counting LFs, and that means searching for them.
Yes, the first time you query file number, clang build metadata about new line by going through the file's content and finding position of new lines. The process uses vector operation as well. Apparently, they think it is better to do that way for various reasons: - Position tracking is more compact (and position is embedded in all expression, declaration, and more) which reduce memory footprint bu quite a lot. - It makes the lexer simpler and faster. - You don't need to track new lines if you don't use them. If you don't emit debug infos in C++, and have no error, most line number are not used (not sure in D, because various language facilities like bound checking uses line number, but that is a win in C++). - Debug emission have some predictable access pattern, and algorithm to find line number from an offset in the file are special cased to handle it. - Finding new line can be vectorized on the whole file. t cannot be vectorized when done in // with lexing. Once again, I'm not sure this is a win in D, because we need line number more than in C++, but it seems to be a win in C++.
Jun 04 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/4/2015 2:44 PM, deadalnix wrote:
 On Thursday, 4 June 2015 at 18:39:02 UTC, Walter Bright wrote:
 On 6/3/2015 7:05 PM, deadalnix wrote:
 On Wednesday, 3 June 2015 at 22:50:52 UTC, Walter Bright wrote:
 On 6/2/2015 5:45 PM, deadalnix wrote:
 You go though character and look for a '/'. When you hit one, you check if the
 character before it is a *, and if so, you have the end of the comment.
 There is
 obviously various edges cases to take into account, but that is the general
 idea.
Line numbers have to be kept track of as well.
They retrieve line number lazily when needed, with various mechanism to speedup the lookup.
Hmm. There's no way to get the line number without counting LFs, and that means searching for them.
Yes, the first time you query file number, clang build metadata about new line by going through the file's content and finding position of new lines. The process uses vector operation as well. Apparently, they think it is better to do that way for various reasons: - Position tracking is more compact (and position is embedded in all expression, declaration, and more) which reduce memory footprint bu quite a lot. - It makes the lexer simpler and faster. - You don't need to track new lines if you don't use them. If you don't emit debug infos in C++, and have no error, most line number are not used (not sure in D, because various language facilities like bound checking uses line number, but that is a win in C++). - Debug emission have some predictable access pattern, and algorithm to find line number from an offset in the file are special cased to handle it. - Finding new line can be vectorized on the whole file. t cannot be vectorized when done in // with lexing. Once again, I'm not sure this is a win in D, because we need line number more than in C++, but it seems to be a win in C++.
It's an interesting approach. I generally shoot for making the debug builds the fastest, because that's when people are in the edit-compile-debug loop. And the debug output needs line numbers :-)
Jun 04 2015
parent "deadalnix" <deadalnix gmail.com> writes:
On Friday, 5 June 2015 at 00:30:44 UTC, Walter Bright wrote:
 It's an interesting approach. I generally shoot for making the 
 debug builds the fastest, because that's when people are in the 
 edit-compile-debug loop. And the debug output needs line 
 numbers :-)
In C++, you would not need line number for header, even in debug mode, unless they contains various template and/or implementation, which I assume would be false in many cases.
Jun 04 2015