www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Want to help DMD bugfixing? Write a simple utility.

reply Don <nospam nospam.com> writes:
Here's the task:
Given a .d source file, strip out all of the unittest {} blocks,
including everything inside them.
Strip out all comments as well.
Print out the resulting file.

Motivation: Bug reports frequently come with very large test cases.
Even ones which look small often import from Phobos.
Reducing the test case is the first step in fixing the bug, and it's 
frequently ~30% of the total time required. Stripping out the unit tests 
is the most time-consuming and error-prone part of reducing the test case.

This should be a good task if you're relatively new to D but would like 
to do something really useful.
-Don
Mar 19 2011
next sibling parent David Nadlinger <see klickverbot.at> writes:
On 3/20/11 1:11 AM, Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.
I realize that you asked for a very specific utility, but in several instances, http://delta.tigris.org/ worked fine for me for reducing large test cases. Parts of it are tailored to C/C++ though, so a port/adaption for D would be a nice project as well. David
Mar 19 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.
 
 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit tests
 is the most time-consuming and error-prone part of reducing the test case.
 
 This should be a good task if you're relatively new to D but would like
 to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have a working D lexer (and possibly parser). You might be able to get something close enough to work in most cases, but it doesn't take all that much to throw off a basic implementation of this sort of thing if you don't lex/parse it with something which properly understands D. - Jonathan M Davis
Mar 19 2011
next sibling parent reply Don <nospam nospam.com> writes:
Jonathan M Davis wrote:
 On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.

 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit tests
 is the most time-consuming and error-prone part of reducing the test case.

 This should be a good task if you're relatively new to D but would like
 to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have a working D lexer (and possibly parser). You might be able to get something close enough to work in most cases, but it doesn't take all that much to throw off a basic implementation of this sort of thing if you don't lex/parse it with something which properly understands D. - Jonathan M Davis
I didn't say it needs 100% accuracy. You can assume, for example, that "unittest" always occurs at the start of a line. The only other things you need to lex are {}, string literals, and comments. BTW, the immediate motivation for this is std.datetime in Phobos. The sheer number of unittests in there is an absolute catastrophe for tracking down bugs. It makes a tool like this MANDATORY.
Mar 19 2011
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday 19 March 2011 18:04:57 Don wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.
 
 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit tests
 is the most time-consuming and error-prone part of reducing the test
 case.
 
 This should be a good task if you're relatively new to D but would like
 to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have a working D lexer (and possibly parser). You might be able to get something close enough to work in most cases, but it doesn't take all that much to throw off a basic implementation of this sort of thing if you don't lex/parse it with something which properly understands D. - Jonathan M Davis
I didn't say it needs 100% accuracy. You can assume, for example, that "unittest" always occurs at the start of a line. The only other things you need to lex are {}, string literals, and comments. BTW, the immediate motivation for this is std.datetime in Phobos. The sheer number of unittests in there is an absolute catastrophe for tracking down bugs. It makes a tool like this MANDATORY.
I tried to create a similar tool before and gave up because I couldn't make it 100% accurate and was running into problems with it. If someone wants to take a shot at it though, that's fine. As for the unit tests in std.datetime making it hard to track down bugs, that only makes sense to me if you're trying to look at the whole thing at once and track down a compiler bug which happens _somewhere_ in the code, but you don't know where. Other than a problem like that, I don't really see how the unit tests get in the way of tracking down bugs. Is it that you need to compile in a version of std.datetime which doesn't have any unit tests compiled in but you still need to compile with -unittest for other stuff? I _am_ working on streamlining the unit tests in std.datetime so that they take up fewer lines of code without reducing how well they cover the code, so depending on your problem with the amount of unit test code, that could help, but I expect that whatever your core problem with the number of unit tests is, that won't fix it. - Jonathan M Davis
Mar 19 2011
parent reply Don <nospam nospam.com> writes:
Jonathan M Davis wrote:
 On Saturday 19 March 2011 18:04:57 Don wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.

 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit tests
 is the most time-consuming and error-prone part of reducing the test
 case.

 This should be a good task if you're relatively new to D but would like
 to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have a working D lexer (and possibly parser). You might be able to get something close enough to work in most cases, but it doesn't take all that much to throw off a basic implementation of this sort of thing if you don't lex/parse it with something which properly understands D. - Jonathan M Davis
I didn't say it needs 100% accuracy. You can assume, for example, that "unittest" always occurs at the start of a line. The only other things you need to lex are {}, string literals, and comments. BTW, the immediate motivation for this is std.datetime in Phobos. The sheer number of unittests in there is an absolute catastrophe for tracking down bugs. It makes a tool like this MANDATORY.
I tried to create a similar tool before and gave up because I couldn't make it 100% accurate and was running into problems with it. If someone wants to take a shot at it though, that's fine. As for the unit tests in std.datetime making it hard to track down bugs, that only makes sense to me if you're trying to look at the whole thing at once and track down a compiler bug which happens _somewhere_ in the code, but you don't know where. Other than a problem like that, I don't really see how the unit tests get in the way of tracking down bugs. Is it that you need to compile in a version of std.datetime which doesn't have any unit tests compiled in but you still need to compile with -unittest for other stuff?
No. All you know there's a bug that's being triggered somewhere in Phobos (with -unittest). It's probably not in std.datetime. But Phobos is a horrible ball of mud where everything imports everything else, and std.datetime is near the centre of that ball. What you have to do is reduce the amount of code, and especially the number of modules, as rapidly as possible; this means getting rid of imports. To do this, you need to remove large chunks of code from the files. This is pretty simple; comment out half of the file, if it still works, then delete it. Normally this works well because typically only about a dozen lines are actually being used. After doing this about three or four times it's small enough that you can usually get rid of most of the imports. Unittests foul this up because they use functions/classes from inside the file. In the case of std.datetime it's even worse because the signal-to-noise ratio is so incredibly poor; it's really difficult to find the few lines of code that are actually being used by other Phobos modules. My experience (obviously only over the last month or so) has been that if the reduction of a bug is non-obvious, more than 10% of the total time taken to fix that bug is the time taken to cut down std.datetime.
Mar 19 2011
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 18:04:57 Don wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.
 
 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit
 tests is the most time-consuming and error-prone part of reducing the
 test case.
 
 This should be a good task if you're relatively new to D but would
 like to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have a working D lexer (and possibly parser). You might be able to get something close enough to work in most cases, but it doesn't take all that much to throw off a basic implementation of this sort of thing if you don't lex/parse it with something which properly understands D. - Jonathan M Davis
I didn't say it needs 100% accuracy. You can assume, for example, that "unittest" always occurs at the start of a line. The only other things you need to lex are {}, string literals, and comments. BTW, the immediate motivation for this is std.datetime in Phobos. The sheer number of unittests in there is an absolute catastrophe for tracking down bugs. It makes a tool like this MANDATORY.
I tried to create a similar tool before and gave up because I couldn't make it 100% accurate and was running into problems with it. If someone wants to take a shot at it though, that's fine. As for the unit tests in std.datetime making it hard to track down bugs, that only makes sense to me if you're trying to look at the whole thing at once and track down a compiler bug which happens _somewhere_ in the code, but you don't know where. Other than a problem like that, I don't really see how the unit tests get in the way of tracking down bugs. Is it that you need to compile in a version of std.datetime which doesn't have any unit tests compiled in but you still need to compile with -unittest for other stuff?
No. All you know there's a bug that's being triggered somewhere in Phobos (with -unittest). It's probably not in std.datetime. But Phobos is a horrible ball of mud where everything imports everything else, and std.datetime is near the centre of that ball. What you have to do is reduce the amount of code, and especially the number of modules, as rapidly as possible; this means getting rid of imports. To do this, you need to remove large chunks of code from the files. This is pretty simple; comment out half of the file, if it still works, then delete it. Normally this works well because typically only about a dozen lines are actually being used. After doing this about three or four times it's small enough that you can usually get rid of most of the imports. Unittests foul this up because they use functions/classes from inside the file. In the case of std.datetime it's even worse because the signal-to-noise ratio is so incredibly poor; it's really difficult to find the few lines of code that are actually being used by other Phobos modules. My experience (obviously only over the last month or so) has been that if the reduction of a bug is non-obvious, more than 10% of the total time taken to fix that bug is the time taken to cut down std.datetime.
Hmmm. I really don't know what could be done to fix that (other than making it easier to rip out the unittest blocks). And enough of std.datetime depends on other parts of std.datetime that trimming it down isn't (and can't be) exactly easy. In general, SysTime is the most likely type to be used, and it depends on Date, TimeOfDay, and DateTime, and all 4 of those depend on most of the free functions in the module. It's not exactly designed in a manner which allows you to cut out large chunks and still have it compile. And I don't think that it _could_ be designed that way and still have the functionality that it has. I guess that this sort of problem is one that would pop up mainly when dealing with compiler bugs. I have a hard time seeing it popping up with your typical bug in Phobos itself. So, I guess that this is the sort of thing that you'd run into and I likely wouldn't. I really don't know how the situation could be improved though other than making it easier to cut out the unit tests. - Jonathan M Davis
Mar 20 2011
next sibling parent reply Don <nospam nospam.com> writes:
Jonathan M Davis wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 18:04:57 Don wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.

 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit
 tests is the most time-consuming and error-prone part of reducing the
 test case.

 This should be a good task if you're relatively new to D but would
 like to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have a working D lexer (and possibly parser). You might be able to get something close enough to work in most cases, but it doesn't take all that much to throw off a basic implementation of this sort of thing if you don't lex/parse it with something which properly understands D. - Jonathan M Davis
I didn't say it needs 100% accuracy. You can assume, for example, that "unittest" always occurs at the start of a line. The only other things you need to lex are {}, string literals, and comments. BTW, the immediate motivation for this is std.datetime in Phobos. The sheer number of unittests in there is an absolute catastrophe for tracking down bugs. It makes a tool like this MANDATORY.
I tried to create a similar tool before and gave up because I couldn't make it 100% accurate and was running into problems with it. If someone wants to take a shot at it though, that's fine. As for the unit tests in std.datetime making it hard to track down bugs, that only makes sense to me if you're trying to look at the whole thing at once and track down a compiler bug which happens _somewhere_ in the code, but you don't know where. Other than a problem like that, I don't really see how the unit tests get in the way of tracking down bugs. Is it that you need to compile in a version of std.datetime which doesn't have any unit tests compiled in but you still need to compile with -unittest for other stuff?
No. All you know there's a bug that's being triggered somewhere in Phobos (with -unittest). It's probably not in std.datetime. But Phobos is a horrible ball of mud where everything imports everything else, and std.datetime is near the centre of that ball. What you have to do is reduce the amount of code, and especially the number of modules, as rapidly as possible; this means getting rid of imports. To do this, you need to remove large chunks of code from the files. This is pretty simple; comment out half of the file, if it still works, then delete it. Normally this works well because typically only about a dozen lines are actually being used. After doing this about three or four times it's small enough that you can usually get rid of most of the imports. Unittests foul this up because they use functions/classes from inside the file. In the case of std.datetime it's even worse because the signal-to-noise ratio is so incredibly poor; it's really difficult to find the few lines of code that are actually being used by other Phobos modules. My experience (obviously only over the last month or so) has been that if the reduction of a bug is non-obvious, more than 10% of the total time taken to fix that bug is the time taken to cut down std.datetime.
Hmmm. I really don't know what could be done to fix that (other than making it easier to rip out the unittest blocks). And enough of std.datetime depends on other parts of std.datetime that trimming it down isn't (and can't be) exactly easy. In general, SysTime is the most likely type to be used, and it depends on Date, TimeOfDay, and DateTime, and all 4 of those depend on most of the free functions in the module. It's not exactly designed in a manner which allows you to cut out large chunks and still have it compile. And I don't think that it _could_ be designed that way and still have the functionality that it has.
The problem is purely the large fraction of the module which is devoted to unit tests. That's all.
 
 I guess that this sort of problem is one that would pop up mainly when dealing 
 with compiler bugs. I have a hard time seeing it popping up with your typical 
 bug in Phobos itself. So, I guess that this is the sort of thing that you'd 
 run into and I likely wouldn't.
Yes.
 I really don't know how the situation could be improved though other than 
 making it easier to cut out the unit tests.
 
 - Jonathan M Davis
Hence the motivation for this utility. The problem exists in all modules, but in std.datetime it's such an obvious time-waster that I can't keep ignoring it.
Mar 20 2011
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
 Jonathan M Davis wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 18:04:57 Don wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.
 
 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit
 tests is the most time-consuming and error-prone part of reducing
 the test case.
 
 This should be a good task if you're relatively new to D but would
 like to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have a working D lexer (and possibly parser). You might be able to get something close enough to work in most cases, but it doesn't take all that much to throw off a basic implementation of this sort of thing if you don't lex/parse it with something which properly understands D. - Jonathan M Davis
I didn't say it needs 100% accuracy. You can assume, for example, that "unittest" always occurs at the start of a line. The only other things you need to lex are {}, string literals, and comments. BTW, the immediate motivation for this is std.datetime in Phobos. The sheer number of unittests in there is an absolute catastrophe for tracking down bugs. It makes a tool like this MANDATORY.
I tried to create a similar tool before and gave up because I couldn't make it 100% accurate and was running into problems with it. If someone wants to take a shot at it though, that's fine. As for the unit tests in std.datetime making it hard to track down bugs, that only makes sense to me if you're trying to look at the whole thing at once and track down a compiler bug which happens _somewhere_ in the code, but you don't know where. Other than a problem like that, I don't really see how the unit tests get in the way of tracking down bugs. Is it that you need to compile in a version of std.datetime which doesn't have any unit tests compiled in but you still need to compile with -unittest for other stuff?
No. All you know there's a bug that's being triggered somewhere in Phobos (with -unittest). It's probably not in std.datetime. But Phobos is a horrible ball of mud where everything imports everything else, and std.datetime is near the centre of that ball. What you have to do is reduce the amount of code, and especially the number of modules, as rapidly as possible; this means getting rid of imports. To do this, you need to remove large chunks of code from the files. This is pretty simple; comment out half of the file, if it still works, then delete it. Normally this works well because typically only about a dozen lines are actually being used. After doing this about three or four times it's small enough that you can usually get rid of most of the imports. Unittests foul this up because they use functions/classes from inside the file. In the case of std.datetime it's even worse because the signal-to-noise ratio is so incredibly poor; it's really difficult to find the few lines of code that are actually being used by other Phobos modules. My experience (obviously only over the last month or so) has been that if the reduction of a bug is non-obvious, more than 10% of the total time taken to fix that bug is the time taken to cut down std.datetime.
Hmmm. I really don't know what could be done to fix that (other than making it easier to rip out the unittest blocks). And enough of std.datetime depends on other parts of std.datetime that trimming it down isn't (and can't be) exactly easy. In general, SysTime is the most likely type to be used, and it depends on Date, TimeOfDay, and DateTime, and all 4 of those depend on most of the free functions in the module. It's not exactly designed in a manner which allows you to cut out large chunks and still have it compile. And I don't think that it _could_ be designed that way and still have the functionality that it has.
The problem is purely the large fraction of the module which is devoted to unit tests. That's all.
 I guess that this sort of problem is one that would pop up mainly when
 dealing with compiler bugs. I have a hard time seeing it popping up with
 your typical bug in Phobos itself. So, I guess that this is the sort of
 thing that you'd run into and I likely wouldn't.
Yes.
 I really don't know how the situation could be improved though other than
 making it easier to cut out the unit tests.
 
 - Jonathan M Davis
Hence the motivation for this utility. The problem exists in all modules, but in std.datetime it's such an obvious time-waster that I can't keep ignoring it.
Well, for the moment at least, if you remove the version = testStdDateTime; version = enableWindowsTest; lines near the top of the file, then pretty much all of the unittest blocks will no longer be compiled in (there might be a couple which are still compiled in, but not many). So, that could help you until the utility that you want is done. Unfortunately, that also means that the utility will have to be smarter if it's going to work on std.datetime. While most of the version(testStdDateTime) blocks are currently _inside_ of the unittest blocks, as I've been adjusting the unit tests, I've been changing them to version(testStdDateTime) unittest because Andrei didn't like the extra vertical space used up by having separate blocks for the unittest and for the version. So, for instance, if the utility assumed that unittest was the first part of the line for a unittest block, it wouldn't work on std.datetime (IIRC, std.algorithm would have similar problems). - Jonathan M Davis
Mar 20 2011
prev sibling parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Sun, 20 Mar 2011 07:50:10 -0000, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 18:04:57 Don wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.

 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and  
it's
 frequently ~30% of the total time required. Stripping out the unit
 tests is the most time-consuming and error-prone part of reducing  
the
 test case.

 This should be a good task if you're relatively new to D but would
 like to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have
a
 working D lexer (and possibly parser). You might be able to get
 something close enough to work in most cases, but it doesn't take  
all
 that much to throw off a basic implementation of this sort of thing  
if
 you don't lex/parse it with something which properly understands D.

 - Jonathan M Davis
I didn't say it needs 100% accuracy. You can assume, for example,
that
 "unittest" always occurs at the start of a line. The only other  
things
 you need to lex are {}, string literals, and comments.

 BTW, the immediate motivation for this is std.datetime in Phobos. The
 sheer number of unittests in there is an absolute catastrophe for
 tracking down bugs. It makes a tool like this MANDATORY.
I tried to create a similar tool before and gave up because I couldn't make it 100% accurate and was running into problems with it. If
someone
 wants to take a shot at it though, that's fine.

 As for the unit tests in std.datetime making it hard to track down  
bugs,
 that only makes sense to me if you're trying to look at the whole  
thing
 at once and track down a compiler bug which happens _somewhere_ in the
 code, but you don't know where. Other than a problem like that, I  
don't
 really see how the unit tests get in the way of tracking down bugs. Is
 it that you need to compile in a version of std.datetime which doesn't
 have any unit tests compiled in but you still need to compile with
 -unittest for other stuff?
No. All you know there's a bug that's being triggered somewhere in Phobos (with -unittest). It's probably not in std.datetime. But Phobos is a horrible ball of mud where everything imports everything else, and std.datetime is near the centre of that ball. What you have to do is reduce the amount of code, and especially the number of modules, as rapidly as possible; this means getting rid of imports. To do this, you need to remove large chunks of code from the files. This is pretty simple; comment out half of the file, if it still works, then delete it. Normally this works well because typically only about a dozen lines are actually being used. After doing this about three or four times it's small enough that you can usually get rid of most of the imports. Unittests foul this up because they use functions/classes from inside the file. In the case of std.datetime it's even worse because the signal-to-noise ratio is so incredibly poor; it's really difficult to find the few lines of code that are actually being used by other Phobos modules. My experience (obviously only over the last month or so) has been that if the reduction of a bug is non-obvious, more than 10% of the total time taken to fix that bug is the time taken to cut down std.datetime.
Hmmm. I really don't know what could be done to fix that (other than making it easier to rip out the unittest blocks). And enough of std.datetime depends on other parts of std.datetime that trimming it down isn't (and can't be) exactly easy. In general, SysTime is the most likely type to be used, and it depends on Date, TimeOfDay, and DateTime, and all 4 of those depend on most of the free functions in the module. It's not exactly designed in a manner which allows you to cut out large chunks and still have it compile. And I don't think that it _could_ be designed that way and still have the functionality that it has. I guess that this sort of problem is one that would pop up mainly when dealing with compiler bugs. I have a hard time seeing it popping up with your typical bug in Phobos itself. So, I guess that this is the sort of thing that you'd run into and I likely wouldn't. I really don't know how the situation could be improved though other than making it easier to cut out the unit tests.
I was just thinking .. if we get a list of the symbols the linker is including, then write an app to take that list, and strip everything else out of the source .. would that work. The Q's are how hard is it to get the symbols from the linker and then how hard is it to match those to source. IIRC there are functions in phobos to convert to/from symbol names, so if the app had sufficient lexing and parsing capability it could match on those. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 23 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
 On Sun, 20 Mar 2011 07:50:10 -0000, Jonathan M Davis <jmdavisProg gmx.com>
 
 wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 18:04:57 Don wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.
 
 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and
it's
 frequently ~30% of the total time required. Stripping out the unit
 tests is the most time-consuming and error-prone part of reducing
the
 test case.
 
 This should be a good task if you're relatively new to D but would
 like to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have
a
 working D lexer (and possibly parser). You might be able to get
 something close enough to work in most cases, but it doesn't take
all
 that much to throw off a basic implementation of this sort of thing
if
 you don't lex/parse it with something which properly understands D.
 
 - Jonathan M Davis
I didn't say it needs 100% accuracy. You can assume, for example,
that
 "unittest" always occurs at the start of a line. The only other
things
 you need to lex are {}, string literals, and comments.
 
 BTW, the immediate motivation for this is std.datetime in Phobos. The
 sheer number of unittests in there is an absolute catastrophe for
 tracking down bugs. It makes a tool like this MANDATORY.
I tried to create a similar tool before and gave up because I couldn't make it 100% accurate and was running into problems with it. If
someone
 wants to take a shot at it though, that's fine.
 
 As for the unit tests in std.datetime making it hard to track down
bugs,
 that only makes sense to me if you're trying to look at the whole
thing
 at once and track down a compiler bug which happens _somewhere_ in the
 code, but you don't know where. Other than a problem like that, I
don't
 really see how the unit tests get in the way of tracking down bugs. Is
 it that you need to compile in a version of std.datetime which doesn't
 have any unit tests compiled in but you still need to compile with
 -unittest for other stuff?
No. All you know there's a bug that's being triggered somewhere in Phobos (with -unittest). It's probably not in std.datetime. But Phobos is a horrible ball of mud where everything imports everything else, and std.datetime is near the centre of that ball. What you have to do is reduce the amount of code, and especially the number of modules, as rapidly as possible; this means getting rid of imports. To do this, you need to remove large chunks of code from the files. This is pretty simple; comment out half of the file, if it still works, then delete it. Normally this works well because typically only about a dozen lines are actually being used. After doing this about three or four times it's small enough that you can usually get rid of most of the imports. Unittests foul this up because they use functions/classes from inside the file. In the case of std.datetime it's even worse because the signal-to-noise ratio is so incredibly poor; it's really difficult to find the few lines of code that are actually being used by other Phobos modules. My experience (obviously only over the last month or so) has been that if the reduction of a bug is non-obvious, more than 10% of the total time taken to fix that bug is the time taken to cut down std.datetime.
Hmmm. I really don't know what could be done to fix that (other than making it easier to rip out the unittest blocks). And enough of std.datetime depends on other parts of std.datetime that trimming it down isn't (and can't be) exactly easy. In general, SysTime is the most likely type to be used, and it depends on Date, TimeOfDay, and DateTime, and all 4 of those depend on most of the free functions in the module. It's not exactly designed in a manner which allows you to cut out large chunks and still have it compile. And I don't think that it _could_ be designed that way and still have the functionality that it has. I guess that this sort of problem is one that would pop up mainly when dealing with compiler bugs. I have a hard time seeing it popping up with your typical bug in Phobos itself. So, I guess that this is the sort of thing that you'd run into and I likely wouldn't. I really don't know how the situation could be improved though other than making it easier to cut out the unit tests.
I was just thinking .. if we get a list of the symbols the linker is including, then write an app to take that list, and strip everything else out of the source .. would that work. The Q's are how hard is it to get the symbols from the linker and then how hard is it to match those to source. IIRC there are functions in phobos to convert to/from symbol names, so if the app had sufficient lexing and parsing capability it could match on those.
That would require a full-blown D lexer and parser. - Jonathan M Davis
Mar 23 2011
next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 23 Mar 2011 15:16:46 -0000, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:
 I was just thinking .. if we get a list of the symbols the linker is
 including, then write an app to take that list, and strip everything  
 else
 out of the source .. would that work.  The Q's are how hard is it to get
 the symbols from the linker and then how hard is it to match those to
 source.  IIRC there are functions in phobos to convert to/from symbol
 names, so if the app had sufficient lexing and parsing capability it  
 could
 match on those.
That would require a full-blown D lexer and parser. - Jonathan M Davis
Yeah, I thought as much. I wonder if the new guy "Ilya" who just posted on digitalmars.D would find this interesting.. -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 23 2011
prev sibling parent reply Kai Meyer <kai unixlords.com> writes:
On 03/23/2011 09:16 AM, Jonathan M Davis wrote:
 On Sun, 20 Mar 2011 07:50:10 -0000, Jonathan M Davis<jmdavisProg gmx.com>

 wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 18:04:57 Don wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.

 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and
it's
 frequently ~30% of the total time required. Stripping out the unit
 tests is the most time-consuming and error-prone part of reducing
the
 test case.

 This should be a good task if you're relatively new to D but would
 like to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have
a
 working D lexer (and possibly parser). You might be able to get
 something close enough to work in most cases, but it doesn't take
all
 that much to throw off a basic implementation of this sort of thing
if
 you don't lex/parse it with something which properly understands D.

 - Jonathan M Davis
I didn't say it needs 100% accuracy. You can assume, for example,
that
 "unittest" always occurs at the start of a line. The only other
things
 you need to lex are {}, string literals, and comments.

 BTW, the immediate motivation for this is std.datetime in Phobos. The
 sheer number of unittests in there is an absolute catastrophe for
 tracking down bugs. It makes a tool like this MANDATORY.
I tried to create a similar tool before and gave up because I couldn't make it 100% accurate and was running into problems with it. If
someone
 wants to take a shot at it though, that's fine.

 As for the unit tests in std.datetime making it hard to track down
bugs,
 that only makes sense to me if you're trying to look at the whole
thing
 at once and track down a compiler bug which happens _somewhere_ in the
 code, but you don't know where. Other than a problem like that, I
don't
 really see how the unit tests get in the way of tracking down bugs. Is
 it that you need to compile in a version of std.datetime which doesn't
 have any unit tests compiled in but you still need to compile with
 -unittest for other stuff?
No. All you know there's a bug that's being triggered somewhere in Phobos (with -unittest). It's probably not in std.datetime. But Phobos is a horrible ball of mud where everything imports everything else, and std.datetime is near the centre of that ball. What you have to do is reduce the amount of code, and especially the number of modules, as rapidly as possible; this means getting rid of imports. To do this, you need to remove large chunks of code from the files. This is pretty simple; comment out half of the file, if it still works, then delete it. Normally this works well because typically only about a dozen lines are actually being used. After doing this about three or four times it's small enough that you can usually get rid of most of the imports. Unittests foul this up because they use functions/classes from inside the file. In the case of std.datetime it's even worse because the signal-to-noise ratio is so incredibly poor; it's really difficult to find the few lines of code that are actually being used by other Phobos modules. My experience (obviously only over the last month or so) has been that if the reduction of a bug is non-obvious, more than 10% of the total time taken to fix that bug is the time taken to cut down std.datetime.
Hmmm. I really don't know what could be done to fix that (other than making it easier to rip out the unittest blocks). And enough of std.datetime depends on other parts of std.datetime that trimming it down isn't (and can't be) exactly easy. In general, SysTime is the most likely type to be used, and it depends on Date, TimeOfDay, and DateTime, and all 4 of those depend on most of the free functions in the module. It's not exactly designed in a manner which allows you to cut out large chunks and still have it compile. And I don't think that it _could_ be designed that way and still have the functionality that it has. I guess that this sort of problem is one that would pop up mainly when dealing with compiler bugs. I have a hard time seeing it popping up with your typical bug in Phobos itself. So, I guess that this is the sort of thing that you'd run into and I likely wouldn't. I really don't know how the situation could be improved though other than making it easier to cut out the unit tests.
I was just thinking .. if we get a list of the symbols the linker is including, then write an app to take that list, and strip everything else out of the source .. would that work. The Q's are how hard is it to get the symbols from the linker and then how hard is it to match those to source. IIRC there are functions in phobos to convert to/from symbol names, so if the app had sufficient lexing and parsing capability it could match on those.
That would require a full-blown D lexer and parser. - Jonathan M Davis
Why are we talking about having to recreate a full-blown lexer and parser when there has to be one that exists for D anyway? This is sounding more and more like you're asking the wrong crowd to solve a problem. To do it right, the people who have access to the real D lexer and parser would need to write this utility, and in some ways, it's already written since compiling with out a -unittest flag already omits all the unittests. So I'm a bit confused about two things. 1) Why ask the wrong people to write the tool in the first place? 2) Why are we the wrong people any way? -Kai Meyer
Mar 23 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
 On 03/23/2011 09:16 AM, Jonathan M Davis wrote:
 On Sun, 20 Mar 2011 07:50:10 -0000, Jonathan M
 Davis<jmdavisProg gmx.com>
 
 wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 18:04:57 Don wrote:
 Jonathan M Davis wrote:
 On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.
 
 Motivation: Bug reports frequently come with very large test
 cases. Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and
it's
 frequently ~30% of the total time required. Stripping out the unit
 tests is the most time-consuming and error-prone part of reducing
the
 test case.
 
 This should be a good task if you're relatively new to D but would
 like to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have
a
 working D lexer (and possibly parser). You might be able to get
 something close enough to work in most cases, but it doesn't take
all
 that much to throw off a basic implementation of this sort of thing
if
 you don't lex/parse it with something which properly understands D.
 
 - Jonathan M Davis
I didn't say it needs 100% accuracy. You can assume, for example,
that
 "unittest" always occurs at the start of a line. The only other
things
 you need to lex are {}, string literals, and comments.
 
 BTW, the immediate motivation for this is std.datetime in Phobos.
 The sheer number of unittests in there is an absolute catastrophe
 for tracking down bugs. It makes a tool like this MANDATORY.
I tried to create a similar tool before and gave up because I couldn't make it 100% accurate and was running into problems with it. If
someone
 wants to take a shot at it though, that's fine.
 
 As for the unit tests in std.datetime making it hard to track down
bugs,
 that only makes sense to me if you're trying to look at the whole
thing
 at once and track down a compiler bug which happens _somewhere_ in
 the code, but you don't know where. Other than a problem like that,
 I
don't
 really see how the unit tests get in the way of tracking down bugs.
 Is it that you need to compile in a version of std.datetime which
 doesn't have any unit tests compiled in but you still need to
 compile with -unittest for other stuff?
No. All you know there's a bug that's being triggered somewhere in Phobos (with -unittest). It's probably not in std.datetime. But Phobos is a horrible ball of mud where everything imports everything else, and std.datetime is near the centre of that ball. What you have to do is reduce the amount of code, and especially the number of modules, as rapidly as possible; this means getting rid of imports. To do this, you need to remove large chunks of code from the files. This is pretty simple; comment out half of the file, if it still works, then delete it. Normally this works well because typically only about a dozen lines are actually being used. After doing this about three or four times it's small enough that you can usually get rid of most of the imports. Unittests foul this up because they use functions/classes from inside the file. In the case of std.datetime it's even worse because the signal-to-noise ratio is so incredibly poor; it's really difficult to find the few lines of code that are actually being used by other Phobos modules. My experience (obviously only over the last month or so) has been that if the reduction of a bug is non-obvious, more than 10% of the total time taken to fix that bug is the time taken to cut down std.datetime.
Hmmm. I really don't know what could be done to fix that (other than making it easier to rip out the unittest blocks). And enough of std.datetime depends on other parts of std.datetime that trimming it down isn't (and can't be) exactly easy. In general, SysTime is the most likely type to be used, and it depends on Date, TimeOfDay, and DateTime, and all 4 of those depend on most of the free functions in the module. It's not exactly designed in a manner which allows you to cut out large chunks and still have it compile. And I don't think that it _could_ be designed that way and still have the functionality that it has. I guess that this sort of problem is one that would pop up mainly when dealing with compiler bugs. I have a hard time seeing it popping up with your typical bug in Phobos itself. So, I guess that this is the sort of thing that you'd run into and I likely wouldn't. I really don't know how the situation could be improved though other than making it easier to cut out the unit tests.
I was just thinking .. if we get a list of the symbols the linker is including, then write an app to take that list, and strip everything else out of the source .. would that work. The Q's are how hard is it to get the symbols from the linker and then how hard is it to match those to source. IIRC there are functions in phobos to convert to/from symbol names, so if the app had sufficient lexing and parsing capability it could match on those.
That would require a full-blown D lexer and parser. - Jonathan M Davis
Why are we talking about having to recreate a full-blown lexer and parser when there has to be one that exists for D anyway? This is sounding more and more like you're asking the wrong crowd to solve a problem. To do it right, the people who have access to the real D lexer and parser would need to write this utility, and in some ways, it's already written since compiling with out a -unittest flag already omits all the unittests. So I'm a bit confused about two things. 1) Why ask the wrong people to write the tool in the first place? 2) Why are we the wrong people any way?
There are tasks for which you need to be able to lex and parse D code. To 100% correctly remove unit tests would be one such task. Another would be if you want a program to be able to syntax highlight some D code. Currently, as far as I know, there are only two lexers and two parsers for D: the C++ front end which dmd, gdc, and ldc use and the D front end which ddmd uses and which is based on the C++ front end. Both of those are under the GPL (which makes them useless for a lot of stuff) and both of them are tied to compilers. Being able to lex D code and get the list of tokens in a D program and being able to parse D code and get the resultant abstract syntax tree would be very useful for a number of programs. So, while your average program may not care about being able to lex and parse D code, there _are_ programs that do, and being able to do so in D would be highly valuable for such programs. Previously Walter asked for a volunteer to port the lexer from the C++ front end to D under the Boost license to be put into Phobos (I volunteered for that and have been working on it off and on, slowly making progress on it). Andrei's reaction was that we should have a generic lexer which uses generic programming and is not tied to D at all, and _that_ is what someone may be working on for the GSoC (there are still solid arguments for having a D-specific lexer though, so hopefully we end up with both). Now, for this particular problem, in order to track down certain types of compiler bugs, he needs to be able to build with -unittest but not have irrelevant code compiled in. So, for instance, if he's testing a bug related to compiling std.file with -unittest and it imported std.datetime, he would want to strip out as much as std.datetime as std.file doesn't need in order to minimize the code that he has to deal with to find the bug. std.datetime's unit tests are prime example of code that would be unnecessary. So, he wants a tool to strip the unit tests from a file. You can't use the compiler's lexer or parser to do that without a lot of changes. To do it 100% correctly, he needs a lexer (and possibly a parser) which can be used by a utility other than the compiler to read in a source file, strip out the unit tests, and then write out the file again. However, he's willing to settle for a utility that _mostly_ works, and you can do that without a full-blow D lexer or parser. - Jonathan M Davis
Mar 23 2011
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 23 Mar 2011 21:16:02 -0000, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:
 There are tasks for which you need to be able to lex and parse D code.  
 To 100% correctly remove unit tests would be one such task.
Is that last bit true? You definitely need to be able to lex it, but instead of actually parsing it you just count { and } and remove 'unittest' plus { plus } plus everything in between right? -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Mar 25 2011
next sibling parent reply spir <denis.spir gmail.com> writes:
On 03/25/2011 12:08 PM, Regan Heath wrote:
 On Wed, 23 Mar 2011 21:16:02 -0000, Jonathan M Davis <jmdavisProg gmx.com>
wrote:
 There are tasks for which you need to be able to lex and parse D code. To
 100% correctly remove unit tests would be one such task.
Is that last bit true? You definitely need to be able to lex it, but instead of actually parsing it you just count { and } and remove 'unittest' plus { plus } plus everything in between right?
At first sight, you're both wrong: you'd need to count { } levels. Also, I think true lexing is not really needed: you'd only need to put apart strings and comments that could hold non-code { & }. (But these are only very superficial notes.) Denis -- _________________ vita es estrany spir.wikidot.com
Mar 25 2011
parent Don <nospam nospam.com> writes:
spir wrote:
 On 03/25/2011 12:08 PM, Regan Heath wrote:
 On Wed, 23 Mar 2011 21:16:02 -0000, Jonathan M Davis 
 <jmdavisProg gmx.com> wrote:
 There are tasks for which you need to be able to lex and parse D 
 code. To
 100% correctly remove unit tests would be one such task.
Is that last bit true? You definitely need to be able to lex it, but instead of actually parsing it you just count { and } and remove 'unittest' plus { plus } plus everything in between right?
At first sight, you're both wrong: you'd need to count { } levels. Also, I think true lexing is not really needed: you'd only need to put apart strings and comments that could hold non-code { & }. (But these are only very superficial notes.) Denis
Yes, exactly: you just need to lex strings (including q{}), comments (which you remove), unittest, and count levels of {. You need to worry about backslashes in comments, but that's about it. I even did this in a CTFE function once, I know it isn't complicated. Should be possible in < 50 lines of code. I just didn't want to have to do it myself. In fact, it would be adequate to replace: unittest { blah... } with: unittest{} Then you don't need to worry about special cases like: version(XXX) unittest { ... }
Mar 25 2011
prev sibling parent "Nick Sabalausky" <a a.a> writes:
"Regan Heath" <regan netmail.co.nz> wrote in message 
news:op.vswbv8qj54xghj puck.auriga.bhead.co.uk...
 On Wed, 23 Mar 2011 21:16:02 -0000, Jonathan M Davis <jmdavisProg gmx.com> 
 wrote:
 There are tasks for which you need to be able to lex and parse D code. 
 To 100% correctly remove unit tests would be one such task.
Is that last bit true? You definitely need to be able to lex it, but instead of actually parsing it you just count { and } and remove 'unittest' plus { plus } plus everything in between right?
No, to do it 100% reliably, you do need lexing/parsing, and also the semantics stage. Example: string makeATest(string str) { return "unit"~"test { "~str~" }"; } mixin(makeATest(q{ // Do tests }));
Mar 25 2011
prev sibling next sibling parent reply Alexey Prokhin <alexey.prokhin yandex.ru> writes:
 Currently, as far as I know, there are only two lexers and two parsers for
 D: the C++ front end which dmd, gdc, and ldc use and the D front end which
 ddmd uses and which is based on the C++ front end. Both of those are under
 the GPL (which makes them useless for a lot of stuff) and both of them are
 tied to compilers. Being able to lex D code and get the list of tokens in
 a D program and being able to parse D code and get the resultant abstract
 syntax tree would be very useful for a number of programs.
There is a third one: http://code.google.com/p/dil/. The main page says that the lexer and the parser are fully implemented for both D1 and D2. But the license is also the GPL.
Mar 24 2011
parent reply "Nick Sabalausky" <a a.a> writes:
"Alexey Prokhin" <alexey.prokhin yandex.ru> wrote in message 
news:mailman.2713.1300954193.4748.digitalmars-d-learn puremagic.com...
 Currently, as far as I know, there are only two lexers and two parsers 
 for
 D: the C++ front end which dmd, gdc, and ldc use and the D front end 
 which
 ddmd uses and which is based on the C++ front end. Both of those are 
 under
 the GPL (which makes them useless for a lot of stuff) and both of them 
 are
 tied to compilers. Being able to lex D code and get the list of tokens in
 a D program and being able to parse D code and get the resultant abstract
 syntax tree would be very useful for a number of programs.
There is a third one: http://code.google.com/p/dil/. The main page says that the lexer and the parser are fully implemented for both D1 and D2. But the license is also the GPL.
The nearly-done v0.4 of my Goldie parsing system (zlib/libpng license) comes with a mostly-complete lexing-only grammar for D2. http://www.dsource.org/projects/goldie/browser/trunk/lang/dlex.grm The limitations of it right now: - Doesn't do nested comments. That requires a feature (that's going to be introduced in the related tool GOLD Parsing System v4.2) that I haven't had a chance to add into Goldie just yet. - It's possible there might be some edge-case bugs regarding either the ".." operator and/or float literals. - It's ASCII-only. Goldie supports Unicode, but character set optimization isn't implemented yet, so unicode grammars are technically possible but impractical ATM (this will be the top priority after I get v0.4 released).
Mar 25 2011
parent "Nick Sabalausky" <a a.a> writes:
"Nick Sabalausky" <a a.a> wrote in message 
news:imivp7$2fu$1 digitalmars.com...
 "Alexey Prokhin" <alexey.prokhin yandex.ru> wrote in message 
 news:mailman.2713.1300954193.4748.digitalmars-d-learn puremagic.com...
 Currently, as far as I know, there are only two lexers and two parsers 
 for
 D: the C++ front end which dmd, gdc, and ldc use and the D front end 
 which
 ddmd uses and which is based on the C++ front end. Both of those are 
 under
 the GPL (which makes them useless for a lot of stuff) and both of them 
 are
 tied to compilers. Being able to lex D code and get the list of tokens 
 in
 a D program and being able to parse D code and get the resultant 
 abstract
 syntax tree would be very useful for a number of programs.
There is a third one: http://code.google.com/p/dil/. The main page says that the lexer and the parser are fully implemented for both D1 and D2. But the license is also the GPL.
The nearly-done v0.4 of my Goldie parsing system (zlib/libpng license) comes with a mostly-complete lexing-only grammar for D2. http://www.dsource.org/projects/goldie/browser/trunk/lang/dlex.grm The limitations of it right now: - Doesn't do nested comments. That requires a feature (that's going to be introduced in the related tool GOLD Parsing System v4.2) that I haven't had a chance to add into Goldie just yet.
Note that this probably isn't a big of a problem as it sounds: For one thing, it still recognizes "/+" and "+/" as tokens. It'll just try to lex everything in between too. And when Goldie is used to just lex, you still get the entire source lexed even if it has errors, and the lex-error tokens get included in the resulting token array. So it would be pretty easy to just call Goldie's lex function, and then step through the token array removing balanced /+ and +/ sections manually.
Mar 25 2011
prev sibling parent spir <denis.spir gmail.com> writes:
On 03/24/2011 08:53 AM, Alexey Prokhin wrote:
 Currently, as far as I know, there are only two lexers and two parsers for
  D: the C++ front end which dmd, gdc, and ldc use and the D front end which
  ddmd uses and which is based on the C++ front end. Both of those are under
  the GPL (which makes them useless for a lot of stuff) and both of them are
  tied to compilers. Being able to lex D code and get the list of tokens in
  a D program and being able to parse D code and get the resultant abstract
  syntax tree would be very useful for a number of programs.
I fully support this. We desperately need it, I guess, working and maintained along language evolution. This is the whole purpose of the GSOC proposal "D tools in D": http://prowiki.org/wiki4d/wiki.cgi?GSOC_2011_Ideas#DtoolsinD Semantic analysis, introduced step by step, would be a huge plus. Denis -- _________________ vita es estrany spir.wikidot.com
Mar 24 2011
prev sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/23/11, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 That would require a full-blown D lexer and parser.

 - Jonathan M Davis
Isn't DDMD written in D? I'm not sure about how finished it is though.
Mar 23 2011
parent "Nick Sabalausky" <a a.a> writes:
"Andrej Mitrovic" <andrej.mitrovich gmail.com> wrote in message 
news:mailman.2696.1300895928.4748.digitalmars-d-learn puremagic.com...
 On 3/23/11, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 That would require a full-blown D lexer and parser.

 - Jonathan M Davis
Isn't DDMD written in D? I'm not sure about how finished it is though.
I've done a little bit of playing around with DDMD for a (still only just barely-started) project, and it seems to be fairly well up to the task of building an AST and running semantics. It is still based on a somewhat older version of D2, though, and my understanding is that actually building a real-world program with it is still impractical (though I haven't tried).
Mar 25 2011
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2011-03-19 20:41:09 -0400, Jonathan M Davis <jmdavisProg gmx.com> said:

 On Saturday 19 March 2011 17:11:56 Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.
 
 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit tests
 is the most time-consuming and error-prone part of reducing the test case.
 
 This should be a good task if you're relatively new to D but would like
 to do something really useful.
Unfortunately, to do that 100% correctly, you need to actually have a working D lexer (and possibly parser). You might be able to get something close enough to work in most cases, but it doesn't take all that much to throw off a basic implementation of this sort of thing if you don't lex/parse it with something which properly understands D.
Well, I made simple lexer for D strings, comments, identifiers and a few other tokens which should be up to that task. It's what I use to parse files and detect dependencies in D for Xcode. Unfortunately, it's written in Objective-C++ (but half of it is plain C)... <https://github.com/michelf/d-for-xcode/blob/master/Sources/DXBaseLexer.h> <https://github.com/michelf/d-for-xcode/blob/master/Sources/DXBaseLexer.mm> <https://github.com/michelf/d-for-xcode/blob/master/Sources/DXScannerTools.h> <https://github.com/michelf/d-for-xcode/blob/master/Sources/DXScannerTools.m> Very short unit test: <https://github.com/michelf/d-for-xcode/blob/master/Unit%20Tests/DXBaseLexerTest.mm> -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Mar 19 2011
prev sibling next sibling parent reply Kai Meyer <kai unixlords.com> writes:
On 03/19/2011 06:11 PM, Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.

 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit tests
 is the most time-consuming and error-prone part of reducing the test case.

 This should be a good task if you're relatively new to D but would like
 to do something really useful.
 -Don
Is there a copy of the official D grammar somewhere online? I wrote a lexer for my Compiler class and would love to try and apply it to another grammar. -Kai Meyer
Mar 20 2011
next sibling parent Zirneklis <zerneklis.web gmail.com> writes:
On 20/03/2011 19:55, Kai Meyer wrote:
 On 03/19/2011 06:11 PM, Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.

 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit tests
 is the most time-consuming and error-prone part of reducing the test
 case.

 This should be a good task if you're relatively new to D but would like
 to do something really useful.
 -Don
Is there a copy of the official D grammar somewhere online? I wrote a lexer for my Compiler class and would love to try and apply it to another grammar. -Kai Meyer
As far as I know the documentation /is/ the official grammar http://digitalmars.com/d/2.0/lex.html
Mar 20 2011
prev sibling parent Trass3r <un known.com> writes:
 Is there a copy of the official D grammar somewhere online? I wrote a  
 lexer for my Compiler class and would love to try and apply it to  
 another grammar.
The official D grammar is spread among the specification. But I recall that someone compiled a complete grammar for D1 some time ago.
Mar 24 2011
prev sibling next sibling parent reply Ary Manzana <ary esperanto.org.ar> writes:
On 3/19/11 9:11 PM, Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.

 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit tests
 is the most time-consuming and error-prone part of reducing the test case.

 This should be a good task if you're relatively new to D but would like
 to do something really useful.
 -Don
Can it be done in Ruby? Or you need it in D?
Mar 20 2011
parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
On Mon, 21 Mar 2011 01:52:45 +0100, Ary Manzana <ary esperanto.org.ar>  
wrote:

 On 3/19/11 9:11 PM, Don wrote:
 Here's the task:
 Given a .d source file, strip out all of the unittest {} blocks,
 including everything inside them.
 Strip out all comments as well.
 Print out the resulting file.

 Motivation: Bug reports frequently come with very large test cases.
 Even ones which look small often import from Phobos.
 Reducing the test case is the first step in fixing the bug, and it's
 frequently ~30% of the total time required. Stripping out the unit tests
 is the most time-consuming and error-prone part of reducing the test  
 case.

 This should be a good task if you're relatively new to D but would like
 to do something really useful.
 -Don
Can it be done in Ruby? Or you need it in D?
Part of the idea was that someone use it to learn D. However, the important part is that it's done. Doing it in D would be preferable, but not a requisite. -- Simen
Mar 21 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
 On 3/23/11, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 That would require a full-blown D lexer and parser.
 
 - Jonathan M Davis
Isn't DDMD written in D? I'm not sure about how finished it is though.
Yes, but the lexer and parser in ddmd are not only GPL (which would be a problem for some stuff but not others - for something like Don's utility, it wouldn't be a problem), and more importantly, it is tied to the compiler code. It's not designed to be used by an arbitrary program. For that, you would need a lexer and parser which were designed with an API such that an arbitrary D program could use them. For instance, the lexer could produce a range of tokens to be processed, and a program which wants to use the lexer can then process that range. - Jonathan M Davis
Mar 23 2011
parent "Nick Sabalausky" <a a.a> writes:
"Jonathan M Davis" <jmdavisProg gmx.com> wrote in message 
news:mailman.2700.1300915109.4748.digitalmars-d-learn puremagic.com...
 On 3/23/11, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 That would require a full-blown D lexer and parser.

 - Jonathan M Davis
Isn't DDMD written in D? I'm not sure about how finished it is though.
Yes, but the lexer and parser in ddmd are not only GPL (which would be a problem for some stuff but not others - for something like Don's utility, it wouldn't be a problem), and more importantly, it is tied to the compiler code. It's not designed to be used by an arbitrary program. For that, you would need a lexer and parser which were designed with an API such that an arbitrary D program could use them. For instance, the lexer could produce a range of tokens to be processed, and a program which wants to use the lexer can then process that range.
I don't know about the license issues, but I don't think the API is a big deal. I'm in the early stages of a DDMD-based project to compile D code down to Haxe, and all I really had to do was comment out the backend-related section at the end of main(), inject my AST-walking/processing functions into the AST classes (though, admittedly, there is 1.5 metric fuckton of these AST classes), and then add a little bit of code at the end of main() to launch my AST-traversal. The main() function could easily be converted to a non-main one. The only real difficultly is the fact that the AST isn't really documented, except for what little exists on one particular Wiki4D page (sorry, don't have the link ATM). Hmm, although, depending what you're doing with it, you may also want to hook DDMD's stdout/stderr output, or at least the error/warning functions.
Mar 25 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/23/11, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 On 3/23/11, Jonathan M Davis <jmdavisProg gmx.com> wrote:
 That would require a full-blown D lexer and parser.

 - Jonathan M Davis
Isn't DDMD written in D? I'm not sure about how finished it is though.
Yes, but the lexer and parser in ddmd are not only GPL (which would be a problem for some stuff but not others - for something like Don's utility, it wouldn't be a problem), and more importantly, it is tied to the compiler code. It's not designed to be used by an arbitrary program. For that, you would need a lexer and parser which were designed with an API such that an arbitrary D program could use them. For instance, the lexer could produce a range of tokens to be processed, and a program which wants to use the lexer can then process that range. - Jonathan M Davis
I didn't even know it was GPL. It doesn't come with a license file.
Mar 23 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
What about the artistic license, the front-end can be used with that
license. Is that less restrictive than GPL?
Mar 23 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
 What about the artistic license, the front-end can be used with that
 license. Is that less restrictive than GPL?
I don't know what the exact licensing situation is. However, as I understand it, the C++ front-end is under the GPL, and therefore because ddmd is based on the C++ front-end, it is also under the GPL. If that's not the case, I don't know what the licensing situation really is. And I don't know what the artistic license says exactly, so I don't know what its restrictions are. - Jonathan M Davis
Mar 23 2011