www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Want to help D compiler development: Two possible weekend projects

reply "David Nadlinger" <see klickverbot.at> writes:
Hi all,

in case you have an afternoon or two to spare, here are two ideas 
how you could help with D compiler development (DMD/GDC/LDC):


  1) Provide an Open Source clean-room implementation of 
response_expand, the function the DMD frontend uses to parse 
response files. Unfortunately, it is under the copyright belongs 
(partly?) to Symantec, so Walter can't simply re-license it for 
use in GDC/LDC, where it is needed to provide DMD compatibility 
(gdmd/ldmd). For the full discussion, see: 
http://forum.dlang.org/thread/kdce69%24303k%241 digitalmars.com


  2) Write an ABI fuzzer: The C ABI can be quite complex to 
implement on some systems, notably x86_64 Posixen (i.e. the AMD64 
System V ABI). There has been a number of issues in the past (see 
e.g. the infamous 
http://d.puremagic.com/issues/show_bug.cgi?id=5570), but while 
most of the cases frequently occurring in common C APIs are 
handled correctly now, there are still a number of issues 
remaining.

What makes working on the related compiler code annoying, at 
least for me, is that test cases are few - bugs comings from ABI 
issues are hard to track down for most non-compiler people - and 
cumbersome to write/build manually, as you need to integrate the 
D part with another one compiler by the host C compiler.

Thus, the idea is to write a tool which randomly generates 
function prototypes and creates both C and D 'caller' and 
'callee' functions with that signature. The caller supplies a 
defined value for each parameter, and the callee in the 
respective other language checks if it was passed correctly. Same 
in the other direction for the return value.

The tool then compiles the files, links them together, and 
executes the test case. It will be beneficial for throughput to 
batch multiple function pairs together into one pair of source 
files. If any check fails, the test executable returns with an 
error code (or a segfault, ...), causing the fuzzer to save away 
the offending test case. If not, the next set of tests will be 
generated, and so on.

It might make sense to bias the parameter type generator towards 
"complex" types containing unions, packed structs, varargs, etc. 
In any case, it should be able to find issues on x86_64 in the 
latest released versions of both DMD (3-byte structs) and LDC 
(small structs passed as varargs when there are still registers 
available).


David
Feb 02 2013
next sibling parent reply "Danny Arends" <Danny.Arends gmail.com> writes:
  1) Provide an Open Source clean-room implementation of 
 response_expand, the function the DMD frontend uses to parse 
 response files. Unfortunately, it is under the copyright 
 belongs (partly?) to Symantec, so Walter can't simply 
 re-license it for use in GDC/LDC, where it is needed to provide 
 DMD compatibility (gdmd/ldmd). For the full discussion, see: 
 http://forum.dlang.org/thread/kdce69%24303k%241 digitalmars.com

Hey David, This looks like an interesting little project, however I'm unsure on copyright matters etc etc. I read the source code in: https://github.com/D-Programming-Language/dmd/blob/master/src/root/response.c Could you explain what needs to be done exactly ? e.g.: How should such a clean room implementation look like ?? Does the structure need to change ?? Is it enough to rewrite the code in D ?? Should the code be in C if so how much change ?? Well just some thoughts, but the code looks pretty straight forward.. Gr, Danny Arends
Feb 03 2013
next sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
Am 03.02.2013 16:33, schrieb Joseph Rushton Wakeling:
 On 02/03/2013 03:17 PM, jerro wrote:
 AFAIK "clean room implementation" means reimplementing it without
 looking at the
 source code of the original implementation.

So, is there spec for what it should do?

Someone needs to write it, hence David's request.
Feb 03 2013
parent Paulo Pinto <pjmlp progtools.org> writes:
Am 03.02.2013 17:53, schrieb Joseph Rushton Wakeling:
 On 02/03/2013 05:33 PM, Paulo Pinto wrote:
 Someone needs to write it, hence David's request.

I thought David's request was for an implementation, not a spec! There's little point for those of us who haven't seen the original code to go and look at it to write a spec, when it merely takes away our ability to write something clean-room.

Why? You just need one person to go through the trouble of reading the code and writing the corresponding spec on the Wiki for example. Another person in the community would then do a clean implementation from the Wiki page. -- Paulo
Feb 03 2013
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/3/2013 6:31 PM, David Nadlinger wrote:
 It's not like we realistically have to be afraid of Symantec suing anybody over
 that piece of code, but ideally Walter would be able to accept that rewritten
 piece of code back into DMD without having to fear any licensing issues.

It's more about doing the right thing than acting out of fear of being sued. I don't think we need to go so far as having a clean room team communicating with a dev team through a 3rd team of lawyers, but a common sense and good faith effort to not infringe is likely sufficient.
Feb 26 2013
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/3/2013 6:52 PM, David Nadlinger wrote:
 I want to avoid getting in touch with any non-free compiler code as much as
 possible as Walter generally seems very conscious of licensing issues and I
 don't want to jeopardize the future of LDC.

Having D be IP-clean is essential to the future of D. BTW, one thing I like very much about github is it provides an audit trail of where contributed code comes from, for all to see.
Feb 26 2013
prev sibling next sibling parent "jerro" <a a.com> writes:
   How should such a clean room implementation look like ??
   Does the structure need to change ??
   Is it enough to rewrite the code in D ??
   Should the code be in C if so how much change ??

AFAIK "clean room implementation" means reimplementing it without looking at the source code of the original implementation.
Feb 03 2013
prev sibling next sibling parent "Danny Arends" <Danny.Arends gmail.com> writes:
How is anyone supposed to know what it does then ??

Also any new code needs to satisfy/adhere to the current 
interface,
so it would be pretty useless not knowing what that is
Furthermore to replace you need to know what to return on error / 
success

Gr,


On Sunday, 3 February 2013 at 14:17:11 UTC, jerro wrote:
  How should such a clean room implementation look like ??
  Does the structure need to change ??
  Is it enough to rewrite the code in D ??
  Should the code be in C if so how much change ??

AFAIK "clean room implementation" means reimplementing it without looking at the source code of the original implementation.

Feb 03 2013
prev sibling next sibling parent "Danny Arends" <Danny.Arends gmail.com> writes:
Perhaps I should create an requirements outline after reading the 
source:
 From the post David links to:

"2) Could somebody read the source and document the quirks of the
parser in painstaking detail, so that somebody else can do a
clean room implementation?"

Gr,
Danny
Feb 03 2013
prev sibling next sibling parent "Dicebot" <m.strashun gmail.com> writes:
On Sunday, 3 February 2013 at 14:51:59 UTC, Danny Arends wrote:
 Perhaps I should create an requirements outline after reading 
 the source:
 From the post David links to:

 "2) Could somebody read the source and document the quirks of 
 the
 parser in painstaking detail, so that somebody else can do a
 clean room implementation?"

 Gr,
 Danny

Yes, this is a common practice for avoiding copyright issues - one man reads the source and provides detailed spec, second re-implements needed stuff based on the spec without having a single look into original sources.
Feb 03 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 02/03/2013 03:17 PM, jerro wrote:
 AFAIK "clean room implementation" means reimplementing it without looking at
the
 source code of the original implementation.

So, is there spec for what it should do?
Feb 03 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 02/03/2013 05:33 PM, Paulo Pinto wrote:
 Someone needs to write it, hence David's request.

I thought David's request was for an implementation, not a spec! There's little point for those of us who haven't seen the original code to go and look at it to write a spec, when it merely takes away our ability to write something clean-room.
Feb 03 2013
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Sunday, 3 February 2013 at 15:07:50 UTC, Dicebot wrote:
 On Sunday, 3 February 2013 at 14:51:59 UTC, Danny Arends wrote:
 Perhaps I should create an requirements outline after reading 
 the source:
 From the post David links to:

 "2) Could somebody read the source and document the quirks of 
 the
 parser in painstaking detail, so that somebody else can do a
 clean room implementation?"

 Gr,
 Danny

Yes, this is a common practice for avoiding copyright issues - one man reads the source and provides detailed spec, second re-implements needed stuff based on the spec without having a single look into original sources.

Yes, this is what I meant. Sorry, I should have made this clearer in the original post. It's not like we realistically have to be afraid of Symantec suing anybody over that piece of code, but ideally Walter would be able to accept that rewritten piece of code back into DMD without having to fear any licensing issues. David
Feb 03 2013
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Sunday, 3 February 2013 at 14:51:59 UTC, Danny Arends wrote:
 Perhaps I should create an requirements outline after reading 
 the source:

This would be great! I want to avoid getting in touch with any non-free compiler code as much as possible as Walter generally seems very conscious of licensing issues and I don't want to jeopardize the future of LDC. Otherwise, I'd have done the boring part (writing up the spec) myself long ago. The general mechanics of response files are well understood, so a spec wouldn't have to describe that extensively. The interesting points are the details like handling of quotes, recursion, … David
Feb 03 2013
prev sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sunday, 3 February 2013 at 00:37:55 UTC, David Nadlinger wrote:
  1) Provide an Open Source clean-room implementation of 
 response_expand, the function the DMD frontend uses to parse 
 response files. Unfortunately, it is under the copyright 
 belongs (partly?) to Symantec, so Walter can't simply 
 re-license it for use in GDC/LDC, where it is needed to provide 
 DMD compatibility (gdmd/ldmd). For the full discussion, see: 
 http://forum.dlang.org/thread/kdce69%24303k%241 digitalmars.com

DMD response files use the same escaping syntax as the CommandLineToArgvW function. Here is how rdmd constructs the file: https://github.com/D-Programming-Language/tools/blob/master/rdmd.d#L367 I had looked at the DMD code, although it was a while ago and today I honestly couldn't say anything specific about it today. That's probably still not good enough for a clean-room reimplementation, though.
Feb 26 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/26/2013 4:50 PM, Vladimir Panteleev wrote:
 On Sunday, 3 February 2013 at 00:37:55 UTC, David Nadlinger wrote:
  1) Provide an Open Source clean-room implementation of response_expand, the
 function the DMD frontend uses to parse response files. Unfortunately, it is
 under the copyright belongs (partly?) to Symantec, so Walter can't simply
 re-license it for use in GDC/LDC, where it is needed to provide DMD
 compatibility (gdmd/ldmd). For the full discussion, see:
 http://forum.dlang.org/thread/kdce69%24303k%241 digitalmars.com

DMD response files use the same escaping syntax as the CommandLineToArgvW function.

As I recall, that was the intent. I wrote it, however, before there was a CommandLineToArgvW function, so much of its behavior was determined by trial and error on how DOS dealt with command lines. Such is inferrable from its intent, which was to enable command lines longer than DOS allowed.
Feb 26 2013