www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Proof of concept: automatically import C header files

reply Jacob Carlborg <doob me.com> writes:
Made a proof of concept to automatically parse, translate and import C 
header files in D using DStep. DMD is linked against DStep and does not 
start new process to make the translation.

I added a new pragma, include, that handles everything. Use like this:

// foo.h
void foo ();

// main.d

module main;

pragma(include, "foo.h");

void main ()
{
     foo();
}

DMD: https://github.com/jacob-carlborg/dmd/tree/dstep
DStep: https://github.com/jacob-carlborg/dstep/tree/c_api

-- 
/Jacob Carlborg
Jul 16 2013
next sibling parent "Robert" <jfanatiker gmx.at> writes:
On Tuesday, 16 July 2013 at 14:15:55 UTC, Jacob Carlborg wrote:
 Made a proof of concept to automatically parse, translate and 
 import C header files in D using DStep. DMD is linked against 
 DStep and does not start new process to make the translation.

awesome! :-)
Jul 16 2013
prev sibling next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Tuesday, 16 July 2013 at 14:15:55 UTC, Jacob Carlborg wrote:
 Made a proof of concept to automatically parse, translate and 
 import C header files in D using DStep. DMD is linked against 
 DStep and does not start new process to make the translation.

While this a relatively common request, I don't think such stuff belongs to compiler. It creates extra mandatory dependencies while providing little advantage over doing this as part of a build system. So far I am perfectly satisfied with invoking dstep from a Makefile.
Jul 16 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/16/2013 8:49 PM, Timothee Cour wrote:
 So how about a library solution instead, which doesn't require compiler change:

While semantically a great idea, technically I don't think CTFE is up to implementing a C front end yet.
Jul 16 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/16/2013 10:04 PM, deadalnix wrote:
 On Wednesday, 17 July 2013 at 04:14:56 UTC, Walter Bright wrote:
 On 7/16/2013 8:49 PM, Timothee Cour wrote:
 So how about a library solution instead, which doesn't require compiler change:

While semantically a great idea, technically I don't think CTFE is up to implementing a C front end yet.

This is the right path. We don't need the full front end, do we ?

Yeah, you do need the full front end. It's pretty amazing how the simplest of .h files seem determined to exercise every last, dark corner of the language. If the converter doesn't accept the full language, you're just going to get a dump truck unloading on it.
Jul 17 2013
parent reply Jacob Carlborg <doob me.com> writes:
On 2013-07-17 10:14, Walter Bright wrote:

 Yeah, you do need the full front end. It's pretty amazing how the
 simplest of .h files seem determined to exercise every last, dark corner
 of the language.

 If the converter doesn't accept the full language, you're just going to
 get a dump truck unloading on it.

When you do have a complete front end you can choose how to handle the language constructs the tool cannot (yet) translate. I.e. just skip it, insert a comment or similar. If you don't have a full front end and encounters something that you cannot translate, you will most likely have weird behaviors. -- /Jacob Carlborg
Jul 17 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-07-17 13:24, Paulo Pinto wrote:

 Thus we are back to the compiler as library discussion.

Yes, but for the C family of languages we already have a compiler as library, that is Clang. -- /Jacob Carlborg
Jul 17 2013
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/17/2013 2:27 AM, Jacob Carlborg wrote:
 On 2013-07-17 10:14, Walter Bright wrote:

 Yeah, you do need the full front end. It's pretty amazing how the
 simplest of .h files seem determined to exercise every last, dark corner
 of the language.

 If the converter doesn't accept the full language, you're just going to
 get a dump truck unloading on it.

When you do have a complete front end you can choose how to handle the language constructs the tool cannot (yet) translate. I.e. just skip it, insert a comment or similar.

Yes, but the front end itself must be complete. Otherwise, it's not really practical when you're dealing with things like the preprocessor - because a non-compliant front end won't even know it has gone off the rails. There are other issues when dealing with C .h files: 1. there may be various #define's necessary to compile it that would normally be supplied on the command line to the C compiler 2. there are various behavior switches (see the PR for DMD that wants to set the signed'ness of char types) 3. rather few .h files seem to be standard compliant C. They always rely on various compiler extensions These problems are not insurmountable, they just are non-trivial and need to be handled for a successful .h file importer.
Jul 17 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-07-17 21:40, Walter Bright wrote:

 Yes, but the front end itself must be complete. Otherwise,
 it's not really practical when you're dealing with things like the
 preprocessor - because a non-compliant front end won't even know it has
 gone off the rails.

 There are other issues when dealing with C .h files:

 1. there may be various #define's necessary to compile it that would
 normally be supplied on the command line to the C compiler

 2. there are various behavior switches (see the PR for DMD that wants to
 set the signed'ness of char types)

 3. rather few .h files seem to be standard compliant C. They always rely
 on various compiler extensions

 These problems are not insurmountable, they just are non-trivial and
 need to be handled for a successful .h file importer.

Yes, I agree with all the above. That's why I'm using libclang. What I'm saying is that when I use a library like libclang I can choose quite freely what to convert and not convert. Example, DStep doesn't handle the preprocessor at all. But since libclang does, it can parse any header file anyway. What happens is just that the preprocessor declarations won't be translated and not end up in the translated file. -- /Jacob Carlborg
Jul 17 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/17/2013 9:48 AM, deadalnix wrote:
 My understanding is that we only want to convert declaration to D. Can you give
 me an example of such corner case that would require the full frontend ?

One example: -------------------------------- //**************************Header**********************\\ int x; -------------------------------- Yes, this POS is real C code I got a bug report on. Note the trailing \\. Is that one line splice or two? You have to get the hairy details right. I've seen similar nonsense with trigraphs. I've seen metaprogramming tricks with token pasting. You can't dismiss this stuff.
Jul 17 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/17/2013 3:20 PM, H. S. Teoh wrote:
 Though about trigraphs... I've to admit I've never actually seen *real*
 C code that uses trigraphs, but yeah, needing to account for them can
 significantly complicate your code.

Building a correct C front end is a known technology, doing a half-baked job isn't going to impress people.
 But as for preprocessor-specific stuff, couldn't we just pipe it through
 a standalone C preprocessor and be done with it? It can't be *that*
 hard, right?

You could, but then you are left with failing to recognize: #define FOO 3 and converting it to: enum FOO = 3;
Jul 17 2013
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-07-18 00:36, Walter Bright wrote:

 You could, but then you are left with failing to recognize:

      #define FOO 3

 and converting it to:

      enum FOO = 3;

And things like: #if linux short a; #elif _WIN32 || _WIN64 int a; #endif Should preferably be converted to: version (linux) short a; else version (Windows) int a; Other example: #define foo(a, b) a + b Should be converted to: auto foo (A, B) (A a, B b) { return a + b; } -- /Jacob Carlborg
Jul 17 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-07-18 00:59, H. S. Teoh wrote:

 IOW either you don't do it at all, or you have to go all the way and
 implement a fully-functional C frontend?

 If so, libclang is starting to sound rather attractive...

That's what I'm telling
 Hmm. We *could* pre-preprocess the code to do this conversion first to
 pick out these #define's, then suppress the #define's we understand from
 the input to the C preprocessor. Something like this:

 	bool isSimpleValue(string s) {
 		// basically, return true if s is something compilable
 		// when put on the right side of "enum x = ...".
 	}

 	auto pipe = spawnCPreprocessor();
 	string[string] manifestConstants;
 	foreach (line; inputFile.byLine()) {
 		if (auto m=match(line, `^\s*#define\s+(\w+)\s+(.*?)\s+`))
 		{
 			if (isSimpleValue(m.captures[2])) {
 				manifestConstants[m.captures[1]] =
 					m.captures[2];

 				// Suppress enums that we picked out
 				continue;
 			}
 			// whatever we don't understand, hand over to
 			// the C preprocessor
 		}
 		pipe.writeln(line);
 	}

 Basically, whatever #define's we can understand, we handle, and anything
 else we let the C preprocessor deal with. By suppressing the #define's
 we've picked out, we force the C preprocessor to leave any reference to
 them as unexpanded identifiers, so that later on we can just generate
 the enums and the resulting code will match up correctly.

You will just end up needing to build a full C preprocessor. Just use an existing one, that is libclang. -- /Jacob Carlborg
Jul 17 2013
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/17/2013 5:31 PM, deadalnix wrote:
 On Wednesday, 17 July 2013 at 19:46:54 UTC, Walter Bright wrote:
 On 7/17/2013 9:48 AM, deadalnix wrote:
 My understanding is that we only want to convert declaration to D. Can you give
 me an example of such corner case that would require the full frontend ?

One example: -------------------------------- //**************************Header**********************\\ int x; -------------------------------- Yes, this POS is real C code I got a bug report on. Note the trailing \\. Is that one line splice or two? You have to get the hairy details right. I've seen similar nonsense with trigraphs. I've seen metaprogramming tricks with token pasting. You can't dismiss this stuff.

This do not require semantic analysis.

Semantic analysis for C is trivial. The real problems are the phases of translation and the preprocessor.
Jul 17 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-07-17 07:04, deadalnix wrote:

 This is the right path. We don't need the full front end, do we ?

Oh, yes we do. You will always run into corner cases your tool cannot handle until you have a complete front end. I tried that first, before I used libclang. -- /Jacob Carlborg
Jul 17 2013
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-07-16 17:05, Dicebot wrote:

 While this a relatively common request, I don't think such stuff belongs
 to compiler. It creates extra mandatory dependencies while providing
 little advantage over doing this as part of a build system.

I started to think a bit about this. One might need to specify various options to translate the header file. Options like include paths and similar. That might be quite problematic to do in a pragam, or via DMD command line options. -- /Jacob Carlborg
Jul 17 2013
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 17, 2013 at 03:36:15PM -0700, Walter Bright wrote:
 On 7/17/2013 3:20 PM, H. S. Teoh wrote:
Though about trigraphs... I've to admit I've never actually seen
*real* C code that uses trigraphs, but yeah, needing to account for
them can significantly complicate your code.

Building a correct C front end is a known technology, doing a half-baked job isn't going to impress people.

IOW either you don't do it at all, or you have to go all the way and implement a fully-functional C frontend? If so, libclang is starting to sound rather attractive...
But as for preprocessor-specific stuff, couldn't we just pipe it
through a standalone C preprocessor and be done with it? It can't be
*that* hard, right?

You could, but then you are left with failing to recognize: #define FOO 3 and converting it to: enum FOO = 3;

Hmm. We *could* pre-preprocess the code to do this conversion first to pick out these #define's, then suppress the #define's we understand from the input to the C preprocessor. Something like this: bool isSimpleValue(string s) { // basically, return true if s is something compilable // when put on the right side of "enum x = ...". } auto pipe = spawnCPreprocessor(); string[string] manifestConstants; foreach (line; inputFile.byLine()) { if (auto m=match(line, `^\s*#define\s+(\w+)\s+(.*?)\s+`)) { if (isSimpleValue(m.captures[2])) { manifestConstants[m.captures[1]] = m.captures[2]; // Suppress enums that we picked out continue; } // whatever we don't understand, hand over to // the C preprocessor } pipe.writeln(line); } Basically, whatever #define's we can understand, we handle, and anything else we let the C preprocessor deal with. By suppressing the #define's we've picked out, we force the C preprocessor to leave any reference to them as unexpanded identifiers, so that later on we can just generate the enums and the resulting code will match up correctly. T -- Prosperity breeds contempt, and poverty breeds consent. -- Suck.com
Jul 17 2013
prev sibling next sibling parent Timothee Cour <thelastmammoth gmail.com> writes:
--001a1133050c7b93ea04e1acfa66
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Jul 16, 2013 at 8:05 AM, Dicebot <public dicebot.lv> wrote:

 On Tuesday, 16 July 2013 at 14:15:55 UTC, Jacob Carlborg wrote:

 Made a proof of concept to automatically parse, translate and import C
 header files in D using DStep. DMD is linked against DStep and does not
 start new process to make the translation.

While this a relatively common request, I don't think such stuff belongs to compiler. It creates extra mandatory dependencies while providing little advantage over doing this as part of a build system. So far I am perfectly satisfied with invoking dstep from a Makefile.

I agree that this stuff doesn't belong to compiler, however Makefiles suck (not even portable) and build systems should be avoided whenever a more integrated solution exist. So how about a library solution instead, which doesn't require compiler change: ---- import parse_c_header_importer; mixin(parse_c_header(import("foo.h"))); void main () { foo();} ---- There are several options: A) mixin(parse_c_header(import("foo.h"))); => defines D symbols for everything in foo.h (excluding things included by it) B) mixin(parse_c_header(import("foo.h"),recursive)); => same, but recursively (probably not very useful, but could be useful if we instead used .i swig interface files. C) The least wasteful: void foo(); int bar(); mixin(parse_c_header(import("foo.h"),foo,bar)); => only defines symbols provided (I've proposed this syntax in an earlier thread) --001a1133050c7b93ea04e1acfa66 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Jul 16, 2013 at 8:05 AM, Dicebot <span dir=3D"ltr">&lt;<a href=3D"m= ailto:public dicebot.lv" target=3D"_blank">public dicebot.lv</a>&gt;</span>= wrote:<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" sty= le=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"im">On Tuesday, 16 July 2013 at 14:15:55 UTC, Jacob Carlborg = wrote:<br> </div><div class=3D"im"><blockquote class=3D"gmail_quote" style=3D"margin:0= 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Made a proof of concept to automatically parse, translate and import C head= er files in D using DStep. DMD is linked against DStep and does not start n= ew process to make the translation.<br> </blockquote> <br></div> While this a relatively common request, I don&#39;t think such stuff belong= s to compiler. It creates extra mandatory dependencies while providing litt= le advantage over doing this as part of a build system.<br> <br> So far I am perfectly satisfied with invoking dstep from a Makefile.<br> </blockquote></div><div><br></div><div>I agree that this stuff doesn&#39;t = belong to compiler, however Makefiles suck (not even portable) and build sy= stems should be avoided whenever a more integrated solution exist.=A0</div> <div><br></div><div>So how about a library solution instead, which doesn&#3= 9;t require compiler change:</div><div><br></div><div>----</div><div>import= parse_c_header_importer;=A0</div><div>mixin(parse_c_header(import(&quot;fo= o.h&quot;)));</div> <div><div>void main () { foo();}</div></div><div>----</div><div><div><br></= div></div><div>There are several options:</div><div><br></div><div>A)</div>= <div><div>mixin(parse_c_header(import(&quot;foo.h&quot;))); =3D&gt; defines= D symbols for everything in foo.h (excluding things included by it)</div> <div><br></div><div>B)</div><div></div><div>mixin(parse_c_header(import(&qu= ot;foo.h&quot;),recursive)); =3D&gt; same, but recursively (probably not ve= ry useful, but could be useful if we instead used .i swig interface files.<= /div> <div><div><br></div><div>C)</div><div>The least wasteful:</div><div></div><= /div><div>void foo();</div><div>int bar();</div><div>mixin(parse_c_header(i= mport(&quot;foo.h&quot;),foo,bar));</div><div></div></div><div>=3D&gt; only= defines symbols provided</div> <div><br></div><div>(I&#39;ve proposed this syntax in an earlier thread)</d= iv> --001a1133050c7b93ea04e1acfa66--
Jul 16 2013
prev sibling next sibling parent Timothee Cour <thelastmammoth gmail.com> writes:
--001a11c1cbc02012f504e1ad93a5
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Jul 16, 2013 at 9:14 PM, Walter Bright
<newshound2 digitalmars.com>wrote:

 On 7/16/2013 8:49 PM, Timothee Cour wrote:

 So how about a library solution instead, which doesn't require compiler
 change:

While semantically a great idea, technically I don't think CTFE is up to implementing a C front end yet.

it would trivially, with CTFE exec. Yet another enabling use case. --001a11c1cbc02012f504e1ad93a5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Jul 16, 2013 at 9:14 PM, Walter Bright <span dir=3D"ltr">&lt;<a hre= f=3D"mailto:newshound2 digitalmars.com" target=3D"_blank">newshound2 digita= lmars.com</a>&gt;</span> wrote:<br><div class=3D"gmail_quote"><blockquote c= lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;= padding-left:1ex"> <div class=3D"im">On 7/16/2013 8:49 PM, Timothee Cour wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> So how about a library solution instead, which doesn&#39;t require compiler= change:<br> </blockquote> <br></div> While semantically a great idea, technically I don&#39;t think CTFE is up t= o implementing a C front end yet.<br></blockquote><div>=A0</div></div>it wo= uld trivially, with CTFE exec.<div>Yet another enabling use case.<br><div> <br></div></div> --001a11c1cbc02012f504e1ad93a5--
Jul 16 2013
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Wednesday, 17 July 2013 at 03:49:32 UTC, Timothee Cour wrote:
 I agree that this stuff doesn't belong to compiler, however 
 Makefiles suck

Here I may agree (for a different reasons probably)
 and build systems should be avoided  whenever a more
 integrated solution exist.

..but not here. Integrated solutions suck even harder than Makefiles.
Jul 16 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 17 July 2013 at 04:32:12 UTC, Timothee Cour wrote:
 On Tue, Jul 16, 2013 at 9:14 PM, Walter Bright
 <newshound2 digitalmars.com>wrote:

 On 7/16/2013 8:49 PM, Timothee Cour wrote:

 So how about a library solution instead, which doesn't 
 require compiler
 change:

While semantically a great idea, technically I don't think CTFE is up to implementing a C front end yet.

it would trivially, with CTFE exec. Yet another enabling use case.

Just because a bad solution is faster to implement doesn't make it good.
Jul 16 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 17 July 2013 at 04:14:56 UTC, Walter Bright wrote:
 On 7/16/2013 8:49 PM, Timothee Cour wrote:
 So how about a library solution instead, which doesn't require 
 compiler change:

While semantically a great idea, technically I don't think CTFE is up to implementing a C front end yet.

This is the right path. We don't need the full front end, do we ?
Jul 16 2013
prev sibling next sibling parent Timothee Cour <thelastmammoth gmail.com> writes:
--001a11c1cbc061d59d04e1ae2176
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Jul 16, 2013 at 10:04 PM, deadalnix <deadalnix gmail.com> wrote:

 On Wednesday, 17 July 2013 at 04:14:56 UTC, Walter Bright wrote:

 On 7/16/2013 8:49 PM, Timothee Cour wrote:

 So how about a library solution instead, which doesn't require compiler
 change:

While semantically a great idea, technically I don't think CTFE is up to implementing a C front end yet.

This is the right path. We don't need the full front end, do we ?

what's a non-full C front end? If it's not a real C front end it's gonna break with certain macros etc. Not good. I see no point in re-implementing a C front end when we can simply use an existing one to do the job (eg clang). This would also allow to parse C++ just as well. --001a11c1cbc061d59d04e1ae2176 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Jul 16, 2013 at 10:04 PM, deadalnix <span dir=3D"ltr">&lt;<a href= =3D"mailto:deadalnix gmail.com" target=3D"_blank">deadalnix gmail.com</a>&g= t;</span> wrote:<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_q= uote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e= x"> <div class=3D"HOEnZb"><div class=3D"h5">On Wednesday, 17 July 2013 at 04:14= :56 UTC, Walter Bright wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> On 7/16/2013 8:49 PM, Timothee Cour wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> So how about a library solution instead, which doesn&#39;t require compiler= change:<br> </blockquote> <br> While semantically a great idea, technically I don&#39;t think CTFE is up t= o implementing a C front end yet.<br> </blockquote> <br></div></div> This is the right path. We don&#39;t need the full front end, do we ?<br> </blockquote></div><br><div>what&#39;s a non-full C front end? If it&#39;s = not a real C front end it&#39;s gonna break with certain macros etc. Not go= od.</div><div><br></div><div>I see no point in re-implementing a C front en= d when we can simply use an existing one to do the job (eg clang). This wou= ld also allow to parse C++ just as well.=A0</div> <div><br></div><div><br></div> --001a11c1cbc061d59d04e1ae2176--
Jul 16 2013
prev sibling next sibling parent Timothee Cour <thelastmammoth gmail.com> writes:
--001a11330b221dce8604e1ae2a66
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Jul 16, 2013 at 10:02 PM, deadalnix <deadalnix gmail.com> wrote:

 On Wednesday, 17 July 2013 at 04:32:12 UTC, Timothee Cour wrote:

 On Tue, Jul 16, 2013 at 9:14 PM, Walter Bright
 <newshound2 digitalmars.com>**wrote:

  On 7/16/2013 8:49 PM, Timothee Cour wrote:
  So how about a library solution instead, which doesn't require compiler
 change:

implementing a C front end yet.

Yet another enabling use case.

Just because a bad solution is faster to implement doesn't make it good.

Being lazy is good. Less bugs to fix, etc. --001a11330b221dce8604e1ae2a66 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Jul 16, 2013 at 10:02 PM, deadalnix <span dir=3D"ltr">&lt;<a href= =3D"mailto:deadalnix gmail.com" target=3D"_blank">deadalnix gmail.com</a>&g= t;</span> wrote:<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_q= uote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e= x"> <div class=3D"im">On Wednesday, 17 July 2013 at 04:32:12 UTC, Timothee Cour= wrote:<br> </div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l= eft:1px #ccc solid;padding-left:1ex"><div class=3D"im"> On Tue, Jul 16, 2013 at 9:14 PM, Walter Bright<br> &lt;<a href=3D"mailto:newshound2 digitalmars.com" target=3D"_blank">newshou= nd2 digitalmars.com</a>&gt;<u></u>wrote:<br> <br> </div><div><div class=3D"h5"><blockquote class=3D"gmail_quote" style=3D"mar= gin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On 7/16/2013 8:49 PM, Timothee Cour wrote:<br> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> So how about a library solution instead, which doesn&#39;t require compiler= <br> change:<br> <br> </blockquote> <br> While semantically a great idea, technically I don&#39;t think CTFE is up t= o<br> implementing a C front end yet.<br> <br> </blockquote> <br></div></div><div class=3D"im"> it would trivially, with CTFE exec.<br> Yet another enabling use case.<br> </div></blockquote> <br> Just because a bad solution is faster to implement doesn&#39;t make it good= .<br> </blockquote></div><br><div>Being lazy is good. Less bugs to fix, etc.</div=

--001a11330b221dce8604e1ae2a66--
Jul 16 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 17 July 2013 at 05:12:00 UTC, Timothee Cour wrote:
 what's a non-full C front end? If it's not a real C front end 
 it's gonna
 break with certain macros etc. Not good.

Macro are processed before parsing? No need for a full C frontend to handle macros.
 I see no point in re-implementing a C front end when we can 
 simply use an
 existing one to do the job (eg clang). This would also allow to 
 parse C++
 just as well.

When you only need a very limited part of the fronted, it make sense. Here we don't need to parse function body for instance, and we can skip most of semantic analysis (if not all ?).
Jul 16 2013
prev sibling next sibling parent reply "angel" <andrey.gelman gmail.com> writes:
Possibly instead of 'include' would be better use 'include_C' as 
opposed to C++ or any other language.
Jul 16 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-07-17 08:29, angel wrote:
 Possibly instead of 'include' would be better use 'include_C' as opposed
 to C++ or any other language.

Or there could be an optional argument indicating the language. pragma(include, "foo.h", "C"); -- /Jacob Carlborg
Jul 17 2013
prev sibling next sibling parent Timothee Cour <thelastmammoth gmail.com> writes:
--047d7b5d301a07d2bf04e1afe129
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Jul 16, 2013 at 11:01 PM, deadalnix <deadalnix gmail.com> wrote:

 On Wednesday, 17 July 2013 at 05:12:00 UTC, Timothee Cour wrote:

 what's a non-full C front end? If it's not a real C front end it's gonna
 break with certain macros etc. Not good.

handle macros. I see no point in re-implementing a C front end when we can simply use an
 existing one to do the job (eg clang). This would also allow to parse C++
 just as well.

When you only need a very limited part of the fronted, it make sense. Here we don't need to parse function body for instance, and we can skip most of semantic analysis (if not all ?).

you'd still need to parse C files recursively (textual inclusion...), handle different C function calling conventions, different C standards, you'd need a way to forward to dmd different C compiler options (include paths to standard / custom libraries), and eventually people will want to parse C++ as well anyways. That can be a lot of work. Whereas using existing tools takes much less effort and is less error prone. --047d7b5d301a07d2bf04e1afe129 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Jul 16, 2013 at 11:01 PM, deadalnix <span dir=3D"ltr">&lt;<a href= =3D"mailto:deadalnix gmail.com" target=3D"_blank">deadalnix gmail.com</a>&g= t;</span> wrote:<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_q= uote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e= x"> <div class=3D"im">On Wednesday, 17 July 2013 at 05:12:00 UTC, Timothee Cour= wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> what&#39;s a non-full C front end? If it&#39;s not a real C front end it&#3= 9;s gonna<br> break with certain macros etc. Not good.<br> <br> </blockquote> <br></div> Macro are processed before parsing? No need for a full C frontend to handle= macros.<div class=3D"im"><br> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> I see no point in re-implementing a C front end when we can simply use an<b= r> existing one to do the job (eg clang). This would also allow to parse C++<b= r> just as well.<br> </blockquote> <br></div> When you only need a very limited part of the fronted, it make sense. Here = we don&#39;t need to parse function body for instance, and we can skip most= of semantic analysis (if not all ?).<br> </blockquote></div><div><br></div><div>you&#39;d still need to parse C file= s recursively (textual inclusion...), handle different C function calling c= onventions, different C standards, you&#39;d need a way to forward to dmd d= ifferent C compiler options (include paths to standard / custom libraries),= and eventually people will want to parse C++ as well anyways. That can be = a lot of work. Whereas using existing tools takes much less effort and is l= ess error prone.</div> <div><br></div> --047d7b5d301a07d2bf04e1afe129--
Jul 17 2013
prev sibling next sibling parent "Chad Joan" <chadjoan gmail.com> writes:
On Tuesday, 16 July 2013 at 14:15:55 UTC, Jacob Carlborg wrote:
 Made a proof of concept to automatically parse, translate and 
 import C header files in D using DStep. DMD is linked against 
 DStep and does not start new process to make the translation.

 I added a new pragma, include, that handles everything. Use 
 like this:

 // foo.h
 void foo ();

 // main.d

 module main;

 pragma(include, "foo.h");

 void main ()
 {
     foo();
 }

 DMD: https://github.com/jacob-carlborg/dmd/tree/dstep
 DStep: https://github.com/jacob-carlborg/dstep/tree/c_api

This sounds pretty cool, and the suggestion from Timothee also makes a lot of sense. Is there any way we can rig this to behave as if it were a CTFE invocation? It could be treated like an intrinsic up to the point where we have powerful-enough CTFE to replace it. I'm still not sure if Walter would be OK with this, but I figure I'd mention it, since it could give us something really nice without having to wait for CTFE to get good.
Jul 17 2013
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Wednesday, 17 July 2013 at 09:27:53 UTC, Jacob Carlborg wrote:
 On 2013-07-17 10:14, Walter Bright wrote:

 Yeah, you do need the full front end. It's pretty amazing how 
 the
 simplest of .h files seem determined to exercise every last, 
 dark corner
 of the language.

 If the converter doesn't accept the full language, you're just 
 going to
 get a dump truck unloading on it.

When you do have a complete front end you can choose how to handle the language constructs the tool cannot (yet) translate. I.e. just skip it, insert a comment or similar. If you don't have a full front end and encounters something that you cannot translate, you will most likely have weird behaviors.

Thus we are back to the compiler as library discussion. -- Paulo
Jul 17 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 17 July 2013 at 07:17:07 UTC, Timothee Cour wrote:
 you'd still need to parse C files recursively (textual 
 inclusion...),
 handle different C function calling conventions, different C 
 standards,
 you'd need a way to forward to dmd different C compiler options 
 (include
 paths to standard / custom libraries), and eventually people 
 will want to
 parse C++ as well anyways. That can be a lot of work. Whereas 
 using
 existing tools takes much less effort and is less error prone.

I'm talking about semantic analysis, you answer with parsing, I'm not sure this is going to lead anywhere. Do you understand that a parser is actually quite a small part of a frontend ?
Jul 17 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 17 July 2013 at 09:27:53 UTC, Jacob Carlborg wrote:
 On 2013-07-17 10:14, Walter Bright wrote:

 Yeah, you do need the full front end. It's pretty amazing how 
 the
 simplest of .h files seem determined to exercise every last, 
 dark corner
 of the language.

 If the converter doesn't accept the full language, you're just 
 going to
 get a dump truck unloading on it.

When you do have a complete front end you can choose how to handle the language constructs the tool cannot (yet) translate. I.e. just skip it, insert a comment or similar. If you don't have a full front end and encounters something that you cannot translate, you will most likely have weird behaviors.

My understanding is that we only want to convert declaration to D. Can you give me an example of such corner case that would require the full frontend ?
Jul 17 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jul 17, 2013 at 12:46:54PM -0700, Walter Bright wrote:
 On 7/17/2013 9:48 AM, deadalnix wrote:
My understanding is that we only want to convert declaration to D.
Can you give me an example of such corner case that would require the
full frontend ?

One example: -------------------------------- //**************************Header**********************\\ int x; -------------------------------- Yes, this POS is real C code I got a bug report on. Note the trailing \\. Is that one line splice or two? You have to get the hairy details right. I've seen similar nonsense with trigraphs. I've seen metaprogramming tricks with token pasting. You can't dismiss this stuff.

I've seen C code where the "header" file has function bodies in them. Though about trigraphs... I've to admit I've never actually seen *real* C code that uses trigraphs, but yeah, needing to account for them can significantly complicate your code. But as for preprocessor-specific stuff, couldn't we just pipe it through a standalone C preprocessor and be done with it? It can't be *that* hard, right? T -- Bare foot: (n.) A device for locating thumb tacks on the floor.
Jul 17 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 17 July 2013 at 19:46:54 UTC, Walter Bright wrote:
 On 7/17/2013 9:48 AM, deadalnix wrote:
 My understanding is that we only want to convert declaration 
 to D. Can you give
 me an example of such corner case that would require the full 
 frontend ?

One example: -------------------------------- //**************************Header**********************\\ int x; -------------------------------- Yes, this POS is real C code I got a bug report on. Note the trailing \\. Is that one line splice or two? You have to get the hairy details right. I've seen similar nonsense with trigraphs. I've seen metaprogramming tricks with token pasting. You can't dismiss this stuff.

This do not require semantic analysis.
Jul 17 2013
prev sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Wednesday, 17 July 2013 at 15:34:54 UTC, Jacob Carlborg wrote:
 On 2013-07-17 13:24, Paulo Pinto wrote:

 Thus we are back to the compiler as library discussion.

Yes, but for the C family of languages we already have a compiler as library, that is Clang.

Agreed. I also confess that my anti-C bias got a bit softened with clang. It does not sort out all C and C++ issues in regard with safety, but it helps bringing to C a Pascal like safety when integrated with proper tooling. Unfortunately when using C and C++, not all compilers are like clang and it is not always easy to convince people to add extra tooling (lint and friends). -- Paulo
Jul 18 2013