www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Community input for a new C binding generator project

reply "James Buren" <ryu0 ymail.com> writes:
I have thought of designing a new C binding generator that tries 
to do more than dstep currently does. At the very least that 
would include automatic conversion of simple macros. Such as: 
characters, strings, aliases, integers, constant integer 
expressions, etc. Another thing I wish to add is mapping of 
standard C types to their D types in core.stdc.* if it is already 
defined there.

However, while doing research for the project, I have come across 
a number of concerns I have. So, I am asking for community input 
before I get serious with this project because I want to actually 
make something people would want to use.

Specifically, this is what I need to know about:

1) C has many standards (C89/C99/C11) and non-standard extensions 
(GNU C). By the very nature of C grammar, it can be quite 
difficult to parse properly. Would anyone care if I excluded 
non-standard features that may be present in a header? For 
example, non-standard bitfields -- they use integers other than 
'int'. This is non-portable behavior.

2) As an extension to above, should I bother supporting rarely 
used features? Some of them are non-standard but others are 
standard. For example, #pragma pack() or GNU attributes and how 
those effect struct alignment in the D interface module is 
non-standard. Another example is converting inline functions, 
which is supported via C99 standard.

3) Should I provide an interface for mapping preprocessor macros 
which cannot be automatically converted by machine logic? For 
example, function macros cannot be converted automatically 
because you can't be sure what type of argument is expected. 
Other macros may not be worth converting because they are simply 
used for generating code.

4) As an extension to questions 1 and 2, is it worth it to you 
for non-standard C features to be supported? This would likely 
make generating a portable D interface more difficult as the D 
interface is limited by what the platform's C compiler supports. 
For example, if struct alignment is used but the platform's C 
compiler does not support this non-standard feature, then the D 
interface would have to exclude it from this platform's version 
of the module.

5) Should I avoid reusing code from a GPLv2 / GPLv3 project? I'm 
not sure if this makes anyone uncomfortable. Supposedly most GPL 
program output retains the license of the input, but I'm not 
totally convinced. I am not sure if the viral nature of the GPL 
would make people unwilling to use my tool.

6) Anything else you would like to see in a C binding generator? 
I may not implement it, but knowing what people want would help 
me anyway.

Thank you for reading this.
Apr 02 2014
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2014-04-02 16:33, James Buren wrote:
 I have thought of designing a new C binding generator that tries to do
 more than dstep currently does.

What's the advantage of starting from scratch by building a new tool instead of contributing to DStep?
 At the very least that would include
 automatic conversion of simple macros. Such as: characters, strings,
 aliases, integers, constant integer expressions, etc. Another thing I
 wish to add is mapping of standard C types to their D types in
 core.stdc.* if it is already defined there.

 However, while doing research for the project, I have come across a
 number of concerns I have. So, I am asking for community input before I
 get serious with this project because I want to actually make something
 people would want to use.

 Specifically, this is what I need to know about:

 1) C has many standards (C89/C99/C11) and non-standard extensions (GNU
 C). By the very nature of C grammar, it can be quite difficult to parse
 properly. Would anyone care if I excluded non-standard features that may
 be present in a header? For example, non-standard bitfields -- they use
 integers other than 'int'. This is non-portable behavior.

Use a compiler that can already parse C, i.e. Clang.
 2) As an extension to above, should I bother supporting rarely used
 features? Some of them are non-standard but others are standard. For
 example, #pragma pack() or GNU attributes and how those effect struct
 alignment in the D interface module is non-standard. Another example is
 converting inline functions, which is supported via C99 standard.

Use a compiler that can handle those extensions, i.e. Clang. -- /Jacob Carlborg
Apr 02 2014
parent Jacob Carlborg <doob me.com> writes:
On 02/04/14 23:00, James Buren wrote:
 On Wednesday, 2 April 2014 at 15:49:18 UTC, Jacob Carlborg wrote:
 Use a compiler that can already parse C, i.e. Clang.

 Use a compiler that can handle those extensions, i.e. Clang.

I'm already wanting to reuse an existing parser or compiler frontend. However, I have found myself frustrated and disappointed when I have experimented with them. Clang has an API that I find difficult to understand, so I can't really say what may be an issue for it. But I have heard that it doesn't expose any of the macros defined during the preprocessing phase.

Clang does expose macros, but not libclang, the C API. The solution is to contribute to Clang and extend libclang with support for macros.
 Both of these compilers seem to have issues which would limit what my
 tool could do without bringing in an external tool to reinvent the wheel
 just to make up for the absence of a critical feature.

 But I have found several promising frontends that can provide the
 information I need and are easy for me to understand.

 Pycparser combined with a suitable preprocessor seems workable, but this
 only works on ISO C99. The license is rather friendly, but is only
 available in python.

 Sparse, the tool used by the linux kernel for source code analysis,
 provides a simple library API that exposes an entire C header or source
 file into a simple to use AST. MIT licensed.

 Cparser, a compiler using libfirm to generate code, provides an API I
 find very easy to follow and use, but it is GPLv3 which does concern me
 a bit.

You will get problems eventually if you don't use a proper front end form a real compiler. Trust me, I've learned this the hard way. In the end, it's the compiler that will determine what's legal and what's not.
 So, in summary, please understand why I am wanting to redesign the
 backend logic for generating a D interface. I find the existing
 frontends most people use to be too limited for my purposes. My goal is
 to want to convert as much of the C API as can be done automatically to
 a D interface. The APIs for the major compilers most people think of do
 not expose all the information I think is necessary for the future
 expansion of a binding generator. If you have any better ideas, please
 let me know.

Yes, contribute what's missing. I'm hopefully going to do that anyway, with libclang, eventually. -- /Jacob Carlborg
Apr 03 2014
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Wednesday, 2 April 2014 at 14:33:04 UTC, James Buren wrote:
 I have thought of designing a new C binding generator that 
 tries to do more than dstep currently does.

By coincidence, just the other day I started a section on the D wiki to list current binding generator projects: http://wiki.dlang.org/Binding_generators#Binding_generators Perhaps you can find an existing project to expand instead.
Apr 02 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
I am with Jacob here. Making custom parser for binding generator 
is both waste of an effort and practically unreliable (you have 
listed some of issues). I would personally not trust one that is 
not based on some mature C compiler.
Apr 02 2014
prev sibling next sibling parent "James Buren" <ryu0 ymail.com> writes:
On Wednesday, 2 April 2014 at 15:49:18 UTC, Jacob Carlborg wrote:
 Use a compiler that can already parse C, i.e. Clang.

 Use a compiler that can handle those extensions, i.e. Clang.

I'm already wanting to reuse an existing parser or compiler frontend. However, I have found myself frustrated and disappointed when I have experimented with them. Clang has an API that I find difficult to understand, so I can't really say what may be an issue for it. But I have heard that it doesn't expose any of the macros defined during the preprocessing phase. GCC has a plugin API, but it is frustratingly difficult to use. It has no exposure of macros that I could find. It does define types fully and such, but there are issues which make it unsuitable for supporting #pragma pack(). It does reveal the proper alignment for this case, but it does not provide any obvious way to tell if the alignment is following default or a custom alignment. This is a detail needed for a proper binding. It only reveals alignment changes fully if you use GNU C extensions. Both of these compilers seem to have issues which would limit what my tool could do without bringing in an external tool to reinvent the wheel just to make up for the absence of a critical feature. But I have found several promising frontends that can provide the information I need and are easy for me to understand. Pycparser combined with a suitable preprocessor seems workable, but this only works on ISO C99. The license is rather friendly, but is only available in python. Sparse, the tool used by the linux kernel for source code analysis, provides a simple library API that exposes an entire C header or source file into a simple to use AST. MIT licensed. Cparser, a compiler using libfirm to generate code, provides an API I find very easy to follow and use, but it is GPLv3 which does concern me a bit. So, in summary, please understand why I am wanting to redesign the backend logic for generating a D interface. I find the existing frontends most people use to be too limited for my purposes. My goal is to want to convert as much of the C API as can be done automatically to a D interface. The APIs for the major compilers most people think of do not expose all the information I think is necessary for the future expansion of a binding generator. If you have any better ideas, please let me know.
Apr 02 2014
prev sibling next sibling parent "James Buren" <ryu0 ymail.com> writes:
On Wednesday, 2 April 2014 at 16:10:31 UTC, Vladimir Panteleev 
wrote:
 By coincidence, just the other day I started a section on the D 
 wiki to list current binding generator projects:

 http://wiki.dlang.org/Binding_generators#Binding_generators

 Perhaps you can find an existing project to expand instead.

I'll look into this, but I've found most of the binding generators are already written in D. I am not as fluent in D as I am in C, so I would prefer to write or expand something already written in C or C++ for now.
Apr 02 2014
prev sibling next sibling parent "Dicebot" <public dicebot.lv> writes:
On Wednesday, 2 April 2014 at 21:00:25 UTC, James Buren wrote:
 Clang has an API that I find difficult to understand, so I 
 can't really say what may be an issue for it. But I have heard 
 that it doesn't expose any of the macros defined during the 
 preprocessing phase.

libclang has both C and C++ API. Former has only limited exposure of macros, latter is much more capable. I doubt anything you may create with a custom parser can be more simple than libclang, unless not with comparable language support.
Apr 02 2014
prev sibling parent reply "Rikki Cattermole" <alphaglosined gmail.com> writes:
On Wednesday, 2 April 2014 at 14:33:04 UTC, James Buren wrote:
 I have thought of designing a new C binding generator that 
 tries to do more than dstep currently does. At the very least 
 that would include automatic conversion of simple macros. Such 
 as: characters, strings, aliases, integers, constant integer 
 expressions, etc. Another thing I wish to add is mapping of 
 standard C types to their D types in core.stdc.* if it is 
 already defined there.

 However, while doing research for the project, I have come 
 across a number of concerns I have. So, I am asking for 
 community input before I get serious with this project because 
 I want to actually make something people would want to use.

 Specifically, this is what I need to know about:

 1) C has many standards (C89/C99/C11) and non-standard 
 extensions (GNU C). By the very nature of C grammar, it can be 
 quite difficult to parse properly. Would anyone care if I 
 excluded non-standard features that may be present in a header? 
 For example, non-standard bitfields -- they use integers other 
 than 'int'. This is non-portable behavior.

 2) As an extension to above, should I bother supporting rarely 
 used features? Some of them are non-standard but others are 
 standard. For example, #pragma pack() or GNU attributes and how 
 those effect struct alignment in the D interface module is 
 non-standard. Another example is converting inline functions, 
 which is supported via C99 standard.

 3) Should I provide an interface for mapping preprocessor 
 macros which cannot be automatically converted by machine 
 logic? For example, function macros cannot be converted 
 automatically because you can't be sure what type of argument 
 is expected. Other macros may not be worth converting because 
 they are simply used for generating code.

 4) As an extension to questions 1 and 2, is it worth it to you 
 for non-standard C features to be supported? This would likely 
 make generating a portable D interface more difficult as the D 
 interface is limited by what the platform's C compiler 
 supports. For example, if struct alignment is used but the 
 platform's C compiler does not support this non-standard 
 feature, then the D interface would have to exclude it from 
 this platform's version of the module.

 5) Should I avoid reusing code from a GPLv2 / GPLv3 project? 
 I'm not sure if this makes anyone uncomfortable. Supposedly 
 most GPL program output retains the license of the input, but 
 I'm not totally convinced. I am not sure if the viral nature of 
 the GPL would make people unwilling to use my tool.

 6) Anything else you would like to see in a C binding 
 generator? I may not implement it, but knowing what people want 
 would help me anyway.

 Thank you for reading this.

Interestingly I was having a play recently with a CTFE'd macro preprocessor in the hope that I could push it through a c lexer and create via template mixins the entire bindings to files. Few limitations like string imports not recursive on Windows (one day I might look into that). But netherless a lot of work, and fun!
Apr 02 2014
parent Jacob Carlborg <doob me.com> writes:
On 03/04/14 01:16, Rikki Cattermole wrote:

 Interestingly I was having a play recently with a CTFE'd macro
 preprocessor in the hope that I could push it through a c lexer and
 create via template mixins the entire bindings to files.
 Few limitations like string imports not recursive on Windows (one day I
 might look into that).
 But netherless a lot of work, and fun!

I modified DMD and added a pragma which would call DStep to create bindings on the fly. It worked, but most people here seemed to prefer to have it as a separate tool. -- /Jacob Carlborg
Apr 03 2014