www.digitalmars.com         C & C++   DMDScript  

digitalmars.dip.ideas - Create a full C++ parser

reply rempas <rempas tutanota.com> writes:
**Idea/Problem**

Ok, I know this will probably not happen, but I like D, and I'm 
going to give it a chance in hopes that I'll be able to convince 
you why you must do that. Now, let me start by saying that I do 
understand how hard it is to write a parser for a programming 
language, let alone C++. However, I will try to explain the 
reason that the trouble is more than worth it as, it will solve 
D's 2 biggest problems and complains that people have and not use 
it.

So let me first start by saying that I think that the D language 
is the best language that we currently have. While not without 
its flaws and things that could be massively improved, it's still 
better than the second best, which is C++. However, D has a very 
big disadvantage, which is library support. A lot of big and 
powerful libraries are written in C++ and while there is the 
ability to manually create the bindings, it can be a real pain to 
manually make it for big projects and having to keep up with 
every update (especially major version ones).

Now, you might tell me that all that aren't something new, and 
you already knew them, but what I'm going to present is that, the 
only thing that stops D from not only been more popular than it 
is, but also, been the most used language that there is out 
there, is a full compatibility with C++! Yeah, I know that it 
sounds just a bold theory, and you may deem that the risk isn't 
worth the trouble but, let me ask you one thing. What's the 
biggest reason D hasn't caught up with big languages? What is the 
number one complaint that people make about D (the garbage 
collector is number 2)? Yes, libraries!

Now, D has "importC" and, lots of C++ libraries have C bindings, 
but the problem is, C doesn't have classes and, you will need to 
do manual work to create a "D way" of using the library. Also, 
not every C++ library has C bindings. For example, 
[Louvre](https://github.com/CuarzoSoftware/Louvre) does not! If I 
want to use it from D, I have to:

* Wait for it to create C bindings (and hope they are maintained 
in the future)
* Manually port the library myself. Which also includes porting 
other C++ libraries because I will get start with the 
[weston-example](https://github.com/CuarzoSoftware/Louvre/tree/main/src/examples/
ouvre-weston-clone) that they showcase in their repos.

**Improvements**

Implementing a C++ parser will have the following advantages:
* C++ libraries will be able to natively been used in D. This 
includes "macros" and templates that have not been initialized in 
the actual library and would need an *additional* initialization 
from the project that would use them (making the process even 
more tedious, slow and overall annoying).

* C++ and D code will be able to be combined, giving the ability 
for any C++ project to more easily and smoothly get fully transit 
to D. That will bring even more popularity and trust to the 
language.

* C++ has smart pointers, which means that we will be able to use 
C++'s standard library for performance sensitive projects, 
solving D's number 2 complain (which is the garbage collector) 
and giving even more trust to people to see D as a real 
competitor that can get the place of C++.

**Implementation**

First, such project will require a great knowledge of the D 
compiler and great skills of writing an efficient parser. That's 
why, if anyone is to do it, it better be the DMD contributors. 
Second, this is something that will take time, so I believe that 
the best approach is to try parsing some real libraries, keep 
improving the parser little by little. Implement, test, fix bugs, 
test, and repeat! Let's start by the STL library (so we can use 
the smart pointers as fast as possible) and then, move to more 
common and big libraries. With steady work, in 2–3 years, we will 
hopefully have a fully working C++ parser without any bugs. Or at 
least one that can parse the biggest and most important C++ 
libraries.
Jun 14
next sibling parent reply rempas <rempas tutanota.com> writes:
On Saturday, 14 June 2025 at 11:39:01 UTC, rempas wrote:
 [**Idea/Problem** .. most important C++ libraries]
Oh and something I forgot to say. A small parser would of course mean a preprocessor. Now, this gives one more advantages of not requiring an external C/C++ compiler (both for C and C++). This will also give us the ability to be able to not require extra files and been able to add the headers in the D files. Something like the following: ```d importC <gtk/gtk.h>; // Header, as a C file (including C11 features) importCXX <iostream>; // Header, as a C++ file importC cpp_module; // C++ module (no reason for the two "XX" in the end as, C has no modules!) void main() { std.cout << "Hello from C++'s print function!\n"; } ```
Jun 14
parent Sergey <kornburn yandex.ru> writes:
On Saturday, 14 June 2025 at 11:47:38 UTC, rempas wrote:
 On Saturday, 14 June 2025 at 11:39:01 UTC, rempas wrote:
 [**Idea/Problem** .. most important C++ libraries]
Oh and something I forgot to say. A small parser would of course mean a preprocessor. Now, this gives one more advantages of not requiring an external C/C++ compiler (both for C and C++). This will also give us the ability to be able to not require extra files and been able to add the headers in the D files. Something like the following: ```d importC <gtk/gtk.h>; // Header, as a C file (including C11 features) importCXX <iostream>; // Header, as a C++ file importC cpp_module; // C++ module (no reason for the two "XX" in the end as, C has no modules!) void main() { std.cout << "Hello from C++'s print function!\n"; } ```
Most of the IT world agreed that even though C++ is quite popular language, at the same time it has very bad designed. And they want to move to something else. Some fields are moving to Go, others to Rust. Also companies are spending a lot of effort to provide solutions to simplify this transition. Auto transpilers from C++ to Rust by DARPA and others, Carbon by Google and Apple presented C++ interop to simplify the transition to Swift. Having C++ interop will be cool now if it will be ready - currently it will be a huge benefit for the language. But starting developing it now I think will be waste of resources - and moreover D doesn't have these resources even for crucial parts. And for sure no resources for such experimental things. And also there are other approaches - several languages have very nice C++ interop. There are projects in Python, R, Julia, Swift, Rust - they have different approaches from automatic bindings generators (cbindgen) to the seamless integrations (Rcpp). So if you really want compiled language with C++ interop I would suggest to check Swift.
Jun 14
prev sibling next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
If you look back in the archives you can see posts from Walter during 
the mid 2000's saying that he did not want C++ binding in D.

Too complex, too much effort for very little gain.

Right now our AST supports a subset of C++, and does not have the 
capability to handle C++ templates. It works for C + COM and not much 
more than that.

An example of a C++ feature which we do not support is multiple 
inheritance. Our AST, semantic analysis and codegen do not support it. 
We cannot bind to it. And this is before we get into compiler specific 
stuff like type information. Dmd is even missing the Windows 64bit 
exception handling of MSVC.

Now consider C, we support most of what C does. What we are missing is 
some semantics of macros, some types and of course typedef, but overall 
the compiler is fully capable of handling it.

ImportC is just a parser with automatic calling out to the macro 
preprocessor sure, but that is because all the AST and semantic analysis 
is in place and mature.

Adding C++ support isn't a 2-3 year project, even if it was just a 
parser (its wayyyy more complex than you are thinking it is). Its all 
this other stuff.

If somebody wants to take this on, here are a list of projects that you 
can prove yourself on:

1. Supports Win64 exceptions to dmd (approved by Walter)
2. Add typedef to dmd (requires a DIP)
3. Implement a macro processor and figure out how to define the 
predefined macros for each target and then make system headers work out 
of the box.
4. Implement 16bit float type (requires a DIP)

These four things are not controversial, at least not compared to stuff 
like multiple inheritance. If you can implement them, then you might 
have a chance to succeed with a ImportC++ feature.

I suspect everyone would love to have ImportC++, the question is how 
much work it would take, and right now its well beyond the benefits.
Jun 14
parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Saturday, 14 June 2025 at 12:19:47 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 
 If somebody wants to take this on, here are a list of projects 
 that you can prove yourself on:

 1. Supports Win64 exceptions to dmd (approved by Walter)
 2. Add typedef to dmd (requires a DIP)
 3. Implement a macro processor and figure out how to define the 
 predefined macros for each target and then make system headers 
 work out of the box.
 4. Implement 16bit float type (requires a DIP)
What are you doing? This isnt happening, why are you giving out a todo list. Toxic optimism
Jun 14
parent "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 15/06/2025 12:44 AM, monkyyy wrote:
 On Saturday, 14 June 2025 at 12:19:47 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 If somebody wants to take this on, here are a list of projects that 
 you can prove yourself on:

 1. Supports Win64 exceptions to dmd (approved by Walter)
 2. Add typedef to dmd (requires a DIP)
 3. Implement a macro processor and figure out how to define the 
 predefined macros for each target and then make system headers work 
 out of the box.
 4. Implement 16bit float type (requires a DIP)
What are you doing? This isnt happening, why are you giving out a todo list. Toxic optimism
If someone was willing to shepherd this, they should understand: 1. This isn't a small amount of work. 2. Have a path forward if they were willing. I would much rather explain why the knee-jerk reaction to the concept of ImportC++ and have it understood why it isn't planned for D, than for us to tell someone 'no' without them understanding nor able to make progress on. Who knows? They might be able to. I will give an example where this policy of mine is a positive. Yesterday I got approval to have my DFA engine merged as long as its self contained and remains behind a preview switch in terms of scope so that it can be experimented with. The reason I started work on it? Because I understood that Walter can't put more time into this aspect of the compiler and its a requirement for us to ever have RC in the language. It would've shortened many months leading up to the start of implementation if I had known that I had to take charge of the implementation, not just the design. As a community we are very bad at helping to onboard people into shepherding of new features, and I want to see that fixed.
Jun 14
prev sibling next sibling parent reply Lance Bachmeier <no spam.net> writes:
On Saturday, 14 June 2025 at 11:39:01 UTC, rempas wrote:

 Now, D has "importC" and, lots of C++ libraries have C 
 bindings, but the problem is, C doesn't have classes and, you 
 will need to do manual work to create a "D way" of using the 
 library. Also, not every C++ library has C bindings. For 
 example, [Louvre](https://github.com/CuarzoSoftware/Louvre) 
 does not! If I want to use it from D, I have to:

 * Wait for it to create C bindings (and hope they are 
 maintained in the future)
 * Manually port the library myself. Which also includes porting 
 other C++ libraries because I will get start with the 
 [weston-example](https://github.com/CuarzoSoftware/Louvre/tree/main/src/examples/
ouvre-weston-clone) that they showcase in their repos.
I think the best way forward would be taking advantage of the recent work on SWIG. Their last major release added experimental support for C: https://swig.org/Doc4.3/C.html#C
Jun 14
parent reply rempas <rempas tutanota.com> writes:
On Saturday, 14 June 2025 at 14:37:19 UTC, Lance Bachmeier wrote:
 I think the best way forward would be taking advantage of the 
 recent work on SWIG. Their last major release added 
 experimental support for C: https://swig.org/Doc4.3/C.html#C
That seems interesting! In general, it would be nice if we lived in a world where there would be a common IR and backend and languages would just target that, allowing you to use any symbol from any language.
Jun 15
parent reply Dejan Lekic <dejan.lekic gmail.com> writes:
On Sunday, 15 June 2025 at 12:46:09 UTC, rempas wrote:
 That seems interesting! In general, it would be nice if we 
 lived in a world where there would be a common IR and backend 
 and languages would just target that, allowing you to use any 
 symbol from any language.
In the case of C, C++ and D (and many others) that is exactly what is happening. They share the same backend.
Jun 16
next sibling parent reply Andrey Zherikov <andrey.zherikov gmail.com> writes:
On Monday, 16 June 2025 at 16:49:51 UTC, Dejan Lekic wrote:
 On Sunday, 15 June 2025 at 12:46:09 UTC, rempas wrote:
 That seems interesting! In general, it would be nice if we 
 lived in a world where there would be a common IR and backend 
 and languages would just target that, allowing you to use any 
 symbol from any language.
In the case of C, C++ and D (and many others) that is exactly what is happening. They share the same backend.
I'm not an expert in compilers, but why can't languages be "married" on IR level? I mean compiler translates source code D and C++ to IR independently where these representations are "linked" together.
Jun 17
next sibling parent evilrat <evilrat666 gmail.com> writes:
On Wednesday, 18 June 2025 at 01:39:00 UTC, Andrey Zherikov wrote:
 On Monday, 16 June 2025 at 16:49:51 UTC, Dejan Lekic wrote:
 On Sunday, 15 June 2025 at 12:46:09 UTC, rempas wrote:
 That seems interesting! In general, it would be nice if we 
 lived in a world where there would be a common IR and backend 
 and languages would just target that, allowing you to use any 
 symbol from any language.
In the case of C, C++ and D (and many others) that is exactly what is happening. They share the same backend.
I'm not an expert in compilers, but why can't languages be "married" on IR level? I mean compiler translates source code D and C++ to IR independently where these representations are "linked" together.
The problem is type-safety, iirc LLVM IR is just about type width, but then C++ and D also have structs, OOP, templates. But you are right, it can do this right now, LDC has options to output IR/bytecode. However without this rich type information you MUST always write correct code because in that case compiler is unable to tell if you have a wrong types, and your program will end up malformed, doing nonsensical calculations on nonsensical inputs. Ok in reality you still need type information in form of manual declarations to please the type system. btw check out my gentool, it partially translates C++ to D on AST level and nicely matches linker level interop feature of D, I built it originally to help with my gamedev needs, but since then there is not much interest even in D community so now it is basically in maintenance mode and I switched to godot.
Jun 17
prev sibling parent user1234 <user1234 12.de> writes:
On Wednesday, 18 June 2025 at 01:39:00 UTC, Andrey Zherikov wrote:
 On Monday, 16 June 2025 at 16:49:51 UTC, Dejan Lekic wrote:
 On Sunday, 15 June 2025 at 12:46:09 UTC, rempas wrote:
 That seems interesting! In general, it would be nice if we 
 lived in a world where there would be a common IR and backend 
 and languages would just target that, allowing you to use any 
 symbol from any language.
In the case of C, C++ and D (and many others) that is exactly what is happening. They share the same backend.
I'm not an expert in compilers, but why can't languages be "married" on IR level? I mean compiler translates source code D and C++ to IR independently where these representations are "linked" together.
IR is already too late to put things in common. Type information is already lost, for example nowadays the LLVM IR does not make any difference between `int*` and `int**`, it's up to the front-end to check that kind of things. For example : https://godbolt.org/z/v6cG9Y65c. Only the front end knows the valid input types. Also you have the problem of the ABI. It's just delusional to think you can call a foreign function if you have it's LLVM IR. To some extent that will work but it's not sane.
Jun 18
prev sibling parent rempas <rempas tutanota.com> writes:
On Monday, 16 June 2025 at 16:49:51 UTC, Dejan Lekic wrote:
 In the case of C, C++ and D (and many others) that is exactly 
 what is happening. They share the same backend.
Yeah, I mean also a way to compile files from multiple languages and have them read files from other languages and been able to call their symbols.
Jun 25
prev sibling next sibling parent xoxo <xororwr gmail.com> writes:
DLang doesn't have a good parser for tooling, nor does it have a 
good LSP for D's features, and you want it to have a builtin C++ 
parser?

I suggest we first ensure that D has good tooling for D.., then 
eventually somebody can begin the work on a C++ parser (even tho 
it's imo a waste of time.., but what ever..).
Jun 15
prev sibling parent reply Johan <j j.nl> writes:
On Saturday, 14 June 2025 at 11:39:01 UTC, rempas wrote:
 **Improvements**

 Implementing a C++ parser will have the following advantages:
 * C++ libraries will be able to natively been used in D. This 
 includes "macros" and templates that have not been initialized 
 in the actual library and would need an *additional* 
 initialization from the project that would use them (making the 
 process even more tedious, slow and overall annoying).

 * C++ and D code will be able to be combined, giving the 
 ability for any C++ project to more easily and smoothly get 
 fully transit to D. That will bring even more popularity and 
 trust to the language.
It's been done already: https://github.com/Syniurge/Calypso -Johan
Jun 17
parent rempas <rempas tutanota.com> writes:
On Wednesday, 18 June 2025 at 06:33:00 UTC, Johan wrote:
 It's been done already:

 https://github.com/Syniurge/Calypso

 -Johan
Ehhhmmmm... Lats commit was 5 years ago?!
Jun 25