www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Writing Compilers in D

reply Kevin A <Kevin_member pathlink.com> writes:
Hello!  I am an experienced compiler/interpreter writer and I have been
considering using D instead of C/C++ as the implementation language.  I have a
few questions that I am seeking answers to before I begin writing it:

- Has anyone else here written a compiler in D?
- Is D well-suited to compiler writing? And if so, what features of D are
particularly good for this?
- Is there a visual debugger available for D?  If not, is there any debugger?
How good is the debugger?
- Is there a good IDE for D?

Your help will be greatly appreciated.

Sincerely,
Kevin A
Aug 12 2004
next sibling parent Deja Augustine <deja scratch-ware.net> writes:
Kevin A wrote:
 Hello!  I am an experienced compiler/interpreter writer and I have been
 considering using D instead of C/C++ as the implementation language.  I have a
 few questions that I am seeking answers to before I begin writing it:
 
 - Has anyone else here written a compiler in D?

I've written part of one. I wrote a D preprocessor that parsed in the D code and did some rudimentary semantic analysis. I was originally going to use that as the base for D.NET until I discovered that the front-end source was available.
 - Is D well-suited to compiler writing? And if so, what features of D are
 particularly good for this?

It's string handling is definately nicer than C++ as are the dynamic arrays. Most of that can be done in C++ via the STL, however.
 - Is there a visual debugger available for D?  If not, is there any debugger?
 How good is the debugger?

As I understand it, you can use a variety of "3rd party" debuggers. I've never tried, though. Contracts make it pretty easy to code without needing a separate debugger.
 - Is there a good IDE for D?

Check out the links page on the D site. -Deja
Aug 12 2004
prev sibling parent reply Stephan Wienczny <Stephan Wienczny.de> writes:
Kevin A wrote:
 Hello!  I am an experienced compiler/interpreter writer and I have been
 considering using D instead of C/C++ as the implementation language.  I have a
 few questions that I am seeking answers to before I begin writing it:
 
 - Has anyone else here written a compiler in D?
 - Is D well-suited to compiler writing? And if so, what features of D are
 particularly good for this?
 - Is there a visual debugger available for D?  If not, is there any debugger?
 How good is the debugger?
 - Is there a good IDE for D?
 
 Your help will be greatly appreciated.
 
 Sincerely,
 Kevin A
 
 

I'm trying to do such a thing. I can tell you my experience (so far). D makes implementing something a little bit more easy than C/C++. You have class definition near its implementation; no redundancy when writing a new function. Then you have got some advanced D features, like dynamic arrays with slicing which makes parsers/lexers awful fast IMHO D is more readable than C/C++ and can be a lot faster... It should be possible to use a visual debugger. There is non in the package, but you should find one on the net. There have been some affords to write a special IDE for D (dide, leds) and there is a config files for others (Eclipse, MS Visual Studio) Stephan
Aug 12 2004
parent reply "Sampsa Lehtonen" <snlehton cc.hut.fi> writes:
On Thu, 12 Aug 2004 21:04:00 +0200, Stephan Wienczny <Stephan Wienczny.de>  
wrote:

 I'm trying to do such a thing. I can tell you my experience (so far).
 D makes implementing something a little bit more easy than C/C++.
 You have class definition near its implementation; no redundancy when  
 writing a new function. Then you have got some advanced D features, like  
   dynamic arrays with slicing which makes parsers/lexers awful fast

 IMHO D is more readable than C/C++ and can be a lot faster...

 It should be possible to use a visual debugger. There is non in the  
 package, but you should find one on the net.
 There have been some affords to write a special IDE for D (dide, leds)  
 and there is a config files for others (Eclipse, MS Visual Studio)

I've considered making a compiler too, perhaps for D. I've made one for MiniJava which is a subset of Java. It produced native code (MIPS). Now I would like to try my skills on something more involved. C/C++ syntax seems too complex, and Java is a bit too abstract (it isn't meant for native code though such compilers exist). Do you guys have any suggestions which tools to use? I've been thinking about making the compiler in Java, as it is easiest and fastest to code (using refactoring tools). I don't care about the compilation times at the moment, getting the compiler running and producing code is such a task on itself. I've used JavaCC and Antlr too, but is there better alternatives? For industrial compiler I'd choose C++ as development language and x86 instruction set as output, but making cisc compilers is so much harder than risc compilers, so maybe I'll go with the MIPS here too. My primary goal is to get my hands on different optimization techniques and to get familiar with complex flow- and data-analyses. Btw, if anyone has pointers to some nice documents about OBJ-file structure and such, I'd be interested. -texmex/sampsa lehtonen -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 16 2004
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Sampsa Lehtonen schrieb:
 I've considered making a compiler too, perhaps for D. I've made one for  
 MiniJava which is a subset of Java. It produced native code (MIPS). Now 
 I  would like to try my skills on something more involved. C/C++ syntax 
 seems  too complex, and Java is a bit too abstract (it isn't meant for 
 native  code though such compilers exist).

Hum. D would be a large undertaking, though not as large as C++. C++ is a problem more semantically than syntactically, i seem to think.
 Do you guys have any suggestions which tools to use? I've been thinking  
 about making the compiler in Java, as it is easiest and fastest to code  
 (using refactoring tools). I don't care about the compilation times at 
 the  moment, getting the compiler running and producing code is such a 
 task on  itself. I've used JavaCC and Antlr too, but is there better 
 alternatives?

Refractoring tools? whatdoyoumean? I'm under the imression that ANTLR is the best, but my fav is COCO/R. There are complete ports of COCO/R for C++, Java, C# and Delphi, and there is a good chance i can get a D version up. Warning: Java and C# versions generate non-reentrant parsers. I would not recommand writing a compiler in Java. So far, i have had a lot of fun writing my first real lexer, and i came to like the D array semantics and slicing, which are probably unique. That is, they can be easily emulated in C++, but they are not native in any language. The difference from, say, Python, is that you can have slices still refer to the original array where the data is stored. For example, one would load a text into a large buffer or memory-mapped file, and have a lexeme contain a slice into it, instead of a position and length. String semantics without overhead. Plus one would insert asserts here and there to make sure that such a slice keeps pointing into the loaded text.
 For industrial compiler I'd choose C++ as development language and x86  
 instruction set as output, but making cisc compilers is so much harder  
 than risc compilers, so maybe I'll go with the MIPS here too.

Non-processor targets might also be interesting... e.g. ANDF, the architecture- neutral distribution format. There are converters from ANDF to target code for different architectures.
 My primary goal is to get my hands on different optimization techniques  
 and to get familiar with complex flow- and data-analyses.

I had just chatted with MadMan/TAP (aka MadenMann) yesterday. Perhaps he would also be interested... He wanted to write a custom compiler for Sega Mega-Drive.
 Btw, if anyone has pointers to some nice documents about OBJ-file  
 structure and such, I'd be interested.

These are different. Digitalmars, Watcom, Borland: look for OMF. Microsoft and some others use COFF. Other operating systems use either some variant of COFF or some variant of ELF, or even something completely different. The format of object files need not necessarily be in correspondence with OS executable format, although i guess it makes linker's life easier. -eye/PaC
Aug 16 2004
parent reply "Sampsa Lehtonen" <snlehton cc.hut.fi> writes:
On Mon, 16 Aug 2004 18:34:14 +0200, Ilya Minkov <minkov cs.tum.edu> wrote:

 Hum. D would be a large undertaking, though not as large as C++. C++ is  
 a problem more semantically than syntactically, i seem to think.

Well I plan to implement just a subset of D first. Leave the exceptions, templates, mixins for later, just to get the primitive things running. Probably the most gratifying thing in compiler construction is the moment when you actually get something compiled and it runs!
 Do you guys have any suggestions which tools to use? I've been


By tools I meant parser generators and such, and refactoring tools are a whole different story though they are related to code parsing. Refactoring means transformatinos on code that do not break the meaning of the code. Pretty fun stuff, really. There are plenty of different refactoring types, starting from simple variable renaming to super-class extraction etc. Basicly they are tools to aid programming by automating the tedious primitive tasks programmers do daily. Check out more at http://www.refactoring.com/ BTW, nice thing about D is that D programs can be parsed easily as there are no preprocessor. This makes refactoring possible, unlike on C++ where the preprocessor can really f*ck up things. Currently I'm waiting a D ide that would include refactoring tools ;)
 I'm under the imression that ANTLR is the best, but my fav is COCO/R.  
 There are complete ports of COCO/R for C++, Java, C# and Delphi, and  
 there is a good chance i can get a D version up. Warning: Java and C#  
 versions generate non-reentrant parsers.

COCO/R, hmm, haven't heard of it. I'll check it out. But probably I'll go with JavaCC (or Antlr, as it generates parsers in C++ too).
 I would not recommand writing a compiler in Java. So far, i have had a  
 lot of fun writing my first real lexer, and i came to like the D array  
 semantics and slicing, which are probably unique. That is, they can be  
 easily emulated in C++, but they are not native in any language. The  
 difference from, say, Python, is that you can have slices still refer to  
 the original array where the data is stored. For example, one would load  
 a text into a large buffer or memory-mapped file, and have a lexeme  
 contain a slice into it, instead of a position and length. String  
 semantics without overhead. Plus one would insert asserts here and there  
 to make sure that such a slice keeps pointing into the loaded text.

Umm, I don't quite follow you. After the lexer has tokenized a token and the parser has accepted it, the actual text comes unnecessary (unless it is an identifier). So why would I want to load the whole file into memory and have tokens pointing into that big piece of text?... With lexer for an ide where parsing needs to be done constantly and on varying places it is a different story, I guess...
 For industrial compiler I'd choose C++ as development language and x86   
 instruction set as output, but making cisc compilers is so much harder   
 than risc compilers, so maybe I'll go with the MIPS here too.

architecture- neutral distribution format. There are converters from ANDF to target code for different architectures.

Yeah, but that is a bit too much of rocket science for me :) Getting the compiler to do proper executable is hard enough, I don't want to hinder the development with unnecessarily complex target platforms :)
 My primary goal is to get my hands on different optimization  
 techniques  and to get familiar with complex flow- and data-analyses.

would also be interested... He wanted to write a custom compiler for Sega Mega-Drive.

I was thinking of making a compiler for ARM's RISC processor for GBA, but that project never really took off. It would have been an interesting project though, because the device has its restrictions and the instruction set is so simple.
 Btw, if anyone has pointers to some nice documents about OBJ-file   
 structure and such, I'd be interested.

Microsoft and some others use COFF. Other operating systems use either some variant of COFF or some variant of ELF, or even something completely different. The format of object files need not necessarily be in correspondence with OS executable format, although i guess it makes linker's life easier.

Hmm, so different compilers need different kind of OBJ files? So I can't use Watcom objs/libs on VC++... oh well. Thanks for the info! -texmex/sampsa lehtonen -- Using Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 16 2004
parent Ilya Minkov <minkov cs.tum.edu> writes:
Sampsa Lehtonen schrieb:

 Well I plan to implement just a subset of D first. Leave the 
 exceptions,  templates, mixins for later, just to get the primitive 
 things running.  Probably the most gratifying thing in compiler 
 construction is the moment  when you actually get something compiled and 
 it runs!

Okeydokey.
 By tools I meant parser generators and such, and refactoring tools are 
 a  whole different story though they are related to code parsing. 
 Refactoring  means transformatinos on code that do not break the meaning 
 of the code.  Pretty fun stuff, really. There are plenty of different 
 refactoring types,  starting from simple variable renaming to 
 super-class extraction etc.  Basicly they are tools to aid programming 
 by automating the tedious  primitive tasks programmers do daily. Check 
 out more at  http://www.refactoring.com/

Gotta look at them.
 BTW, nice thing about D is that D programs can be parsed easily as 
 there  are no preprocessor. This makes refactoring possible, unlike on 
 C++ where  the preprocessor can really f*ck up things. Currently I'm 
 waiting a D ide  that would include refactoring tools ;)

Yes. But i guess, someone would have to write refractoring tools before someone else would integrate them into an editor.
 COCO/R, hmm, haven't heard of it. I'll check it out. But probably I'll 
 go  with JavaCC (or Antlr, as it generates parsers in C++ too).

Yup. ANTLR is really worth it, but i just find COCO/R nice. The whole program is tiny, and generated compilers are small, complete, fast, readable. It has a few special features like comment, pragma processing, etc, extendable lookup both in parser and lexer, and context dependancy can be used. It should even be able to parse C++, i think. Though not many people have been writing grammars for it.
 Umm, I don't quite follow you. After the lexer has tokenized a token 
 and  the parser has accepted it, the actual text comes unnecessary 
 (unless it  is an identifier). So why would I want to load the whole 
 file into memory  and have tokens pointing into that big piece of text?...
 With lexer for an ide where parsing needs to be done constantly and on  
 varying places it is a different story, I guess...

I have found that a lexeme need almost only carry the text and the pointer to the lexer. Type of the lexeme is taken from the class hierarchy. Concrete subtypes may contain further information or methods. So far i have following types of lexeme defined: * Indetifier; * Numerical (matching both integer and floating-point); * Crude. I only need to read the first symbol to guess the type of the lexeme: a letter or underscore makes it an identifier, a number makes it numeric, and everything else is "crude", and is matched using a large switch of switches which includes operators and everything else unwieldy. I have language keywords be identifiers in the lexer, and only checked in the parser later. The lexeme is parsed in the constructor of the corresponding type - thus there is no stepping back, if there is a mismatch it is a fatal error. So far i discriminate Crudes and keywords by text in the parser, this is very fast. No copies of data are being made, and in fact there are usually only few comparisons taking place each time because first characters carry the most information. Note also that one doesn't have to have the lexemes store their position in the file for error reporting and such - in a function to get file and line i assert that the lexeme string is within the lexer's storage, and then i slice the lexer's storage from the beginning to the start of the lexeme. Then i only need to count the line ends in there to figure out the line. :) Or, i can even have a table with offsets of all line ends and simply scan through it. Like, it all is nothing that couldn't be done some other way, but it just works so nicely!
 Yeah, but that is a bit too much of rocket science for me :) Getting 
 the  compiler to do proper executable is hard enough, I don't want to 
 hinder  the development with unnecessarily complex target platforms :)

I thought it might be a bit simpler. But a real simple CPU is perhaps better suited.
 I was thinking of making a compiler for ARM's RISC processor for GBA, 
 but  that project never really took off. It would have been an 
 interesting  project though, because the device has its restrictions and 
 the  instruction set is so simple.

Ever heard of "Gamepark-32", a korean game handheld? It is very popular with crazy developers. There are almost no games for it other than in korean language, and the handheld itself has to be imported, but it's cheap (aroung 120 eur IIRC), has the GCC devtools, and is accessed by USB, and runs programs from SmartMedia cards. :) It is based upon an ARM9 (as opposed to ARM7 in GBA) clocked with frequencies of 66, 100, 133 or even (unwarranted) 166 MHz - the freq can be manipulated programmatically. It is a pure framebuffer device, it doesn't have any sprite acceleration like GBA, but a much better display (320x240 hicolor), and one may have fun to figure out some cool software tricks to make it reach some notable performance. The specs and some links for example here: http://darkfader.net/gp32/ -eye
Aug 16 2004