www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [SAoC] "Improving DMD as a Library" project thread

reply Mihaela Chirea <chireamihaela99 gmail.com> writes:
Hello!

My name is Mihaela Chirea and I am a 4th year Computer 
Engineering student at Politehnica University of Bucharest.

My interest in programming languages lead me to attending a D 
workshop at Ideas and Projects Workshop in 2019 and D Summer 
School this year, both held by Eduard Staniloiu and Razvan Nitu. 
Topics like meta-programming and design by introspection made me 
curious about how these concepts were implemented, thus 
increasing my interest in compilers.

For this year's edition of SAoC I will be working on improving 
dmd as a library, mainly by cleaning up the AST nodes by moving 
the semantic elements in more suitable places and creating new 
visitors when needed.
After studying the current state of dmd and identifying the parts 
I will be working on, I have decided on following this plan:

- Getting used to the structure of the compiler by working on the 
nodes that don't contain that much semantic information:

Milestone 1:
     - aliasthis.d
     - attrib.d
     - statement.d
     - aggregate.d
     - cond.d
     - staticcond.d
     - nspace.d

- Work on the files where semantic elements either appear often, 
or the functions in which they appear are used in many other 
places and therefore more files would need changes

Milestone 2:
     - mtype.d
     - dstruct.d
     - dclass.d
     - denum.d
     - dimport.d

Milestone 3:
     - dsymbol.d
     - expression.d
     - dmodule.d

Milestone 4
     - declaration.d
     - func.d
     - dtemplate.d

However, small changes to this plan may be necessary since other 
changes to the compiler may raise unexpected issues for this 
project.

For as much as time allows, and even after the end of this event, 
I would also work on creating a nice compiler interface, which 
would become much easier after this refactoring step.
I will be posting weekly updates regarding my progress on this 
project.

Thanks!
Mihaela
Sep 14 2020
next sibling parent Mathias LANG <geod24 gmail.com> writes:
On Monday, 14 September 2020 at 12:47:42 UTC, Mihaela Chirea 
wrote:
 Hello!

 My name is Mihaela Chirea and I am a 4th year Computer 
 Engineering student at Politehnica University of Bucharest.

 [...]

 Thanks!
 Mihaela
Good luck! It's a much needed improvement. I see you already joined the dlang slack, if you have any questions, #dmd is the place to go.
Sep 15 2020
prev sibling parent reply Mihaela Chirea <chireamihaela99 gmail.com> writes:
Hello!

During the first week of working on this project I received 
multiple suggestions regarding other possible tasks that could 
better benefit the community. I started working on them from the 
second week but never clearly changed the milestones.

So, based mostly on Jacob Carlborg's suggestions[1], here are the 
new plans:

Milestone 2:
- Add the start location to the AST nodes that lack this 
information
- Bring all the dmd as a library features already existing in the 
compiler under DMDLIB
- Add the token size
- Add the end location to all nodes

Some of the issues I would like to tackle during the next 
milestones are:
- Add the possibility of analyzing source code that is only in 
memory
- Reduce the global state
- Don't generate TypeInfo when not needed (as suggested here[2])

So far, I didn't get the chance to study these last topics in 
detail and I would appreciate any advice or opinions on how to 
start working on these tasks.

[1] https://github.com/dlang/dmd/pull/11788#issuecomment-698186023
[2] 
https://forum.dlang.org/post/iopxhnudlrgiqwjxzihe forum.dlang.org
Oct 28 2020
parent reply Jacob Carlborg <doob me.com> writes:
On Wednesday, 28 October 2020 at 19:08:01 UTC, Mihaela Chirea 
wrote:

 - Add the possibility of analyzing source code that is only in 
 memory
I've started on this [1] (very rough workin in progress), if you need any pointers. [1] https://github.com/jacob-carlborg/ddc/commit/cee56ce3750701d593dd619b27d28f18e4929e72 -- /Jacob Carlborg
Oct 29 2020
parent reply RazvanN <razvan.nitu1305 gmail.com> writes:
On Thursday, 29 October 2020 at 08:50:46 UTC, Jacob Carlborg 
wrote:
 On Wednesday, 28 October 2020 at 19:08:01 UTC, Mihaela Chirea 
 wrote:

 - Add the possibility of analyzing source code that is only in 
 memory
I've started on this [1] (very rough workin in progress), if you need any pointers. [1] https://github.com/jacob-carlborg/ddc/commit/cee56ce3750701d593dd619b27d28f18e4929e72 -- /Jacob Carlborg
So right now the compiler, when given a .d/.di file it opens it, reads the contents and immediately lexes+parses the string after which the string is discarded. If the contents of the file need to be changed or reanalyzed, then the whole process needs to be started from scratch. What you are proposing Jacob is that the contents of the file are stored somewhere for ease of reuse. Is that right? Cheers, RazvanN
Oct 29 2020
parent Jacob Carlborg <doob me.com> writes:
On Friday, 30 October 2020 at 06:03:40 UTC, RazvanN wrote:

 So right now the compiler, when given a .d/.di file it opens 
 it, reads the contents and immediately lexes+parses the string 
 after which the string is discarded. If the contents of the 
 file need to be changed or reanalyzed, then the whole process 
 needs to be started from scratch. What you are proposing Jacob 
 is that the contents of the file are stored somewhere for ease 
 of reuse. Is that right?
Kind of, or at least that's one of the reasons. The main idea is to separate the reading of a file from lexing and parsing it. We introduce a file manager (like a cache). The compiler will first look in the file manager if the file content if available, otherwise read from disk. The important part here is that it needs to be possible to pre-populate (and also update) the file manager with a file and its content. This would allow to do a full compilation from memory, without touching the disk. The main reason for this is to be able to have the compiler receive file content data from other sources than disk. Two use cases for that would be: * A LSP server (or similar tool) receiving the data from the network from an editor with unsaved files * The data is already in memory, think a string literal. This is useful when writing tests The other idea is, as you mentioned, to read from memory if the file has already been read from disk when reanalyzing. For example, if you want to get the tokens of an AST node, as the compiler looks like now, you probably need to re-lex the file to get the tokens. But you don't want to re-read the file from disk, because it might have been updated. For this use case, it's really important the compiler is reading the exact same file content as it did when it originally created the AST. Note, there's already a file cache [1], but that will not fit. It it's not possible to pre-populate or update. It also splits up the file in lines. The existing file cache [1] could perhaps take advantage of the new file manager. Keep in mind that this new file manager needs to be used, not only when reading D files, but also when reading files through import expressions. [1] https://github.com/dlang/dmd/blob/master/src/dmd/filecache.d -- /Jacob Carlborg
Oct 30 2020