www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [SAOC 2023] dfmt rewrite - Weekly update #1

reply Prajwal S N <snp lang.org> writes:
Hi everyone,

For SAOC 2023, I'm working on refactoring 
[dfmt](https://github.com/dlang-community/dfmt) to use the AST 
from DMD-as-a-library instead of libdparse.

The past week has been very interesting. I got up to speed with 
the dfmt codebase, and managed to do a 1-to-1 port of the lexer 
dependency from libdparse to DMD-as-a-library. Most parts were 
pretty straightforward, and the bulk of the work was replacing 
every `tok!"<token>"` instance with `TOK.<token>` and making sure 
the token coming from DMD was the same as what was previously 
being used. So far so good!

You can see the draft PR tracking the work 
[here](https://github.com/dlang-community/dfmt/pull/589).

Going forward, my mentor and I have decided that it would be 
impractical to try and replace the parser directly, for multiple 
reasons:

- It's a lot of work to replace the parser and use the DMD AST 
instead of libdparse's, and all of this work will happen without 
a working version of dfmt. If, at the end of this, dfmt is broken 
or refuses to compile, it could very well mean that all that 
effort went down the drain.
- Doing a brute force replacement of the parser will prevent us 
from testing the transformation passes in dfmt individually, and 
also brings us back to the point above.

Hence, we've decided to do an incremental rewrite of the files 
that use the parser, initially with no passes (just to ensure the 
AST is being built in the first place), and then adding each pass 
along with relevant unit tests.
Sep 22 2023
next sibling parent M.M. <matus email.cz> writes:
On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:
 Hi everyone,

 For SAOC 2023, I'm working on refactoring 
 [dfmt](https://github.com/dlang-community/dfmt) to use the AST 
 from DMD-as-a-library instead of libdparse.

 ....
Good luck in the project!
Sep 22 2023
prev sibling next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:
 Hi everyone,

 For SAOC 2023, I'm working on refactoring 
 [dfmt](https://github.com/dlang-community/dfmt) to use the AST 
 from DMD-as-a-library instead of libdparse.

 [...]
Superb! We really really need people working on tooling ❤️
Sep 22 2023
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:
 Hi everyone,

 For SAOC 2023, I'm working on refactoring 
 [dfmt](https://github.com/dlang-community/dfmt) to use the AST 
 from DMD-as-a-library instead of libdparse.

 The past week has been very interesting. I got up to speed with 
 the dfmt codebase, and managed to do a 1-to-1 port of the lexer 
 dependency from libdparse to DMD-as-a-library. Most parts were 
 pretty straightforward, and the bulk of the work was replacing 
 every `tok!"<token>"` instance with `TOK.<token>` and making 
 sure the token coming from DMD was the same as what was 
 previously being used. So far so good!

 You can see the draft PR tracking the work 
 [here](https://github.com/dlang-community/dfmt/pull/589).

 Going forward, my mentor and I have decided that it would be 
 impractical to try and replace the parser directly, for 
 multiple reasons:

 - It's a lot of work to replace the parser and use the DMD AST 
 instead of libdparse's, and all of this work will happen 
 without a working version of dfmt. If, at the end of this, dfmt 
 is broken or refuses to compile, it could very well mean that 
 all that effort went down the drain.
 - Doing a brute force replacement of the parser will prevent us 
 from testing the transformation passes in dfmt individually, 
 and also brings us back to the point above.

 Hence, we've decided to do an incremental rewrite of the files 
 that use the parser, initially with no passes (just to ensure 
 the AST is being built in the first place), and then adding 
 each pass along with relevant unit tests.
I don't want to sound alarming or anything, but an AST is not really what you want to work with as a formatter. The main reason is that you want to carry around a lot of information that the AST generally doesn't care about (comments, informations about layout, etc...). Consider the following example: ```d int a; // this is an int. int b; ``` We immediately recognize that the comment refers to a. However: ```d int a; // this is an int. int b; ``` Now we recognize that the comment refers to b. There is a lot of subtle semantic in there that is very hard to convey through an AST and are very hard to work with in that form. There is a lot of prior art on the matter of code formatting, and the best explanation is probably the one from dartfmt's author: https://journal.stuffwithstuff.com/2015/09/08/the-hardest-progr m-ive-ever-written/ . clang-format and many others do use this approach. Shameless plug: sdfmt uses that approach. You can get it there: https://code.dlang.org/packages/sdc%3Asdfmt . I understand this is probably out of scope to turn things around at this time, but holly hell, do we really need, as a community, to redo all the mistake other communities have done instead of learning from them, and, to add insult to injury, involve junior devs in that madness?
Sep 24 2023
parent reply RazvanN <razvan.nitu1305 gmail.com> writes:
On Sunday, 24 September 2023 at 21:11:34 UTC, deadalnix wrote:
 On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:
 Hi everyone,

 For SAOC 2023, I'm working on refactoring 
 [dfmt](https://github.com/dlang-community/dfmt) to use the AST 
 from DMD-as-a-library instead of libdparse.

 The past week has been very interesting. I got up to speed 
 with the dfmt codebase, and managed to do a 1-to-1 port of the 
 lexer dependency from libdparse to DMD-as-a-library. Most 
 parts were pretty straightforward, and the bulk of the work 
 was replacing every `tok!"<token>"` instance with 
 `TOK.<token>` and making sure the token coming from DMD was 
 the same as what was previously being used. So far so good!

 You can see the draft PR tracking the work 
 [here](https://github.com/dlang-community/dfmt/pull/589).

 Going forward, my mentor and I have decided that it would be 
 impractical to try and replace the parser directly, for 
 multiple reasons:

 - It's a lot of work to replace the parser and use the DMD AST 
 instead of libdparse's, and all of this work will happen 
 without a working version of dfmt. If, at the end of this, 
 dfmt is broken or refuses to compile, it could very well mean 
 that all that effort went down the drain.
 - Doing a brute force replacement of the parser will prevent 
 us from testing the transformation passes in dfmt 
 individually, and also brings us back to the point above.

 Hence, we've decided to do an incremental rewrite of the files 
 that use the parser, initially with no passes (just to ensure 
 the AST is being built in the first place), and then adding 
 each pass along with relevant unit tests.
I don't want to sound alarming or anything, but an AST is not really what you want to work with as a formatter. The main reason is that you want to carry around a lot of information that the AST generally doesn't care about (comments, informations about layout, etc...). Consider the following example: ```d int a; // this is an int. int b; ``` We immediately recognize that the comment refers to a. However: ```d int a; // this is an int. int b; ``` Now we recognize that the comment refers to b. There is a lot of subtle semantic in there that is very hard to convey through an AST and are very hard to work with in that form.
But dmd attaches comments to declarations in the parsing phase. You can get the comments from the AST. Why is that not a good way to represent the data?
 There is a lot of prior art on the matter of code formatting, 
 and the best explanation is probably the one from dartfmt's 
 author: 
 https://journal.stuffwithstuff.com/2015/09/08/the-hardest-progr
m-ive-ever-written/ . clang-format and many others do use this approach.
The article states the following: "At this point, you’re probably thinking, “Wait. What’s so hard about formatting?” After you’ve parsed, can’t you just walk the AST and pretty-print it with some whitespace? If every statement fit within the column limit of the page, yup. It’s a piece of cake. (I think that’s what gofmt does.) But our formatter also keeps your code within the line length limit. That means adding line breaks (or “splits” as the formatter calls them), and determining the best place to add those is famously hard." So the fundamental problem comes from the fact that the author wants to do smart stuff with line/column limits. dfmt does not even have a line limit option, so that problem does not apply here. Even if we were to add it, that could be viewed as a separate project of transitioning dfmt to a better representation.
 Shameless plug: sdfmt uses that approach. You can get it there: 
 https://code.dlang.org/packages/sdc%3Asdfmt .

 I understand this is probably out of scope to turn things 
 around at this time, but holly hell, do we really need, as a 
 community, to redo all the mistake other communities have done 
 instead of learning from them, and, to add insult to injury, 
 involve junior devs in that madness?
As things stand dfmt already exists and is implemented by using an AST to output the formatted code. Maybe that is not the ideal approach and could be improved, but that has nothing to do with this project. This project is all about replacing a dependency (libdparse) with another one (dmd-as-a-lib) so that when dmd is upgraded, dfmt ideally would not be affected. Re-engineering the formatter such that it uses a different technique represents a separate project that is orthogonal to the current effort.
Sep 25 2023
parent reply Tim <tim.dlang t-online.de> writes:
On Monday, 25 September 2023 at 08:26:33 UTC, RazvanN wrote:
 So the fundamental problem comes from the fact that the author 
 wants to do smart stuff with line/column limits. dfmt does not 
 even have a line limit option, so that problem does not apply 
 here. Even if we were to add it, that could be viewed as a 
 separate project of transitioning dfmt to a better 
 representation.
I think dfmt has options for line limits: ``` --soft_max_line_length --max_line_length ```
Sep 25 2023
parent reply RazvanN <razvan.nitu1305 gmail.com> writes:
On Monday, 25 September 2023 at 16:12:51 UTC, Tim wrote:
 On Monday, 25 September 2023 at 08:26:33 UTC, RazvanN wrote:
 So the fundamental problem comes from the fact that the author 
 wants to do smart stuff with line/column limits. dfmt does not 
 even have a line limit option, so that problem does not apply 
 here. Even if we were to add it, that could be viewed as a 
 separate project of transitioning dfmt to a better 
 representation.
I think dfmt has options for line limits: ``` --soft_max_line_length --max_line_length ```
Yes, those refer to the length of a specific line, but the OP was also referring to constraints for the number of lines in a specific file, so when I said line limit I was referring to the limit of a line, rather to the limit of the number of lines in a file.
Sep 26 2023
parent RazvanN <razvan.nitu1305 gmail.com> writes:
On Tuesday, 26 September 2023 at 07:50:59 UTC, RazvanN wrote:

 Yes, those refer to the length of a specific line, but the OP 
 was also referring to constraints for the number of lines in a 
 specific file, so when I said line limit I was referring to the 
 limit of a line, rather to the limit of the number of lines in 
 a file.
*I was not referring to the limit of a line, rather ...
Sep 26 2023
prev sibling next sibling parent max haughton <maxhaton gmail.com> writes:
On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:
 Hi everyone,

 For SAOC 2023, I'm working on refactoring 
 [dfmt](https://github.com/dlang-community/dfmt) to use the AST 
 from DMD-as-a-library instead of libdparse.

 [...]
Somewhat orthogonal but I'm tempted by the idea of all the "official" tooling being in with the compiler source. They should be together in the release, and one should never have to deal with them being out of sync.
Sep 25 2023
prev sibling parent sighoya <sighoya gmail.com> writes:
On Friday, 22 September 2023 at 07:12:02 UTC, Prajwal S N wrote:
[...]
Wishing you great success for the project.
Sep 27 2023