www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Adela Vais - SAOC Milestones - Dlang GLR Parser for GNU Bison

reply Adela Vais <adela.vais99 gmail.com> writes:
Hello all!

Starting this week, I will post the milestones for #SAOC2020 only 
on this thread, as it will be easier to follow them this way.

General milestones: [0].
Milestone 1 Update 1: [1].
Milestone 1 Update 2: [2].

My plan for last week was:
- to continue familiarizing myself with the M4 functions used 
within the repo (working on lookahead correction and continuing 
the add other features); (done)
- to start analyzing the C and C++ existing parsers, by writing 
programs that would help me understand the key differences 
between them. (not done - postponed)

As of last week:
- I continued to do tasks in the repo[3][4][5][6].
- Bison's documentation was updated to include D support[7]. I 
now have an updated list of the features in the D backend and I 
know what needs to be added or improved.
- I documented myself about some of the features I need to 
implement or improve asap (lookahead, error messages, etc).
- The PR[8] for dlang/phobos is still open.

The plan for next week:
- While finishing the documentation, I realized that the LALR 
parser has a lot of missing features, so I will focus on adding 
them.
- I also plan to enhance the user interface, allowing the user to 
view the TokenKind and the semantic value as a complete symbol, 
as opposed to different values.
- I will create a context error message function, that will help 
me better encapsulate the code, provide more efficient code (it 
will appear in the program only if certain directives have been 
selected by the user), and will make a smoother transition to 
adding the missing custom error message.
- I will also continue working on the lookahead correction if I 
have the time. It is postponed until I add the context 
functionality.

[0]: 
https://forum.dlang.org/post/ngdkxwyhrcduspuslfoe forum.dlang.org
[1]: 
https://forum.dlang.org/post/xxnwpbmagtzuhwmsusnu forum.dlang.org
[2]: 
https://forum.dlang.org/post/qnrnrjxrcmglbgwndkby forum.dlang.org
[3]: 
https://github.com/akimd/bison/commit/4bbda69f1ef10a40cfd28deb9bf619e28dcdbf6b
[4]: 
https://github.com/akimd/bison/commit/4855b9855465c3289f1f8f452bc0b09ba5711e05
[5]: 
https://github.com/akimd/bison/commit/3829bd6262ee22f445217808dd0aa2c34b72f660
[6]: 
https://github.com/akimd/bison/commit/7cd195b30aa615fa9996f471c1f200e016a904a5
[7]: 
https://github.com/akimd/bison/commit/72360b51a545a5c1d03027885466f5855c67c61e
[8]: https://github.com/dlang/phobos/pull/7642
Oct 06 2020
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Tuesday, 6 October 2020 at 20:31:41 UTC, Adela Vais wrote:
 [...]
Hello! As of last week: - The complete symbols and the custom error message functionalities are works in progress. - The PR[0] for dlang/phobos is still open. The plan for next week: - Continue working on the above tasks. - Add the lookahead correction if I have the time. It is postponed until I add the context functionality. #SAOC2020 [0]: https://github.com/dlang/phobos/pull/7642
Oct 18 2020
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Sunday, 18 October 2020 at 15:48:13 UTC, Adela Vais wrote:
 [...]
Hello! As of last week: - I finished the Context class, which is providing information about the error token, its location, and the expected tokens. It is currently under review. [0] - I implemented the custom error message functionality. It is currently under review. [0] - I almost finished the lookahead correction feature, I expect it to be ready by the next update. [1] - I made the requested changes for the PR for dlang/phobos. [2] The plan for next week: - Continue working on lookahead correction. - Add tests for programs that use multiple parsers. - Implement yyerrok, which is a way to make the parser resume to normal execution immediately after it encountered an error. [3] #SAOC2020 [0]: https://github.com/akimd/bison/compare/master...adelavais:custom-error-message-squash [1]: https://github.com/akimd/bison/compare/master...adelavais:add-lac [2]: https://github.com/dlang/phobos/pull/7642 [3]: https://www.gnu.org/software/bison/manual/bison.html#Error-Recovery
Oct 26 2020
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Monday, 26 October 2020 at 19:11:27 UTC, Adela Vais wrote:
 [...]
Hello! As of last week: - I resubmitted the Context class and the custom error message patches with the needed corrections and suggestions. [0] - I made the requested changes for the PR for dlang/phobos. [1] - I implemented the yyerrok functionality, which is currently under review. [2] - I made a style cleanup in the D examples, currently under review. [3] - I worked on the tests for programs that use multiple parsers, but I still have some problems. The problem is about integrating the tests I made in the test suite. If I can't make it work today, I will send a PR with what I changed until now so I can get help. - After the modifications in the parser, the lookahead is again a work in progress. The plan for next week: - Continue working on the above. - Start working on the push parser[4] for D (right now, the LALR1 parser is of type pull). #SAOC2020 [0] https://github.com/adelavais/bison/tree/fix-context-squash [1] https://github.com/dlang/phobos/pull/7642 [2] https://github.com/adelavais/bison/tree/yyerrok [3] https://github.com/adelavais/bison/tree/examples-stylefix [4] https://www.gnu.org/software/bison/manual/html_node/Push-Decl.html#Push-Decl
Nov 06 2020
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Friday, 6 November 2020 at 16:08:55 UTC, Adela Vais wrote:
 [...]
I forgot to modify the title of the last post, it was a status report for #SAOC2020 Milestone 2 Update 2.
Nov 06 2020
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Friday, 6 November 2020 at 16:11:14 UTC, Adela Vais wrote:
 [...]
Hello! As of last week: - The patches sent last week were all accepted. - I started working on internationalization, which allows the user to have the error messages translated in other languages. [1] - The PR from Phobos is about to get merged (72h no objection -> merge label). [2] - I have a fully working version of complete external symbols [3], but I need feedback from the D community to decide if this should replace the current code version or if both versions should be supported. I opened a discussion thread. [4] - I started reading the documentation and analyzing the push parser code from the Java parser. The plan for next week: - Continue working on the above. #SAOC2020 [1] https://github.com/adelavais/bison/tree/internationalisation [2] https://github.com/dlang/phobos/pull/7642 [3] https://github.com/adelavais/bison/tree/complete-external-symbols [4] https://forum.dlang.org/post/afiyfqqcauxdrbkrizri forum.dlang.org
Nov 14 2020
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Saturday, 14 November 2020 at 16:03:48 UTC, Adela Vais wrote:
[...]
Hello! As of last week: - I added support for lookahead correction [1]. - I changed the return value of yylex() from TokenKind to YYParser.Symbol [2], and changed the YYLocation's type from class to struct [3], based on the feedback I received [4]. - I wrote a fix regarding the name of the custom error message function from the API [5]. - The PR from dlang/phobos was merged [6]. - The internationalization feature is a work in progress. The plan for next week: - Continue working on internationalization. - Start coding the push parser. #SAOC2020 [1]: https://github.com/akimd/bison/commit/593724366f714e6c0316c51716cc507309ea9030 [2]: https://github.com/akimd/bison/commit/10305f3e941591f1f915402f1c1076024129e624 [3]: https://github.com/akimd/bison/commit/e5854bbddd10d6b834622dc1e6b67c91b9c43f48 [4]: https://forum.dlang.org/post/mailman.6687.1605555766.31109.digitalmars-d puremagic.com [5]: https://github.com/akimd/bison/commit/0e51f6146ad32126b9fce26fb8de34c3d2f727e6 [6]: https://github.com/dlang/phobos/pull/7642
Nov 21 2020
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Saturday, 21 November 2020 at 21:46:08 UTC, Adela Vais wrote:
[...]
Hello! As of last week: - I worked on internationalization. I changed my approach and moved to a code version more similar to the C parser's one. The code seems to be working, but I have some problems with the automatization of the testing process. I sent a WIP[1] for feedback. - I opened a PR[2] in druntime to add libintl.h in the language, as it is often used with locale.h, and it would bring internationalization into D. At the moment, for internationalization, I am using functions from libintl.h and from locale.h (that exists in D, in core.stdc.locale). For the libintl functions, I am using extern(C). - I started working on the push parser. At the moment, I am still working on a pull version, and I am extracting parse() local variables and making them members of the YYParser class. I do this to preserve the parser state between calls when I will move to a push parser. The plan for next week: - continue working on the above. #SAOC2020 [1]: https://github.com/adelavais/bison/tree/internationalisation-gettext [2]: https://github.com/dlang/druntime/pull/3300
Dec 01 2020
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Wednesday, 2 December 2020 at 00:36:20 UTC, Adela Vais wrote:
[...]
Hello! As of last week: - As libint.h is not a standard C header, it can't be added to druntime, so I will not be pursuing my PR[1] anymore. I started working on a dub package [2] instead. - I fixed my setup and now internationalisation works as expected on my system. I sent a patch [3] and, after review, I need to make some changes before it is accepted. - I started working on some old fixes that were requested [4]: style fixes in LALR1 and the documentation, creating an alias for the return value of yylex (now YYParser.Symbol can be referred to as Symbol) and reducing the verbosity of handling the location reporting in the examples and tests. - The Lexer's return value is the Symbol struct. It receives the external token (TokenKind) and transforms it to its internal form (SymbolKind), saving only the latter. The parser had 2 variables, for both forms. I removed the variable for the external token [4], as there was no use for it anymore. The plan for next week: - From now until the end of #SAOC2020 I will work in parallel on the GLR and the remaining tasks from the LALR1, as the user interface (which is the same for both parsers) is almost finished. - I will start working on the GLR, first by analyzing the already existing GLRs (in C and C++) and writing programs that would help me understand the differences between them and LALR1. - I will add type aliases for a few internal types. The documentation specifies that the user should not use objects for which the name starts with "YY", as they are internal implementation details, and the examples use such objects/types (struct YYLocation, for example), which should be available for the user. [1]: https://github.com/dlang/druntime/pull/3300 [2]: https://github.com/adelavais/libint [3]: https://github.com/adelavais/bison/tree/internationalisation-squash [4]: https://github.com/adelavais/bison/tree/fixes-from-changing-yylex-retval
Dec 10 2020
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Thursday, 10 December 2020 at 23:44:13 UTC, Adela Vais wrote:
[...]
Hello! As of last week: - I added type aliases for location, position and semantic value, and sent the patches [1] for review. - I used the existing C examples from the repo and started analyzing the code of the GLR, noting how the values progressed (and differed from the LALR1) throughout the parsing. The plan for the next update: - Continue analyzing the GLRs. After I feel confident enough that I understand C's GLR, I will move to the "old" C++ GLR, which is a wrapper around the C one. - Fix the internationalisation patch. #SAOC2020 [1] https://github.com/adelavais/bison/tree/fixes-from-changing-yylex-retval
Dec 18 2020
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Friday, 18 December 2020 at 21:03:51 UTC, Adela Vais wrote:
[...]
Hello! As of the last update: - I added version identifiers for running the internationalization code. As the D backend uses functions from libintl.h, which is a non-standard C header, the behavior of Bison cannot be to use it by default; the user is able to choose whether they install its prerequisites or not. [1] - I created a dub package [2] for an easier import of the libintl.h functions in Bison and other projects. I also closed the PR [3] from druntime. - All the patches from last week were accepted, and I submitted 3 additional ones: * I removed the getter methods for the semantic value and positions from the Lexer interface, now unnecessary because of the complete symbol approach [4]; * I modified the backend to use all throughout the aliases for location, position, and semantic value [5]; * I removed a comparison inside the parser(), unnecessary after I removed the variable for the external token from this method [6]. The plan for next week: - I will use the libintl dub with Bison. - I will modify the error parsing by eliminating "verbose", a backward compatibility option not needed in D. At the moment, "verbose" and "detailed" options generate the same code. With this change, I will also restructure the way the SymbolKind (the internal types) names are handled when generating the error messages. - Continue working on the push parser. During the next milestone of #SAOC2020 I decided will focus on writing the last remaining parts of the LALR1, and postponing the work on the GLR. While I will not make significant progress on the GLR itself, the 2 parsers share the same user interface, so a lot of my work on the LALR1 will be translated into the GLR. [1]: https://github.com/adelavais/bison/commits/fix-i18n [2]: https://code.dlang.org/packages/libintl [3]: https://github.com/dlang/druntime/pull/3300 [4]: https://github.com/akimd/bison/commit/32bb53870bb9caa2f8de081fdb53cb3540c8ce7a [5]: https://github.com/akimd/bison/commit/27109d9d4ac11665612119344141df0b9f440fbb [6]: https://github.com/akimd/bison/commit/2b4451c4afb8ed90795f8bb7b198996143d769c9
Dec 22 2020
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Tuesday, 22 December 2020 at 21:12:02 UTC, Adela Vais wrote:
[...]
Hello! As of the last update: - I added internationalisation to the LALR1 parser. [1] - Starting from the code generated by the calc example, I modified it to use a push parser. - I removed a name parsing function used for error messages. It was a backward-compatible feature for the other parsers, but D does not have to use it. [2] - I removed some imports that were unnecessary after I added the internationalization. [3] - I removed support for the error parsing option 'verbose'. This is another backward-compatible feature for the other parsers. Before the removal, 'detailed' and 'verbose' options generated the same output. [4] - I created a way of reporting the number of errors found by the parser. [5] - I submitted a patch for fixing a test function. If lookahead correction with trace debugging was used, then the output was getting mixed up with the error message reporting. [6] - I started working on a way to introduce in the calc example the std.conv.parse enhancement I made. [7] The plan for next week: - Continue working on the unfinished tasks from above. For the push parser, the next step will be to see how the push parser works in a program using lookahead correction. [1] https://github.com/akimd/bison/commit/594cae57ca63fc7b3f18dad3d6472e043c626df0 [2] https://github.com/akimd/bison/commit/5bac3ddcee7c6eebb4833d1954f614fced475073 [3] https://github.com/akimd/bison/commit/dc8b16424a89297368dbc66c69787ed0882966f0 [4] https://github.com/akimd/bison/commit/c13b3c02d39edd4d46480b8ee065466d8720939f [5] https://github.com/akimd/bison/commit/8d01c60e9c1aa5975e38602b8ffeb128833a8518 [6] https://github.com/adelavais/bison/tree/local-test [7] https://github.com/adelavais/bison/tree/parse
Jan 07
parent reply Adela Vais <adela.vais99 gmail.com> writes:
On Thursday, 7 January 2021 at 23:30:29 UTC, Adela Vais wrote:
[...]
Hello! As of the last update: - I modified the calc example to use lookahead correction. Using my last week's work on an unmodified calc example, I modified this program to use a push-parser. - I made some small style fixes in the examples and submitted them for review along with the std.conv.parse modification started last week. The enhancement to parse was introduced in Dlang v2.095.0, so we will support both versions for the calc example: the one demonstrating the new parse feature, and the one using the old code. [1] - I started working on token constructors. [2] This feature works in the C++ parser only using the '%define api.value.type variant' option. By default, the user must write a union containing all the different types of values needed for yylex(), and then use them when declaring the tokens: %union { int ival; } [...] %token <ival> NUM "number" Variant allows the user to simply write: %token <int> NUM "number" D does not support this feature, but I plan to introduce it in the near future (not necessarily during SAOC). From yylex(), the user must return a Symbol by calling its constructor. This constructor does not check if the TokenKind and the value correspond in any way, so the user can return Symbol(TokenKind.NUMBER, "I am a string") and the error will be caught much later in the program, and, of course, at runtime. As a solution to this, the C++ parser provides the option of calling the make_<token_name> function (example: make_NUMBER), which calls the Symbol constructor with the correct arguments, and generates compile-time errors. I want to provide this feature for D (modified to be called as Symbol.NUMBER(someNumber)). The C++ parser generates the make_ functions with M4, so all the functions are put in a header file for the user. I want to limit the space occupied by these methods by generating them using D. [3] I created a version that does not use variant, as a proof of concept that this task can be done from D. But once I add support for variant, the token constructors should work without modifications. The plan for the next week of #SAOC2020: - Continue working on the above. Push-parser next steps: * I have to do more correctness and speed tests for both versions (with and without lookahead correction). * Start integrating it in the D backend. [1]: https://github.com/adelavais/bison/tree/calc-example-fix [2]: https://github.com/adelavais/bison/tree/token-constructors [3]: https://github.com/adelavais/bison/blob/token-constructors/tests/testsuite.dir/582/calc.d#L544
Jan 16
parent Adela Vais <adela.vais99 gmail.com> writes:
On Saturday, 16 January 2021 at 22:09:04 UTC, Adela Vais wrote:
[...]
Hello! - I worked with Akim Demaille (one of my mentors, Bison co-maintainer) on adding the '%define api.value.type union'. He provided me with a WIP based on my work on the token constructors, and I continued it with the D code needed. Unlike C++, D allows structs, classes, etc. to be union members, so there is no need for D to implement '%define api.value.type variant'. [1] - I worked on token-constructors. In C++, this directive works only with '%define api.value.type variant'. In D, I managed to add this feature for the default parser, which uses %union, too. For this change, I had to rewrite the Symbol's constructors. I wrote them in D, but they became too complex. In the near future, I want to rewrite them in M4, which will also make them easier to maintain. [1] - I almost finished the push-parser. After this change, the backend will support 3 different options for this directive: pull (by default), push (which I implemented in the past month), and both (which means that the user has access to both parse functions, and the pull-parse method is a wrapper around the push one). I integrated it into the backend and it is going to be reviewed soon. [2] Future plans: - In the near future I will finish my work on the LALR1, which means that I will add the token-constructors and push-parser features to the backend. I also need to rewrite the Symbol's constructors in M4 and to complete the documentation. - During the next few months I will be working on the GLR. It will not be a wrapper around the C's GLR, as initially planned, but a stand-alone parser. The difference between a user's code of a LALR1 and GLR parser is only the presence of the '%glr-parser' declaration, otherwise, the user's code is identical. Given that the user APIs are the same, a lot of my work on the LALR1 will be used by the GLR, too. [1]: https://github.com/adelavais/bison/tree/tok-constr [2]: https://github.com/adelavais/bison/tree/push-parser
Jan 29