www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - SAOC 2024 "Learning about AST Nodes and Semantic Analysis in Compiler

reply Dennis <dennis.onyeka.4 gmail.com> writes:
**Tasks Accomplished**



Before delving into how to decouple AST nodes from semantics 
functions, I looked at how compilers work in general and the 
processes involved.

A typical compiler works this way:
Character Stream=> |**Lexer**| =>Tokens=>|**Parser**| =>AST 
=>|**Semantic Routines**| =>**Intermediate 
Representation(Optimization)** =>|Code Generator| => **Assembly 
Code**

**Character stream:** It is also known as source code or input 
that the programmer wrote.

**Lexer/scanner:** lexing/lexical analysis is the process of 
breaking down a string into meaningful units, the result of this 
process is called tokens.

**Parser:** The job of the parser is to obtain strings of tokens 
from the lexical analyzer and verifies that the string is a 
grammar from the source language. It detects and reports any 
syntax errors and produces a parse tree from which intermediate 
code can be generated.
The output of the parser is an abstract syntax tree (AST).

**Abstract syntax tree(AST):** The AST is like a blueprint that 
represents the structure of my code. It breaks down the code into 
smaller chunks and organizes them in a tree-like structure so 
that the compiler can understand.
An important fact I learnt is that the AST only contains 
information related to analyzing the source text and ignores 
extra syntactic information used for parsing text.
In the dmd compiler codebase, AST nodes are classes and structs, 
while the semantic routines are function tightly coupled within 
the AST classes.

I also learnt about the core differences between an AST tree and 
a parse tree which in summary I would say an AST is focusing on 
the essential elements and their relationships. It captures the 
underlying structure and semantics of the code, excluding 
unnecessary syntactic details while parse tree captures the 
complete structure of the input code, including all the syntactic 
details, such as parentheses, semicolons, and other 
language-specific constructs.

A simple ast node constructed for the practice
https://github.com/dchidindu5/test_demo/blob/main/README.md


**Semantic Analysis:** It is a process in compiling where the 
compiler checks whether the code is logical and meaningful. Its 
major role is type checking to
  confirm whether variable declarations, functions, and control 
flow adheres to the semantics of the language.
So far these processes are the frontend of the dmd compiler.

- To fully understand the directory for the dmd codebase, I used 
this as a guide, which outlines the files and what they perform.
https://github.com/dlang/dmd/blob/master/compiler/src/dmd/README.md
- Looked up into each and every file I would work on.




- Chose the attrib.d AST node file as recommended by my mentor
- I examined the imports and commented out //import 
dmd.dsymbolsem which is a semantic import.
- Built the compiler and errors were encountered.
- Looked at the error messages and moved the affected semantic 
functions to dsymbolsem.d which is a semantic analysis file.
- The affected functions were `newScope` func
- Converted it into a visitor which is a design pattern for 
refactoring. Had trouble mastering it so my mentor sent a 
previous commit on visitors to
[Extract dsymbol.Dsymbol.importAll and turn it into a 
visitor](https://github.com/dlang/dmd/pull/15870/)
- Implemented it on the newScope func.

**First error encountered:**
```
src/dmd/dsymbolsem.d(7494): Error: function `extern (C++) Scope* 
dmd.dsymbolsem.newScopeVisitor.visit(Scope* sc)` does not 
override any function, did you mean to override alias 
`dmd.visitor.Visitor.visit`?
src/dmd/dsymbolsem.d(7494):    	Functions are the only 
declarations that may be overridden
	Functions are the only declarations that may be overridden
```


**First commit-** 
https://github.com/dlang/dmd/commit/c01f76b25b4eb210d92d0ab858dd025ee72bfc6a

**Solution**
My mentor helped me to discover that the method signature in 
newScopeVisitor is not exactly the same as in the base class 
Visitor. That means that the method I'm trying to override does 
not have the exact same name, return type,and parameters.
I worked on it and used the exact name and argument and no return 
type, because it’s a virtual function(does not return any value)


**Challenges**
Although still refactoring the code, working on new errors

**Current commit:**
https://github.com/dlang/dmd/compare/master...dchidindu5:dmd:practice1?expand=1
https://github.com/dlang/dmd/commit/36489c94755a502f7141168ed6e006ef95339062


**Summary:**
This week was focused on building a strong theoretical foundation 
in compiler design, particularly around AST nodes and semantic 
analysis, while also getting acquainted with the practical 
aspects of contributing to the DMD compiler project.





**Resources:**

AST
https://medium.com/basecs/leveling-up-ones-parsing-game-with-asts-d7a6fc2400ff
https://pgrandinetti.github.io/compilers/page/what-is-semantic-analysis-in-compilers/

Visitors
https://www.geeksforgeeks.org/visitor-method-design-patterns-in-c/

D language Book
http://ddili.org/ders/d.en/index.html
Sep 22
parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Sunday, 22 September 2024 at 20:33:22 UTC, Dennis wrote:
 **Tasks Accomplished**



 [...]
Excellent work! please do open a pull request (draft if you feel it is not yet ready) and the rest of the reviewers (and CI; ex your missing newline at the end of file for `dsymbolsem.d`) can give you feedback.
Sep 22
next sibling parent Dennis <dennis.onyeka.4 gmail.com> writes:
On Monday, 23 September 2024 at 03:37:01 UTC, Nicholas Wilson 
wrote:
 On Sunday, 22 September 2024 at 20:33:22 UTC, Dennis wrote:
 **Tasks Accomplished**


 Design:

 [...]
Excellent work! please do open a pull request (draft if you feel it is not yet ready) and the rest of the reviewers (and CI; ex your missing newline at the end of file for `dsymbolsem.d`) can give you feedback.
**Noted**
Sep 23
prev sibling parent Dennis <dennis.onyeka.4 gmail.com> writes:
On Monday, 23 September 2024 at 03:37:01 UTC, Nicholas Wilson 
wrote:
 On Sunday, 22 September 2024 at 20:33:22 UTC, Dennis wrote:
 **Tasks Accomplished**


 Design:

 [...]
Excellent work! please do open a pull request (draft if you feel it is not yet ready) and the rest of the reviewers (and CI; ex your missing newline at the end of file for `dsymbolsem.d`) can give you feedback.
This is the link to the draft PR https://github.com/dlang/dmd/pull/16880
Sep 25