www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - How can we make it easier to experiment with the compiler?

reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
I think there are many that would like to experiment with the 
compiler, but feel discouraged because they don't know how to 
approach it.

I think this is not only comes down to documentation, but also is 
structural. In order to figure out what to improve, the best 
starting point is experienced challenges.

The number one challenge I see is keeping track of DMD as it is 
released with new improvements. Basically reapplying the changes 
made to the experimental branch to the main branch (aka 
"rebasing"?). I suspect that kills many efforts, meaning people 
create a fork, start making changes, but then a new version of 
DMD is released and the fork is left to dry in the sun as 
rebasing is not fun. And well, a hobby that isn't fun, is not a 
good hobby. :-D

Better internal compiler structure would help a lot with this. So 
a prioritized list for me would be:

1. Have a clean separation between frontend and backend, that is 
close to plug-and-play. That would allow people to inject a new 
high level IR between frontend and backend that could open for 
new interesting optimizations, and allow all the compilers to 
benefit from it.

2. Break down source files into smaller units, so that stable 
parts are separated from unstable parts.

3. More encapsulation and separation of responsibility.

4. Switch to a more syntactical AST, possibly enabling AST macros 
in the future without too much hassle, then use an IR for real 
work.

5. Use directories.

6. Improved documentation.

7. Tutorials.

What other items should be on the list?

Which items are feasible in the next 6 months?
May 22
next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Sunday, 23 May 2021 at 06:12:30 UTC, Ola Fosheim Grøstad wrote:
 I think there are many that would like to experiment with the 
 compiler, but feel discouraged because they don't know how to 
 approach it.

 I think this is not only comes down to documentation, but also 
 is structural. In order to figure out what to improve, the best 
 starting point is experienced challenges.

 The number one challenge I see is keeping track of DMD as it is 
 released with new improvements. Basically reapplying the 
 changes made to the experimental branch to the main branch (aka 
 "rebasing"?).
(the is the correct terminology). I suspect this is more of a problem for people that are less familiar with git, which might well also include people wanting to play around with DMD, e.g. GSoC/SAoC students. I know this was the case for me while developing dcompute with the added difficulty of tracking LLVM on top of LDC (which was kept in sync with DMD).
 I suspect that kills many efforts, meaning people create a 
 fork, start making changes, but then a new version of DMD is 
 released and the fork is left to dry in the sun as rebasing is 
 not fun. And well, a hobby that isn't fun, is not a good hobby. 
 :-D
The solution to this is better git skills not so much better compiler skills/knowledge of DMD although a merge conflict in a critical piece of code is always a PiTA. We now have slack/discord for people to ask these kinds of questions, which I'm sure they will get answered if the are trying to do something interesting or fix an annoying problem.
 Better internal compiler structure would help a lot with this. 
 So a prioritized list for me would be:
Oh god yes. the directory structure, or rather lack thereof, is a really dire repellant for newcomers. I cannot understate this. 173 files in dmd/src/dmd is _completely_ unacceptable, however Walter seems to like it this way and has struck down PRs trying to remediate this in the past (because it doesn't suit his editor configuration? or something like that). We should have at least the following folders: ast: ast_node, dsymbol, aggregate, et al semantic: semantic2, semantic3, ob, nogc, safe et al visitors: parsetimevisitor, permissivevisitor, visitor et al glue (backend interfacing files): lib[.*],scan[.*] toir, s2ir, e2ir et al lex: lexer, tokens, identifier, id utf et al headers: (alas still needed until dtoh works well enough and has been stable enough releases for GDC to bootstrap)
 1. Have a clean separation between frontend and backend, that 
 is close to plug-and-play. That would allow people to inject a 
 new high level IR between frontend and backend that could open 
 for new interesting optimizations, and allow all the compilers 
 to benefit from it.
see also https://mlir.llvm.org, I had a GSoC student try to do something with this, I don't think it got to a usable state. but this is about as a state of the art as it gets and a very interesting research direction. Rust and swift use multiple levels of IRs. Also from what I understand, the pointer and liveness analysis as part of DIP 1000/1040/(other walter DIPs?) does something like this, but in a hacked up, nonstandard manner.
 2. Break down source files into smaller units, so that stable 
 parts are separated from unstable parts.
Urgh. Dealing with 10000 line files and 1000 line functions is such a drain on trying to get stuff done (looking at you expressionsem.d). However this needs to be combined with directories/packages or it will not improve the situation.
 3. More encapsulation and separation of responsibility.

 4. Switch to a more syntactical AST, possibly enabling AST 
 macros in the future without too much hassle, then use an IR 
 for real work.
That is a noble goal, but would require _a lot_ of changes both in DMD and in downstream LDC and GDC, and tools that consume AST that expect it to be complete. not to mention designing said IR, redoing semantic analysis/transformations to work with it.
 5. Use directories.
Yes!!! sooo much yes! see above.
 6. Improved documentation.

 7. Tutorials.

 What other items should be on the list?
try to make sure we use standard terminology for things so that people can reliably search for things
 Which items are feasible in the next 6 months?
Directories.
May 23
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/23/2021 7:25 PM, Nicholas Wilson wrote:
 Directories.
module" that leaves one with nowhere to start. We currently have: dmd dmd/root dmd/backend I regularly fend off attempts to have dmd/root import files from dmd, and dmd/backend import files from dmd. I recently had to talk someone out of having dmd/backend import files from dmd/root. In other words, a failure of encapsulation. Let's look at one example, picked more or less because I've looked at it recently, dmd/target.d. The reason for its existence is to abstract target information. It's imports are: import dmd.argtypes_x86; import dmd.argtypes_sysv_x64; import core.stdc.string : strlen; import dmd.cond; import dmd.cppmangle; import dmd.cppmanglewin; import dmd.dclass; import dmd.declaration; import dmd.dscope; import dmd.dstruct; import dmd.dsymbol; import dmd.expression; import dmd.func; import dmd.globals; import dmd.id; import dmd.identifier; import dmd.mtype; import dmd.statement; import dmd.typesem; import dmd.tokens : TOK; import dmd.root.ctfloat; import dmd.root.outbuffer; import dmd.root.string : toDString; If I want to understand the code, I have to understand half of the rest of the compiler. On a more abstract level, why on earth would a target abstraction need to know about AST nodes? At least half of these imports shouldn't be here, and if they are, the code needs to be redesigned. Recently I needed some target information in the ImportC lexer, and it would have been so easy to just import dmd.target. But then that drags along all the imports that I've really tried to avoid importing into the lexer. Iain came up with a clever solution to use a template parameter. Note that Phobos suffers terribly from this disease (everything ultimately imports everything else), which makes it very hard to understand and debug. Fixing this is not easy, it requires a lot of hard thinking about what a module *really* needs to do. But each success at eliminating an import makes it more understandable. Creating a false hierarchy (an implied relationship that is instantly defeated by the imports) of files won't fix it. A good rule of thumb is: *** Never import a file from an uplevel directory *** Import sideways and down, never up.
May 23
next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 24 May 2021 at 02:56:14 UTC, Walter Bright wrote:
 On 5/23/2021 7:25 PM, Nicholas Wilson wrote:
 Directories.
every other module" that leaves one with nowhere to start. We currently have: dmd dmd/root dmd/backend I regularly fend off attempts to have dmd/root import files from dmd, and dmd/backend import files from dmd. I recently had to talk someone out of having dmd/backend import files from dmd/root. In other words, a failure of encapsulation.
This is a _completely_ orthogonal problem. The symptoms are completely orthogonal, although easily confused: failure of encapsulation makes _reasoning_ about the _interconnectedness_ of code difficult, failure to package makes _exploration_ and _enumeration_ of code (files, functions, classes, data structures) more difficult. The solutions, however are cross enabling: we can implement and _enforce_ policies like say "AST node implementing modules should not import semantic analysis modules" with reasonable confidence iff we have all the AST modules in one place and all the semantic analysis modules in one place. The symptoms of failure of encapsulation I'm going to assume you are well aware of. The symptoms of failure to use packages are as follows: * the sheer number of filed in src/dmd make it impossible to remember what each file is for. This problem is compounded by the fact that many files have names that do not describe well what they do _especially_ to newcomers. Principle offending example `ob`. Compare with names like `filecache`. * it is impossible to determine at a glance what files are related to each other: is `foreachvar.d` an AST node?, what about `dcast.d`? (No and No) Whats the difference between `glue.d` and `gluelayer.d`? is `visitor.d`, `transitivevisitor.d`, `strictvisitor.d` `parsetimevisitor.d` and `permissivevisitor.d` a complete list of the module public visitor modules? (No) Which of `cond.d` and `staticcond.d` is the AST node for a static condition? What does the other file do? (`cond.d`, semantic analysis) What files do semantic analysis? Which files declare AST nodes? Which files interface with the backend (and subsequently are not part of LDC or GDC)? Where is DMD's entry point?
 snip example
 Fixing this is not easy, it requires a lot of hard thinking 
 about what a module *really* needs to do. But each success at 
 eliminating an import makes it more understandable.
Fixing the lack of directory issue requires only to think about what a module _is_ i.e. what package it belongs to: driver/frontend (mars, errors etc) , lexer group (lex, parse, tokens etc), ast, semantic analysis, backend interfacing, backend, root.
 Creating a false hierarchy (an implied relationship that is 
 instantly defeated by the imports)
You cannot seriously tell me with a straight face that e.g. AST, is not a hierarchy and should not be grouped together.
 of files won't fix [failure to encapsulate].
Indeed is fixes a different problem, but it makes fixing failure to encapsulate much easier.
 A good rule of thumb is:

     *** Never import a file from an uplevel directory ***

 Import sideways and down, never up.
Indeed. However you can't to much of that with just
   dmd
   dmd/root
   dmd/backend
May 23
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/24/21 1:15 AM, Nicholas Wilson wrote:
 Indeed is fixes a different problem, but it makes fixing failure to 
 encapsulate much easier.
I think the best first step is to add `private` to the codebase. This is cheap to get into and informs any future refactoring. I find it confusing that people push for massive reorganization for years, but won't bother to create 50 line PRs that add `private` appropriately.
May 23
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 24 May 2021 at 06:00:06 UTC, Andrei Alexandrescu wrote:
 refactoring. I find it confusing that people push for massive 
 reorganization for years, but won't bother to create 50 line 
 PRs that add `private` appropriately.
Yes, but I'd like this thread to be more forward-looking and focus more on making compiler hacking a "fun hobby" rather than being one of should-have-in-the-past. The key "sociological" point one could take away from this is: 1. Do boring chores together, because that makes them less unfun. 2. Then leave the smaller fun things to individuals that have sporadic activity (busy life or weak affiliation with the project). 3. Encourage a sense of autonomous ownership, and experimental forks is a very good way to achieve that. Research on school children shows that a sense of autonomous ownership of the task is a good motivation aspect. (Not sufficient, but close to necessary.) (To do unfun chores together we need a plan and some sort of model or map).
May 24
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/23/2021 10:15 PM, Nicholas Wilson wrote:
 This is a _completely_ orthogonal problem.
It's the same problem. D's support for modules and packages is literally designed around matching the hierarchy of the source files. Shuffling files around accomplishes nothing when every module imports every other module.
May 23
next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 24 May 2021 at 06:58:48 UTC, Walter Bright wrote:
 On 5/23/2021 10:15 PM, Nicholas Wilson wrote:
 This is a _completely_ orthogonal problem.
It's the same problem. Shuffling files around accomplishes nothing when every module imports every other module.
Did you read _literally nothing else_ that I wrote? Let me quote myself again so that you don't miss it:
 The symptoms are completely orthogonal, although easily 
 confused: failure of encapsulation makes _reasoning_ about the 
 _interconnectedness_ of code difficult, failure to package 
 makes _exploration_ and _enumeration_ of code (files, 
 functions, classes, data structures) more difficult.
Putting the modules into packages fixes EXACTLY the problem of horrible experience with exploration and enumeration. It explicitly does not fix failure of encapsulation because it is a _completely_ orthogonal set of symptoms.
 D's support for modules and packages is literally designed 
 around matching the hierarchy of the source files.
Yes, and?
May 24
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 1:35 AM, Nicholas Wilson wrote:
 Did you read _literally nothing else_ that I wrote?
I read it, my response was to the entire posting.
May 24
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 2:39 AM, Walter Bright wrote:
 On 5/24/2021 1:35 AM, Nicholas Wilson wrote:
 Did you read _literally nothing else_ that I wrote?
I read it, my response was to the entire posting.
To be a little clearer, if the files are all merely reshuffled into various packages, then they all violate the rule: *** Never import a file from an uplevel directory *** and understanding is not increased at all. And it isn't even just an uplevel directory, it's up then sideways then down. There *is*, however, documentation on the dmd source files: https://dlang.org/phobos/index.html Click on "dmd" on the left. For anyone wishing to get a tour of the files and what they do, this is the place. Adding better Ddoc comments to the source files will help with this, of course.
May 24
parent reply Tobias Pankrath <tobias+dlang pankrath.net> writes:
On Monday, 24 May 2021 at 10:04:49 UTC, Walter Bright wrote:
 On 5/24/2021 2:39 AM, Walter Bright wrote:
 On 5/24/2021 1:35 AM, Nicholas Wilson wrote:
 Did you read _literally nothing else_ that I wrote?
I read it, my response was to the entire posting.
To be a little clearer, if the files are all merely reshuffled into various packages, then they all violate the rule: *** Never import a file from an uplevel directory *** and understanding is not increased at all. And it isn't even just an uplevel directory, it's up then sideways then down.
Putting the files into directories would make those violations obvious and serve as documentation on how the deps should be. Than all others can start work towards that goal.
May 24
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 3:41 AM, Tobias Pankrath wrote:
 Putting the files into directories would make those violations obvious and
 serve as documentation on how the deps should be. Than all others can
 start work towards that goal.
When you've got a rusty car, it sure is tempting to just paint it. But it's all for naught if the real work, the hard work, the boring work -repairing the rust- is not done. Just shooting color out of the sprayer is fun and looks great. But it avoids accomplishing anything worthwhile. It's an Illusion of Progress.
May 25
parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Wednesday, 26 May 2021 at 00:31:48 UTC, Walter Bright wrote:
 On 5/24/2021 3:41 AM, Tobias Pankrath wrote:
 Putting the files into directories would make those violations 
 obvious and
 serve as documentation on how the deps should be. Than all 
 others can
 start work towards that goal.
When you've got a rusty car, it sure is tempting to just paint it. But it's all for naught if the real work, the hard work, the boring work -repairing the rust- is not done. Just shooting color out of the sprayer is fun and looks great. But it avoids accomplishing anything worthwhile
That is correct for the analogy you used, however that is a false analogy because...
 It's an Illusion of Progress.
...Illusions of Progress provide no actual utility, hence illusions. Packaging DMD otoh, provides _lots_ of utility: exploration and navigation is greatly eased, moreso for people who are less familiar with the codebase.
May 25
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 26 May 2021 at 01:25:23 UTC, Nicholas Wilson wrote:
 ...Illusions of Progress provide no actual utility, hence 
 illusions.
 Packaging DMD otoh, provides _lots_ of utility: exploration and 
 navigation is greatly eased, moreso for people who are less 
 familiar with the codebase.
For what it's worth, I've found that exploration and navigation of DMD code becomes much more manageable with an editor that supports "goto definition"--ideally with history, so you can jump backwards too. I use vim's built-in ctags support for this, but I imagine most popular code editors can be configured to do something similar.
May 25
next sibling parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 26 May 2021 at 01:38:59 UTC, Paul Backus wrote:
 On Wednesday, 26 May 2021 at 01:25:23 UTC, Nicholas Wilson 
 wrote:
 ...Illusions of Progress provide no actual utility, hence 
 illusions.
 Packaging DMD otoh, provides _lots_ of utility: exploration 
 and navigation is greatly eased, moreso for people who are 
 less familiar with the codebase.
For what it's worth, I've found that exploration and navigation of DMD code becomes much more manageable with an editor that supports "goto definition"--ideally with history, so you can jump backwards too. I use vim's built-in ctags support for this, but I imagine most popular code editors can be configured to do something similar.
I dont know if it is funny or sad that a thread about enabeling more compiler experiments ends with a note on adding basic IDE features to arcane editors...
May 25
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Wednesday, 26 May 2021 at 05:39:19 UTC, Ola Fosheim Grostad 
wrote:
 On Wednesday, 26 May 2021 at 01:38:59 UTC, Paul Backus wrote:
 On Wednesday, 26 May 2021 at 01:25:23 UTC, Nicholas Wilson 
 wrote:
 [...]
For what it's worth, I've found that exploration and navigation of DMD code becomes much more manageable with an editor that supports "goto definition"--ideally with history, so you can jump backwards too. I use vim's built-in ctags support for this, but I imagine most popular code editors can be configured to do something similar.
I dont know if it is funny or sad that a thread about enabeling more compiler experiments ends with a note on adding basic IDE features to arcane editors...
😅 You can use VS tho = bliss
May 26
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 26 May 2021 at 08:35:04 UTC, Imperatorn wrote:
 😅

 You can use VS tho = bliss
Ok, we choose to laugh! 😅 (thanks, that helped on a rainy day :-)
May 26
prev sibling next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 26/05/2021 1:38 PM, Paul Backus wrote:
 For what it's worth, I've found that exploration and navigation of DMD 
 code becomes much more manageable with an editor that supports "goto 
 definition"--ideally with history, so you can jump backwards too.
It also becomes significantly more manageable when you have things like the parser, ast, semantic analysis and backend all grouped together in different projects of your solution! This way you can ignore significant chunks of the compiler which are irrelevant to what you are working on. Imagine if there was a way to standardize this experience for everyone! If only...
May 26
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 26 May 2021 at 12:37:58 UTC, rikki cattermole wrote:
 Imagine if there was a way to standardize this experience for 
 everyone! If only...
The experience is irrelevant in this context. Partitioning is a necessary first step for decoupling. Let us stop pretending this is a matter of taste. It is not. It is a matter of basic Software Engineering (the profession).
May 26
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 27/05/2021 1:04 AM, Ola Fosheim Grostad wrote:
 On Wednesday, 26 May 2021 at 12:37:58 UTC, rikki cattermole wrote:
 Imagine if there was a way to standardize this experience for 
 everyone! If only...
The experience is irrelevant in this context. Partitioning is a necessary first step for decoupling. Let us stop pretending this is a matter of taste. It is not. It is a matter of basic Software Engineering (the profession).
Agreed. When the directory structure does not match the concepts and complexity involved in a project, it is a symptom of much larger issues from my experience. Fixing it, makes other issues much more visible to the point where they: HAVE TO BE FIXED RIGHT NOW. Without the rearranging to match concepts and complexities in the file structure, it is a lot harder to properly scope modules to doing one and only one thing.
May 26
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 26 May 2021 at 13:51:38 UTC, rikki cattermole wrote:
 Fixing it, makes other issues much more visible to the point 
 where they: HAVE TO BE FIXED RIGHT NOW.

 Without the rearranging to match concepts and complexities in 
 the file structure, it is a lot harder to properly scope 
 modules to doing one and only one thing.
Exactly. The core principle for anything that has to do with computers at basically any level is surprisingly simple: Divide and Conquer.
May 26
next sibling parent reply zjh <fqbqrr 163.com> writes:
but W.B. say you are not writing d compiler thousand hours.
What you say doesn't count.
May 26
parent Greg Strong <mageofmaple protonmail.com> writes:
On Wednesday, 26 May 2021 at 14:16:02 UTC, zjh wrote:
 but W.B. say you are not writing d compiler thousand hours.
 What you say doesn't count.
By that logic, what you say doesn't count either. Yet you post anyway.
May 26
prev sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 27/05/2021 1:58 AM, Ola Fosheim Grostad wrote:
 On Wednesday, 26 May 2021 at 13:51:38 UTC, rikki cattermole wrote:
 Fixing it, makes other issues much more visible to the point where 
 they: HAVE TO BE FIXED RIGHT NOW.

 Without the rearranging to match concepts and complexities in the file 
 structure, it is a lot harder to properly scope modules to doing one 
 and only one thing.
Exactly. The core principle for anything that has to do with computers at basically any level is surprisingly simple: Divide and Conquer.
I actually have an article on code quality and how I measure it. https://cattermole.co.nz/article/code_qual But the important list I use (for which dmd fails completely at): 1. Organized in a way that reflects the idea/concept. 2. Seperate concepts, seperate areas (files/areas of a file). 3. Grouping of resource usage 4. Depth from purpose 5. Naming 1, 2 and 4 is what this part of the thread is all about. 5 is stuff like what is STC? Variable names ext. 3. ok just look at the filename of this. https://github.com/dlang/dmd/blob/master/src/dmd/libelf.d or... https://github.com/dlang/dmd/blob/master/src/dmd/libomf.d I hope I don't need to say why these files fail that test when they are in the same directory as: https://github.com/dlang/dmd/blob/master/src/dmd/doc.d
May 26
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 26 May 2021 at 14:21:06 UTC, rikki cattermole wrote:
 I actually have an article on code quality and how I measure it.

 https://cattermole.co.nz/article/code_qual
I like your motto: Code is documentation!
 But the important list I use (for which dmd fails completely 
 at):

 1. Organized in a way that reflects the idea/concept.
 2. Seperate concepts, seperate areas (files/areas of a file).
 3. Grouping of resource usage
 4. Depth from purpose
 5. Naming

 1, 2 and 4 is what this part of the thread is all about.
But, my main issues are not these, these are symptoms. My main concerns are the consequenses of the ubderlying cause for these symptoms. The real challenge is not having a clean way of introducing new components ( like an IR between front and backend or a new solver related to the type system ). There is missing an analysis of where the compiler should allow extensions (compile time) with ease. That prevents experimentation, and lowers interest in participation. LDC has achieved a lot and it is, I think, because they could specialize on THEIR piece, and take pride in maintaining it in a (I can only assume) busy life. They can also make their own decisions, so there is a sense of autonomous control, which is a high motivation factor (generally speaking).
May 26
next sibling parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 26 May 2021 at 19:13:42 UTC, Ola Fosheim Grostad 
wrote:
 But, my main issues are not these, these are symptoms. My main 
 concerns are the consequenses of the ubderlying cause for these 
 symptoms. The real challenge is not having a clean way of 
 introducing new components ( like an IR between front and
I guess another way of putting it is that it is ok that some authors want to maintain and fix bugs in their own component, so that component can have little documentation and so on, if there is an architecture to support having components! (Which is a desirable quality because it allows a sense of autonomous ownership etc.)
May 26
prev sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 27/05/2021 7:13 AM, Ola Fosheim Grostad wrote:
 On Wednesday, 26 May 2021 at 14:21:06 UTC, rikki cattermole wrote:
 I actually have an article on code quality and how I measure it.

 https://cattermole.co.nz/article/code_qual
I like your motto: Code is documentation!
Thanks!
 But the important list I use (for which dmd fails completely at):

 1. Organized in a way that reflects the idea/concept.
 2. Seperate concepts, seperate areas (files/areas of a file).
 3. Grouping of resource usage
 4. Depth from purpose
 5. Naming

 1, 2 and 4 is what this part of the thread is all about.
But, my main issues are not these, these are symptoms. My main concerns are the consequenses of the ubderlying cause for these symptoms. The real challenge is not having a clean way of introducing new components ( like an IR between front and backend or a new solver related to the type system ). There is missing an analysis of where the compiler should allow extensions (compile time) with ease.
Yeah, although I'll stay out of the whole IR thing as I'm no where near thinking about something like that.
May 26
parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 26 May 2021 at 20:03:28 UTC, rikki cattermole wrote:
 Yeah, although I'll stay out of the whole IR thing as I'm no 
 where near thinking about something like that.
Ok, one simple way is to just have a standard high level intermediary language like SIL for Swift. Then authors can build auxillary IRs that point back to the language nodes if needed. Then run the algorithm on the auxillary IR, then modify the language node graph accordingly. Then drop the auxillary IR and move on to the next stage. In the end the backend receives whatever is left of the intermediate datastructure. Another option could be to have a mediating layer between front and backend. The default noop layer could be designed such that the optimizer will remove most of the overhead. Then people can replace the mediating layer with their own datastructure that obtains what it needs from the frontend, does something with it, and pass everything the backend needs down to the backend. Probably more options. I have no strong opinion of what is best. Just settle for something that puts a clean separation layer between front and backend without loosing sought information.
May 26
prev sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Wednesday, 26 May 2021 at 01:38:59 UTC, Paul Backus wrote:
 For what it's worth, I've found that exploration and navigation 
 of DMD code becomes much more manageable with an editor that 
 supports "goto definition"--ideally with history, so you can 
 jump backwards too.
That's if you've got a starting point in the source code, then yeah it is a lot better (thx to all who did work on LSP related projects for D), however it won't help when you try to find some functionality which you don't know in what module is.
May 26
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 27/05/2021 5:28 AM, Alexandru Ermicioi wrote:
 On Wednesday, 26 May 2021 at 01:38:59 UTC, Paul Backus wrote:
 For what it's worth, I've found that exploration and navigation of DMD 
 code becomes much more manageable with an editor that supports "goto 
 definition"--ideally with history, so you can jump backwards too.
That's if you've got a starting point in the source code, then yeah it is a lot better (thx to all who did work on LSP related projects for D), however it won't help when you try to find some functionality which you don't know in what module is.
There is a file list (somewhere, I'm not looking for it) that tells you what is what. But a proper directory structure + good header comments can do this even better :D
May 26
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/25/2021 6:25 PM, Nicholas Wilson wrote:
 ...Illusions of Progress provide no actual utility, hence illusions.
 Packaging DMD otoh, provides _lots_ of utility: exploration and navigation is 
 greatly eased, moreso for people who are less familiar with the codebase.
Creating a FILES.md file, the content of which is each source file with a brief description, will accomplish the same thing.
May 26
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/26/2021 8:06 PM, Walter Bright wrote:
 On 5/25/2021 6:25 PM, Nicholas Wilson wrote:
 ...Illusions of Progress provide no actual utility, hence illusions.
 Packaging DMD otoh, provides _lots_ of utility: exploration and navigation is 
 greatly eased, moreso for people who are less familiar with the codebase.
Creating a FILES.md file, the content of which is each source file with a brief description, will accomplish the same thing.
I see this has already been done: https://github.com/dlang/dmd/blob/master/src/dmd/README.md It's a bit out of date, files like typesem.d are missing.
May 26
parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Thursday, 27 May 2021 at 04:53:12 UTC, Walter Bright wrote:
 On 5/26/2021 8:06 PM, Walter Bright wrote:
 On 5/25/2021 6:25 PM, Nicholas Wilson wrote:
 ...Illusions of Progress provide no actual utility, hence 
 illusions.
 Packaging DMD otoh, provides _lots_ of utility: exploration 
 and navigation is greatly eased, moreso for people who are 
 less familiar with the codebase.
Creating a FILES.md file, the content of which is each source file with a brief description, will accomplish the same thing.
I see this has already been done: https://github.com/dlang/dmd/blob/master/src/dmd/README.md It's a bit out of date, files like typesem.d are missing.
I know, I wrote the equivalent for the backend. And no that does not accomplish the _same_ thing, not even remotely close. The fact you only just found out about it shows that: a) you have never used the README, and b) know your way around well enough to not need it, which shows the implication that c) you have no perspective from those who would have use for either a README or better structured files and know nothing about the relative benefits of either of them. Yes, a README is strictly better than nothing. It does not substitute for having organised files. Neither does well organised files substitute for a lack of README.
May 26
next sibling parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Thursday, 27 May 2021 at 05:36:55 UTC, Nicholas Wilson wrote:
 The fact you only just found out about it shows that:
 a) you have never used the README, and
 b) know your way around well enough to not need it, which shows 
 the implication that
 c) you have no perspective from those who would have use for 
 either a README or better structured files and know nothing 
 about the relative benefits of either of them.
d) See no value in spending effort on designing an architecture. e) Does not see the value of having others add new features. So basically the project is following this structure: On person runs ahead adding features faster than they are completed. Helping out means walking around grepping for things to fix. If that is it, then we might as well close the thread and conclude that dmd is a hobby project. Which is fair enough. Just don't pretend it aspires to be more than that, because that takes reorganization and restructuring.
May 26
next sibling parent reply Mathias LANG <geod24 gmail.com> writes:
On Thursday, 27 May 2021 at 06:08:53 UTC, Ola Fosheim Grostad 
wrote:
 If that is it, then we might as well close the thread and 
 conclude that dmd is a hobby project.

 Which is fair enough. Just don't pretend it aspires to be more 
 than that, because that takes reorganization and restructuring.
Don't forget about the many contributors that have invested thousands of hours to understand and improve the code base :) While I disagree with many of Walter's arguments here, a refactoring has a lower barrier of entry than a bugfix, and is more prone to be subjective. It isn't always subjective, as, just like a bug fix, a refactoring *can* come with a test case. For example, if someone writes a tool that uses DMD as a library, it will be a solid ground to push a change that could otherwise be perceived as subjective. We had work done on trying to make DMD work as a library before, and all we ended up with was a massive amount of duplication. But when such refactoring are driven by use case (e.g. VisualD's usage of DMD), they are easy to justify and accept. Now to talk about what can be done to improve the DMD codebase, it's fairly obvious: ELIMINATE ALL CASTS. But not by replacing `cast(XXX)e` with `e.isXXX()`, but by actually using proper encapsulation. What I mean is that instead of switching on types, like this: ```D if (auto tf = t.isTypeFunction()) (cast(FunctionDeclaration)t.sym).something(); else if (auto td = t.isTypeDelegate()) (cast(FunctionDeclaration)(cast(TypeFunction)t).sym).something(); else // Something else ``` We should switch on capabilities. We currently "suffer" from abstraction: almost everything is a `Type`, `Dsymbol`, `Expression`, etc... but then when we do semantic we have to `cast` or `isXXX` it all over the place. Functions that are only supposed to accept `CallExp` or `BinExp` end up accepting an `Expression` because somewhere, a field that should be `CallExp` or `BinExp` is stored as an `Expression`, or because we don't have the proper return type on a function, etc... There are simple areas where one can start, for example making all aggregate have the most specialized type possible: `TypeDelegate.nextOf()` should return a `TypeFunction`, not a `Type`. `FunctionDeclaration.type()` should be a property that gives you a `TypeFunction`, not a `Type`, etc...
May 27
next sibling parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Thursday, 27 May 2021 at 07:14:25 UTC, Mathias LANG wrote:
 On Thursday, 27 May 2021 at 06:08:53 UTC, Ola Fosheim Grostad 
 wrote:
 If that is it, then we might as well close the thread and 
 conclude that dmd is a hobby project.

 Which is fair enough. Just don't pretend it aspires to be more 
 than that, because that takes reorganization and restructuring.
Don't forget about the many contributors that have invested thousands of hours to understand and improve the code base :)
Yes, let us not forget that they wasted many unproductive hours on trying to understand... Let us put a number on that in dollars...
 While I disagree with many of Walter's arguments here, a 
 refactorning has a lower barrier of entry than a bugfix, and is 
 more prone to be subjective.
I dont want to enter that territory, neither bugfixes or micro level refactoring has much of an impact on people wanting to experiment. To enable that we have to look at the macro level and establish stable well designed interfaces so that changes in compiler internals have small impact on experimental components. Well designed interfaces can hook up to internals you want to change at a later stage, but now you do at least not get more dependecies tied to things you want to replace. So in essence, if there is a mess, first step is not to clean up the mess (could be too costly), but to hide it so that people stop depending on it. My impression is that Walter is arguing that everything should be cleaned up first. That is not realistic.
 Now to talk about what can be done to improve the DMD codebase, 
 it's fairly obvious: ELIMINATE ALL CASTS. But not by replacing 
 `cast(XXX)e` with `e.isXXX()`, but by actually using proper 
 encapsulation.
Does not enable experimentation. Only a good macro level architecture enables experimentation. The internals can to some extent be a mess, with little impact, it does not matter unless you want to change templating or type system features. Many interesting experiments can be done by combining parser mods, runtime mods and post frontend mods. Other interesting improvements can be done if one identifies areas in the compiler that can be isolated from the whole and where new features could be enabled. I suspect this is needed to get a solver that provides proper type unification, but I havent looked at this... Some stuff being messy is not the big picture issue. The big picture is to get clean points in the codebase where you can inject your own component. And to put those injection points where they have most potential for enabling experimentation. An easy first step is to put a separation layer between frontend and backend.
May 27
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/27/2021 12:14 AM, Mathias LANG wrote:
 Now to talk about what can be done to improve the DMD codebase, it's fairly 
 obvious: ELIMINATE ALL CASTS. But not by replacing `cast(XXX)e` with 
 `e.isXXX()`, but by actually using proper encapsulation.
 
 What I mean is that instead of switching on types, like this:
 ```D
 if (auto tf = t.isTypeFunction())
      (cast(FunctionDeclaration)t.sym).something();
 else if (auto td = t.isTypeDelegate())
 (cast(FunctionDeclaration)(cast(TypeFunction)t).sym).something();
 else
      // Something else
 ```
The isXXX() functions also make for safe casting. Your example would be: if (auto tf = t.isTypeFunction()) tf.sym.something(); else if (auto td = t.isTypeDelegate()) t.isTypeFunction().sym.isFunctionDeclaration().something();
 There are simple areas where one can start, for example making all aggregate 
 have the most specialized type possible: `TypeDelegate.nextOf()` should return
a 
 `TypeFunction`, not a `Type`. `FunctionDeclaration.type()` should be a
property 
 that gives you a `TypeFunction`, not a `Type`, etc...
FunctionDeclaration.type() can also give you a TypeError.
May 27
parent reply Basile B. <b2.temp gmx.com> writes:
On Thursday, 27 May 2021 at 08:41:49 UTC, Walter Bright wrote:
 On 5/27/2021 12:14 AM, Mathias LANG wrote:
 Now to talk about what can be done to improve the DMD 
 codebase, it's fairly obvious: ELIMINATE ALL CASTS. But not by 
 replacing `cast(XXX)e` with `e.isXXX()`, but by actually using 
 proper encapsulation.
 
 What I mean is that instead of switching on types, like this:
 ```D
 if (auto tf = t.isTypeFunction())
      (cast(FunctionDeclaration)t.sym).something();
 else if (auto td = t.isTypeDelegate())
 (cast(FunctionDeclaration)(cast(TypeFunction)t).sym).something();
 else
      // Something else
 ```
The isXXX() functions also make for safe casting.
And this is actually the only way to dyncast cast nodes as DMD AST is extern(C++)... But TBH I think that all the isXXX family of functions should be free functions, not members funcs. All these isXXX calls are virtuals but they dont need to (although often devirtualized).
May 27
next sibling parent Basile B. <b2.temp gmx.com> writes:
On Thursday, 27 May 2021 at 08:50:32 UTC, Basile B. wrote:
 On Thursday, 27 May 2021 at 08:41:49 UTC, Walter Bright wrote:
 On 5/27/2021 12:14 AM, Mathias LANG wrote:
 Now to talk about what can be done to improve the DMD 
 codebase, it's fairly obvious: ELIMINATE ALL CASTS. But not 
 by replacing `cast(XXX)e` with `e.isXXX()`, but by actually 
 using proper encapsulation.
 
 What I mean is that instead of switching on types, like this:
 ```D
 if (auto tf = t.isTypeFunction())
      (cast(FunctionDeclaration)t.sym).something();
 else if (auto td = t.isTypeDelegate())
 (cast(FunctionDeclaration)(cast(TypeFunction)t).sym).something();
 else
      // Something else
 ```
The isXXX() functions also make for safe casting.
And this is actually the only way to dyncast cast nodes as DMD AST is extern(C++)... But TBH I think that all the isXXX family of functions should be free functions, not members funcs. All these isXXX calls are virtuals but they dont need to (although often devirtualized).
Other advantage of module scope isXXX functions is that the base Expression node would not need to know about all the derived. We would have a real astbase module with just Type, Statement, DSymbol, Expression. The isXXXX would be in the module that declare the XXXX class.
May 27
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/27/2021 1:50 AM, Basile B. wrote:
 All these isXXX calls are virtuals but they 
 dont need to (although often devirtualized).
They're all `final` meaning not virtual. The intent is them being inlined.
May 27
prev sibling parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Thursday, 27 May 2021 at 06:08:53 UTC, Ola Fosheim Grostad 
wrote:
 Which is fair enough. Just don't pretend it aspires to be more 
 than that, because that takes reorganization and restructuring.
And let me add thar reorganization and restructuring is not a sign of failure. It is a sign of professionalism. Well run projects have this built into the process so that these important activities are not put aside or put on hold.
May 27
prev sibling next sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Thursday, 27 May 2021 at 05:36:55 UTC, Nicholas Wilson wrote:
 On Thursday, 27 May 2021 at 04:53:12 UTC, Walter Bright wrote:
 On 5/26/2021 8:06 PM, Walter Bright wrote:
 [...]
I see this has already been done: https://github.com/dlang/dmd/blob/master/src/dmd/README.md It's a bit out of date, files like typesem.d are missing.
I know, I wrote the equivalent for the backend. And no that does not accomplish the _same_ thing, not even remotely close. The fact you only just found out about it shows that: a) you have never used the README, and b) know your way around well enough to not need it, which shows the implication that c) you have no perspective from those who would have use for either a README or better structured files and know nothing about the relative benefits of either of them. Yes, a README is strictly better than nothing. It does not substitute for having organised files. Neither does well organised files substitute for a lack of README.
and adding to that by citing Walters message "It's a bit out of date, files like typesem.d are missing." shows the inherent problem of a separate README file. Organized files are self documenting, readme's are not.
May 26
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/27/21 1:36 AM, Nicholas Wilson wrote:
 Yes, a README is strictly better than nothing. It does not substitute 
 for having organised files. Neither does well organised files substitute 
 for a lack of README.
Razvan found in https://github.com/dlang/dmd/pull/12560 a number of imports of backend modules that shouldn't be there. I wonder if this convention could be enforced by using package-level protection in backend (and elsewhere) in such a way that would have made it impossible for those imports to work. That would be a good way forward because as it goes (and went in the past) the discussion remains sterile. Once there is a demonstrable improvement brought about by packages and (self-evidently) you can't get package-level protection without packages, the case will be much easier to make. The overarching point is that better modularization should predate, inform, and motivate division in packages, not follow it.
May 27
prev sibling parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 24 May 2021 at 09:39:42 UTC, Walter Bright wrote:
 On 5/24/2021 1:35 AM, Nicholas Wilson wrote:
 Did you read _literally nothing else_ that I wrote?
I read it, my response was to the entire posting.
Then I can only conclude you have absolutely no perspective for someone who has little or no experience with the DMD codebase. I have had multiple GSoC/SAoC students and I have spoken to perhaps two dozen people at various dconfs who I consider to be well versed in D and all of them have complained that the lack of organisation of the files in DMD to be a significant hinderance to contribution to the point where many simply do not bother. Many of these people are regular commits to phobos and druntime.
May 24
prev sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Monday, 24 May 2021 at 06:58:48 UTC, Walter Bright wrote:
 On 5/23/2021 10:15 PM, Nicholas Wilson wrote:
 This is a _completely_ orthogonal problem.
It's the same problem. D's support for modules and packages is literally designed around matching the hierarchy of the source files. Shuffling files around accomplishes nothing when every module imports every other module.
It will be a huge help if they are though. At minimum it will organize the things into packages that have one purpose, which will help in understanding the structure of dmd, and also make navigation and search of desired functionality easier, compared to one flat package. This can actually be the first step at unwinding all the mess with imports you're mentioning, since packages are not just folders, but logical organization of a set of modules that are somewhat related to the purpose the package has.
May 24
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 2:55 AM, Alexandru Ermicioi wrote:
 It will be a huge help if they are though. At minimum it will organize the 
 things into packages that have one purpose, which will help in understanding
the 
 structure of dmd, and also make navigation and search of desired functionality 
 easier, compared to one flat package.
It establishes a fake hierarchy that is *not* expressed in the code. Poor encapsulation is the problem, and this does nothing to solve it.
 This can actually be the first step at 
 unwinding all the mess with imports you're mentioning, since packages are not 
 just folders, but logical organization of a set of modules that are somewhat 
 related to the purpose the package has.
It's backwards. Fix the rust on the car, then repaint it.
May 24
parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 24 May 2021 at 10:15:53 UTC, Walter Bright wrote:
 On 5/24/2021 2:55 AM, Alexandru Ermicioi wrote:
 It will be a huge help if they are though. At minimum it will 
 organize the things into packages that have one purpose, which 
 will help in understanding the structure of dmd, and also make 
 navigation and search of desired functionality easier, 
 compared to one flat package.
It establishes a fake hierarchy that is *not* expressed in the code. Poor encapsulation is the problem, and this does nothing to solve it.
It _is_ in the code. FFS, the AST is literally a hierarchy!
May 24
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/23/21 10:56 PM, Walter Bright wrote:
 I recently had to talk someone out of having dmd/backend import files 
 from dmd/root.
One problem with that is code duplication. There are two types OutBuffer in frontend and Outbuffer in backend that are 95% identical, yet duplicated. Recent improvements (two distinct) will need to be duplicated to the other, which is clearly not a good way to go. How to address this problem? I think all of us looking to improve dmd's architecture would be well served by reading this book: https://amazon.com/gp/product/0135974445/ Really close, cover to cover. A lot of the principles in that book are either applied with good results (sadly not as often as one would hope), or not, with the expected poor outcome, in dmd's codebase. For example, this:
 A good rule of thumb is:
 
     *** Never import a file from an uplevel directory ***
 
 Import sideways and down, never up. 
is an approximate formulation of a subset of Dependency Inversion Principle: https://en.wikipedia.org/wiki/Dependency_inversion_principle
May 23
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/23/2021 10:55 PM, Andrei Alexandrescu wrote:
 One problem with that is code duplication.
Sure. But in the outbuffer case, the duplication stems from backend being used in multiple projects. It's hard to have perfection, and if getting to perfection means driving off a cliff (!) it's better to just live with a bit of imperfection here and there. I don't like having two outbuffers, but the cure is worse for that particular case. There's even another implementation of outbuffer in Phobos (because I thought outbuffer was generally very useful): https://dlang.org/phobos/std_outbuffer.html But here we run into our rule that dmd shouldn't rely on Phobos. Compromise is inevitable. Outbuffer isn't a spike we need to impale ourselves on. There are plenty of other encapsulation problems that could be improved, like target.d.
May 24
next sibling parent reply zjh <fqbqrr 163.com> writes:
There should be a base package on DMD/Druntime and Phobos.
Split large file into small files in one directory.
0ne big file<=>one directory.
We need big changes.
May 24
parent reply zjh <fqbqrr 163.com> writes:
We need `big changes`.
We need `todolist`(order by important).
We need to split big files into directories.
Small refactoring is useless.
`Big changes` are necessary.
We separate the `stable part` from the `unstable part` of the big 
file.And divided into `small files`.
According to dependence, change from `the most dependent`.
Interfacs or func name need not to change.
It's just that the `organization` has to be changed.
Nobody reads `thousands of lines` functions.
No one reads `>100kb` coding files because they are too large.
We just `split up` large files, not modify the function 
implementation.
Because modifying the function implementation is `most likely` to 
make mistakes.
May 24
next sibling parent reply user1234 <user1234 12.de> writes:
On Monday, 24 May 2021 at 12:38:59 UTC, zjh wrote:
 We need `big changes`.
 We need `todolist`(order by important).
 We need to split big files into directories.
 Small refactoring is useless.
 `Big changes` are necessary.
 We separate the `stable part` from the `unstable part` of the 
 big file.And divided into `small files`.
 According to dependence, change from `the most dependent`.
 Interfacs or func name need not to change.
 It's just that the `organization` has to be changed.
 Nobody reads `thousands of lines` functions.
 No one reads `>100kb` coding files because they are too large.
 We just `split up` large files, not modify the function 
 implementation.
 Because modifying the function implementation is `most likely` 
 to make mistakes.
100 kb is let's say 2500 slocs (or rather 1500 from the D-Scanner pov), that's not too crazy. Many DMD source files are big because they contain a visitor. visitors cant be split in several files. Often you only actually are interested by a single method of a visitor so the overhall size of a source does not matter. Eventually what could be done for the biggest methods of visitors is to extract parts of the content to several **non-nested** free functions, so that no more low level implementation details, like control loops, are visible and instead you just see do_this; do_that; with just a few, nzzcessarily unavoidable, flow statements. The problem is that extracting and splitting the content would be tedious because of the decade of more or less well organized patchwork added to fix the bugs. PS: backticks are for inline code, sourround with pairs of stars or pairs of underscores.
May 24
next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On Monday, 24 May 2021 at 19:42:00 UTC, user1234 wrote:
 On Monday, 24 May 2021 at 12:38:59 UTC, zjh wrote:
 [...]
100 kb is let's say 2500 slocs (or rather 1500 from the D-Scanner pov), that's not too crazy. Many DMD source files are big because they contain a visitor. visitors cant be split in several files. Often you only actually are interested by a single method of a visitor so the overhall size of a source does not matter. Eventually what could be done for the biggest methods of visitors is to extract parts of the content to several **non-nested** free functions, so that no more low level implementation details, like control loops, are visible and instead you just see do_this; do_that; with just a few, nzzcessarily unavoidable, flow statements.
Actually, the visitors have been slowly getting converted into nested functions and a switch table.
May 24
prev sibling parent Basile B. <b2.temp gmx.com> writes:
On Monday, 24 May 2021 at 19:42:00 UTC, user1234 wrote:
 Eventually what could be done for the biggest methods of 
 visitors is to extract parts of the content to several 
 **non-nested** free functions, so that no more low level 
 implementation details, like control loops, are visible and 
 instead you just see do_this; do_that; with just a few, 
 nzzcessarily unavoidable, flow statements.
I've had the opportunity to do quch a refact yesterday in styx. It makes things very clear, for example the expression semantic for binary assign exps : ```d override void visit(BinAssExpressionAstNode node) { processBinaryOperands(node); if (tryRewritingToOperatorOverload(node, [node.left, node.right])) return; if (tryToSetLengthExp(node)) return; if (checkIfInvalidEnumSetOp(node)) return; if (checkPtrArithmeticOp(node)) return; ensureAssignedParamIsLvalue(node); tryOneWayAssImplicitConv(node); checkIfAssignable(node.left); } ``` or for binary exps that are not assign and not cmp: ```d override void visit(BinaryExpressionAstNode node) { processBinaryOperands(node); if (tryRewritingToOperatorOverload(node, [node.left, node.right])) return; if (checkIfInvalidEnumSetOp(node)) return; if (checkPtrArithmeticOp(node)) return; tryTwoWaysBinaryImplicitConv(node); } ``` It was like 200 lines before.
Jun 06
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/24/21 8:38 AM, zjh wrote:
 Small refactoring is useless.
 `Big changes` are necessary.
That evokes the couple who had problems in their relationship, so they decided to solve them by getting married.
May 24
next sibling parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Monday, 24 May 2021 at 20:45:11 UTC, Andrei Alexandrescu wrote:
 On 5/24/21 8:38 AM, zjh wrote:
 Small refactoring is useless.
 `Big changes` are necessary.
That evokes the couple who had problems in their relationship, so they decided to solve them by getting married.
If that meant that they encapsulated their problems and had a united front towards the rest of the world, then that is the right approach for dmd. Arranged marriages are underappreciated...
May 24
prev sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Monday, 24 May 2021 at 20:45:11 UTC, Andrei Alexandrescu wrote:
 On 5/24/21 8:38 AM, zjh wrote:
 Small refactoring is useless.
 `Big changes` are necessary.
That evokes the couple who had problems in their relationship, so they decided to solve them by getting married.
You watched "Better Call Saul", didn't you? :-)
May 25
prev sibling next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On Monday, 24 May 2021 at 09:53:42 UTC, Walter Bright wrote:
 But here we run into our rule that dmd shouldn't rely on 
 Phobos. Compromise is inevitable. Outbuffer isn't a spike we 
 need to impale ourselves on. There are plenty of other 
 encapsulation problems that could be improved, like target.d.
*ahem* https://github.com/dlang/dmd/pull/12574 It's a start at least.
May 24
prev sibling parent reply Johan Engelen <j j.nl> writes:
On Monday, 24 May 2021 at 09:53:42 UTC, Walter Bright wrote:
 On 5/23/2021 10:55 PM, Andrei Alexandrescu wrote:
 One problem with that is code duplication.
Sure. But in the outbuffer case, the duplication stems from backend being used in multiple projects. It's hard to have perfection, and if getting to perfection means driving off a cliff (!) it's better to just live with a bit of imperfection here and there. I don't like having two outbuffers, but the cure is worse for that particular case. There's even another implementation of outbuffer in Phobos (because I thought outbuffer was generally very useful): https://dlang.org/phobos/std_outbuffer.html But here we run into our rule that dmd shouldn't rely on Phobos. Compromise is inevitable. Outbuffer isn't a spike we need to impale ourselves on. There are plenty of other encapsulation problems that could be improved, like target.d.
Outbuffer is a case of a data structure that is useful throughout the compiler. So it is put in a package of the compiler that is OK to be imported from other packages (and should avoid importing other packages). I think the `dmd.root` package is exactly like such a package (compare with `ADT` in LLVM). From that standpoint, I don't see why the `dmd.backend` package cannot import `dmd.root`. If `dmd.backend` is to be used in different projects, then those should also use `dmd.root` and that's where the dependency chain stops. Better: if it is in Phobos, great, use that! If you need a certain data structure you know where to look: Phobos or `dmd.root`. Is it not in there? Don't create a new structure elsewhere, add it to `dmd.root` and import it. In LDC, we use the C++ stdlib and we use Phobos, because why not? We are programming in D after all, and it is the standard library that is available in all bootstrapping compiler required versions. We do take care not to rely on _latest_ Phobos, but on Phobos from the oldest bootstrapping D compiler version supported up to the latest version. Same for the C++ stdlib (C++20 is not ok, but C++14 is much encouraged). For example: LDC uses MD5 hashing for its machine codegen cache. `import std.digest.md; auto md5hash = md5Of(slice);` Done. LLVM does the same, e.g. the project removed its own unique_ptr implementation (`OwningPtr`), and now uses `std::unique_ptr`. My standpoint on the original topic of "make it easier to experiment with the compiler": I disagree with making the code more stable. If anything, we should be refactoring much more aggressively, changing function names etc. Nicholas mentions that it is a pain to keep up with LLVM, where even function names are renamed from "SortSomeName" to "SortSomeNames" (made-up example), because plural is correct. The pain would be _much_ more if all unfixed small incorrectness/clumsiness/etc. accumulates over time and you end up with a convoluted codebase... The main stumbling block is already mentioned: ownership. If you want contributors, you have to give up some ownership and be willing to make compromises between your own and the new contributors' viewpoints. The lack of willingness to give up ownership is what keeps me out (and I suspect, indirectly, others too). The frontend source code is not nice, but I'm not drawn to fix it at all (even if paid for) because I am not ashamed by it as I would be if I would have some shared 'ownership' of it. -Johan
May 24
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 24 May 2021 at 10:47:16 UTC, Johan Engelen wrote:
 My standpoint on the original topic of "make it easier to 
 experiment with the compiler": I disagree with making the code 
 more stable. If anything, we should be refactoring much more 
 aggressively, changing function names etc.
Thank you for bringing us back on topic. Yes, or at least have a map of what is considered stable and well encapsulated and what is considered unstable and likely to change. I don't believe this is a matter for git rebasing tooling/understanding. I just don't want to build _directly_ on top of something that looks like it is likely to change (from a software engineering point of view). I consider every hour spent on rebasing, dealing with regressions etc to be losses, or more importantly "not fun". I only want to do "not fun" things if I can learn something from them. D has to rely on hobbyists, so getting "not fun"/"no learning potential" out of the way is important.
 others too). The frontend source code is not nice, but I'm not 
 drawn to fix it at all (even if paid for) because I am not 
 ashamed by it as I would be if I would have some shared 
 'ownership' of it.
That is a bit harsh, of course all code bases that have evolved over a long time are not nice, parts of LDC too. Anyway, my main wish is just to be able to inject my own IR between the frontend and backend. My feeling right now is that to do that I have to choose LDC and then heavily modify it. I sense that in the end I basically will end up with my own backend, something I don't want to maintain... Think of it like LEGOs. The front end is a green brick and the back end a red brick. I want to insert a white brick between them. I don't want to modify the bricks more than "cleaning" the studs. Another analogy, if the frontend is an engine, my IR is the transmission and the backend is the wheels, then I don't mind that the current engine is oily and grease, I leave that to other mechanics to clean up. Same with the wheels. I just want to be an expert on the transmission and evolve it from a manual transmission into a nice automatic transmission. Right now the engine is coupled directly to the wheels... which basically means being forced to drive in the same gear all the time. I am less interested in getting my fingers greasy and am happy to leave that to others as long as I can focus on polishing the chrome on my transmission line... (I belive many things could be done with an intermediary high level IR, such as ARC, stackless coroutines, heap optimizations... LLVM is too low level. AST is too cumbersome.)
May 24
parent reply sighoya <sighoya gmail.com> writes:
On Monday, 24 May 2021 at 14:37:45 UTC, Ola Fosheim Grøstad wrote:
I sense that in the end I basically will end up with my own 
backend, something I don't want to maintain...
I think you will end up with your own compiler :)
May 24
parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Monday, 24 May 2021 at 15:16:34 UTC, sighoya wrote:
 On Monday, 24 May 2021 at 14:37:45 UTC, Ola Fosheim Grøstad 
 wrote:
I sense that in the end I basically will end up with my own 
backend, something I don't want to maintain...
I think you will end up with your own compiler :)
I think we need to learn from Apple and Microsoft, they are doing well, not only because of resources, but because they let people be specialists on certain aspects of the compiler. D has people who has specialized on the GC and LLVM, but it isnt a deliberate strategy... Yet. Building a racing car is not a one man project...
May 24
prev sibling next sibling parent Bruce Carneal <bcarneal gmail.com> writes:
On Monday, 24 May 2021 at 10:47:16 UTC, Johan Engelen wrote:

[...]
 My standpoint on the original topic of "make it easier to 
 experiment with the compiler": I disagree with making the code 
 more stable. If anything, we should be refactoring much more 
 aggressively, changing function names etc.
Yes. It's easier to understand shallow trees with modest leaves than arbitrary graphs with 1000+ LOC "leaves". Getting there will take some work. Fortunately, it looks like much of that work can be done "bottom up" i.e. incrementally. When simplifying code readability is a commonly applied metric. How long does it take for an intelligent but "outside" developer to understand the code? Another useful metric is the degree of dynamic dependence: could this code run in parallel? If not, why not? Examining the ability to run in parallel can also be done "bottom up", and is at least as valuable for simplification/correctness as it is for parallel speedup potential. That said, a taskification that followed our, sometimes extreme, code expansion contours could yield speedups that coarser approaches to multi-threading do not. It could also bring vibe style sanity in place of manually managed asynchrony where the dependencies are carried in your head. When looking to foster task independence, building around dependency graphs which are immutable/committed in the interior and expanding/mutating/synchronizing at the frontier, is one way to go. (__traits compiles is interesting in this context...) The SDC people will have other ideas/experience to share if taskification becomes a thing. Finally, my thanks again to the current front end crew and the LDC/dcompute crew. The tool chain may not be perfect but, boy, it's way better than falling back to C++/CUDA.
May 24
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/24/21 6:47 AM, Johan Engelen wrote:
 On Monday, 24 May 2021 at 09:53:42 UTC, Walter Bright wrote:
 On 5/23/2021 10:55 PM, Andrei Alexandrescu wrote:
 One problem with that is code duplication.
Sure. But in the outbuffer case, the duplication stems from backend being used in multiple projects. It's hard to have perfection, and if getting to perfection means driving off a cliff (!) it's better to just live with a bit of imperfection here and there. I don't like having two outbuffers, but the cure is worse for that particular case. There's even another implementation of outbuffer in Phobos (because I thought outbuffer was generally very useful): https://dlang.org/phobos/std_outbuffer.html But here we run into our rule that dmd shouldn't rely on Phobos. Compromise is inevitable. Outbuffer isn't a spike we need to impale ourselves on. There are plenty of other encapsulation problems that could be improved, like target.d.
Outbuffer is a case of a data structure that is useful throughout the compiler. So it is put in a package of the compiler that is OK to be imported from other packages (and should avoid importing other packages). I think the `dmd.root` package is exactly like such a package  (compare with `ADT` in LLVM). From that standpoint, I don't see why the `dmd.backend` package cannot import `dmd.root`. If `dmd.backend` is to be used in different projects, then those should also use `dmd.root` and that's where the dependency chain stops.
Thanks. I set out to write pretty much exactly that. To add to it: A. High-level modules should not depend on low-level modules. Both should depend on abstractions (e.g., interfaces). B. Abstractions should not depend on details. Details (concrete implementations) should depend on abstractions. (Source: https://en.wikipedia.org/wiki/Dependency_inversion_principle) Applied here: A. The back-end should not depend on the front end. Both should depend on abstractions (e.g., interfaces) such as OutBuffer. B. OutBuffer should not depend on memory-mapped minutia. Memory-mapped work should be done to serve OutBuffer.
 Better: if it is in Phobos, great, use that!
 If you need a certain data structure you know where to look: Phobos or 
 `dmd.root`. Is it not in there? Don't create a new structure elsewhere, 
 add it to `dmd.root` and import it.
I am sympathetic to the cause of not addding a large number of moving pieces to the compiler codebase. But yes the point stands. Hopefully good versioning could help a lot with all that.
 In LDC, we use the C++ stdlib and we use Phobos, because why not? We are 
 programming in D after all, and it is the standard library that is 
 available in all bootstrapping compiler required versions. We do take 
 care not to rely on _latest_ Phobos, but on Phobos from the oldest 
 bootstrapping D compiler version supported up to the latest version. 
 Same for the C++ stdlib (C++20 is not ok, but C++14 is much encouraged).
 
 For example: LDC uses MD5 hashing for its machine codegen cache. `import 
 std.digest.md; auto md5hash = md5Of(slice);`  Done.
 LLVM does the same, e.g. the project removed its own unique_ptr 
 implementation (`OwningPtr`), and now uses `std::unique_ptr`.
Cool stuff!
 My standpoint on the original topic of "make it easier to experiment 
 with the compiler": I disagree with making the code more stable. If 
 anything, we should be refactoring much more aggressively, changing 
 function names etc.
Doesn't aggressive refactoring require massive unittests?
May 24
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 3:47 AM, Johan Engelen wrote:
 The main stumbling block is already mentioned: ownership. If you want 
 contributors, you have to give up some ownership and be willing to make 
 compromises between your own and the new contributors' viewpoints. The lack of 
 willingness to give up ownership is what keeps me out (and I suspect, 
 indirectly, others too). The frontend source code is not nice, but I'm not
drawn 
 to fix it at all (even if paid for) because I am not ashamed by it as I would
be 
 if I would have some shared 'ownership' of it.
Good points, but part of the reason the front end code is what it is is because of many contributors with diverse viewpoints on what good code should look like. How should we reconcile that?
May 24
prev sibling next sibling parent reply poffer <poffer poffer.net> writes:
On Monday, 24 May 2021 at 02:56:14 UTC, Walter Bright wrote:
 On 5/23/2021 7:25 PM, Nicholas Wilson wrote:
 Directories.
every other module" that leaves one with nowhere to start. We currently have: dmd dmd/root dmd/backend I regularly fend off attempts to have dmd/root import files from dmd, and dmd/backend import files from dmd. I recently had to talk someone out of having dmd/backend import files from dmd/root. In other words, a failure of encapsulation. Let's look at one example, picked more or less because I've looked at it recently, dmd/target.d. The reason for its existence is to abstract target information. It's imports are: import dmd.argtypes_x86; import dmd.argtypes_sysv_x64; import core.stdc.string : strlen; import dmd.cond; import dmd.cppmangle; import dmd.cppmanglewin; import dmd.dclass; import dmd.declaration; import dmd.dscope; import dmd.dstruct; import dmd.dsymbol; import dmd.expression; import dmd.func; import dmd.globals; import dmd.id; import dmd.identifier; import dmd.mtype; import dmd.statement; import dmd.typesem; import dmd.tokens : TOK; import dmd.root.ctfloat; import dmd.root.outbuffer; import dmd.root.string : toDString; If I want to understand the code, I have to understand half of the rest of the compiler. On a more abstract level, why on earth would a target abstraction need to know about AST nodes? At least half of these imports shouldn't be here, and if they are, the code needs to be redesigned. Recently I needed some target information in the ImportC lexer, and it would have been so easy to just import dmd.target. But then that drags along all the imports that I've really tried to avoid importing into the lexer. Iain came up with a clever solution to use a template parameter. Note that Phobos suffers terribly from this disease (everything ultimately imports everything else), which makes it very hard to understand and debug. Fixing this is not easy, it requires a lot of hard thinking about what a module *really* needs to do. But each success at eliminating an import makes it more understandable. Creating a false hierarchy (an implied relationship that is instantly defeated by the imports) of files won't fix it. A good rule of thumb is: *** Never import a file from an uplevel directory *** Import sideways and down, never up.
A good enhancement to the language would be adding some sort of module declaration that just states the admitted import packages or modules. I know that could be done by an external tool, but I feel that this one is a common problem.
May 24
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 1:40 AM, poffer wrote:
 A good enhancement to the language would be adding some sort of module 
 declaration that just states the admitted import packages or modules. I know 
 that could be done by an external tool, but I feel that this one is a common 
 problem.
Importing unused modules is a problem, but a minor one. The larger problem is needing those modules.
May 24
parent reply poffer <poffer poffer.net> writes:
On Monday, 24 May 2021 at 09:41:12 UTC, Walter Bright wrote:
 On 5/24/2021 1:40 AM, poffer wrote:
 A good enhancement to the language would be adding some sort 
 of module declaration that just states the admitted import 
 packages or modules. I know that could be done by an external 
 tool, but I feel that this one is a common problem.
Importing unused modules is a problem, but a minor one. The larger problem is needing those modules.
No. What I mean is a declaration that for example, allows only import from dmd in dmd/backend, of declare that imports from dmd/root are forbidden. Aren't you the guy pushing from declarations over conventions? Conventions do not scale.
May 24
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 3:18 AM, poffer wrote:
 On Monday, 24 May 2021 at 09:41:12 UTC, Walter Bright wrote:
 On 5/24/2021 1:40 AM, poffer wrote:
 A good enhancement to the language would be adding some sort of module 
 declaration that just states the admitted import packages or modules. I know 
 that could be done by an external tool, but I feel that this one is a common 
 problem.
Importing unused modules is a problem, but a minor one. The larger problem is needing those modules.
No. What I mean is a declaration that for example, allows only import from dmd in dmd/backend, of declare that imports from dmd/root are forbidden.
Ok, now I understand what you meant.
 Aren't you the guy pushing from declarations over conventions?
Snark isn't necessary.
 Conventions do not scale.
Please propose a DIP for your idea.
May 24
prev sibling next sibling parent reply Iain Buclaw <ibuclaw gdcproject.org> writes:
On Monday, 24 May 2021 at 02:56:14 UTC, Walter Bright wrote:
   import dmd.argtypes_x86;
   import dmd.argtypes_sysv_x64;
   import core.stdc.string : strlen;
   import dmd.cond;
   import dmd.cppmangle;
   import dmd.cppmanglewin;
   import dmd.dclass;
   import dmd.declaration;
   import dmd.dscope;
   import dmd.dstruct;
   import dmd.dsymbol;
   import dmd.expression;
   import dmd.func;
   import dmd.globals;
   import dmd.id;
   import dmd.identifier;
   import dmd.mtype;
   import dmd.statement;
   import dmd.typesem;
   import dmd.tokens : TOK;
   import dmd.root.ctfloat;
   import dmd.root.outbuffer;
   import dmd.root.string : toDString;

 If I want to understand the code, I have to understand half of 
 the rest of the compiler. On a more abstract level, why on 
 earth would a target abstraction need to know about AST nodes? 
 At least half of these imports shouldn't be here, and if they 
 are, the code needs to be redesigned.
To be fair, most of this is imported because a function needs the definition of one or more symbols. This can be made better by either: 1. Making these selective imports, or... 2. Moving type definitions of AST nodes into modules that _only_ contain definitions.
May 24
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 2:46 AM, Iain Buclaw wrote:
 To be fair, most of this is imported because a function needs the definition
of 
 one or more symbols.  This can be made better by either:
 
 1. Making these selective imports, or...
That doesn't really help, the dependencies are still there.
 2. Moving type definitions of AST nodes into modules that _only_ contain 
 definitions.
It is not critical that we fix target.d. It's just that it would be better if its API was not AST nodes, but just values. Let the caller construct the AST node from the information provided. Like what we did for the C parser. I was happy to have it not indirectly import everything in dmd when all it needed was a couple values. I'm not saying any of this is easy.
May 24
parent reply Iain Buclaw <ibuclaw gdcproject.org> writes:
On Monday, 24 May 2021 at 10:21:44 UTC, Walter Bright wrote:
 That doesn't really help, the dependencies are still there.
It makes it clear what they are for, which makes this statement:
 If I want to understand the code, I have to understand half of 
 the rest of the compiler.
obsolete.
 It is not critical that we fix target.d. It's just that it 
 would be better if its API was not AST nodes, but just values. 
 Let the caller construct the AST node from the information 
 provided.
The majority of the API are values, but it still needs to be fed AST information in order to make informative decisions. For instance, how else would we be able to infer `isReturnOnStack` without a `TypeFunction`? Even GDC needs the completed `TypeFunction`, as I generate a `tree` on-the-fly and pass that to GCC's back-end API to get said information.
 Like what we did for the C parser. I was happy to have it not 
 indirectly import everything in dmd when all it needed was a 
 couple values.

 I'm not saying any of this is easy.
Target's first goal of removing all `global.params.isXXX` fields was never going to be easy either. :-)
May 24
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 3:51 AM, Iain Buclaw wrote:
 For instance, how else would we be able to infer `isReturnOnStack` without a 
 `TypeFunction`?
Challenge accepted! Let's see. The only things the TypeFunction for are: 1. the return type 2. the function linkage 3. if the function returns a ref Pass those in instead, and the need for TypeFunction goes away. https://github.com/dlang/dmd/blob/master/src/dmd/target.d#L762 (1) can be further broken down into "is it a POD", etc. Breaking all this info out of a TypeFunction takes some code, but this "decomposition" can be done by a wrapper function in another module. The end result is target.d can be completely independent of the compiler's internal AST structures. But wait! There's more! Notice how isReturnOnStack depends on several random global variables like os, is64bit, and isPOSIX. They can be passed in as arguments, too (or passed in as a const ref to a struct containing those as members). isReturnOnStack() then becomes a pure function! (and safe, nogc, nothrow, all that good stuff) Those initialize() functions go away, too. The beauty now becomes that we can (at last!) easily and correctly write unittests for it. target.d now becomes INDEPENDENT of the rest of the compiler. How sweet it will be!
May 24
next sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On Monday, 24 May 2021 at 22:18:35 UTC, Walter Bright wrote:
 On 5/24/2021 3:51 AM, Iain Buclaw wrote:
 For instance, how else would we be able to infer 
 `isReturnOnStack` without a `TypeFunction`?
Challenge accepted! Let's see. The only things the TypeFunction for are: 1. the return type 2. the function linkage 3. if the function returns a ref Pass those in instead, and the need for TypeFunction goes away.
I still see a Type though. :-) On my side, in pseudo-code this would become: ``` tree type = build_gcc_type (return_type); if (isref) type = build_reference_type (type); return targetm.calls.return_in_memory (type); ``` Or alternatively, I could just abandon all accuracy and go with: ``` return (return_type.ty == Tstruct || return_type.ty == Tsarray) && !isref; ``` Because I know that this function doesn't affect the code generator, though users won't be able to reliably do introspection.
 Notice how isReturnOnStack depends on several random global 
 variables like os, is64bit, and isPOSIX. They can be passed in 
 as arguments, too (or passed in as a const ref to a struct 
 containing those as members).
They have been moved to the internal state of Target, so no longer random globals. Information such as the target OS or CPU should not be floating around the front-end. It should all be constrained to either the dmd.target interface or dmd.backend, leaving the front-end to only handle matters relating to language semantics.
 isReturnOnStack() then becomes a pure function! (and safe, 
 nogc, nothrow, all that good stuff)

 Those initialize() functions go away, too.

 The beauty now becomes that we can (at last!) easily and 
 correctly write unittests for it. target.d now becomes 
 INDEPENDENT of the rest of the compiler.

 How sweet it will be!
I think target.d could instead benefit from breaking out into per-backend modules though, such as target_linux.d, target_freebsd.d, target_x86.d, target_x86_64.d, ... to separate out concerns of the OS with concerns of the CPU. It would be something completely dmd-specific though, as I don't use/re-use any part of what's present in dmd's source tree around this module.
May 24
prev sibling parent zjh <fqbqrr 163.com> writes:
We should add a `favor function` to the forum post.
May 24
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/24/21 5:46 AM, Iain Buclaw wrote:
 On Monday, 24 May 2021 at 02:56:14 UTC, Walter Bright wrote:
   import dmd.argtypes_x86;
   import dmd.argtypes_sysv_x64;
   import core.stdc.string : strlen;
   import dmd.cond;
   import dmd.cppmangle;
   import dmd.cppmanglewin;
   import dmd.dclass;
   import dmd.declaration;
   import dmd.dscope;
   import dmd.dstruct;
   import dmd.dsymbol;
   import dmd.expression;
   import dmd.func;
   import dmd.globals;
   import dmd.id;
   import dmd.identifier;
   import dmd.mtype;
   import dmd.statement;
   import dmd.typesem;
   import dmd.tokens : TOK;
   import dmd.root.ctfloat;
   import dmd.root.outbuffer;
   import dmd.root.string : toDString;

 If I want to understand the code, I have to understand half of the 
 rest of the compiler. On a more abstract level, why on earth would a 
 target abstraction need to know about AST nodes? At least half of 
 these imports shouldn't be here, and if they are, the code needs to be 
 redesigned.
To be fair, most of this is imported because a function needs the definition of one or more symbols.  This can be made better by either: 1. Making these selective imports, or... 2. Moving type definitions of AST nodes into modules that _only_ contain definitions.
Yes, and these are good incremental steps that help a lot, are low cost, and inform larger refactorings. There should be active work on pushing imports down to where they're used. My dream: top-level imports will become an antipattern in large D code.
May 24
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Monday, 24 May 2021 at 02:56:14 UTC, Walter Bright wrote:
 A good rule of thumb is:

     *** Never import a file from an uplevel directory ***

 Import sideways and down, never up.
You may want to reconsider what you just said. Do you really insist that `std.stdio` should copy-paste the CLib headers instead of importing `core.stdc.stdio`?
May 24
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 1:35 PM, Dukc wrote:
 On Monday, 24 May 2021 at 02:56:14 UTC, Walter Bright wrote:
 A good rule of thumb is:

     *** Never import a file from an uplevel directory ***

 Import sideways and down, never up.
You may want to reconsider what you just said. Do you really insist that `std.stdio` should copy-paste the CLib headers instead of importing `core.stdc.stdio`?
I knew someone would bring that up. :-) It's a good question. `core` is a separate library in its own, independent hierarchy, it is not in the `std` hierarchy. It is not "up sideways and down". So it's good. Now, if std.stdio imported core.stdc.stdio, and core.stdc.stdio imported std.stdio, then you've got a really bad design.
May 24
parent Dukc <ajieskola gmail.com> writes:
On Monday, 24 May 2021 at 22:21:46 UTC, Walter Bright wrote:
 `core` is a separate library in its own, independent hierarchy, 
 it is not in the `std` hierarchy. It is not "up sideways and 
 down". So it's good.
Shouldn't the same reasoning apply to `import`ing `dmd.root` from `dmd.backend`? if I understood right, `dmd.root` is designed to act like an external utility library. It should be no problem to `import` it, as long as `dmd.root` does not try to import rest of DMD.
 Now, if std.stdio imported core.stdc.stdio, and core.stdc.stdio 
 imported std.stdio, then you've got a really bad design.
It sounds like your real issue is circular imports, not parent package imports. That sounds more reasonable to me.
May 24
prev sibling parent reply Dibyendu Majumdar <mobile majumdar.org.uk> writes:
On Monday, 24 May 2021 at 02:56:14 UTC, Walter Bright wrote:

 every other module" that leaves one with nowhere to start.
 I regularly fend off attempts to have dmd/root import files 
 from dmd, and dmd/backend import files from dmd. I recently had 
 to talk someone out of having dmd/backend import files from 
 dmd/root.
Wow - that's pretty fundamental. How does such code get in? I assume you own the DMD code - all changes should be vetted and approved by you? Btw one less from Linux maintenance is that the the owners should spend all their time reviewing code - not writing code!
May 25
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/25/2021 3:16 AM, Dibyendu Majumdar wrote:
 Wow - that's pretty fundamental. How does such code get in? I assume you own
the 
 DMD code - all changes should be vetted and approved by you?
There are many people who have pull privileges.
May 27
prev sibling next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 24 May 2021 at 02:25:33 UTC, Nicholas Wilson wrote:
 On Sunday, 23 May 2021 at 06:12:30 UTC, Ola Fosheim Grøstad 
 wrote:
 The number one challenge I see is keeping track of DMD as it 
 is released with new improvements. Basically reapplying the 
 changes made to the experimental branch to the main branch 
 (aka "rebasing"?).
(the is the correct terminology). I suspect this is more of a problem for people that are less familiar with git, which might well also include people wanting to play around with DMD, e.g. GSoC/SAoC students. I know this was the case for me while developing dcompute with the added difficulty of tracking LLVM on top of LDC (which was kept in sync with DMD).
 I suspect that kills many efforts, meaning people create a 
 fork, start making changes, but then a new version of DMD is 
 released and the fork is left to dry in the sun as rebasing is 
 not fun. And well, a hobby that isn't fun, is not a good 
 hobby. :-D
The solution to this is better git skills not so much better compiler skills/knowledge of DMD although a merge conflict in a critical piece of code is always a PiTA. We now have slack/discord for people to ask these kinds of questions, which I'm sure they will get answered if the are trying to do something interesting or fix an annoying problem.
I think I should have used the term "boring" rather than "challenging". I doubt that git skills would solve it as I think it is more related to what a hobby is to people who are older and have a very long spare time todo-list. Any "unproductive" and "unfun" chore will go to the bottom of the todo-list. My I-really-ought-todo-list is so long that it could fill up the rest of my life... So it is basically easier to just stay on an outdated dmd-branch for a couple of years, rather than keeping track of it... which is not a good strategy. Think of it like this: I have 2-5 hours a week for completely unnecessary, but fun things like hacking a new IR + optimization inbetween DMD and LLVM. So, what should I do: do my taxes, rebase my fork, watch Eurovision with family? Rebasing is down there with taxes, except I have to do the taxes eventually, just not this Saturday... (Ok, so we watch Eurovision then just to find out how bad it is? :-) I think it would not be too difficult to get to a situation where you have well-defined entry points, hooks, layers that makes it more of a plugin-experience. Examples of potential plug-and-play: 1. Add new experimental syntax: The parser is quite close. It would not take a lot of work to encapsulate a manager of (file-extension, Parser) pairs that have no overhead (compile time). Ok, so if you want to extend the language as experiment, just duplicate the parser, modify it and plug it in. This is a low-hanging fruit. 2. Add new semantics: add a new file with functions with custom intrinsics that are somehow added to the runtime, use your custom parser to lower your custom syntax to these custom runtime functions. Inject yourself between the front-end and backed (assuming a high level IR), pick up the custom intrinsics and do the analysis/transforms you want. 3. Add new high level optimization, like ARC: same as 2, except you only add new passes in a new file and possibly some new fields to the high level IR. Then edit a config file that makes the pass available and executed at the right time (with respect to other passes). So, the basic idea is, that instead of _modifying_ the compiler, you add new files to it and bring them into the compiler by hooks, configuration files etc. Then you can also much easier merge and combine contributions from many different extension authors and easily replace one extension with a better one.
 Urgh. Dealing with 10000 line files and 1000 line functions is 
 such a drain on trying to get stuff done (looking at you 
 expressionsem.d). However this needs to be combined with 
 directories/packages or it will not improve the situation.
Yes, but one can create virtual directories though. E.g. in some editors you can group files from different directories so it looks like they are in one directory. You can do something similar with "ln -s", but it isn't optimal...
 Which items are feasible in the next 6 months?
Directories.
Sounds like a good start. I still think the high level IR is the most pressing one, as not having that abstraction makes adding new experimental semantics too time consuming for hobbyists. I had the idea that I could do ARC by adding intrinsics to LLVM, but Apple engineers strongly advised against it and strongly suggested working on a high level IR instead. ARC is something well suited for a hobbyists as you can implement it in a gradual manner if you have a high level IR (one tweak here, one tweak there). Anyway, I think more experimentation is needed. Say, if 1 out of 10 experiments made it into the main dmd, then there could be more interesting options that would make dmd stand out in the crowd. IMHO The key challenge is to make experimentation fun for people who have limited time (which happens as you get older). Imagine if D could get some of the people that were active with D 10-15 years ago, but currently have very limited time, to create their own experiments? I am sure that many of those have grown to capable programmers since then, so that could be something to think about. It has to be fun experience throughout for people to spend those 3-4 spare hours a week on compiler hacking.
May 24
prev sibling parent reply Iain Buclaw <ibuclaw gdcproject.org> writes:
On Monday, 24 May 2021 at 02:25:33 UTC, Nicholas Wilson wrote:
 On Sunday, 23 May 2021 at 06:12:30 UTC, Ola Fosheim Grøstad
 5. Use directories.
Yes!!! sooo much yes! see above.
You can't complain unless you've had a go at making a change to Ada. [gcc] (master) $ ls gcc/c | wc -l 19 [gcc] (master) $ ls gcc/cp | wc -l 87 [gcc] (master) $ ls gcc/fortran | wc -l 99 [gcc] (master) $ ls gcc/d/dmd | wc -l 114 [gcc] (master) $ ls gcc/ada | wc -l 565 I do tend to agree though that we should try to respect Dunbar's number when it comes to these things. But the individual file count does not map to reality.
May 24
parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 24 May 2021 at 09:41:08 UTC, Iain Buclaw wrote:
 On Monday, 24 May 2021 at 02:25:33 UTC, Nicholas Wilson wrote:
 On Sunday, 23 May 2021 at 06:12:30 UTC, Ola Fosheim Grøstad
 5. Use directories.
Yes!!! sooo much yes! see above.
You can't complain unless you've had a go at making a change to Ada. [gcc] (master) $ ls gcc/c | wc -l 19 [gcc] (master) $ ls gcc/cp | wc -l 87 [gcc] (master) $ ls gcc/fortran | wc -l 99 [gcc] (master) $ ls gcc/d/dmd | wc -l 114 [gcc] (master) $ ls gcc/ada | wc -l 565
Eee gads!
 I do tend to agree though that we should try to respect 
 Dunbar's number when it comes to these things.  But the 
 individual file count does not map to reality.
it makes it difficult to navigate, especially so when you are unfamiliar with the code base.
May 24
prev sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Sunday, 23 May 2021 at 06:12:30 UTC, Ola Fosheim Grøstad wrote:
 7. Tutorials.
8. Proper module naming, not abbreviations. Abbreviations need to be remembered, and that is additional mental workload for new volunteer. 9. Proper variable naming, not abbreviations. I really tried to understand some code, but got discouraged once met all those abbreviated variable names, I literally had to stuff all my memory with what those abbreviations meant instead of trying to keep the thread of the logic that code is implementing. 10. Split up humongous methods and objects, they are rude to new volunteers, and discourages any code improvement. 11. Perhaps some tutorial, on how to orient in all dmd internals, with a nice abstract class diagram explaining key elements of dmd objects and how they interact between themselves. This would allow at least some kind of overview of what does what in dmd, and how they interact. I really was interested in doing some dmd bug fixes, but 8,9, and 10, make the code to take too much time, and willpower to just understand it. It was and is a huge barrier for me to try and fix/improve dmd. P.S. And no, e,exp,aa and other kind of abbreviations except for loop indexes, are not always obvious, and do take mental power and memory, while trying to understand existing code. They are not simple for new volunteers to dmd. Best regards, Alexandru.
May 24
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 2:44 AM, Alexandru Ermicioi wrote:
 They are not simple for new volunteers to dmd.
You're right, they are not. They're optimized for the people who spend thousands of hours working on it. This inevitably happens with every profession, every discipline, and every project. A jargon specific to it grows up around it, for the convenience of the people who work on it every day. If the jargon is consistent and reasonably logical, it can be a great aid to understanding once one gets familiar with it. Unfortunately, I have failed at my original design goal of making DMD a simple compiler. Reshuffling files around and renaming things will not help. What will help is better encapsulation - unfortunately, that is hard to do. There are some reasonably well-encapsulated parts. The lexer, the parser, and the files in the root package. To understand the compiler, I'd start there.
May 24
next sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Monday, 24 May 2021 at 10:34:35 UTC, Walter Bright wrote:
 On 5/24/2021 2:44 AM, Alexandru Ermicioi wrote:
 They are not simple for new volunteers to dmd.
You're right, they are not. They're optimized for the people who spend thousands of hours working on it. This inevitably happens with every profession, every discipline, and every project. A jargon specific to it grows up around it, for the convenience of the people who work on it every day. If the jargon is consistent and reasonably logical, it can be a great aid to understanding once one gets familiar with it.
Well, there is no dictionary for those abbreviations, and it is hard, to decipher them when looking into kilometer long code. That is my experience with dmd code: 1. I stumble on compiler bug. 2. File a bug report. 3. No-one fixes it in couple of days, and I think perhaps I can fix this bug, since it's not complicated, and should be couple of lines. 4. I download dmd, try to compile it somehow, because dub compilation either freezed, or failed, but somehow manage to by using older build system dmd had. 5. Then finally I can start changing code? 6. No, first find what module and class is responsible for that code, in ocean of modules named from an ocean of abbreviations, or with misguiding names. 7. Oh well after wasting an hour/two of from 3 to 4 what you have, you find it. 8. Then you look into the kilometer long function. You seem to find the piece of code that might be the cause of the bug, and try understanding it better. 9. You read that said code, try keeping in mind entire code flow you've read up to this point, and suddenly there is an 'aa'. 10. You try to figure out what 'aa' means, but fail to do so, therefore you need to look ad it's declaration to know the type and figure it out from it's type. 11. You find the type of variable, and rejoice at deciphering it being 'associative array', yay. 12. Okay, let's go back to the line with 'aa'. 13. First find that said line if for some reason your ide didn't retain it. 14. Once there, you continue reading, but wait,what was before the line with 'aa'? 15. Damn, I forgot. Sigh I have to read all the code again. That is my experience with all abbreviations in dmd, which are like an ocean. It is ok, to have a couple of well defined and documented abbreviations, not an ocean of them without any documentation. It is not my job, to fix dmd, I wanted to do something when I had couple of hours to invest. It is not rewarding when those couple of hours are wasted at deciphering abbreviations, and not even understanding the flow of code itself. Please limit use of abbreviations to minimum, and those that are used, should be documented.
 There are some reasonably well-encapsulated parts. The lexer, 
 the parser, and the files in the root package. To understand 
 the compiler, I'd start there.
Yet there is no official guidance on where to start. Also, please note that not all volunteers prefer reading source code, and invest hours at understanding the architecture and inner workings, starting from lexer or parser, some of them just want to fix a small bug, and be done with it. It is extremely hard to do that now. Best regards, Alexandru.
May 24
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 4:24 AM, Alexandru Ermicioi wrote:
 Please limit use of abbreviations to minimum, and those that are used, should
be 
 documented.
grep -w aa *.d yields: argtypes_aarch64.d: * https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst. clone.d: * int[S] aa; // Currently AA key uses bitwise comparison dinterpret.d: * aa[i][j] op= newval; dinterpret.d: * aa = [i:[j:T.init]]; dinterpret.d: * aa[j] op= newval; dinterpret.d: // Create a CTFE pointer &aa[index] dinterpret.d:private Expression interpret_aaApply(UnionExp* pue, InterState* istate, Expression aa, Expression deleg) dinterpret.d: aa = interpret(aa, istate); dinterpret.d: if (exceptionOrCantInterpret(aa)) dinterpret.d: return aa; dinterpret.d: if (aa.op != TOK.assocArrayLiteral) dinterpret.d: AssocArrayLiteralExp ae = cast(AssocArrayLiteralExp)aa; dmangle.d: private extern(D) bool backrefImpl(T)(ref AssocArray!(T, size_t) aa, T key) dmangle.d: auto p = aa.getLvalue(key); dsymbol.d: AliasAssign aa = new AliasAssign(loc, ident, dsymbol.d: return aa; e2ir.d: elem *aa = toElem(ie.e2, irs); e2ir.d: // aaInX(aa, keyti, key); e2ir.d: elem *ep = el_params(key, keyti, aa, null); e2ir.d: // *aaGetY(aa, aati, valuesize, &key); e2ir.d: // *aaGetRvalueX(aa, keyti, valuesize, &key); expression.d: * aa[k1][k2][k3] op= val; expression.d: * auto ref __aatmp = aa; expressionsem.d: * aa.remove(arg) into delete aa[arg] expressionsem.d: ce.error("expected key as argument to `aa.remove()`"); expressionsem.d: * aa[key] = e2; expressionsem.d: * ref __aatmp = aa; sideeffect.d: * S[int] aa; sideeffect.d: * aa[1] = 0; sideeffect.d: * 1 in aa ? aa[1].value = 0 : (aa[1] = 0, aa[1].this(0)).value; sideeffect.d: * int value = (aa[1] = 0); // value = aa[1].value which is not perfect, but very helpful. It usually means "Associative Array", but sometimes "AliasAssign". grep is very, very handy at this sort of thing. I use it constantly.
 Yet there is no official guidance on where to start. Also, please note that
not 
 all volunteers prefer reading source code, and invest hours at understanding
the 
 architecture and inner workings, starting from lexer or parser, some of them 
 just want to fix a small bug, and be done with it. It is extremely hard to do 
 that now.
Start here: https://github.com/dlang/dmd/blob/master/src/dmd/README.md Each source file has handy links at the start. For example, dsymbol.d: https://github.com/dlang/dmd/blob/master/src/dmd/dsymbol.d has a link to its documentation generated from Ddoc: https://dlang.org/phobos/dmd_dsymbol.html
May 26
next sibling parent reply zjh <fqbqrr 163.com> writes:
On Thursday, 27 May 2021 at 05:08:41 UTC, Walter Bright wrote:
 On 5/24/2021 4:24 AM, Alexandru Ermicioi wrote:
 Please limit use of abbreviations to minimum, and those that 
 are used, should be documented.
grep -w aa *.d
We have good articles, good posts. But no system rearragement. So, you said again and again, but others still don't know. We need open another section to rearrange the good post/infomation. We need good organization on man/info/code/....
May 26
next sibling parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Thursday, 27 May 2021 at 05:22:26 UTC, zjh wrote:
 On Thursday, 27 May 2021 at 05:08:41 UTC, Walter Bright wrote:
 On 5/24/2021 4:24 AM, Alexandru Ermicioi wrote:
 Please limit use of abbreviations to minimum, and those that 
 are used, should be documented.
grep -w aa *.d
We have good articles, good posts. But no system rearragement. So, you said again and again, but others still don't know. We need open another section to rearrange the good post/infomation. We need good organization on man/info/code/....
No need, just do this:
May 26
prev sibling parent zjh <fqbqrr 163.com> writes:
On Thursday, 27 May 2021 at 05:22:26 UTC, zjh wrote:
 On Thursday,
Stability is very important. And a `good architecture/organization` can fix problems as quickly as possible. D can not only rely on `1/2` man, or continue adding features and bugfixing. What D needs is good organization on `code/people/information`. D needs people to participate. Good organization is very important.`Good organization` gets twice the result with half the effort. Well organized, people are naturally willing to participate. And errors can be quickly fixed. Change the organization doesn't mean change implemention.So errors may not be too much.
May 26
prev sibling parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Thursday, 27 May 2021 at 05:08:41 UTC, Walter Bright wrote:
 grep is very,  very handy at this sort of thing. I use it 
 constantly.
We are stuck in a 70s mainframe.
May 26
prev sibling parent reply 12345swordy <alexanderheistermann gmail.com> writes:
On Monday, 24 May 2021 at 10:34:35 UTC, Walter Bright wrote:
 On 5/24/2021 2:44 AM, Alexandru Ermicioi wrote:
 They are not simple for new volunteers to dmd.
You're right, they are not. They're optimized for the people who spend thousands of hours working on it. This inevitably happens with every profession, every discipline, and every project. A jargon specific to it grows up around it, for the convenience of the people who work on it every day. If the jargon is consistent and reasonably logical, it can be a great aid to understanding once one gets familiar with it. Unfortunately, I have failed at my original design goal of making DMD a simple compiler. Reshuffling files around and renaming things will not help. What will help is better encapsulation - unfortunately, that is hard to do. There are some reasonably well-encapsulated parts. The lexer, the parser, and the files in the root package. To understand the compiler, I'd start there.
I seriously question the "Optimized for people who spend thousands of hours working on it" line, as I had a very intelligent person posted on slacks asking what does this function do, as there is no comments for said functions. -Alex
May 24
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/24/21 9:53 AM, 12345swordy wrote:
 On Monday, 24 May 2021 at 10:34:35 UTC, Walter Bright wrote:
 On 5/24/2021 2:44 AM, Alexandru Ermicioi wrote:
 They are not simple for new volunteers to dmd.
You're right, they are not. They're optimized for the people who spend thousands of hours working on it. This inevitably happens with every profession, every discipline, and every project. A jargon specific to it grows up around it, for the convenience of the people who work on it every day. If the jargon is consistent and reasonably logical, it can be a great aid to understanding once one gets familiar with it. Unfortunately, I have failed at my original design goal of making DMD a simple compiler. Reshuffling files around and renaming things will not help. What will help is better encapsulation - unfortunately, that is hard to do. There are some reasonably well-encapsulated parts. The lexer, the parser, and the files in the root package. To understand the compiler, I'd start there.
I seriously question the "Optimized for people who spend thousands of hours working on it" line, as I had a very intelligent person posted on slacks asking what does this function do, as there is no comments for said functions.
Adding documentation would be another good investment with terrific dividends. Again it minds my boggle that people talk about big changes (and no doubt would be willing to try them) but can't be bothered to make small changes with disproportionately good impact.
May 24
parent reply Max Haughton <maxhaton gmail.com> writes:
On Monday, 24 May 2021 at 20:47:39 UTC, Andrei Alexandrescu wrote:
 On 5/24/21 9:53 AM, 12345swordy wrote:
 On Monday, 24 May 2021 at 10:34:35 UTC, Walter Bright wrote:
 [...]
I seriously question the "Optimized for people who spend thousands of hours working on it" line, as I had a very intelligent person posted on slacks asking what does this function do, as there is no comments for said functions.
Adding documentation would be another good investment with terrific dividends. Again it minds my boggle that people talk about big changes (and no doubt would be willing to try them) but can't be bothered to make small changes with disproportionately good impact.
Where do you start? i.e. there's always work to be done but unless you enforce change from the top you're blocking a river at the mouth (to play devil's advocate)
May 24
next sibling parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Monday, 24 May 2021 at 21:01:05 UTC, Max Haughton wrote:
 On Monday, 24 May 2021 at 20:47:39 UTC, Andrei Alexandrescu 
 wrote:
 On 5/24/21 9:53 AM, 12345swordy wrote:
 On Monday, 24 May 2021 at 10:34:35 UTC, Walter Bright wrote:
 [...]
I seriously question the "Optimized for people who spend thousands of hours working on it" line, as I had a very intelligent person posted on slacks asking what does this function do, as there is no comments for said functions.
Adding documentation would be another good investment with terrific dividends. Again it minds my boggle that people talk about big changes (and no doubt would be willing to try them) but can't be bothered to make small changes with disproportionately good impact.
Where do you start? i.e. there's always work to be done but unless you enforce change from the top you're blocking a river at the mouth (to play devil's advocate)
The reason I put documentation low on my list is that it has a high maintenance cost if you are going to redesign. Also, it has not been a hindrance for experimentation for me. Probably a hindrance for fixing bugs, but that is not the topic.. In general, let us try too focus on macro issues, there is no need for dmd to be perfect in order to better support experimentation. Partitioning and interfacing is more important than statement and block level issues. Micro issues such as imports and number of OutBuffer implementations are low impact issues, those are more aesthetical in nature...
May 24
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 2:01 PM, Max Haughton wrote:
 Where do you start?
At the first function you notice that has poor/missing/wrong documentation. Like this one I just did: https://github.com/dlang/dmd/pull/12570
May 24
next sibling parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 May 2021 at 00:03:56 UTC, Walter Bright wrote:
 On 5/24/2021 2:01 PM, Max Haughton wrote:
 Where do you start?
At the first function you notice that has poor/missing/wrong documentation. Like this one I just did: https://github.com/dlang/dmd/pull/12570
This is not helpful. Too much commenting makes the code even harder to read and drowns out important comments. This corporate illness (which assumes that programmers are idiots) is why editors now ship with hide-all-comments functionality... Good code with good naming needs only few comments and those are on an _algorithmic_ level. Nobody that has read an introductory book on compilers need a comment explaining a function that is looking up a symbol from a symboltable. If that is a problem, improve the name, use a longer name.
May 24
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 May 2021 at 04:21:08 UTC, Ola Fosheim Grostad 
wrote:
 Nobody that has read an introductory book on compilers need a 
 comment explaining a function that is looking up a symbol from 
 a symboltable. If that is a problem, improve the name, use a 
 longer name.
An improvement that would have made any comment on "lookup(symbol)" superfluous is to have a signature that indicates whether a returned pointer can be null or not.
May 25
prev sibling next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 May 2021 at 04:21:08 UTC, Ola Fosheim Grostad 
wrote:
 Nobody that has read an introductory book on compilers need a 
 comment explaining a function that is looking up a symbol from 
 a symboltable. If that is a problem, improve the name, use a 
 longer name.
Of course, it is nice to know that that lookup(symbol) can return null, but that should be visible in the signature. That can be covered by some "nullable/notnullable" or "optional" wrapper. Just follow some established coding guidelines for signatures. I guess my point is: there is a big difference between documentation for a public API where the user is not supposed to read the code and internal relations in an application.
May 25
prev sibling parent reply sighoya <sighoya gmail.com> writes:
On Tuesday, 25 May 2021 at 04:21:08 UTC, Ola Fosheim Grostad 
wrote:

 This is not helpful. Too much commenting makes the code even 
 harder to read and drowns out important comments.  This 
 corporate illness (which assumes that programmers are idiots) 
 is why editors now ship with hide-all-comments functionality... 
 Good code with good naming needs only few comments and those 
 are on an _algorithmic_ level.
You can't encode the full semantic into one function name with parameter names without to over blow these names. Though, I concur with you for better naming, at least no abbreviations. I even find the code to be more structured with comment blocks, in my eyes it aids to visualize the code structure better.
 Nobody that has read an introductory book on compilers need a 
 comment explaining a function that is looking up a symbol from 
 a symboltable. If that is a problem, improve the name, use a 
 longer name.
+1 for `Symbol lookUpSymbol(string symbolName)` However, small comments inside the function would also be beneficial. A good example of comments is the ast module of nim: https://github.com/nim-lang/Nim/blob/devel/compiler/ast.nim
May 25
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 May 2021 at 08:32:46 UTC, sighoya wrote:
 You can't encode the full semantic into one function name with 
 parameter names without to over blow these names.
We can assume that the reader has read a book on compiler design and is familiar with the terminology and the most common algorithms. Provide a reference to wikipedia if unsure if the reader is with you... Functions that are only called from a few places can have long descriptive names, that is not a negative.
 However, small comments inside the function would also be 
 beneficial.
Yes, obviously. But adding 6 lines of comments for every trivial function is not helpful. It is a useless policy. It is a policy for the sake of having a policy. If time is invested in documenting things that should be changed... then change becomes less likely: "look, the documentation is over there, change not needed". Anyway, documentation is the wrong solution to structural issues. It does not enable anything. It is kinda like saying a city does not read roadsigns because there is a good map available. Or that a city that is a labyrinth of one-way streets are easy to navigate with the right kind of map. Driving while looking at a map is not a good experience. And when things change, can you then trust the map? *shrug*
May 25
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 25 May 2021 at 09:05:26 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 May 2021 at 08:32:46 UTC, sighoya wrote:
 [...]
We can assume that the reader has read a book on compiler design and is familiar with the terminology and the most common algorithms. Provide a reference to wikipedia if unsure if the reader is with you... Functions that are only called from a few places can have long descriptive names, that is not a negative.
 [...]
Yes, obviously. But adding 6 lines of comments for every trivial function is not helpful. It is a useless policy. It is a policy for the sake of having a policy. If time is invested in documenting things that should be changed... then change becomes less likely: "look, the documentation is over there, change not needed". Anyway, documentation is the wrong solution to structural issues. It does not enable anything. It is kinda like saying a city does not read roadsigns because there is a good map available. Or that a city that is a labyrinth of one-way streets are easy to navigate with the right kind of map. Driving while looking at a map is not a good experience. And when things change, can you then trust the map? *shrug*
I don't know...I mean it's a start... I feel like this forum has ADHD sometimes. A week ago it was all up in arms about ImportC, now it's fcused on this, two weeks from now this will be forgotten and on to something else...
May 25
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 May 2021 at 11:22:24 UTC, jmh530 wrote:
 I don't know...I mean it's a start...
Many functions have two-liners documentation already, but it is kinda like a forest. You see lots of individual trees, but the shape of the forest is hard to grasp. More documentation on individual functions won't enable anything. Just like stapling "this is spruce", "this is birch" to individual trees does not help much.
 I feel like this forum has ADHD sometimes. A week ago it was 
 all up in arms about ImportC, now it's fcused on this, two 
 weeks from now this will be forgotten and on to something 
 else...
You have to build consensus somehow. Most of the new cool features other languages get is bette done using a dedicated high level IR. There is currently no easy way to experiment with that for D (short of writing your own backend). I think that is a road block.
May 25
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 May 2021 at 11:40:42 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 May 2021 at 11:22:24 UTC, jmh530 wrote:
 I don't know...I mean it's a start...
Many functions have two-liners documentation already, but it is kinda like a forest. You see lots of individual trees, but the shape of the forest is hard to grasp. More documentation on individual functions won't enable anything. Just like stapling "this is spruce", "this is birch" to individual trees does not help much.
Or let me explain it another way. Assume writing good useful documentation takes 10-20% of your coding time. What should you formally document? 1. High level structure. 2. Stuff that is stable and well encapsulated and needs to be explained. 3. FAQs. 4. Functions that deviate from the norm (behaves in surprising ways). Stuff you want to replace, not so much I think. Given that those 10-20% would be better spent refactoring.
May 25
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 25 May 2021 at 12:21:23 UTC, Ola Fosheim Grøstad 
wrote:
 [snip]

 Stuff you want to replace, not so much I think. Given that 
 those 10-20% would be better spent refactoring.
Ultimately Walter needs to think about how he best spends his time. Refactoring won't happen overnight and more people who understand the compiler the more can assist with that and other things in the meantime.
May 25
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 May 2021 at 12:38:56 UTC, jmh530 wrote:
 Refactoring won't happen overnight and more people who 
 understand the compiler the more can assist with that and other 
 things in the meantime.
They best way to refactor is to partition and encapsulate then you can replace one item at a time. The only people who can do this is people who are willing to dig deep into the codebase. But this thread is about experimentation. Experimentation on top of parts that are considered unstable is futile. You don't need to understand every single piece of the compiler to have fun extending it. And you should focus your efforts on the stable parts. Parts that are considered unstable need to be encapsulated and provide interfaces so that people can build on those interfaces instead of making change hard by tying more stuff to the code that you want to replace.
May 25
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 May 2021 at 12:47:53 UTC, Ola Fosheim Grøstad 
wrote:
 Parts that are considered unstable need to be encapsulated and 
 provide interfaces so that people can build on those interfaces 
 instead of making change hard by tying more stuff to the code 
 that you want to replace.
The key point here is designing new and better interfaces. You cannot refactor yourself into heaven with no redesign. But it does not have to be disruptive. As an example, let's pretend we want a new AST and a new IR. Here is a non-disruptive sequence: 1. Write new AST and translation to old AST. Not disruptive. 2. Write translation from old AST to new IR, and encourge backends to transition. Not disruptive. 3. Transition to new IR, by making old AST private. Backends are ready. Not disruptive. 4. Move passes one by one to new IR. Not disruptive. 5. Write translation from new AST to new IR. Done. You don't want to document your old interface, because you don't want people to depend on it. You want to document your new interface and encourage people to transition. Then you eventually can make the old interface private and can in peace replace the old parts.
May 25
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 May 2021 at 13:11:17 UTC, Ola Fosheim Grøstad 
wrote:
 1. Write new AST and translation to old AST. Not disruptive.

 2. Write translation from old AST to new IR, and encourge 
 backends to transition. Not disruptive.

 3. Transition to new IR, by making old AST private. Backends 
 are ready. Not disruptive.

 4. Move passes one by one to new IR. Not disruptive.

 5. Write translation from new AST to new IR. Done.
If you are unsure if the new IR is stable you can rearrange the sequence like this instead: new AST -> new IR -> old AST Perhaps better as it gives more time for backends to transition.
May 25
prev sibling parent reply zjh <fqbqrr 163.com> writes:
+10086.
May 25
parent reply zjh <fqbqrr 163.com> writes:
On Tuesday, 25 May 2021 at 13:28:02 UTC, zjh wrote:
 +10086.
Refactoring doesn't take much time. Because the function has been realized. Refactoring has great benefits. Clearly hierarchy, Clearly dependence and Clearly interface.
May 25
parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 May 2021 at 13:33:25 UTC, zjh wrote:
 On Tuesday, 25 May 2021 at 13:28:02 UTC, zjh wrote:
 +10086.
Refactoring doesn't take much time. Because the function has been realized. Refactoring has great benefits. Clearly hierarchy, Clearly dependence and Clearly interface.
It takes time, but it is a necessary part of the life cycle, and does not have to be disruptive. It can happen bit by bit as long as you have a new design that is clean. The nice thing is that one can easily detect regressions by comparing a dump from the old compiler with a dump the new compiler. Do this for all D programs on github and you can feel confident that the new compiler has not introduced new errors. So: you compare new IR translated to old AST from the new compiler with the old ast from the old compiler for a D program. If they are equal, then the new compiler passed the test.
May 25
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/25/2021 4:22 AM, jmh530 wrote:
 I feel like this forum has ADHD sometimes. A week ago it was all up in arms 
 about ImportC, now it's fcused on this, two weeks from now this will be 
 forgotten and on to something else...
Meanwhile, Iain and I are putting out PRs on ImportC.
May 27
next sibling parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Thursday, 27 May 2021 at 09:53:19 UTC, Walter Bright wrote:
 On 5/25/2021 4:22 AM, jmh530 wrote:
 I feel like this forum has ADHD sometimes. A week ago it was 
 all up in arms about ImportC, now it's fcused on this, two 
 weeks from now this will be forgotten and on to something 
 else...
Meanwhile, Iain and I are putting out PRs on ImportC.
Yes, 2 people run ahead while 10 equally capable people throw their hands up in the air then walks off and start writing their own compilers.
May 27
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Thursday, 27 May 2021 at 09:58:31 UTC, Ola Fosheim Grostad 
wrote:
 On Thursday, 27 May 2021 at 09:53:19 UTC, Walter Bright wrote:
 On 5/25/2021 4:22 AM, jmh530 wrote:
 I feel like this forum has ADHD sometimes. A week ago it was 
 all up in arms about ImportC, now it's fcused on this, two 
 weeks from now this will be forgotten and on to something 
 else...
Meanwhile, Iain and I are putting out PRs on ImportC.
Yes, 2 people run ahead while 10 equally capable people throw their hands up in the air then walks off and start writing their own compilers.
2 years later they were all steamrolled by C++ and Swift because they failed to organize and coordinate between themselves. The sadness of Open Source is that unlike businesses they don't see productivitylosses. They can just ignore them and pretend they don't exist. Thus they fail to benefit fom the synergies that are available to them.
May 27
parent zjh <fqbqrr 163.com> writes:
On Thursday, 27 May 2021 at 11:27:00 UTC, Ola Fosheim Grostad 
wrote:
 On Thursday, 27 May 2021 at 09:58:31 UTC, Ola Fosheim Grostad
You're right. We have few people, and if we don't organize ourselves, we cannot compete with other languages. They all have organized. Wake up,Walter(repeate 3 times).
May 27
prev sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 27 May 2021 at 09:53:19 UTC, Walter Bright wrote:
 On 5/25/2021 4:22 AM, jmh530 wrote:
 I feel like this forum has ADHD sometimes. A week ago it was 
 all up in arms about ImportC, now it's fcused on this, two 
 weeks from now this will be forgotten and on to something 
 else...
Meanwhile, Iain and I are putting out PRs on ImportC.
Of course, forum =/= dmd
May 27
parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Thursday, 27 May 2021 at 12:53:25 UTC, jmh530 wrote:
 On Thursday, 27 May 2021 at 09:53:19 UTC, Walter Bright wrote:
 On 5/25/2021 4:22 AM, jmh530 wrote:
 I feel like this forum has ADHD sometimes. A week ago it was 
 all up in arms about ImportC, now it's fcused on this, two 
 weeks from now this will be forgotten and on to something 
 else...
Meanwhile, Iain and I are putting out PRs on ImportC.
Of course, forum =/= dmd
The forums are drying up when it comes to people who are interested in compilers. You better do something to recruit the ones that are still there unless you want to struggle with DIP1000 all by yourselves forever.
May 27
prev sibling parent reply sighoya <sighoya gmail.com> writes:
On Tuesday, 25 May 2021 at 09:05:26 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 25 May 2021 at 08:32:46 UTC, sighoya wrote:
 You can't encode the full semantic into one function name with 
 parameter names without to over blow these names.
We can assume that the reader has read a book on compiler design and is familiar with the terminology and the most common algorithms. Provide a reference to wikipedia if unsure if the reader is with you...
For very general things, yes, this is possible, but there are structures and algorithms out there which didn't resemble that what you've learned or there isn't a simple name invented/discovered by someone. Everyone has a different intuition how to solve a problem which could be pretty hard to follow without comments by reading solely index operations, shifts and type names which are so specific as the cosmos.
 Functions that are only called from a few places can have long 
 descriptive names, that is not a negative.
Trade off, but I appreciate this in tests for instance.
 Yes, obviously. But adding 6 lines of comments for every 
 trivial function is not helpful. It is a useless policy. It is 
 a policy for the sake of having a policy.
Yes, I agree with this and six lines is mostly too much, look at the example I linked before, this was mostly a one liner of a comment.
 If time is invested in documenting things that should be 
 changed... then change becomes less likely: "look, the 
 documentation is over there, change not needed".
Okay, that may be true, but it makes it also easier to dive in and to have fun to change things.
 Anyway, documentation is the wrong solution to structural 
 issues. It does not enable anything.
Agree.
 It is kinda like saying a city does not read roadsigns because 
 there is a good map available. Or that a city that is a 
 labyrinth of one-way streets are easy to navigate with the 
 right kind of map. Driving while looking at a map is not a good 
 experience. And when things change, can you then trust the map?

 *shrug*
I think the metaphor speaks against you as the map is the wiki article you mentioned :)
May 25
parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 25 May 2021 at 19:08:18 UTC, sighoya wrote:
 I think the metaphor speaks against you as the map is the wiki 
 article you mentioned :)
Nah, because that would be in the documentation, so you are already looking at the map. But I think for a compiler you should assume basic terminology to be known, if people have an interest for this they would pick up a book on compiler design and implementation.
May 25
prev sibling parent reply Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Tuesday, 25 May 2021 at 08:32:46 UTC, sighoya wrote:
 You can't encode the full semantic into one function name with 
 parameter names without to over blow these names.
In this case, it might be good to have a documentation comment, otherwise behavior should be known from the function name and args.
 However, small comments inside the function would also be 
 beneficial.
Having such comments inside function body, means you've failed to make the code easy to read and understand. Instead of such inline comments, consider extracting that piece into a function with right name. Adding such comments should be the last option in your decision on what to do with that piece of code. Note that most probably next dev, if he changed that piece of code, will most probably just forget updating that comment, meaning that it will tell a lie instead of truth.
May 25
parent sighoya <sighoya gmail.com> writes:
On Tuesday, 25 May 2021 at 16:00:32 UTC, Alexandru Ermicioi wrote:
 On Tuesday, 25 May 2021 at 08:32:46 UTC, sighoya wrote:
 You can't encode the full semantic into one function name with 
 parameter names without to over blow these names.
In this case, it might be good to have a documentation comment, otherwise behavior should be known from the function name and args.
Agree.
 Having such comments inside function body, means you've failed 
 to make the code easy to read and understand. Instead of such 
 inline comments, consider extracting that piece into a function 
 with right name.
It's a trade-off. Over modularization can also be a mispattern as it significantly reduces locality. The other point is how to deal with dynamic context which may solved with templates, what a hack. Anyhow, you don't always code very high level, sometimes a bit more low level or indirect, then it's good to have some thread to follow.
 Adding such comments should be the last option in your decision 
 on what to do with that piece of code.
Naming is more important, definitely. But succinct comments for small sections aren't that bad and are sometimes better than to modularize it with a function: ```D void firstAddToThenUpdateStructureThenFinalize... ``` Giving a shorter and a more non-functional name to this function would be ok but is sometimes too general to understand it in your context. Splitting this function in smaller parts may work to name these operations shorter, but the point is the context, it's not always clear even with correct semantic naming which is mostly not possible without to be too general. It's like commit messages, I like to commit first with the technical detail: ``` Update ClassA: ``` Then in the next lines I add some points describing newly added semantics which is too much to compact it into one single line. If you could add context otherwise, this would be pretty good, for instance Swift parameter labels are a first step into the right direction: send(message:"Hello World",from:"Earth",to:"Mars")
 Note that most probably next dev, if he changed that piece of 
 code, will most probably just forget updating that comment, 
 meaning that it will tell a lie instead of truth.
Yes, but I can argue with the same for modularization, if someone changes the body without to rename the function failed the same way.
May 25
prev sibling next sibling parent Basile B. <b2.temp gmx.com> writes:
On Tuesday, 25 May 2021 at 00:03:56 UTC, Walter Bright wrote:
 On 5/24/2021 2:01 PM, Max Haughton wrote:
 Where do you start?
At the first function you notice that has poor/missing/wrong documentation. Like this one I just did: https://github.com/dlang/dmd/pull/12570
Related, all the BUG:, TODO:, etc. comments should be moved to bugzilla. For example [here](https://github.com/dlang/dmd/blob/7fafcd213ac82c58e7b8fb8143c837c7595c4e8f/src/dmd/expressionsem.d#L10140) ```d /* BUG: Should handle things like: * char c; * c ~ ' ' * ' ' ~ c; */ ``` this is an request to have `char() ~ char()` producing `char[]`. This has nothing to do in the code.
May 25
prev sibling parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On Tuesday, 25 May 2021 at 00:03:56 UTC, Walter Bright wrote:
 On 5/24/2021 2:01 PM, Max Haughton wrote:
 Where do you start?
At the first function you notice that has poor/missing/wrong documentation. Like this one I just did: https://github.com/dlang/dmd/pull/12570
Another place you can make a big impact with zero change in language behavior is this: ``` Error: none of the overloads of size are callable using a const object, candidates are: dmd.mtype.Type.size() dmd.mtype.Type.size(ref const(Loc) loc) Error: mutable method dmd.mtype.TypeVector.elementType is not callable using a const object Consider adding const or inout here Error: mutable method dmd.mtype.TypeVector.isscalar is not callable using a const object Consider adding const or inout here Error: mutable method dmd.mtype.TypeVector.isintegral is not callable using a const object Consider adding const or inout here Error: mutable method dmd.mtype.TypeVector.isfloating is not callable using a const object Consider adding const or inout here ``` Which is but a small portion of the monster error that occurs when you use `const` instead of `auto` for any AST `Type` or `Dsymbol`.
May 25
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2021 6:53 AM, 12345swordy wrote:
 I seriously question the "Optimized for people who spend thousands of hours 
 working on it" line, as I had a very intelligent person posted on slacks
asking 
 what does this function do, as there is no comments for said functions.
"Use correct Ddoc function comment blocks." https://github.com/dlang/dmd/blob/master/CONTRIBUTING.md It's up to contributors to read and follow the guidelines, and up to those with pull privileges to require conformance. It's also up to you and I and us to go and fix documentation problems we run across, like this: https://github.com/dlang/dmd/pull/12570
May 24