www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Writing a (dis-)assembler for 8-bit code in D - blog posts

reply Dukc <ajieskola gmail.com> writes:
You remember Brian Callahan, the one who finished OpenBSD support 
for the D language? He has more posts that I think people here 
might find interesting. He has written a disassembler and an 
assembler for a Z80 processor in D.

The main point in his articles is seemingly to demonstrate how 
programs behave, choice of the language is an implementation 
detail. His D is still rough, as it is for anyone new to the 
language, but he knows a lot about low-level programming in 
general. If you're thinking about low-level programming or 
compiler technology, these are worth a look.

https://briancallahan.net/blog/archive.html
Apr 19
parent reply Brian <bcallah openbsd.org> writes:
Hello Dukc --

On Monday, 19 April 2021 at 15:01:07 UTC, Dukc wrote:
 You remember Brian Callahan, the one who finished OpenBSD 
 support for the D language? He has more posts that I think 
 people here might find interesting. He has written a 
 disassembler and an assembler for a Z80 processor in D.

 The main point in his articles is seemingly to demonstrate how 
 programs behave, choice of the language is an implementation 
 detail. His D is still rough, as it is for anyone new to the 
 language, but he knows a lot about low-level programming in 
 general. If you're thinking about low-level programming or 
 compiler technology, these are worth a look.

 https://briancallahan.net/blog/archive.html
I do exist in these parts and on this mailing list. :) Turns out I have at least one more post in the series--I decided to make the parser match the CP/M assembler for strings after all. But most certainly fixed that in a "C" way. Guess which language I'm in my head mapping my D on top of. And you are correct, choice of language is an implementation detail. It wasn't happenstance though. D has some nice facilities that, while perhaps not exclusive to D are nonetheless quite useful for achieving the goals of the dis/assembler. Also, perhaps people want to collect D tutorials for new D coders/new coders in general and this dis/assembler can eventually be a part of that. But... that doesn't mean I can't or won't take constructive critiques. But perhaps some context on the blog series: The whole point of the disassembler and assembler was to answer (for myself, really) if I could successfully teach someone with effectively no formal CS education how to write such tools. Imagine someone who never learned data structures wanting to write their own tools. You could hand them this dis/assembler, they could learn everything about them with just the code, the blog posts, and the skills they have now. And once they've done that, then you say "great, now go learn DS and algo and ..." So ways to make the code more readable to that target audience are appreciated and will almost certainly become the topic of their own post on the blog in that series. I'll also take tips for better idiomatic D in general, for my own sake. As an unrelated aside: I'm giving a talk about all the different languages I have helped port to OpenBSD (about 40 or so that I can remember as of now). It won't all be about D, but D will be an exclusive, highlighted, part of it. Humorously, everyone is going around calling it "the D on OpenBSD talk" because that one blog post really gained traction in the *BSD community too. Anyhow, it's May 5 at 18:45 NY time. Free and virtual (Zoom): https://www.nycbug.org/index?action=view&id=10683 ~Brian
Apr 19
next sibling parent starcanopy <starcanopy protonmail.com> writes:
On Monday, 19 April 2021 at 20:05:57 UTC, Brian wrote:
 The whole point of the disassembler and assembler was to answer 
 (for myself, really) if I could successfully teach someone with 
 effectively no formal CS education how to write such tools. 
 Imagine someone who never learned data structures wanting to 
 write their own tools. You could hand them this dis/assembler, 
 they could learn everything about them with just the code, the 
 blog posts, and the skills they have now. And once they've done 
 that, then you say "great, now go learn DS and algo and ..." So 
 ways to make the code more readable to that target audience are 
 appreciated and will almost certainly become the topic of their 
 own post on the blog in that series.
I'm in your target audience (never taken a CS class), and while I've only read the initial entry in the series, I think your prose is good, and your elucidations just as so. I look forward to reading the proceeding posts, so thanks for writing! With regards to using D, I think it's a good choice if only for the fact that it appears similar to C and one may translate code to-and-fro with relatively little hassle, but doesn't necessitate the degree of circumspection that the latter demands of a beginner. So, perhaps, progressing from "un-idiomatic" D (i.e. C-like?) to more modern practices could be a sub-plot of sorts.
Apr 19
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Monday, 19 April 2021 at 20:05:57 UTC, Brian wrote:
 I'll also take tips for better idiomatic D in general, for my 
 own sake.
Here goes some tips. Don't bother with `static` before the functions. It does nothing in D, unless the definition is local scope (inside a struct, class, union or another function). If you want to limit symbol visibility, see https://dlang.org/spec/attribute.html#visibility_attributes. TlDr: `private` means the symbol can only be used in the same file. `public` is the default and means the symbol can be `import`ed from another file. `export` is used when making .so files and tells the symbol must be dynamically linkable. You probably want to learn about `foreach` loop. For starters, you can replace almost all for loops this way: `foreach(i; 0 .. array1.length) array2[i] = array1[i];`. There are even better ways to do the same, http://ddili.org/ders/d.en/foreach.html is a good introduction. A word of warning about `foreach` though: it should not be used if the length of the array is going to change while iterating. In my example, `array.length1` is only calculated once, at start of the loop, so shortening `array1` in the body would lead to out-of-bounds condition. `static foreach` has a lot of potential to shorten your assembler code. For example, instead of ```D if (op == "nop") nop(); else if (op == "lxi") lxi(); else if (op == "stax") stax(); else if (op == "inx") inx(); else if (op == "inr") inr(); else if (op == "dcr") dcr(); else if (op == "mvi") mvi(); <...> else err("unknown opcode: " ~ op); ``` its better to write ```D sw: switch (op) { //iterates over the array at compile time static foreach(opStr; ["nop", "lxi", "stax", "inx", "inr", "dcr", "mvi", <...>]) { case opStr: mixin(opStr)(); //inserts a call to function with name opStr. break sw; //breaks out of the switch statement } default: err("unknown opcode: " ~ op); } ``` Even better is if you make a global array of opCodes and pass that to `static foreach`. The opcode array needs to be available in compile time, so mark it `enum` instead of `immutable`, like: `enum opcodes = ["nop", "lxi", "stax", <...>];`
 As an unrelated aside: I'm giving a talk about all the 
 different languages I have helped port to OpenBSD (about 40 or 
 so that I can remember as of now).
Wow, that's a lot! Congratulations!
Apr 20
next sibling parent reply Brian <bcallah openbsd.org> writes:
On Tuesday, 20 April 2021 at 10:02:05 UTC, Dukc wrote:
 TlDr: `private` means the symbol can only be used in the same 
 file.
That's clearly what I was looking for, thanks.
 You probably want to learn about `foreach` loop.
Oh, yes. I certainly know what a foreach loop is. D is hardly the only language to provide it. I actively chose against using it; long boring story short, it's the result of some self-bias I have from many years on a particular kind of coding research team I used to be on as a grad student. Looking back at the finished code, it was a decision that didn't bear out as I was hoping. As there are only 12 for loops (and no other types of loops, again, on purpose) in the whole assembler, that's probably a design decision that I would not have selected if I were going to start over. In fact, the new parser abandons the idea already. So I agree: converting all the for loops to foreach loops would be a nice additional blog post. Thanks.
 `static foreach` has a lot of potential to shorten your 
 assembler code.
 ```D
 sw: switch (op)
 {   //iterates over the array at compile time
     static foreach(opStr; ["nop", "lxi", "stax", "inx", "inr", 
 "dcr", "mvi", <...>])
     {	case opStr:
         mixin(opStr)(); //inserts a call to function with name 
 opStr.
         break sw; //breaks out of the switch statement
     }

     default: err("unknown opcode: " ~ op);
 }
 ```

 Even better is if you make a global array of opCodes and pass 
 that to `static foreach`. The opcode array needs to be 
 available in compile time, so mark it `enum` instead of 
 `immutable`, like: `enum opcodes = ["nop", "lxi", "stax", 
 <...>];`
I should spend some time looking at mixins. The rest you have there makes intuitive sense looking at it. (Crazy question: is there a way to dump internal state after semantic analysis?) Looking at the mixins page here: https://dlang.org/articles/mixin.html, I am already disappointed to learn about the monkey business it won't let me do :) I was really hoping to radically alter the syntax of D to create my own language and then implement an entirely different language in that (I kid, but only slightly. That's exactly what Arthur Whitney does in his language development: K being a good example of this.) ~Brian
Apr 20
next sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 20 April 2021 at 13:38:37 UTC, Brian wrote:
 [..]

 Looking at the mixins page here: 
 https://dlang.org/articles/mixin.html, I am already 
 disappointed to learn about the monkey business it won't let me 
 do :) I was really hoping to radically alter the syntax of D to 
 create my own language and then implement an entirely different 
 language in that (I kid, but only slightly. That's exactly what 
 Arthur Whitney does in his language development: K being a good 
 example of this.)

 ~Brian
You can almost do what you want, which may be sufficient for your needs :P Have a look these two projects as general examples: * https://github.com/PhilippeSigaud/Pegged * https://vibed.org/docs#html-templates Basically, you can embed a DSL with arbitrary syntax and semantics in a D program, just as long as the DSL code is encapsulated in a string mixin. So, you can either have every D file be just a big string mixin - `mixin(myDSL("<lots of code here>"))` (see Pegged), or you can put the DSL code as separate files and then string-import it at compile-time from the rest of the D code (see vibe.d).
Apr 20
parent Brian <bcallah openbsd.org> writes:
On Tuesday, 20 April 2021 at 14:52:08 UTC, Petar Kirov 
[ZombineDev] wrote:
 You can almost do what you want, which may be sufficient for 
 your needs :P

 Have a look these two projects as general examples:
 * https://github.com/PhilippeSigaud/Pegged
 * https://vibed.org/docs#html-templates

 Basically, you can embed a DSL with arbitrary syntax and 
 semantics in a D program, just as long as the DSL code is 
 encapsulated in a string mixin. So, you can either have every D 
 file be just a big string mixin - `mixin(myDSL("<lots of code 
 here>"))` (see Pegged), or you can put the DSL code as separate 
 files and then string-import it at compile-time from the rest 
 of the D code (see vibe.d).
Oh, that was definitely tongue-in-cheek to the point of being obnoxious on my part :) I was amused that the mixins page specifically called out that specific potential as monkey business. But I appreciate the links. ~Brian
Apr 20
prev sibling next sibling parent Paul Backus <snarwin gmail.com> writes:
On Tuesday, 20 April 2021 at 13:38:37 UTC, Brian wrote:
 I should spend some time looking at mixins. The rest you have 
 there makes intuitive sense looking at it. (Crazy question: is 
 there a way to dump internal state after semantic analysis?)
`-vcg-ast` will dump the AST after semantic analysis.
Apr 20
prev sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 20 April 2021 at 13:38:37 UTC, Brian wrote:
 `static foreach` has a lot of potential to shorten your 
 assembler code.
 ```D
 sw: switch (op)
 {   //iterates over the array at compile time
     static foreach(opStr; ["nop", "lxi", "stax", "inx", "inr", 
 "dcr", "mvi", <...>])
     {	case opStr:
         mixin(opStr)(); //inserts a call to function with name 
 opStr.
         break sw; //breaks out of the switch statement
     }

     default: err("unknown opcode: " ~ op);
 }
 ```

 Even better is if you make a global array of opCodes and pass 
 that to `static foreach`. The opcode array needs to be 
 available in compile time, so mark it `enum` instead of 
 `immutable`, like: `enum opcodes = ["nop", "lxi", "stax", 
 <...>];`
I should spend some time looking at mixins. The rest you have there makes intuitive sense looking at it. (Crazy question: is there a way to dump internal state after semantic analysis?)
Not sure if that's quite what you want, but you can use [`pragma(msg, typeOrValueKnownAtCompileTime)`][1] [²][2] to print stuff at CT. The following program: ```d void main() { string op; sw: switch (op) { static foreach(opStr; ["nop", "lxi", "stax", "inx", "inr", "dcr", "mvi"]) { case opStr: pragma(msg, "Inside case: '", opStr, "'"); break sw; } default: return; } } ``` Prints during compilation: ``` Inside case: 'nop' Inside case: 'lxi' Inside case: 'stax' Inside case: 'inx' Inside case: 'inr' Inside case: 'dcr' Inside case: 'mvi' ``` (Try online: https://run.dlang.io/is/czbSXe) If you want read an in-depth explanation regarding the fine distinction between `static` compile-time code and "dynamic" compile-time code, you can check this wiki page: https://wiki.dlang.org/User:Quickfur/Compile-time_vs._compile-time [1]: https://dlang.org/spec/pragma.html#msg [2]: http://ddili.org/ders/d.en/pragma.html P.S. As Paul mentioned, you can also use the `-vcg-ast` compiler switch (not sure if other compilers than dmd support it): https://run.dlang.io/is/AHHwAs
Apr 20
parent reply Brian <bcallah openbsd.org> writes:
On Tuesday, 20 April 2021 at 15:26:28 UTC, Petar Kirov 
[ZombineDev] wrote:
 Not sure if that's quite what you want, but you can use 
 [`pragma(msg, typeOrValueKnownAtCompileTime)`][1] [²][2] to 
 print stuff at CT.
I suppose what I want to do is traverse the compiler's transformation of the mixin. The mixin page suggests it performs that work at semantic evaluation time. ~Brian
Apr 20
next sibling parent Dukc <ajieskola gmail.com> writes:
On Tuesday, 20 April 2021 at 15:37:25 UTC, Brian wrote:
 I suppose what I want to do is traverse the compiler's 
 transformation of the mixin. The mixin page suggests it 
 performs that work at semantic evaluation time.

 ~Brian
I try to explain how it does that. First off, as the article says, a mixin must always expand to either a complete statement/declaration, or to an expression. This let's the compiler to complete the grammar pass (of stuff outside the mixin) before the semantic analysis, without needing to worry about the content of the mixin. This is why `mixin("{")` won't work - the grammar pass would have to analyze what is in the string to understand that. Secondly, a mixin can accept any string available at compile time. D has a compile time function execution engine, that is used if a result of a function is needed at compile time. This is the case for `mixin` arguments, but it's not the only thing. Behold: ```D import std; //a regular function, callable at both runtime and compile time string makeSymbol(const char c){return "in" ~ c;} //a template alias Type(T) = T; //compile-time constant that behaves like a variable enum intStr = makeSymbol('t'); void main() { //same as int x = 15; //all work to determine the type done by CTFE engine at semantic pass Type!(mixin(intStr)) x = 15; x += 10; x.writeln; } ```
Apr 20
prev sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Tuesday, 20 April 2021 at 15:37:25 UTC, Brian wrote:
 On Tuesday, 20 April 2021 at 15:26:28 UTC, Petar Kirov 
 [ZombineDev] wrote:
 Not sure if that's quite what you want, but you can use 
 [`pragma(msg, typeOrValueKnownAtCompileTime)`][1] [²][2] to 
 print stuff at CT.
I suppose what I want to do is traverse the compiler's transformation of the mixin. The mixin page suggests it performs that work at semantic evaluation time. ~Brian
`-vcg-ast` is your friend then! See for example: https://run.dlang.io/is/RlX9Ks Other than using the compiler frontend as a library (which is possible, but not yet straightforward), and the above mentioned switch, the typical way most people check the generated code is by simply printing the string that is being mixed-in (either at compile-time with `pragma(msg)`, or at run-time). Unlike some languages, where macros are essentially a separate language, in D, if you can call a function at compile-time you can certainly call it at run-time as well. So to check that `mixin(generateSomeCode())` does what you want, you can simply unit test `generateSomeCode` as usual.
Apr 20
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Tuesday, 20 April 2021 at 17:58:07 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Tuesday, 20 April 2021 at 15:37:25 UTC, Brian wrote:
 On Tuesday, 20 April 2021 at 15:26:28 UTC, Petar Kirov 
 [ZombineDev] wrote:
 Not sure if that's quite what you want, but you can use 
 [`pragma(msg, typeOrValueKnownAtCompileTime)`][1] [²][2] to 
 print stuff at CT.
I suppose what I want to do is traverse the compiler's transformation of the mixin. The mixin page suggests it performs that work at semantic evaluation time. ~Brian
`-vcg-ast` is your friend then! See for example: https://run.dlang.io/is/RlX9Ks Other than using the compiler frontend as a library (which is possible, but not yet straightforward), and the above mentioned switch, the typical way most people check the generated code is by simply printing the string that is being mixed-in (either at compile-time with `pragma(msg)`, or at run-time). Unlike some languages, where macros are essentially a separate language, in D, if you can call a function at compile-time you can certainly call it at run-time as well. So to check that `mixin(generateSomeCode())` does what you want, you can simply unit test `generateSomeCode` as usual.
D rox! How do we get more ppl to know about it? Is it because it can do "anything" and therefore ppl get suspicious? Like "that's not possible" (which can be partly true ofc). I've only known D2 for about 2 years, and recently started writing real stuff in it. Imo, if we continue polishing D2 it will be awesome. (Then for those wanting a D3, fork it so you can make the changes without worrying about compact, then see if it could be made compatible). For example, a conversion bw D1 and D2 could be done. And if we got a D3, the same thing could apply. The other way around is not as important imo. Anyway, D roks
Apr 23
parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Friday, 23 April 2021 at 09:43:30 UTC, Imperatorn wrote:
 On Tuesday, 20 April 2021 at 17:58:07 UTC, Petar Kirov 
 [ZombineDev] wrote:
 [...]
D rox! How do we get more ppl to know about it? Is it because it can do "anything" and therefore ppl get suspicious? Like "that's not possible" (which can be partly true ofc). I've only known D2 for about 2 years, and recently started writing real stuff in it. Imo, if we continue polishing D2 it will be awesome. (Then for those wanting a D3, fork it so you can make the changes without worrying about compact, then see if it could be made compatible). For example, a conversion bw D1 and D2 could be done. And if we got a D3, the same thing could apply. The other way around is not as important imo. Anyway, D roks
some of their latest features, while enjoying the large ecosystem of libraries and IDE tooling. Where do you think stuff like constexpr if or System.Span<T> was inspired from?
Apr 23
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 23 April 2021 at 11:27:50 UTC, Paulo Pinto wrote:
 On Friday, 23 April 2021 at 09:43:30 UTC, Imperatorn wrote:
 [...]
in some of their latest features, while enjoying the large ecosystem of libraries and IDE tooling. Where do you think stuff like constexpr if or System.Span<T> was inspired from?
Hmm, true. Well, then I guess it's the ecosystem we have to work on! That's a bit of a chicken and egg thing though
Apr 23
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/20/21 6:02 AM, Dukc wrote:
 Even better is if you make a global array of opCodes and pass that to 
 `static foreach`. The opcode array needs to be available in compile 
 time, so mark it `enum` instead of `immutable`, like: `enum opcodes = 
 ["nop", "lxi", "stax", <...>];`
Just FYI, immutables are accessible at compile-time. it can sometimes be advantageous to use immutable instead of enum, as an enum doesn't exist at runtime (so e.g. every use will construct a new array). -Steve
Apr 23
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
I can't make it due to a conflict but reminding everyone:

On 4/19/21 1:05 PM, Brian wrote:

 I'm giving a talk about all the different
 languages I have helped port to OpenBSD (about 40 or so that I can
 remember as of now). It won't all be about D, but D will be an
 exclusive, highlighted, part of it. Humorously, everyone is going around
 calling it "the D on OpenBSD talk" because that one blog post really
 gained traction in the *BSD community too. Anyhow, it's May 5 at 18:45
 NY time. Free and virtual (Zoom):
 https://www.nycbug.org/index?action=view&id=10683

 ~Brian
Ali
May 04
parent reply Matheus <matheus gmail.com> writes:
On Tuesday, 4 May 2021 at 23:03:55 UTC, Ali Çehreli wrote:
 I can't make it due to a conflict but reminding everyone:
Thanks for the info, and I hope someone will share this on youtube later, for those who can't access through current alternative. Matheus.
May 04
parent reply Brian <bcallah openbsd.org> writes:
On Wednesday, 5 May 2021 at 00:18:07 UTC, Matheus wrote:
 On Tuesday, 4 May 2021 at 23:03:55 UTC, Ali Çehreli wrote:
 I can't make it due to a conflict but reminding everyone:
Thanks for the info, and I hope someone will share this on youtube later, for those who can't access through current alternative. Matheus.
Yes. Speaking as one of the admins of NYC*BUG (where I'm giving this talk), we record all of our talks and they are posted. ~Brian
May 05
parent matheus <matheus gmail.com> writes:
On Wednesday, 5 May 2021 at 15:21:17 UTC, Brian wrote:
 On Wednesday, 5 May 2021 at 00:18:07 UTC, Matheus wrote:
 On Tuesday, 4 May 2021 at 23:03:55 UTC, Ali Çehreli wrote:
 I can't make it due to a conflict but reminding everyone:
Thanks for the info, and I hope someone will share this on youtube later, for those who can't access through current alternative. Matheus.
Yes. Speaking as one of the admins of NYC*BUG (where I'm giving this talk), we record all of our talks and they are posted. ~Brian
Awesome and I'll wait for it! Thanks. Matheus.
May 05