digitalmars.D.announce - j2d - translating Java to D with the language machine

Peri Hankey (32/32) Mar 27 2006 Hello

kris (2/42) Mar 27 2006 Nice! :)
Walter Bright (3/7) Mar 27 2006 What are you using for a lexer?

Brad Anderson (31/39) Mar 27 2006 Walter, I just started with The Language Machine, and am no expert by

Peri Hankey (59/117) Mar 28 2006 Brad - that was good. Walter, as Brad says, lexing and grammar are not

Peri Hankey <mpah thegreen.co.uk> writes:

Hello

Following an exchange of emails with Brad Anderson I have adapted my 
d-to-d translator to translate from java to D. It's at a very early 
stage (I started a couple of days ago). It seems to me that the first 
thing to do is to apply it to the gnu classpath sources.

Results so far:

               j2d ok? gdc syntax ok? gdc compile?  source lines
   java.lang   y       y              n             33871
   java.util   y       y              n             56636
   java.io     y       y              n             23187

This is quick and dirty so far: the code is probably wrong in many 
places, and mapping java classes and run-time class information may be 
tricky - I'm sure many of you will have already thought about this. At 
present compilation fails because modules are not being found, but when 
they are found there will be a different crop of incompatible 
this-that-the-other errors. Also the java.lang modules need to be 
automatically included.

So there is a great deal to do. The sources (about 700 rules, 1100 
lines) are in SVN at dsource - it's easiest to start at

    http://languagemachine.sourceforge.net/j2d.html

and follow the link. I am developing on Linux, and this is definitely a 
project that needs shared libraries. The easiest way to join the chase 
is to use:

* the language machine (required)
* gdc (0.17) - for shared libraries on Linux
* gnu make - (or roll your own build tools)
* gnu classpath sources: http://www.gnu.org/software/classpath/

Suggestions, feedback, assistance all welcome.

Peri

-- 
Peri Hankey                               mpah thegreen.co.uk
http://languagemachine.sourceforge.net - The language machine

Mar 27 2006

kris <foo bar.com> writes:

Peri Hankey wrote:
 Hello
 
 Following an exchange of emails with Brad Anderson I have adapted my 
 d-to-d translator to translate from java to D. It's at a very early 
 stage (I started a couple of days ago). It seems to me that the first 
 thing to do is to apply it to the gnu classpath sources.
 
 Results so far:
 
               j2d ok? gdc syntax ok? gdc compile?  source lines
   java.lang   y       y              n             33871
   java.util   y       y              n             56636
   java.io     y       y              n             23187
 
 This is quick and dirty so far: the code is probably wrong in many 
 places, and mapping java classes and run-time class information may be 
 tricky - I'm sure many of you will have already thought about this. At 
 present compilation fails because modules are not being found, but when 
 they are found there will be a different crop of incompatible 
 this-that-the-other errors. Also the java.lang modules need to be 
 automatically included.
 
 So there is a great deal to do. The sources (about 700 rules, 1100 
 lines) are in SVN at dsource - it's easiest to start at
 
    http://languagemachine.sourceforge.net/j2d.html
 
 and follow the link. I am developing on Linux, and this is definitely a 
 project that needs shared libraries. The easiest way to join the chase 
 is to use:
 
 * the language machine (required)
 * gdc (0.17) - for shared libraries on Linux
 * gnu make - (or roll your own build tools)
 * gnu classpath sources: http://www.gnu.org/software/classpath/
 
 Suggestions, feedback, assistance all welcome.
 
 Peri
 


Nice! :)

Mar 27 2006

"Walter Bright" <newshound digitalmars.com> writes:

"Peri Hankey" <mpah thegreen.co.uk> wrote in message 
news:e09ta6$2rco$1 digitaldaemon.com...
 Following an exchange of emails with Brad Anderson I have adapted my 
 d-to-d translator to translate from java to D. It's at a very early stage 
 (I started a couple of days ago). It seems to me that the first thing to 
 do is to apply it to the gnu classpath sources.

What are you using for a lexer?

Mar 27 2006

Brad Anderson <brad dsource.dot.org> writes:

Walter Bright wrote:
 "Peri Hankey" <mpah thegreen.co.uk> wrote in message 
 news:e09ta6$2rco$1 digitaldaemon.com...
 Following an exchange of emails with Brad Anderson I have adapted my 
 d-to-d translator to translate from java to D. It's at a very early stage 
 (I started a couple of days ago). It seems to me that the first thing to 
 do is to apply it to the gnu classpath sources.

 
 What are you using for a lexer? 

Walter, I just started with The Language Machine, and am no expert by 
any means, but I'll give it a shot...  The LM is kind of a lexer and 
parser rolled into one.

http://languagemachine.sourceforge.net/ is a good intro, as well as the 
paradigm shift page.

Rules like:
[a-z_A-Z] % { { repeat [a-zA-Z_0-9] % } toSym:X } <- identifier symbol :X ;

identify that any letter or underscore, followed by zero or more 
letter/underscores is stored in a symbol X, and the right-hand side of 
the rule says these are symbols (tokens in a normal lexer).

Later on, rules like:
"instanceof" type :B <- op opnd:{ ft  :"ty"   :A :B };

tell us that this java 'instanceof' symbol becomes a 'ft' in the 
intermediate representation.

These are the Java to X frontend rules contained here:
http://www.dsource.org/projects/languagemachine/browser/trunk/languagemachine/src/j2d/j2xfe.lmn
(X being the intermediate representation).

Then an X to D backend set of rules is used to take that 'ft' and make 
it into a 'D instanceof', if there is any difference between the two 
languages, which there is in this case.

The rule in the backend turns it into D code:
ft :F :A :B  <- code - "(cast(" B ")" A ")";

The D backend can be found here:
http://www.dsource.org/projects/languagemachine/browser/trunk/languagemachine/src/j2d/x2dbe.lmn

It's a little bit like Scott Sanders' Molt project which hacked the 
Jikes compiler into emitting an XML intermediate representation of the 
Java code and XSLT was used to take the XML doc and transform it into D.

hth,
BA

P.S. Peri, how'd I do?

Mar 27 2006

Peri Hankey <mpah thegreen.co.uk> writes:

Brad Anderson wrote:
 Walter Bright wrote:
 
 "Peri Hankey" <mpah thegreen.co.uk> wrote in message 
 news:e09ta6$2rco$1 digitaldaemon.com...

 Following an exchange of emails with Brad Anderson I have adapted my 
 d-to-d translator to translate from java to D. It's at a very early 
 stage (I started a couple of days ago). It seems to me that the first 
 thing to do is to apply it to the gnu classpath sources.


 What are you using for a lexer? 

 
 
 Walter, I just started with The Language Machine, and am no expert by 
 any means, but I'll give it a shot...  The LM is kind of a lexer and 
 parser rolled into one.
 
 http://languagemachine.sourceforge.net/ is a good intro, as well as the 
 paradigm shift page.
 
 Rules like:
 [a-z_A-Z] % { { repeat [a-zA-Z_0-9] % } toSym:X } <- identifier symbol :X ;
 
 identify that any letter or underscore, followed by zero or more 
 letter/underscores is stored in a symbol X, and the right-hand side of 
 the rule says these are symbols (tokens in a normal lexer).
 
 Later on, rules like:
 "instanceof" type :B <- op opnd:{ ft  :"ty"   :A :B };
 
 tell us that this java 'instanceof' symbol becomes a 'ft' in the 
 intermediate representation.
 
 These are the Java to X frontend rules contained here:
 http://www.dsource.org/projects/languagemachine/browser/trunk/languagemachi
e/src/j2d/j2xfe.lmn 
 
 (X being the intermediate representation).
 
 Then an X to D backend set of rules is used to take that 'ft' and make 
 it into a 'D instanceof', if there is any difference between the two 
 languages, which there is in this case.
 
 The rule in the backend turns it into D code:
 ft :F :A :B  <- code - "(cast(" B ")" A ")";
 
 The D backend can be found here:
 http://www.dsource.org/projects/languagemachine/browser/trunk/languagemachi
e/src/j2d/x2dbe.lmn 
 
 
 It's a little bit like Scott Sanders' Molt project which hacked the 
 Jikes compiler into emitting an XML intermediate representation of the 
 Java code and XSLT was used to take the XML doc and transform it into D.
 
 hth,
 BA
 
 P.S. Peri, how'd I do?

Brad - that was good. Walter, as Brad says, lexing and grammar are not 
really distinguished in LM except that one writes rules in different 
ways for different purposes. The 5-minute grok:

Rules do recognition and substitution - think macro substitution with 
grammar. Rules are triggered by mismatch between a goal symbol and an 
actual or substituted input symbol - symbols are just values in a 
stream. See languagemachine.sf.net/language_machine_notes.pdf.

in LMN, initial space distinguishes rule text from wiki-style comment. 
The '.grammar(...)' directives select a grammar and priority attributes 
for subsequent rules - the grammar for the j2d rules is called 'd' for 
because that's what they were in the d2d frontend.

* single quotes: actual text (unicode)
* double quotes: arbitrary nonterminal symbols (unicode)
* identifiers (initial '_' or lowercase): nonterminal symbols
* identifiers (initial uppercase): variables
* ":"  on lh, variable binding;   on rh, provide value to binding
* "%"  on lh, acquire one symbol; on rh, provide resulting sequence
* .[a-z_] character in range (initial '^' for exclusion)
*  [a-z_] means same as above, but now deprecated

There are four possible ways of selecting a rule to try:

by goal and input symbols: most efficient, hashed on the pair
by input symbol only:      bottom-up, any context
at each mismatch, for any pair: (speculative, deprecated)
by goal symbol only:       top-down recursive descent

The '-' symbol outside arithmetic is used to mean 'never mind':

  'wabe' <- nonsense; // recognise 'wabe' as "nonsense"
  stuff  <- -;        // never mind goal, ignore "stuff"
  - term <- expr;     // never mind input, look for "term" as "expr"
  br :X  <- code - X; // we promise "code", never mind, provide X

Rule priorities limit the way rule recognition phases are permitted to 
nest. Rule classes are tried in order: specific, bottom-up, speculative, 
top-down. Aternatives in each class are tried 'longest' and 'newest' 
first, with "fastback" backtrack to state at triggering mismatch.

The lexical rule Brad quoted came from the d-to-d translator. Here are 
some (tidied) lexical rules from j2xfe.lmn:

  .d(b)       // rules to distinguish reserved symbols
  "w?" "r?"  :X  <-- X;                   // eg: "if" <- "r?" :"if";
  - anything :X  <-  "r?" :{ symbol :X }; // not reserved, wrap it

  .d(1000L)   // some lexical rules - high priority, left associative

  .[a-zA-Z_] { %  { repeat .[a-zA-Z_0-9] % } toSys :X }  <-- "w?" X ;

  '\'' { repeat e1 .[^\'] % } '\'' toStr:Str     <- - squote :{ Str } ;
  '\"' { repeat e2 .[^\"] % } '\"' toStr:Str     <- - dquote :{ Str } ;
  '.'    % decimal   %   dexp   % rtype % type:T <- - number % ;
  '0'    % znumber   %                    type:T <- - number % ;
  .[1-9] % { repeat .[0-9] % }  dpoint %  type:T <- - number % ;

  .[ \t\n\r\f\v]     <-- ;   // ignore white space
  '//' line          <- - ;  // remainder of line is comment
  '/*' comment '*/'  <- - ;  // delimited comments (no nesting)

   .d(1010R)   // within atoms - higher priority, right associative

Subsequent rules deal with what happens within a lexical atom - numeric 
literals, quoted strings, comment text etc. Higher priority, so spaces 
will not be ignored within quotes, numbers, etc. Right associative to 
permit nesting and recursion.

I'm sorry if that was too long - won't do it again.
Peri

-- 
Peri Hankey                               mpah thegreen.co.uk
http://languagemachine.sourceforge.net - The language machine

Mar 28 2006

D Programming

C/C++ Programming

Other

digitalmars.D.announce - j2d - translating Java to D with the language machine