digitalmars.D.learn - Get address of label?

Heywood Floyd (21/21) Dec 25 2010 Is this possible somehow:

Simen kjaeraas (23/40) Dec 25 2010 Absolutely:

Heywood Floyd (38/83) Dec 25 2010 Thanks for the answer!

Simen kjaeraas (52/98) Dec 25 2010 enum opTable : int {

bearophile (167/171) Dec 26 2010 This is true in theory, and I remember Walter liking this optimization. ...

Heywood Floyd (5/207) Dec 26 2010 Thank you bearophile and Simen for your replies! Very helpful!

bearophile (4/5) Dec 25 2010 In this simple case Simen kjaeraas has shown you a solution. But in gene...

Heywood Floyd <soul8o8 gmail.com> writes:

Is this possible somehow:

	int op(int r, int i)
	{
		static auto tbl = [&add, &sub, &mul];
		goto tbl[i % 3];
	
		add:
			r++;
			goto end;
		sub:
			r--;
			goto end;
		mul:
			r*=r;
			goto end;
		end:
		return r;
	}

Happy holidays!
BR
/HF

Dec 25 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Heywood Floyd <soul8o8 gmail.com> wrote:

 Is this possible somehow:
      int op(int r, int i)
     {
         static auto tbl = [&add, &sub, &mul];
         goto tbl[i % 3];
            add:
             r++;
             goto end;
         sub:
             r--;
             goto end;
         mul:
             r*=r;
             goto end;
         end:
         return r;
     }

Absolutely:

enum ops : int {
     add = 0,
     sub = 1,
     mul = 2;
}

int op( int r, int i ) {
     switch( i % 3 ) {
         case add:
             r++;
             break;
         case sub:
             r--;
             break;
         case mul:
             r*=r;
             break;
     }
     return r;
}

-- 
Simen

Dec 25 2010

Heywood Floyd <soul8o8 gmail.com> writes:

Thanks for the answer!
Ok, hm, how about this then:

	auto opTable = [&op_add, &op_cmp, &op_end]; //etc.

	ubyte[] eval(ubyte[] prog)
	{
		int ip = 0, sp = 0;
		ubyte[4096] stack;

		next:
			goto opTable[prog[ip++] & OP_TABLE_MASK];
		
		op_add:
			stack[sp] += stack[sp-1];
			goto next;
		
		op_cmp:
			stack[sp] = stack[sp-1] == stack[sp-2];
			goto next;
	
		/// and so on...
	
		op_end:
			return stack;
	}


What I'm looking for here is a way of interpreting code without creating
branches in the machine code, unless the interpreted code actually does a
branch (ie a conditional jump). Seems to me a switch would introduce branching
(?) of some sort.

I mean, even if switch is implemented as a jump table, it would still do some
basic bounds checking, or?

I'm also interested in trying to inline the "next"-operation here, ie like

	string op_next(){ return "goto opTable[prog[ip++] & OP_TABLE_MASK];"; }
	//...
	op_add:
		stack[sp] += stack[sp-1];
		mixin(op_next());

..in order to reduce a jump. Of course I'm just playing around with different
strategies for creating a fast interepreter. In C, at least, using a jump table
instead of a switch is faster, especially in 32-bit mode (according to some
very simple experiments, which may or may not hold water in reality�).

Any ideas for how to make a jump-table interpreter in D? Is it doable with
inline asm perhaps? If at least not for any other reason than to investigate if
it's faster. (Or is it a stupid idea to begin with? Is this overkill? : )

cheers!
BR
/HF


PS. How do you export assembler code from the DMD-compiler?










Simen kjaeraas Wrote:

 Heywood Floyd <soul8o8 gmail.com> wrote:
 
 Is this possible somehow:
      int op(int r, int i)
     {
         static auto tbl = [&add, &sub, &mul];
         goto tbl[i % 3];
            add:
             r++;
             goto end;
         sub:
             r--;
             goto end;
         mul:
             r*=r;
             goto end;
         end:
         return r;
     }

 
 Absolutely:
 
 enum ops : int {
      add = 0,
      sub = 1,
      mul = 2;
 }
 
 int op( int r, int i ) {
      switch( i % 3 ) {
          case add:
              r++;
              break;
          case sub:
              r--;
              break;
          case mul:
              r*=r;
              break;
      }
      return r;
 }
 
 -- 
 Simen

Dec 25 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Heywood Floyd <soul8o8 gmail.com> wrote:

 Thanks for the answer!
 Ok, hm, how about this then:
 	auto opTable =3D [&op_add, &op_cmp, &op_end]; //etc.
 	ubyte[] eval(ubyte[] prog)
 	{
 		int ip =3D 0, sp =3D 0;
 		ubyte[4096] stack;
 		next:
 			goto opTable[prog[ip++] & OP_TABLE_MASK];
 		=

 		op_add:
 			stack[sp] +=3D stack[sp-1];
 			goto next;
 		=

 		op_cmp:
 			stack[sp] =3D stack[sp-1] =3D=3D stack[sp-2];
 			goto next;
 	=

 		/// and so on...
 	=

 		op_end:
 			return stack;
 	}


enum opTable : int {
     op_add,
     op_cmp,
     op_end,
     // etc
}

ubyte[] eval(ubyte[] prog) pure {
     int ip =3D 0, sp =3D 0;
     ubyte[4096] stack;

     while ( true ) {
         final switch ( cast( opTable )( prog[ip++] & OP_TABLE_MASK ) ) =
{
             case op_add:
                 stack[sp] +=3D stack[sp-1];
                 continue;
             case op_cmp:
                 // blahblahblah
                 continue;
             // ???
             case op_end: // Profit!
                 return stack;
         }
     }
}

 What I'm looking for here is a way of interpreting code without creati=

ng  =

 branches in the machine code, unless the interpreted code actually doe=

s  =

 a branch (ie a conditional jump). Seems to me a switch would introduce=

  =

 branching (?) of some sort.

Seems to me a goto would introduce a branch, so I'm not sure doing if
your way actually causes less branching.


 I mean, even if switch is implemented as a jump table, it would still =

do  =

 some basic bounds checking, or?

Final switch to the rescue!
http://digitalmars.com/d/2.0/statement.html#FinalSwitchStatement

Essentially, mark the switch as final, and cover every option.
Likely, the optimizer does that for you if you cover every option but
don't mark the switch as final.


 I'm also interested in trying to inline the "next"-operation here, ie =

 =

 like
 	string op_next(){ return "goto opTable[prog[ip++] & OP_TABLE_MASK];";=

 }
 	//...
 	op_add:
 		stack[sp] +=3D stack[sp-1];
 		mixin(op_next());
 ..in order to reduce a jump. Of course I'm just playing around with  =

 different strategies for creating a fast interepreter. In C, at least,=

  =

 using a jump table instead of a switch is faster, especially in 32-bit=

  =

 mode (according to some very simple experiments, which may or may not =

 =

 hold water in reality=E2=84=A2).

Now this, this would not work with a simple switch, no.


 Any ideas for how to make a jump-table interpreter in D? Is it doable =

 =

 with inline asm perhaps? If at least not for any other reason than to =

 =

 investigate if it's faster. (Or is it a stupid idea to begin with? Is =

 =

 this overkill? : )

It likely is overkill, but I'm no expert in these matters. My forays
into the inline asm idea proved fruitless, but there may yet be ways.


 PS. How do you export assembler code from the DMD-compiler?

Not sure what you mean here. Do you want an assembler-listing as the
output of the compiler? If so, I don't think there's a way except for
obj2asm or similar disassemblers.

-- =

Simen

Dec 25 2010

bearophile <bearophileHUGS lycos.com> writes:

Simen kjaeraas:

 Essentially, mark the switch as final, and cover every option.
 Likely, the optimizer does that for you if you cover every option but
 don't mark the switch as final.

This is true in theory, and I remember Walter liking this optimization. But in
practice I don't know if DMD performs this optimization already, so you need to
take a look at the produced asm to be sure.

--------------------------------

This is a little test, D2 code:

enum E { e1, e2, e3 }

int foo1(E e) {
    switch (e) {
        case E.e1: return 1;
        case E.e2: return 2;
        case E.e3: return 3;
        default: assert(0);
    }
}

int foo2(E e) {
    switch (e) {
        case E.e1: return 1;
        case E.e2: return 2;
        default: return 3;
    }
}

int foo3(E e) {
    final switch (e) {
        case E.e1: return 1;
        case E.e2: return 2;
        case E.e3: return 3;
    }
}
void main() {}


_D5test4foo1FE5test31EZi
        push EAX
        test EAX,EAX
        je  L11
        cmp EAX,1
        je  L18
        cmp EAX,2
        je  L1F
        jmp short L26
L11:    pop ECX
        mov EAX,1
        ret
L18:    pop ECX
        mov EAX,2
        ret
L1F:    pop ECX
        mov EAX,3
        ret
L26:    hlt

_D5test4foo2FE5test31EZi
        push EAX
        test EAX,EAX
        je  LC
        cmp EAX,1
        je  L13
        jmp short L1A
LC:     pop ECX
        mov EAX,1
        ret
L13:    pop ECX
        mov EAX,2
        ret
L1A:    pop ECX
        mov EAX,3
        ret

_D5test4foo3FE5test31EZi
        push EAX
        test EAX,EAX
        je  L11
        cmp EAX,1
        je  L18
        cmp EAX,2
        je  L1F
        jmp short L26
L11:    pop ECX
        mov EAX,1
        ret
L18:    pop ECX
        mov EAX,2
        ret
L1F:    pop ECX
        mov EAX,3
        ret
L26:    pop EAX
        ret

--------------------------------

Some C code compiled with GCC 4.5.1:

typedef enum { e1, e2, e3 } E;

int foo2(E e) {
    switch (e) {
        case e1: return 1;
        case e2: return 2;
        default: return 3;
    }
}

int foo3(E e) {
    switch (e) {
        case e1: return 1;
        case e2: return 2;
        case e3: return 3;
    }
}

int foo4(E e) {
    static void *array[] = { &&E1, &&E2, &&E3 };

    goto *array[e];

    E1: return 1;
    E2: return 2;
    E3: return 3;
}

int main() {
    return 0;
}


_foo2:
    movl    4(%esp), %edx
    movl    $3, %eax
    cmpl    $1, %edx
    jbe L5
    rep
    ret
    .p2align 4,,7
L5:
    movl    _CSWTCH.1(,%edx,4), %eax
    ret
    .p2align 4,,15

_foo3:
    movl    4(%esp), %edx
    cmpl    $1, %edx
    je  L11
    movl    $1, %eax
    jb  L6
    cmpl    $2, %edx
    je  L13
    .p2align 4,,3
    rep
    ret
    .p2align 4,,7
L11:
    movl    $2, %eax
L6:
    .p2align 4,,3
    rep
    ret
    .p2align 4,,7
L13:
    movb    $3, %al
    ret

_foo4:
	movl	4(%esp), %eax
	jmp	*_array.1639(,%eax,4)
	.p2align 4,,7
L15:
	movl	$1, %eax
	ret
	.p2align 4,,7
L17:
	movl	$2, %eax
	ret
	.p2align 4,,7
L18:
	movl	$3, %eax
	ret

--------------------------------

 My forays into the inline asm idea proved fruitless, but there may yet be ways.

In D+DMD inline asm kills inlining, so you may use inline asm only if you need
to do lot of computations.
In LDC there are pragma(allow_inline) and asm expressions that some most of
this problem.

Going back to the OP problem: in D there are no computed gotos, that are useful
if you want to write very fast interpreters and other things. But keep in mind
that DMD supports normal gotos from and in inlined asm (LLVM-LDC doesn't
supports this well), plus naked asm, this gives you some possibilities.
An option on Linux is to write the interpreter core using GNU C (that has
computed gotos) and then link the core to the D code compiled with DMD/GDC.

It's strange how something as basic, old and necessary as a switch, to create a
basic but fast interpreter, is so hard to compile well for compilers :-)

Bye,
bearophile

Dec 26 2010

Heywood Floyd <soul8o8 gmail.com> writes:

Thank you bearophile and Simen for your replies! Very helpful! 
I'll keep looking into it...

BR
/HF



bearophile Wrote:

 Simen kjaeraas:
 
 Essentially, mark the switch as final, and cover every option.
 Likely, the optimizer does that for you if you cover every option but
 don't mark the switch as final.

 
 This is true in theory, and I remember Walter liking this optimization. But in
practice I don't know if DMD performs this optimization already, so you need to
take a look at the produced asm to be sure.
 
 --------------------------------
 
 This is a little test, D2 code:
 
 enum E { e1, e2, e3 }
 
 int foo1(E e) {
     switch (e) {
         case E.e1: return 1;
         case E.e2: return 2;
         case E.e3: return 3;
         default: assert(0);
     }
 }
 
 int foo2(E e) {
     switch (e) {
         case E.e1: return 1;
         case E.e2: return 2;
         default: return 3;
     }
 }
 
 int foo3(E e) {
     final switch (e) {
         case E.e1: return 1;
         case E.e2: return 2;
         case E.e3: return 3;
     }
 }
 void main() {}
 
 
 _D5test4foo1FE5test31EZi
         push EAX
         test EAX,EAX
         je  L11
         cmp EAX,1
         je  L18
         cmp EAX,2
         je  L1F
         jmp short L26
 L11:    pop ECX
         mov EAX,1
         ret
 L18:    pop ECX
         mov EAX,2
         ret
 L1F:    pop ECX
         mov EAX,3
         ret
 L26:    hlt
 
 _D5test4foo2FE5test31EZi
         push EAX
         test EAX,EAX
         je  LC
         cmp EAX,1
         je  L13
         jmp short L1A
 LC:     pop ECX
         mov EAX,1
         ret
 L13:    pop ECX
         mov EAX,2
         ret
 L1A:    pop ECX
         mov EAX,3
         ret
 
 _D5test4foo3FE5test31EZi
         push EAX
         test EAX,EAX
         je  L11
         cmp EAX,1
         je  L18
         cmp EAX,2
         je  L1F
         jmp short L26
 L11:    pop ECX
         mov EAX,1
         ret
 L18:    pop ECX
         mov EAX,2
         ret
 L1F:    pop ECX
         mov EAX,3
         ret
 L26:    pop EAX
         ret
 
 --------------------------------
 
 Some C code compiled with GCC 4.5.1:
 
 typedef enum { e1, e2, e3 } E;
 
 int foo2(E e) {
     switch (e) {
         case e1: return 1;
         case e2: return 2;
         default: return 3;
     }
 }
 
 int foo3(E e) {
     switch (e) {
         case e1: return 1;
         case e2: return 2;
         case e3: return 3;
     }
 }
 
 int foo4(E e) {
     static void *array[] = { &&E1, &&E2, &&E3 };
 
     goto *array[e];
 
     E1: return 1;
     E2: return 2;
     E3: return 3;
 }
 
 int main() {
     return 0;
 }
 
 
 _foo2:
     movl    4(%esp), %edx
     movl    $3, %eax
     cmpl    $1, %edx
     jbe L5
     rep
     ret
     .p2align 4,,7
 L5:
     movl    _CSWTCH.1(,%edx,4), %eax
     ret
     .p2align 4,,15
 
 _foo3:
     movl    4(%esp), %edx
     cmpl    $1, %edx
     je  L11
     movl    $1, %eax
     jb  L6
     cmpl    $2, %edx
     je  L13
     .p2align 4,,3
     rep
     ret
     .p2align 4,,7
 L11:
     movl    $2, %eax
 L6:
     .p2align 4,,3
     rep
     ret
     .p2align 4,,7
 L13:
     movb    $3, %al
     ret
 
 _foo4:
 	movl	4(%esp), %eax
 	jmp	*_array.1639(,%eax,4)
 	.p2align 4,,7
 L15:
 	movl	$1, %eax
 	ret
 	.p2align 4,,7
 L17:
 	movl	$2, %eax
 	ret
 	.p2align 4,,7
 L18:
 	movl	$3, %eax
 	ret
 
 --------------------------------
 
 My forays into the inline asm idea proved fruitless, but there may yet be ways.

 
 In D+DMD inline asm kills inlining, so you may use inline asm only if you need
to do lot of computations.
 In LDC there are pragma(allow_inline) and asm expressions that some most of
this problem.
 
 Going back to the OP problem: in D there are no computed gotos, that are
useful if you want to write very fast interpreters and other things. But keep
in mind that DMD supports normal gotos from and in inlined asm (LLVM-LDC
doesn't supports this well), plus naked asm, this gives you some possibilities.
 An option on Linux is to write the interpreter core using GNU C (that has
computed gotos) and then link the core to the D code compiled with DMD/GDC.
 
 It's strange how something as basic, old and necessary as a switch, to create
a basic but fast interpreter, is so hard to compile well for compilers :-)
 
 Bye,
 bearophile

Dec 26 2010

bearophile <bearophileHUGS lycos.com> writes:

Heywood Floyd:

 Is this possible somehow:

In this simple case Simen kjaeraas has shown you a solution. But in general
D-DMD doesn't support computed gotos yet. I have asked for them some times, in
some different ways. I guess they will added as a non-standard D exception,
hopefully with the same syntax across different compilers.

Bye,
bearophile

Dec 25 2010

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Get address of label?