digitalmars.D - Self-Modifying code for user settings optimization

Jason Jeffory (16/16) Jan 09 2016 Instead of something like

Rikki Cattermole (20/20) Jan 09 2016 I've been looking into this issue for web routing.

Jason Jeffory (26/50) Jan 09 2016 Well, I wasn't thinking of interpreted/JIT code but native.

Rikki Cattermole (7/58) Jan 09 2016 What I think you're wanting is a little to 'magical' for compilers

John Colvin (4/5) Jan 09 2016 An enum isn't guaranteed to be embedded in the instruction

Rikki Cattermole (9/13) Jan 09 2016 enum FOO = true;

John Colvin (4/22) Jan 09 2016 Of course, I just meant that when reading a global or an enum,

Jason Jeffory (80/171) Jan 09 2016 It might, which is why I asked, seems like it would be something

Rikki Cattermole (28/28) Jan 09 2016 interface IFoo {

Jason Jeffory (8/36) Jan 09 2016 I see what you are saying... it should work. Seems like a lot of

Jay Norwood (13/18) Jan 10 2016 There is debug line info, but good luck with most of that after

Jason Jeffory <JasonJeffory doodle.com> writes:

Instead of something like

DoSomething(UserSettings["width"]);

Which requires an access to UserSettings, which may be slow in 
time critical code(but you want to provide some way to configure 
various behaviors), why not use self-modifying code?


DoSomething(3); // 3 maybe the default, but a hack is somehow 
introduced, such as modifying the "push 3" instruction to "push 
width" (but width is constant for all runs, the instruction 
itself changes values only when the default value is 
changed(could be done at setup). (push could be mov or whatever)

This would avoid pipelining issue and provide the absolute fasted 
way to have settings?

(Of course, we'd have to know where all the "push" instructions 
are located and all that since the modification could not occur 
serially(would be somewhat pointless then))

Not even sure if CPU's allow SMC anymore?

Jan 09 2016

Rikki Cattermole <alphaglosined gmail.com> writes:

I've been looking into this issue for web routing.
Over all its definitely more performant.

But:
- You need some way to generate code
- ABI compatibility
- Host binary compatibility (not the same as ABI)
- Front end for the "language" to specify what to generate

I'm either going sljit way or my own.
ATM I'm looking at building a c frontend to help with porting of sljit 
and for the future AOT generation of binaries.

Most of the work to get x86 done for sljit has been done, about 2-3k left.
https://github.com/rikkimax/sljitd

Regarding if CPU's allow for JIT'ing code, yup they do allow it still.
If they didn't, that CPU would be next to useless.

However, an OS is not required to expose this. But if you're dealing 
with Windows and *nix. Don't worry about it.

If you're interested in working on helping to port sljit please do.
Just note that it isn't a very optimized JIT but it is fairly small and 
easy to use. Important to me is that it can be fully ported to D without 
much worries unlike LLVM, which is a pain to compile anyway.

Jan 09 2016

Jason Jeffory <JasonJeffory doodle.com> writes:

On Saturday, 9 January 2016 at 10:41:23 UTC, Rikki Cattermole 
wrote:
 I've been looking into this issue for web routing.
 Over all its definitely more performant.

 But:
 - You need some way to generate code
 - ABI compatibility
 - Host binary compatibility (not the same as ABI)
 - Front end for the "language" to specify what to generate

 I'm either going sljit way or my own.
 ATM I'm looking at building a c frontend to help with porting 
 of sljit and for the future AOT generation of binaries.

 Most of the work to get x86 done for sljit has been done, about 
 2-3k left.
 https://github.com/rikkimax/sljitd

 Regarding if CPU's allow for JIT'ing code, yup they do allow it 
 still.
 If they didn't, that CPU would be next to useless.

 However, an OS is not required to expose this. But if you're 
 dealing with Windows and *nix. Don't worry about it.

 If you're interested in working on helping to port sljit please 
 do.
 Just note that it isn't a very optimized JIT but it is fairly 
 small and easy to use. Important to me is that it can be fully 
 ported to D without much worries unlike LLVM, which is a pain 
 to compile anyway.

Well, I wasn't thinking of interpreted/JIT code but native.

I suppose D could possibly do it with CTFE? (Create the CTFE to 
keep track of the the addresses, if possible, of where the 
variables are at in memory, so they can be updated).

e.g.,

DoSomething(Settings!"Width"); // Somehow puts in a dummy 
variable and keeps track of it's address

x = Settings!Width; // similar but different behavior. Basically 
turn an complex call(dictionary look up or something similar) to 
a mov x, const instruction, etc..

Mainly I'm thinking about switches like

If (Settings["FastCode"])
{

}

but want to remove the lookup.

Hence maybe something like

if (volatile bool x = TRUE) { }

But then somehow capture x's address(not sure if we could 
accomplish that in D?) which we easily change it's value outside 
the time critical code when needed.


Sorry, I can't help with sljit... have way to many things on my 
plate, at some point I might if stuff changes. I'll look into it 
a little though.

Thanks

Jan 09 2016

Rikki Cattermole <alphaglosined gmail.com> writes:

On 10/01/16 12:32 AM, Jason Jeffory wrote:
 On Saturday, 9 January 2016 at 10:41:23 UTC, Rikki Cattermole wrote:
 I've been looking into this issue for web routing.
 Over all its definitely more performant.

 But:
 - You need some way to generate code
 - ABI compatibility
 - Host binary compatibility (not the same as ABI)
 - Front end for the "language" to specify what to generate

 I'm either going sljit way or my own.
 ATM I'm looking at building a c frontend to help with porting of sljit
 and for the future AOT generation of binaries.

 Most of the work to get x86 done for sljit has been done, about 2-3k
 left.
 https://github.com/rikkimax/sljitd

 Regarding if CPU's allow for JIT'ing code, yup they do allow it still.
 If they didn't, that CPU would be next to useless.

 However, an OS is not required to expose this. But if you're dealing
 with Windows and *nix. Don't worry about it.

 If you're interested in working on helping to port sljit please do.
 Just note that it isn't a very optimized JIT but it is fairly small
 and easy to use. Important to me is that it can be fully ported to D
 without much worries unlike LLVM, which is a pain to compile anyway.

 Well, I wasn't thinking of interpreted/JIT code but native.

 I suppose D could possibly do it with CTFE? (Create the CTFE to keep
 track of the the addresses, if possible, of where the variables are at
 in memory, so they can be updated).

 e.g.,

 DoSomething(Settings!"Width"); // Somehow puts in a dummy variable and
 keeps track of it's address

 x = Settings!Width; // similar but different behavior. Basically turn an
 complex call(dictionary look up or something similar) to a mov x, const
 instruction, etc..

 Mainly I'm thinking about switches like

 If (Settings["FastCode"])
 {

 }

 but want to remove the lookup.

 Hence maybe something like

 if (volatile bool x = TRUE) { }

 But then somehow capture x's address(not sure if we could accomplish
 that in D?) which we easily change it's value outside the time critical
 code when needed.


 Sorry, I can't help with sljit... have way to many things on my plate,
 at some point I might if stuff changes. I'll look into it a little though.

 Thanks

What I think you're wanting is a little to 'magical' for compilers 
especially dmd to do.

I would recommend using enum's and static if or just go ahead and set it 
to a global variable.
Enums are free and global variables may have cache misses issue, but it 
will be better then doing an AA lookup every time.

Jan 09 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole 
wrote:
 Enums are free and global variables may have cache misses issue

An enum isn't guaranteed to be embedded in the instruction 
stream, there's still plenty of opportunities for cache misses.

Jan 09 2016

Rikki Cattermole <alphaglosined gmail.com> writes:

On 10/01/16 3:50 AM, John Colvin wrote:
 On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole wrote:
 Enums are free and global variables may have cache misses issue

 An enum isn't guaranteed to be embedded in the instruction stream,
 there's still plenty of opportunities for cache misses.

enum FOO = true;

static if (FOO) {
	doThis();
} else {
	doThat();
}

No need for enum to be embedded in the instruction stream.
Because it won't be. The else block just doesn't get compiled in.

Jan 09 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Saturday, 9 January 2016 at 14:55:27 UTC, Rikki Cattermole 
wrote:
 On 10/01/16 3:50 AM, John Colvin wrote:
 On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole 
 wrote:
 Enums are free and global variables may have cache misses 
 issue

 An enum isn't guaranteed to be embedded in the instruction 
 stream,
 there's still plenty of opportunities for cache misses.

 enum FOO = true;

 static if (FOO) {
 	doThis();
 } else {
 	doThat();
 }

 No need for enum to be embedded in the instruction stream.
 Because it won't be. The else block just doesn't get compiled 
 in.

Of course, I just meant that when reading a global or an enum, 
enum isn't necessarily cheaper. static if f.t.w.

Jan 09 2016

Jason Jeffory <JasonJeffory doodle.com> writes:

On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole 
wrote:
 On 10/01/16 12:32 AM, Jason Jeffory wrote:
 On Saturday, 9 January 2016 at 10:41:23 UTC, Rikki Cattermole 
 wrote:
 I've been looking into this issue for web routing.
 Over all its definitely more performant.

 But:
 - You need some way to generate code
 - ABI compatibility
 - Host binary compatibility (not the same as ABI)
 - Front end for the "language" to specify what to generate

 I'm either going sljit way or my own.
 ATM I'm looking at building a c frontend to help with porting 
 of sljit
 and for the future AOT generation of binaries.

 Most of the work to get x86 done for sljit has been done, 
 about 2-3k
 left.
 https://github.com/rikkimax/sljitd

 Regarding if CPU's allow for JIT'ing code, yup they do allow 
 it still.
 If they didn't, that CPU would be next to useless.

 However, an OS is not required to expose this. But if you're 
 dealing
 with Windows and *nix. Don't worry about it.

 If you're interested in working on helping to port sljit 
 please do.
 Just note that it isn't a very optimized JIT but it is fairly 
 small
 and easy to use. Important to me is that it can be fully 
 ported to D
 without much worries unlike LLVM, which is a pain to compile 
 anyway.

 Well, I wasn't thinking of interpreted/JIT code but native.

 I suppose D could possibly do it with CTFE? (Create the CTFE 
 to keep
 track of the the addresses, if possible, of where the 
 variables are at
 in memory, so they can be updated).

 e.g.,

 DoSomething(Settings!"Width"); // Somehow puts in a dummy 
 variable and
 keeps track of it's address

 x = Settings!Width; // similar but different behavior. 
 Basically turn an
 complex call(dictionary look up or something similar) to a mov 
 x, const
 instruction, etc..

 Mainly I'm thinking about switches like

 If (Settings["FastCode"])
 {

 }

 but want to remove the lookup.

 Hence maybe something like

 if (volatile bool x = TRUE) { }

 But then somehow capture x's address(not sure if we could 
 accomplish
 that in D?) which we easily change it's value outside the time 
 critical
 code when needed.


 Sorry, I can't help with sljit... have way to many things on 
 my plate,
 at some point I might if stuff changes. I'll look into it a 
 little though.

 Thanks

 What I think you're wanting is a little to 'magical' for 
 compilers especially dmd to do.

It might, which is why I asked, seems like it would be something 
trivial to do if the address of the function and relative address 
of the "variable" can be gotten at "compile time"(not sure it is 
possible by maybe one could write an object parser).

 I would recommend using enum's and static if or just go ahead 
 and set it to a global variable.
 Enums are free and global variables may have cache misses 
 issue, but it will be better then doing an AA lookup every time.


I don't think either of these work. The globals have the cache 
issue and pollute the namespace. Enums area static, It requires a 
recompile to change settings. I'm talking about changing them a 
run time. Do you follow?

e.g.,

void Compute()
{
...
    for(;;)
    {
       if (Settings["ComputeEXACT"])
       {
          // Slower but faster
       } else
       {
          // Fast but worse approximation
       }
    }
}

Obviously in some cases we can rearrange the order to avoid 
looking up ComputeEXACT in the loop, but assume this is not the 
case. The AA lookup is too slow, adds too much overhead.

Now suppose we use a local variable

void Compute()
{
    bool ComputeEXACT = false; // We could use 
Settings["ComputeEXACT"] here, but assume Compute() may be used 
in other loops through chaining.
...
    for(;;)
    {
       if (ComputeEXACT)
       {
          // Slower but faster
       } else
       {
          // Fast but worse approximation
       }
    }
}


But now ComputeEXACT behaves like the enum, essentially requires 
recompilation. The only way around that fact is to modify 
ComputeEXACT by code. This is very easy to do if we know where it 
is at. We should be able to know, assuming it isn't optimized 
out(prevent using volatile or whatever). It should be on the 
stack, right. Since false is a constant, a simple instruction 
should be generated to push it on there. We can change this!! 
(would be platform dependent, but not hard)

Alternatively, we could use a simple array to hold all the 
settings, this requires an indirection.

Even better, just modify the if loop directly from a jne to a je 
type of thing. This would be the fastest way!!?!

This all requires knowing how to get the addresses of stuff at 
compile time.

The goals:

1. Avoid any lookup of memory locations except possibly off the 
stack.
2. Modify specific code-memory locations at runtime.

My guess is that D can't do this out of the box, but maybe it can 
accomplish it with an object code parser that builds a list of 
all the address that need modifying? (might require a two pass 
compilation, or the object parser could modify the code to 
"correct" it, then the core update routine could be written in D 
directly(assume all are bool for now):

void UpdateSettings()
{
    volatile bool Settings[N];
    for(int i = 0; i < N; i++)
       Modify(Settings[i].address, Settings[i].value);
}

So, you call UpdateSettings, it modifies all the addresses with 
the settings value. The object parser comes in after the code has 
compiled and fills in the correct info for N and address.

(of course, this is dangerous, needless to say!)

Jan 09 2016

Rikki Cattermole <alphaglosined gmail.com> writes:

interface IFoo {
	void a();
	void b();
}

__gshared IFoo a, b;
__gshared IFoo instance;

class Foo(bool bar) : IFoo {
	void a() {
		static if (bar) {
			// do something
		} else {
			// do nothing
		}
	}
}

shared static this() {
	a = new Foo!true;
	b = new Foo!false;
}

void update(Lookup lookup) {
	if (lookup["bar"])
		instance = a;
	else
		instance = b;
}

Small indirection when executing to find which function to execute but 
that is the best out of language semantics we have and only works for 
booleans.

Jan 09 2016

Jason Jeffory <JasonJeffory doodle.com> writes:

On Saturday, 9 January 2016 at 23:43:32 UTC, Rikki Cattermole 
wrote:
 interface IFoo {
 	void a();
 	void b();
 }

 __gshared IFoo a, b;
 __gshared IFoo instance;

 class Foo(bool bar) : IFoo {
 	void a() {
 		static if (bar) {
 			// do something
 		} else {
 			// do nothing
 		}
 	}
 }

 shared static this() {
 	a = new Foo!true;
 	b = new Foo!false;
 }

 void update(Lookup lookup) {
 	if (lookup["bar"])
 		instance = a;
 	else
 		instance = b;
 }

 Small indirection when executing to find which function to 
 execute but that is the best out of language semantics we have 
 and only works for booleans.

I see what you are saying... it should work. Seems like a lot of 
bloat for something relatively trivial. Changing the whole 
context to change a single branch and multiplying the number of 
types might have some long term consequences. Its also 
complexifying the code quite a bit... more prone to errors.

Maybe with a bit of ingenuity these can be overcome.

Jan 09 2016

Jay Norwood <jayn prismnet.com> writes:

On Saturday, 9 January 2016 at 21:09:05 UTC, Jason Jeffory wrote:
 It might, which is why I asked, seems like it would be 
 something trivial to do if the address of the function and 
 relative address of the "variable" can be gotten at "compile 
 time"(not sure it is possible by maybe one could write an 
 object parser).

There is debug line info, but good luck with most of that after 
the optimizer gets through with the code.

This project provides an api for code patching.  Maybe it will 
help, or at least give you ideas.
http://www.dyninst.org/dyninst

I also have some interest in the ability to add arbitrary named 
markers to code at compile time that could be accessed from 
symbol info.  I'm not interested in modifying the code, but in 
using the addresses to create windows for code measurement.  Our 
hardware supports performance analysis limited to a specified 
address range without instrumenting the code, but with optimized 
code it is difficult to use.

Jan 10 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Self-Modifying code for user settings optimization