www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Self-Modifying code for user settings optimization

reply Jason Jeffory <JasonJeffory doodle.com> writes:
Instead of something like

DoSomething(UserSettings["width"]);

Which requires an access to UserSettings, which may be slow in 
time critical code(but you want to provide some way to configure 
various behaviors), why not use self-modifying code?


DoSomething(3); // 3 maybe the default, but a hack is somehow 
introduced, such as modifying the "push 3" instruction to "push 
width" (but width is constant for all runs, the instruction 
itself changes values only when the default value is 
changed(could be done at setup). (push could be mov or whatever)

This would avoid pipelining issue and provide the absolute fasted 
way to have settings?

(Of course, we'd have to know where all the "push" instructions 
are located and all that since the modification could not occur 
serially(would be somewhat pointless then))

Not even sure if CPU's allow SMC anymore?
Jan 09
parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
I've been looking into this issue for web routing.
Over all its definitely more performant.

But:
- You need some way to generate code
- ABI compatibility
- Host binary compatibility (not the same as ABI)
- Front end for the "language" to specify what to generate

I'm either going sljit way or my own.
ATM I'm looking at building a c frontend to help with porting of sljit 
and for the future AOT generation of binaries.

Most of the work to get x86 done for sljit has been done, about 2-3k left.
https://github.com/rikkimax/sljitd

Regarding if CPU's allow for JIT'ing code, yup they do allow it still.
If they didn't, that CPU would be next to useless.

However, an OS is not required to expose this. But if you're dealing 
with Windows and *nix. Don't worry about it.

If you're interested in working on helping to port sljit please do.
Just note that it isn't a very optimized JIT but it is fairly small and 
easy to use. Important to me is that it can be fully ported to D without 
much worries unlike LLVM, which is a pain to compile anyway.
Jan 09
parent reply Jason Jeffory <JasonJeffory doodle.com> writes:
On Saturday, 9 January 2016 at 10:41:23 UTC, Rikki Cattermole 
wrote:
 I've been looking into this issue for web routing.
 Over all its definitely more performant.

 But:
 - You need some way to generate code
 - ABI compatibility
 - Host binary compatibility (not the same as ABI)
 - Front end for the "language" to specify what to generate

 I'm either going sljit way or my own.
 ATM I'm looking at building a c frontend to help with porting 
 of sljit and for the future AOT generation of binaries.

 Most of the work to get x86 done for sljit has been done, about 
 2-3k left.
 https://github.com/rikkimax/sljitd

 Regarding if CPU's allow for JIT'ing code, yup they do allow it 
 still.
 If they didn't, that CPU would be next to useless.

 However, an OS is not required to expose this. But if you're 
 dealing with Windows and *nix. Don't worry about it.

 If you're interested in working on helping to port sljit please 
 do.
 Just note that it isn't a very optimized JIT but it is fairly 
 small and easy to use. Important to me is that it can be fully 
 ported to D without much worries unlike LLVM, which is a pain 
 to compile anyway.
Well, I wasn't thinking of interpreted/JIT code but native. I suppose D could possibly do it with CTFE? (Create the CTFE to keep track of the the addresses, if possible, of where the variables are at in memory, so they can be updated). e.g., DoSomething(Settings!"Width"); // Somehow puts in a dummy variable and keeps track of it's address x = Settings!Width; // similar but different behavior. Basically turn an complex call(dictionary look up or something similar) to a mov x, const instruction, etc.. Mainly I'm thinking about switches like If (Settings["FastCode"]) { } but want to remove the lookup. Hence maybe something like if (volatile bool x = TRUE) { } But then somehow capture x's address(not sure if we could accomplish that in D?) which we easily change it's value outside the time critical code when needed. Sorry, I can't help with sljit... have way to many things on my plate, at some point I might if stuff changes. I'll look into it a little though. Thanks
Jan 09
parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
On 10/01/16 12:32 AM, Jason Jeffory wrote:
 On Saturday, 9 January 2016 at 10:41:23 UTC, Rikki Cattermole wrote:
 I've been looking into this issue for web routing.
 Over all its definitely more performant.

 But:
 - You need some way to generate code
 - ABI compatibility
 - Host binary compatibility (not the same as ABI)
 - Front end for the "language" to specify what to generate

 I'm either going sljit way or my own.
 ATM I'm looking at building a c frontend to help with porting of sljit
 and for the future AOT generation of binaries.

 Most of the work to get x86 done for sljit has been done, about 2-3k
 left.
 https://github.com/rikkimax/sljitd

 Regarding if CPU's allow for JIT'ing code, yup they do allow it still.
 If they didn't, that CPU would be next to useless.

 However, an OS is not required to expose this. But if you're dealing
 with Windows and *nix. Don't worry about it.

 If you're interested in working on helping to port sljit please do.
 Just note that it isn't a very optimized JIT but it is fairly small
 and easy to use. Important to me is that it can be fully ported to D
 without much worries unlike LLVM, which is a pain to compile anyway.
Well, I wasn't thinking of interpreted/JIT code but native. I suppose D could possibly do it with CTFE? (Create the CTFE to keep track of the the addresses, if possible, of where the variables are at in memory, so they can be updated). e.g., DoSomething(Settings!"Width"); // Somehow puts in a dummy variable and keeps track of it's address x = Settings!Width; // similar but different behavior. Basically turn an complex call(dictionary look up or something similar) to a mov x, const instruction, etc.. Mainly I'm thinking about switches like If (Settings["FastCode"]) { } but want to remove the lookup. Hence maybe something like if (volatile bool x = TRUE) { } But then somehow capture x's address(not sure if we could accomplish that in D?) which we easily change it's value outside the time critical code when needed. Sorry, I can't help with sljit... have way to many things on my plate, at some point I might if stuff changes. I'll look into it a little though. Thanks
What I think you're wanting is a little to 'magical' for compilers especially dmd to do. I would recommend using enum's and static if or just go ahead and set it to a global variable. Enums are free and global variables may have cache misses issue, but it will be better then doing an AA lookup every time.
Jan 09
next sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole 
wrote:
 Enums are free and global variables may have cache misses issue
An enum isn't guaranteed to be embedded in the instruction stream, there's still plenty of opportunities for cache misses.
Jan 09
parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
On 10/01/16 3:50 AM, John Colvin wrote:
 On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole wrote:
 Enums are free and global variables may have cache misses issue
An enum isn't guaranteed to be embedded in the instruction stream, there's still plenty of opportunities for cache misses.
enum FOO = true; static if (FOO) { doThis(); } else { doThat(); } No need for enum to be embedded in the instruction stream. Because it won't be. The else block just doesn't get compiled in.
Jan 09
parent John Colvin <john.loughran.colvin gmail.com> writes:
On Saturday, 9 January 2016 at 14:55:27 UTC, Rikki Cattermole 
wrote:
 On 10/01/16 3:50 AM, John Colvin wrote:
 On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole 
 wrote:
 Enums are free and global variables may have cache misses 
 issue
An enum isn't guaranteed to be embedded in the instruction stream, there's still plenty of opportunities for cache misses.
enum FOO = true; static if (FOO) { doThis(); } else { doThat(); } No need for enum to be embedded in the instruction stream. Because it won't be. The else block just doesn't get compiled in.
Of course, I just meant that when reading a global or an enum, enum isn't necessarily cheaper. static if f.t.w.
Jan 09
prev sibling parent reply Jason Jeffory <JasonJeffory doodle.com> writes:
On Saturday, 9 January 2016 at 11:38:06 UTC, Rikki Cattermole 
wrote:
 On 10/01/16 12:32 AM, Jason Jeffory wrote:
 On Saturday, 9 January 2016 at 10:41:23 UTC, Rikki Cattermole 
 wrote:
 I've been looking into this issue for web routing.
 Over all its definitely more performant.

 But:
 - You need some way to generate code
 - ABI compatibility
 - Host binary compatibility (not the same as ABI)
 - Front end for the "language" to specify what to generate

 I'm either going sljit way or my own.
 ATM I'm looking at building a c frontend to help with porting 
 of sljit
 and for the future AOT generation of binaries.

 Most of the work to get x86 done for sljit has been done, 
 about 2-3k
 left.
 https://github.com/rikkimax/sljitd

 Regarding if CPU's allow for JIT'ing code, yup they do allow 
 it still.
 If they didn't, that CPU would be next to useless.

 However, an OS is not required to expose this. But if you're 
 dealing
 with Windows and *nix. Don't worry about it.

 If you're interested in working on helping to port sljit 
 please do.
 Just note that it isn't a very optimized JIT but it is fairly 
 small
 and easy to use. Important to me is that it can be fully 
 ported to D
 without much worries unlike LLVM, which is a pain to compile 
 anyway.
Well, I wasn't thinking of interpreted/JIT code but native. I suppose D could possibly do it with CTFE? (Create the CTFE to keep track of the the addresses, if possible, of where the variables are at in memory, so they can be updated). e.g., DoSomething(Settings!"Width"); // Somehow puts in a dummy variable and keeps track of it's address x = Settings!Width; // similar but different behavior. Basically turn an complex call(dictionary look up or something similar) to a mov x, const instruction, etc.. Mainly I'm thinking about switches like If (Settings["FastCode"]) { } but want to remove the lookup. Hence maybe something like if (volatile bool x = TRUE) { } But then somehow capture x's address(not sure if we could accomplish that in D?) which we easily change it's value outside the time critical code when needed. Sorry, I can't help with sljit... have way to many things on my plate, at some point I might if stuff changes. I'll look into it a little though. Thanks
What I think you're wanting is a little to 'magical' for compilers especially dmd to do.
It might, which is why I asked, seems like it would be something trivial to do if the address of the function and relative address of the "variable" can be gotten at "compile time"(not sure it is possible by maybe one could write an object parser).
 I would recommend using enum's and static if or just go ahead 
 and set it to a global variable.
 Enums are free and global variables may have cache misses 
 issue, but it will be better then doing an AA lookup every time.
I don't think either of these work. The globals have the cache issue and pollute the namespace. Enums area static, It requires a recompile to change settings. I'm talking about changing them a run time. Do you follow? e.g., void Compute() { ... for(;;) { if (Settings["ComputeEXACT"]) { // Slower but faster } else { // Fast but worse approximation } } } Obviously in some cases we can rearrange the order to avoid looking up ComputeEXACT in the loop, but assume this is not the case. The AA lookup is too slow, adds too much overhead. Now suppose we use a local variable void Compute() { bool ComputeEXACT = false; // We could use Settings["ComputeEXACT"] here, but assume Compute() may be used in other loops through chaining. ... for(;;) { if (ComputeEXACT) { // Slower but faster } else { // Fast but worse approximation } } } But now ComputeEXACT behaves like the enum, essentially requires recompilation. The only way around that fact is to modify ComputeEXACT by code. This is very easy to do if we know where it is at. We should be able to know, assuming it isn't optimized out(prevent using volatile or whatever). It should be on the stack, right. Since false is a constant, a simple instruction should be generated to push it on there. We can change this!! (would be platform dependent, but not hard) Alternatively, we could use a simple array to hold all the settings, this requires an indirection. Even better, just modify the if loop directly from a jne to a je type of thing. This would be the fastest way!!?! This all requires knowing how to get the addresses of stuff at compile time. The goals: 1. Avoid any lookup of memory locations except possibly off the stack. 2. Modify specific code-memory locations at runtime. My guess is that D can't do this out of the box, but maybe it can accomplish it with an object code parser that builds a list of all the address that need modifying? (might require a two pass compilation, or the object parser could modify the code to "correct" it, then the core update routine could be written in D directly(assume all are bool for now): void UpdateSettings() { volatile bool Settings[N]; for(int i = 0; i < N; i++) Modify(Settings[i].address, Settings[i].value); } So, you call UpdateSettings, it modifies all the addresses with the settings value. The object parser comes in after the code has compiled and fills in the correct info for N and address. (of course, this is dangerous, needless to say!)
Jan 09
next sibling parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
interface IFoo {
	void a();
	void b();
}

__gshared IFoo a, b;
__gshared IFoo instance;

class Foo(bool bar) : IFoo {
	void a() {
		static if (bar) {
			// do something
		} else {
			// do nothing
		}
	}
}

shared static this() {
	a = new Foo!true;
	b = new Foo!false;
}

void update(Lookup lookup) {
	if (lookup["bar"])
		instance = a;
	else
		instance = b;
}

Small indirection when executing to find which function to execute but 
that is the best out of language semantics we have and only works for 
booleans.
Jan 09
parent Jason Jeffory <JasonJeffory doodle.com> writes:
On Saturday, 9 January 2016 at 23:43:32 UTC, Rikki Cattermole 
wrote:
 interface IFoo {
 	void a();
 	void b();
 }

 __gshared IFoo a, b;
 __gshared IFoo instance;

 class Foo(bool bar) : IFoo {
 	void a() {
 		static if (bar) {
 			// do something
 		} else {
 			// do nothing
 		}
 	}
 }

 shared static this() {
 	a = new Foo!true;
 	b = new Foo!false;
 }

 void update(Lookup lookup) {
 	if (lookup["bar"])
 		instance = a;
 	else
 		instance = b;
 }

 Small indirection when executing to find which function to 
 execute but that is the best out of language semantics we have 
 and only works for booleans.
I see what you are saying... it should work. Seems like a lot of bloat for something relatively trivial. Changing the whole context to change a single branch and multiplying the number of types might have some long term consequences. Its also complexifying the code quite a bit... more prone to errors. Maybe with a bit of ingenuity these can be overcome.
Jan 09
prev sibling parent Jay Norwood <jayn prismnet.com> writes:
On Saturday, 9 January 2016 at 21:09:05 UTC, Jason Jeffory wrote:
 It might, which is why I asked, seems like it would be 
 something trivial to do if the address of the function and 
 relative address of the "variable" can be gotten at "compile 
 time"(not sure it is possible by maybe one could write an 
 object parser).
There is debug line info, but good luck with most of that after the optimizer gets through with the code. This project provides an api for code patching. Maybe it will help, or at least give you ideas. http://www.dyninst.org/dyninst I also have some interest in the ability to add arbitrary named markers to code at compile time that could be accessed from symbol info. I'm not interested in modifying the code, but in using the addresses to create windows for code measurement. Our hardware supports performance analysis limited to a specified address range without instrumenting the code, but with optimized code it is difficult to use.
Jan 10