www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Self Optimizing Code

reply Mark "J" Twain <Juckleberry Twain.com> writes:
Another new D feature here!

Self-Optimizing Code is a fantastic research area that can lead 
to greater improvements in code, make them more responsive to 
individual applications, and requires very little effort.


To demonstrate:

In many programs, memory behavior is fixed to a large degree, if 
not completely. Take the typical dynamic data structure that over 
allocates the required memory to store it's data by some factor, 
usually 2.

We might have code like

if (length >= capacity)
     expand(2*length);

These types of expressions are used through a program. They are 
hard coded values that are generally somewhat arbitrary. They 
definitely are not optimal in all cases.

Instead, a better solution would be to use variables:

if (n*length > m*capacity) expand(l*length)

n,m, and l are not variable and can change.

How is this useful you say? Because of mathematics!

In a typical program we will have N tuples, where each tuple t_i 
is a parameter set, such as t_k(n,m,l).

This forms a vector space and, due to mathematics, we can find 
optimal values for the different parameter sets by observing the 
outcome of small perturbations over long times.

e.g., we intially start with our hard coded values as defaults.

t_i_0(1,1,2) for our example above.

A background process has this variable, valid ranges, and the 
ability to measure performance for the app.

It then, in the background, perturbates t_i, say t_i_1(1.01, 1, 
2), monitors the performance impact, and stores the results.

This can be efficiently since the algorithm is quite simple, can 
be done without threading concerns(since the parameters are 
effectively immutable to app), and can run in the background just 
monitoring the app performance.

Using statistics, we can find out if the change had any 
significant effect(remember, we are doing this over the long 
haul, so any deviations will cancel out).

Eventually the program will optimize itself to run faster by 
choosing the values that best suit how the program is used during 
the optimization process(which can be done continually, and using 
better techniques than I have describe).

To accomplish this task easily, all one needs to do is have 
special variables usage

 Perf("Memory_Capacity")
if ($n*length >= $m*capacity)
     expand($l*length);

The compiler then stores values(n,m,l) in a list, in this case we 
named it, along with the file position, etc. This is the only 
extra stuff that is done to a program. The special variable 
symbols $ was arbitrary, not necessarily desirable.

A BG thread is created that then does the optimization part. We 
might need to specify ranges for the values(the sample space for 
the algorithm, that could be done externally though).
Aug 02 2016
next sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
hello, you just invented JIT compiler.
Aug 03 2016
next sibling parent reply Luis <luis.panadero gmail.com> writes:
On Wednesday, 3 August 2016 at 07:36:21 UTC, ketmar wrote:
 hello, you just invented JIT compiler.
This is not JIT.
Aug 03 2016
parent ketmar <ketmar ketmar.no-ip.org> writes:
On Wednesday, 3 August 2016 at 13:45:13 UTC, Luis wrote:
 On Wednesday, 3 August 2016 at 07:36:21 UTC, ketmar wrote:
 hello, you just invented JIT compiler.
This is not JIT.
yes, it is. fairly advanced one, that does dynamic recompilation of some code pathes, but still JIT. being in touch with JIT-related things in more than a decade, i can recognize JIT when i smell it.
Aug 03 2016
prev sibling next sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 3 August 2016 at 07:36:21 UTC, ketmar wrote:
 hello, you just invented JIT compiler.
Yup, 100% a JIT. I'm pretty sure PyPy does this to some degree
Aug 03 2016
prev sibling parent reply Mark "J" Twain <Juckleberry Twain.com> writes:
On Wednesday, 3 August 2016 at 07:36:21 UTC, ketmar wrote:
 hello, you just invented JIT compiler.
Um, no. JIT = Just in time compilation. The code is already compiled and in binary form. The only differnece is that hard coded values = literals in the binary, become variables. There is no recompliation. The variables are simply updated by an optimization routine. They could be manually updated by the user, for example, but the user is not intelligent with regards to these things(although an interface for the user could be used). Regardless, this is not JIT. Having seen some of your posts, I know you won't admit your ignorance here, but unless you are going to claim that DMD is JIT, then this is not JIT. Why? It can already be done in DMD without using any advanced techniques. The problem is that it requires the programmer to do more boilerplate code than is needed. This is more akin to GC, but instead of managing memory references, it manages variables. Instead of freeing stuff, it perturbs them. But does so in the background too... but does not need to "stop the world", since as far as the program is concerned, these special variables like literals(immutable at least).
Aug 03 2016
parent ketmar <ketmar ketmar.no-ip.org> writes:
On Wednesday, 3 August 2016 at 19:22:05 UTC, Mark "J" Twain wrote:
you know what? i'm sorry that i stepped in your thread. yet i 
fixed this, as i added you to my ignore list, which means that i 
won't get any more notifications with your name on 'em, and won't 
see 'em in web frontend. sorry, i have zero tolerance to people 
like you these days. hope we won't meet again.
Aug 03 2016
prev sibling next sibling parent reply ikod <geller.garry gmail.com> writes:
On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain wrote:
 Another new D feature here!

 Self-Optimizing Code is a fantastic research area that can lead 
 to greater improvements in code, make them more responsive to 
 individual applications, and requires very little effort.
Looks like this: https://wiki.dlang.org/LDC_LLVM_profiling_instrumentation
Aug 03 2016
parent reply Mark "J" Twain <Juckleberry Twain.com> writes:
On Wednesday, 3 August 2016 at 18:18:51 UTC, ikod wrote:
 On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain wrote:
 Another new D feature here!

 Self-Optimizing Code is a fantastic research area that can 
 lead to greater improvements in code, make them more 
 responsive to individual applications, and requires very 
 little effort.
Looks like this: https://wiki.dlang.org/LDC_LLVM_profiling_instrumentation
No, I don't think so. The best I can tell is this is just profiling. This would be having the program itself profile itself and make modifications itself(of course, limited to these "special variables").
Aug 03 2016
parent reply ikod <geller.garry gmail.com> writes:
On Wednesday, 3 August 2016 at 19:25:08 UTC, Mark "J" Twain wrote:
 On Wednesday, 3 August 2016 at 18:18:51 UTC, ikod wrote:
 On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain 
 wrote:
 Another new D feature here!

 Self-Optimizing Code is a fantastic research area that can 
 lead to greater improvements in code, make them more 
 responsive to individual applications, and requires very 
 little effort.
Looks like this: https://wiki.dlang.org/LDC_LLVM_profiling_instrumentation
No, I don't think so. The best I can tell is this is just profiling. This would be having the program itself profile
This is not just profiling, but "Profile-Guided Optimization (PGO)"
Aug 03 2016
next sibling parent Mark "J" Twain <Juckleberry Twain.com> writes:
On Wednesday, 3 August 2016 at 21:09:46 UTC, ikod wrote:
 On Wednesday, 3 August 2016 at 19:25:08 UTC, Mark "J" Twain 
 wrote:
 On Wednesday, 3 August 2016 at 18:18:51 UTC, ikod wrote:
 On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain 
 wrote:
 Another new D feature here!

 Self-Optimizing Code is a fantastic research area that can 
 lead to greater improvements in code, make them more 
 responsive to individual applications, and requires very 
 little effort.
Looks like this: https://wiki.dlang.org/LDC_LLVM_profiling_instrumentation
No, I don't think so. The best I can tell is this is just profiling. This would be having the program itself profile
This is not just profiling, but "Profile-Guided Optimization (PGO)"
Yes, but that is still profiling. It is static profiling while I am talking about dynamic profiling. It is similar, but it doesn't happen while the program is in use by the user. That is, simply, the programmer runs the profiler and it optimizes the code then ships it as final to the user. I am talking about the user running the program and the program optimizes itself. Big difference!
Aug 03 2016
prev sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Wed, 03 Aug 2016 21:09:46 +0000, ikod wrote:
 This is not just profiling, but "Profile-Guided Optimization (PGO)"
The person is talking about algorithms with tuning parameters (like array growth rate when appending) adjusting those parameters based on observed characteristics of the program. For instance, if it's observed that arrays of a given type tend to grow to ~1800 entries or stay under 100 entries, the runtime might automatically extend your array from length 128 to 2048 directly instead of growing to 256, then 512, then 1024, and finally 2048. A compiler would not be allowed to make these changes automatically. There are significant downsides. You need a lot of bookkeeping to determine what values you need. The values are not preserved across runs. You need to implement it separately for each algorithm. You generally only bother when the potential performance gains are great as an absolute measure and relative to the amount of bookkeeping you would have to do. For instance, with array appending, it's a terribly common operation, so you can only use this technique productively if gathering the data is dirt cheap.
Aug 03 2016
parent reply Mark "J" Twain <Juckleberry Twain.com> writes:
On Thursday, 4 August 2016 at 01:54:57 UTC, Chris Wright wrote:
 On Wed, 03 Aug 2016 21:09:46 +0000, ikod wrote:
 This is not just profiling, but "Profile-Guided Optimization 
 (PGO)"
The person is talking about algorithms with tuning parameters (like array growth rate when appending) adjusting those parameters based on observed characteristics of the program. For instance, if it's observed that arrays of a given type tend to grow to ~1800 entries or stay under 100 entries, the runtime might automatically extend your array from length 128 to 2048 directly instead of growing to 256, then 512, then 1024, and finally 2048. A compiler would not be allowed to make these changes automatically. There are significant downsides. You need a lot of bookkeeping to determine what values you need. The values are not preserved across runs. You need to implement it separately for each algorithm. You generally only bother when the potential performance gains are great as an absolute measure and relative to the amount of bookkeeping you would have to do. For instance, with array appending, it's a terribly common operation, so you can only use this technique productively if gathering the data is dirt cheap.
Close. I am only talking about adding the features to do such things and not having the compiler involved in the analysis. We can already do this sort of thing but it requires more boilerplate code and and the benefit may be negligible. What it does for us is allows us to change hard coded values into special variables that then can be manipulated(if desired, or left alone) while the program is executing. All the compiler does is take these special variables and stores them in chunks(tuples) and possibly stores some discriminatory info about them like the file and function they were used in. Then it is up to the optimizing algorithm do decide how to approach using them. If it does nothing, then the only performance hit is that variables were used instead of hard coded literals. A bit more info may be required for the optimizer to make good choices though. Since the data collected could be extremely large(thousands of tuples). This creates a very large parameter space and knowing what parameter changes created what performance changes is crucial(although a blind sort of parameter space montecarlo method could work, but would produce erratic behavior). The optimizations would persist between program executions simply by writing out all the current parameter values, else it would be difficult for the program to ever actually optimize itself. Neural networks could be used to find more optimized conditions. It could be extended to include more dynamic characteristics of a program, like what JIT does, but I think this adds far more complexity and would require much more compiler support and would then probably just end up with a JIT like environment. Another simple example. Suppose one has a sleep(10) in a thread. The 10 was "arbitrarily" chosen. Instead, if we could dosleep($10), 10 becomes a parameter. The compiler emits it to a file along with it's address(it is a global in the program address space). The optimizer then reads the file, attempts to modify this value with a slight perturbation, say to 11. Checks the performance impact(over many uses, to remove statistical deviations) and if it is more performant, it then uses that value as the default. It can then try 12 and repeat the process. While the algorithm is not advanced, it is produces a more optimal result than hard coding the value to 10, which then cannot change for the life of the program. To keep the program stable, such changes would have to occur very slowly. Better algorithms, and better profiling tools(which, say, could do wider sampling and better performance analysis), could hone in on good values from the get go. Then the program itself adjusts to the individual user case by analysis.
Aug 03 2016
parent reply Chris Wright <dhasenan gmail.com> writes:
In your example, you have a size_t or double factor for array growth. If 
you set it to -1, you would have an unpleasant time. You need a way to 
specify the range of valid values. In more complex algorithms, you need a 
way to evaluate several parameters together to determine if they are 
valid.

Since you're not always optimizing for the same thing (time vs memory; 
wasted memory vs number of reallocations; for approximation algorithms, 
how good an approximation is generated), you need a way to specify that 
as well.

Putting that in the language would be horribly complex. It would be much 
simpler for everyone if it were in a library instead. I could probably 
code up a basic version in a couple hours. Which I'll do for US$150 -- 
below market rate, especially since I'd continue supporting it.

Furthermore, your idea of changing compile-time constants without 
recompiling the program is unworkable. Constants are folded and inlined. 
That "* 2" might turn into "<< 1". A loop based on a constant can be 
unrolled.

You suggested that we can do a random walk on values to find optimal 
values. This will find a local minimum, which might not be the globally 
optimal value. Something like simulated annealing would work better in 
the general case. Simulated annealing is a fair bit more complex than a 
random walk; it's not going to be implemented inside druntime.
Aug 03 2016
parent reply Mark "J" Twain <Juckleberry Twain.com> writes:
On Thursday, 4 August 2016 at 04:41:43 UTC, Chris Wright wrote:
 In your example, you have a size_t or double factor for array 
 growth. If you set it to -1, you would have an unpleasant time. 
 You need a way to specify the range of valid values. In more 
 complex algorithms, you need a way to evaluate several 
 parameters together to determine if they are valid.
Yes, I mentioned that, but there is no ideal way to do this. I think it shouldn't be done in code since it clutters the code. The parameter tuples are the groupings, I think the compiler can infer groupings based on source code closeness(if 3 parameters show up on the same line, it's obviously save to assume they go together). Ultimately there is no big deal about grouping because one can view the parameter space of tuples as flat.
 Since you're not always optimizing for the same thing (time vs 
 memory; wasted memory vs number of reallocations; for 
 approximation algorithms, how good an approximation is 
 generated), you need a way to specify that as well.
Well, it depends. If performance is cpy cycles and memory, then they are relatively easy to get, although absolute accuracy would be required.
 Putting that in the language would be horribly complex. It 
 would be much simpler for everyone if it were in a library 
 instead. I could probably code up a basic version in a couple 
 hours. Which I'll do for US$150 -- below market rate, 
 especially since I'd continue supporting it.
or you could pay me 150$ and I'd do it. But I have better things to do, and without compiler support, or a preprocessor, I'm not interested in the code complexity a library solution would have. Having to specify every parameter with some long sequence of characters just makes it messy. I would have to disagree about the language complexity though. At most the hardest part would be figuring out what syntax to use for symbols. $(10,0,100,1) Could be used. When ever the compiler comes across such a thing it simply 1. creates an "anonymous" variable with value 10. 2. Emits something like test.d#43(10,0,100,1), 0x043245 (file,line,default value,min,max,step,address) One could allow for group indexing or "closeness" by code. e.g., $(10,0,100,1,43) is of group 43. 0x043245 which is the address. That would be it on the compiler side. The optimizer then has enough information to work from. While such a method may lack enough information to be optimized effectively, in theory, it would be enough to find the optimal parameters in the long term. One could probably get away with a two or three line code for the variables that does the same. auto x = Opt_Var(10, 0, 100, 1, 43); then use x in the code. The hard part might be getting the relative address and absolute addresses setup properly for the optimizer. Might work but still a bit cluttery IMO. If you feel like implementing it for the benefit of humanity, be my guest. Else I'll end up trying in the future when I get around to it.
 Furthermore, your idea of changing compile-time constants 
 without recompiling the program is unworkable. Constants are 
 folded and inlined. That "* 2" might turn into "<< 1". A loop 
 based on a constant can be unrolled.
This is why I said they were variables. I said they were effectively constants. They obviously can't be constants. We have to have an address that holds the value so it can be changed by the optimizer.
 You suggested that we can do a random walk on values to find 
 optimal values. This will find a local minimum, which might not 
 be the globally optimal value. Something like simulated 
 annealing would work better in the general case. Simulated 
 annealing is a fair bit more complex than a random walk; it's 
 not going to be implemented inside druntime.
Well, there are many optimization methods, that isn't the point. The optimizer can be exposed for modification and different methods could be used. My main point was to introduce the idea. I have never seen it mentioned in literature before. The ability for a program to slowly optimize itself over it's life cycle seems to be a new and novel concept. The benefit of such a thing? Not sure. May only improve performance 1 or 2%. The real benefit? Who knows, but as time goes on it might become real useful and be part of the profiling process and find better ways to optimize programs that give better results.
Aug 04 2016
parent Chris Wright <dhasenan gmail.com> writes:
On Thu, 04 Aug 2016 18:39:36 +0000, Mark "J" Twain wrote:
 I would have to disagree about the language complexity though. At most
 the hardest part would be figuring out what syntax to use for symbols.
The cost with language changes isn't just implementation. The largest part is that people reading code in the language need to be able to recognize what a piece of code is doing. If it's not immediately obvious, that is a cost borne by everyone using the language. Your suggested syntax is terribly inscrutable, and the easiest way to fix that problem is to use a library instead of a language change.
 Having to specify every parameter with some long sequence of 
 characters just makes it messy.
It would be far faster to implement a library solution. It would be much easier to maintain a library solution. A library solution could use functions in phobos and other libraries available in dub, not just druntime. A library solution could release much more often. It would be easier to configure -- for instance, random walk vs simulated annealing. It would be easier to specify and create constraints using a library solution. It would take no political effort to publish a library on dub. But it would cost you an extra handful of characters, so it would be unusable. Are you trolling?
Aug 04 2016
prev sibling parent reply crimaniak <crimaniak gmail.com> writes:
On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain wrote:
 Instead, a better solution would be to use variables:

 if (n*length > m*capacity) expand(l*length)
Some time ago I played with self-optimizing cache layer. Problem: Time to obtain cache items is unknown and server dependant. For example, network can be involved in this. Sometimes is it localhost, sometimes server on another continent. Time to cache it is unknown too. For example if memcached extension is not installed on this server then fallback to disk backend will slow cache performance very much. So we don't know is it worst to cache. Solution: cache measures own performance and acts accordingly. So I make next things: 1. Functional interface instead of save/load type interface. cacheLevel.get(functor_to_get_item) allow to measure item obtaining time. 2. Implement all measuring/controlling logic in separate class with interface AbstractStatist, in CacheLevel class make just hooks for it. So changing AbstractStatist implementation I can change work mode to measure statistics and use it or work as usual cache (EmptyStatist for all empty hooks). I make implementation to skip all items not worst caching (calcTime/calcHits*totalHits-totalTime < 0) but it's possible to make more smart things (like caching only most efficient items not exceeding cache size). In my experience I can draw some conclusions. 1. It need to separate measure mode and control mode. You can't have accurate statistics while changing system behavior according to current statistics state. 2. Statistics can be different for different applications but for specific application in specific conditions for most cases it can be approximated as constant. So for array allocating strategy more realistic scenario the next, I think: 1. Application compiled in some 'array_debug' mode then some statist trait added to array, collect usage statistics and writes optimal constants at the application exit. 2. Programmer configure array allocator in application according to these constants. 3. Application builds in release mode with optimal allocation strategy and without any statist overhead and works fast. Users are happy.
Aug 04 2016
parent Mark "J" Twain <Juckleberry Twain.com> writes:
On Thursday, 4 August 2016 at 11:35:41 UTC, crimaniak wrote:
 On Tuesday, 2 August 2016 at 22:06:38 UTC, Mark "J" Twain wrote:
 Instead, a better solution would be to use variables:

 if (n*length > m*capacity) expand(l*length)
Some time ago I played with self-optimizing cache layer. Problem: Time to obtain cache items is unknown and server dependant. For example, network can be involved in this. Sometimes is it localhost, sometimes server on another continent. Time to cache it is unknown too. For example if memcached extension is not installed on this server then fallback to disk backend will slow cache performance very much. So we don't know is it worst to cache. Solution: cache measures own performance and acts accordingly.
I did not think of this in terms of networking. I suppose it can be used there too but, as you mention, latency could be a factor.
 So I make next things:
 1. Functional interface instead of save/load type interface. 
 cacheLevel.get(functor_to_get_item) allow to measure item 
 obtaining time.
 2. Implement all measuring/controlling logic in separate class 
 with interface AbstractStatist, in CacheLevel class make just 
 hooks for it. So changing AbstractStatist implementation I can 
 change work mode to measure statistics and use it or work as 
 usual cache (EmptyStatist for all empty hooks). I make 
 implementation to skip all items not worst caching 
 (calcTime/calcHits*totalHits-totalTime < 0) but it's possible 
 to make more smart things (like caching only most efficient 
 items not exceeding cache size).
If it is more complex, that what I described, it would have to be thought out a great deal. My goal was to simply have the program expose optimization points(variables) then allow an optimizer to change those to find better points. The program itself would be virtually unmodified. No code to interact with the optimization process except to use variables instead of constants(which is minimal and necessary). Exposing an interface for the program itself to guide the optimization process seems like a lot more work. But, of course, ultimately is better as it allows more information to flow in to the optimization process. But this design is beyond what I'm willing to achieve(this way could be months or years to get done right), while my method could take just a few hours to code up, and is rather general, although a bit dumb(fire and forget and hope for the best).
 In my experience I can draw some conclusions.
 1. It need to separate measure mode and control mode. You can't 
 have accurate statistics while changing system behavior 
 according to current statistics state.
 2. Statistics can be different for different applications but 
 for specific application in specific conditions for most cases 
 it can be approximated as constant.
Yes, it is tricky to make the algorithm stable. This is why I think, for a simple optimizer, it would need to do this over long periods(months of program use). Because there are so many aberrations(other programs, user behavior, etc), these can only be statically removed by repetitive uses. Essentially "low pass the data to remove all the spikes" then compare the avg result with the previous.
 So for array allocating strategy more realistic scenario the 
 next, I think:
 1. Application compiled in some 'array_debug' mode then some 
 statist trait added to array, collect usage statistics and 
 writes optimal constants at the application exit.
 2. Programmer configure array allocator in application 
 according to these constants.
 3. Application builds in release mode with optimal allocation 
 strategy and without any statist overhead and works fast. Users 
 are happy.
This is more static profiling type of optimizations. I am talking about something a bit different. Both methods could be used together for a better result, but mine is for simplicity. We generally blindly set constants for things that affect performance. Let's simply turn those constants in to variables and let a global blind optimizer try to figure out better values than what we "blindly" set. There is no guarantee that it would find a better result and may even introduce program instability. But all this stuff can be somewhat measured by cpu and memory usage and given enough parameters, there is probably at least several optimal points the optimizer could find. Ultimately for just a little work(setting the variables and specifying their ranges and step, say), we could have most programs being created, in D for now at least, optimizing themselves(while they are being used by the user after they have been shipped) to some degree. This, I believe, is unheard of. It represents the next level in program optimizations. Imagine one day where a program could optimize itself depending on the hardware of the user, the users habits, etc. Well, this method attempts to get that ball rolling and does all those things in a general way(albeit ignorant, but maybe just as effective). It's hard to know how effective until it is done, but we do know it can't be any worse than what we already have.
Aug 04 2016