digitalmars.D.learn - Can this implementation of Damm algorithm be optimized?
- Nestor (34/34) Feb 09 2017 Hi,
- Cym13 (12/47) Feb 09 2017 I can't see how to make it more efficient than it is (aside from
- Era Scarecrow (25/39) Feb 09 2017 Well one thing is you can probably reduce them from chars to just
- Nestor (43/85) Feb 09 2017 OK I changed the approach using a multidimensional array for the
- Daniel Kozak via Digitalmars-d-learn (3/90) Feb 09 2017 Maybe you can try use static array instead of dynamic
- Nestor (55/57) Feb 09 2017 I shaved it to this to discard unneccessary time-consuming
- Daniel Kozak via Digitalmars-d-learn (2/52) Feb 09 2017 Did you try it with different backends? llvm (ldc), gcc(gdc)?
- Nestor (9/11) Feb 09 2017 Not really, just standard dmd.
- =?UTF-8?Q?Ali_=c3=87ehreli?= (8/10) Feb 09 2017 avgtime profiles the whole process, right? It measures everything that
- Daniel Kozak via Digitalmars-d-learn (3/14) Feb 09 2017 Yes this is important, and for example when you used shared libphobos it...
- Cym13 (69/160) Feb 09 2017 The question is, why do you expect it to be noticably faster? On
- Era Scarecrow (36/41) Feb 09 2017 Truthfully, because you'll need tens of millions or hundreds of
- Nestor (3/18) Feb 10 2017 Thank you for the detailed reply. I wasn't able to follow you
- Era Scarecrow (28/30) Feb 11 2017 The idea behind it is like this (which you can scale up):
- Nestor (7/37) Feb 11 2017 Notice this is no ordinary matrix, but an Anti-Simmetric
- Era Scarecrow (6/11) Feb 11 2017 Yes i know, which is why i had 3 to calculate 2 inputs because
- Era Scarecrow (32/34) Feb 11 2017 Alright I've found the bug and fixed it, and it passes with
- Era Scarecrow (12/14) Feb 11 2017 Just ran the unittests under the dmd profiler, says the
- Era Scarecrow (14/16) Feb 11 2017 Ran some more tests.
- Nestor (3/19) Feb 12 2017 Wow!
- Era Scarecrow (23/28) Feb 12 2017 Certainly. But the bulk of the answer comes down that the 2
- Nestor (5/40) Feb 11 2017 I fail to see where you are declaring QG10Matrix2, because
- Era Scarecrow (17/20) Feb 11 2017 I declared it here:
- Nestor (4/7) Feb 09 2017 I forgot to comment that what is multiplied by ten is not the
Hi, I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome: import std.stdio; static immutable char[] QG10Matrix = "03175986427092154863420687135917509834266123045978" ~ "36742095815869720134894536201794386172052581436790"; char checkDigit(string str) { char tmpdigit = '0'; foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10]; return tmpdigit; } enum { EXIT_SUCCESS = 0, EXIT_FAILURE = 1, } int main(string[] args) { scope(failure) { writeln("Invalid arguments. You must pass a number."); return EXIT_FAILURE; } assert(args.length == 2); char digit = checkDigit(args[1]); if(digit == '0') writefln("%s is a valid number.", args[1]); else { writefln("%s is not a valid number (but it would be, appending digit %s).", args[1], digit); } return EXIT_SUCCESS; } [1] https://en.wikiversity.org/wiki/Damm_algorithm
Feb 09 2017
On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:Hi, I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome: import std.stdio; static immutable char[] QG10Matrix = "03175986427092154863420687135917509834266123045978" ~ "36742095815869720134894536201794386172052581436790"; char checkDigit(string str) { char tmpdigit = '0'; foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10]; return tmpdigit; } enum { EXIT_SUCCESS = 0, EXIT_FAILURE = 1, } int main(string[] args) { scope(failure) { writeln("Invalid arguments. You must pass a number."); return EXIT_FAILURE; } assert(args.length == 2); char digit = checkDigit(args[1]); if(digit == '0') writefln("%s is a valid number.", args[1]); else { writefln("%s is not a valid number (but it would be, appending digit %s).", args[1], digit); } return EXIT_SUCCESS; } [1] https://en.wikiversity.org/wiki/Damm_algorithmI can't see how to make it more efficient than it is (aside from finding a better algorithm which isn't a language concern). You are using C style and will get the same machine code as if you'd written it with C. The only difference would be boundchecking that you can disable by using the -boundscheck=off compiler switch. By the way note that you can use the -profile switch to see where the bottlenecks are, in your case writting the result to stdout is *by far* what takes the most time (and I'm quite certain the bottleneck is the same in the C version for it's a fact that I/O is slow).
Feb 09 2017
On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome: import std.stdio; static immutable char[] QG10Matrix = "03175986427092154863420687135917509834266123045978" ~ "36742095815869720134894536201794386172052581436790"; char checkDigit(string str) { char tmpdigit = '0'; foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10]; return tmpdigit; }Well one thing is you can probably reduce them from chars to just bytes, instead of having to subtract you can instead add at the end. Although unless you're working with a VERY large input you won't see a difference. Actually since you're also multiplying by 10, you can incorporate that in the table too... (although a mixin might be better for the conversion than by hand) static immutable char[] QG10Matrix = [ 00,30,10,70,50,90,80,60,40,20, 70,00,90,20,10,50,40,80,60,30, 40,20,00,60,80,70,10,30,50,90, 10,70,50,00,90,80,30,40,20,60, 60,10,20,30,00,40,50,90,70,80, 30,60,70,40,20,00,90,50,80,10, 50,80,60,90,70,20,00,10,30,40, 80,90,40,50,30,60,20,00,10,70, 90,40,30,80,60,10,70,20,00,50, 20,50,80,10,40,30,60,70,90,00]; char checkDigit(string str) { char tmpdigit = 0; foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + tmpdigit]; return (tmpdigit/10) + '0'; }
Feb 09 2017
On Thursday, 9 February 2017 at 18:34:30 UTC, Era Scarecrow wrote:On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:OK I changed the approach using a multidimensional array for the matrix so I could ditch arithmetic operations altogether, but curiously after measuring a few thousand runs of both implementations through avgtime, I see no noticeable difference. Why? import std.stdio; static immutable ubyte[][] QG10Matrix = [ [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3], [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6], [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1], [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7], [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0], ]; static int charToInt(char chr) { scope(failure) return -1; return cast(int)(chr - '0'); } ubyte checkDigit(string str) { ubyte tmpdigit; foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)]; return tmpdigit; } enum { EXIT_SUCCESS = 0, EXIT_FAILURE = 1, } int main(string[] args) { scope(failure) { writeln("Invalid arguments. You must pass a number."); return EXIT_FAILURE; } assert(args.length == 2); ubyte digit = checkDigit(args[1]); if(digit == 0) writefln("%s is a valid number.", args[1]); else { writefln("%s is not a valid number (but it would be, appending digit %s).", args[1], digit); } return EXIT_SUCCESS; }I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome: import std.stdio; static immutable char[] QG10Matrix = "03175986427092154863420687135917509834266123045978" ~ "36742095815869720134894536201794386172052581436790"; char checkDigit(string str) { char tmpdigit = '0'; foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10]; return tmpdigit; }Well one thing is you can probably reduce them from chars to just bytes, instead of having to subtract you can instead add at the end. Although unless you're working with a VERY large input you won't see a difference. Actually since you're also multiplying by 10, you can incorporate that in the table too... (although a mixin might be better for the conversion than by hand) static immutable char[] QG10Matrix = [ 00,30,10,70,50,90,80,60,40,20, 70,00,90,20,10,50,40,80,60,30, 40,20,00,60,80,70,10,30,50,90, 10,70,50,00,90,80,30,40,20,60, 60,10,20,30,00,40,50,90,70,80, 30,60,70,40,20,00,90,50,80,10, 50,80,60,90,70,20,00,10,30,40, 80,90,40,50,30,60,20,00,10,70, 90,40,30,80,60,10,70,20,00,50, 20,50,80,10,40,30,60,70,90,00]; char checkDigit(string str) { char tmpdigit = 0; foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + tmpdigit]; return (tmpdigit/10) + '0'; }
Feb 09 2017
Dne 9.2.2017 v 20:39 Nestor via Digitalmars-d-learn napsal(a):On Thursday, 9 February 2017 at 18:34:30 UTC, Era Scarecrow wrote:Maybe you can try use static array instead of dynamic static immutable ubyte[10][10] QG10Matrix = ...On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:OK I changed the approach using a multidimensional array for the matrix so I could ditch arithmetic operations altogether, but curiously after measuring a few thousand runs of both implementations through avgtime, I see no noticeable difference. Why? import std.stdio; static immutable ubyte[][] QG10Matrix = [ [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3], [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6], [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1], [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7], [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0], ]; static int charToInt(char chr) { scope(failure) return -1; return cast(int)(chr - '0'); } ubyte checkDigit(string str) { ubyte tmpdigit; foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)]; return tmpdigit; } enum { EXIT_SUCCESS = 0, EXIT_FAILURE = 1, } int main(string[] args) { scope(failure) { writeln("Invalid arguments. You must pass a number."); return EXIT_FAILURE; } assert(args.length == 2); ubyte digit = checkDigit(args[1]); if(digit == 0) writefln("%s is a valid number.", args[1]); else { writefln("%s is not a valid number (but it would be, appending digit %s).", args[1], digit); } return EXIT_SUCCESS; }I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome: import std.stdio; static immutable char[] QG10Matrix = "03175986427092154863420687135917509834266123045978" ~ "36742095815869720134894536201794386172052581436790"; char checkDigit(string str) { char tmpdigit = '0'; foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10]; return tmpdigit; }Well one thing is you can probably reduce them from chars to just bytes, instead of having to subtract you can instead add at the end. Although unless you're working with a VERY large input you won't see a difference. Actually since you're also multiplying by 10, you can incorporate that in the table too... (although a mixin might be better for the conversion than by hand) static immutable char[] QG10Matrix = [ 00,30,10,70,50,90,80,60,40,20, 70,00,90,20,10,50,40,80,60,30, 40,20,00,60,80,70,10,30,50,90, 10,70,50,00,90,80,30,40,20,60, 60,10,20,30,00,40,50,90,70,80, 30,60,70,40,20,00,90,50,80,10, 50,80,60,90,70,20,00,10,30,40, 80,90,40,50,30,60,20,00,10,70, 90,40,30,80,60,10,70,20,00,50, 20,50,80,10,40,30,60,70,90,00]; char checkDigit(string str) { char tmpdigit = 0; foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + tmpdigit]; return (tmpdigit/10) + '0'; }
Feb 09 2017
On Thursday, 9 February 2017 at 20:46:06 UTC, Daniel Kozak wrote:Maybe you can try use static array instead of dynamic static immutable ubyte[10][10] QG10Matrix = ...I shaved it to this to discard unneccessary time-consuming functions: static immutable ubyte[10][10] QG10Matrix = [ [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3], [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6], [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1], [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7], [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0], ]; static immutable string number = "0123456789012345678901234567890123456789012345678901234567890123456789"; static int charToInt(char chr) { return ((chr >= '0') || (chr <= '9')) ? cast(int)(chr - '0') : -1; } ubyte checkDigit(string str) { ubyte tmpdigit; foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)]; return tmpdigit; } void main() { auto digit = checkDigit(number); } I even tried making checkDigit static, but surprisingly this increased average execution time by 1ms. Anyway, the previopus version (modified to benefit from a little optimization too) is almost as performant, even though it includes multiplication and sum: static immutable char[] QG10Matrix = "03175986427092154863420687135917509834266123045978" ~ "36742095815869720134894536201794386172052581436790"; static immutable string number = "0123456789012345678901234567890123456789012345678901234567890123456789"; static int charToInt(char chr) { return ((chr >= '0') || (chr <= '9')) ? cast(int)(chr - '0') : -1; } char checkDigit(string str) { char tmpdigit = '0'; foreach(chr; str) tmpdigit = QG10Matrix[charToInt(chr) + (charToInt(tmpdigit) * 10)]; return tmpdigit; } void main() { auto digit = checkDigit(number); } I compiled both with -inline -noboundscheck -release and the multidimensional array version does perform 1ms faster for a couple hundred runs, but I expected the difference to be much more noticeable. Any idea of what might be happening here?
Feb 09 2017
Dne 9.2.2017 v 22:29 Nestor via Digitalmars-d-learn napsal(a):On Thursday, 9 February 2017 at 20:46:06 UTC, Daniel Kozak wrote:Did you try it with different backends? llvm (ldc), gcc(gdc)?Maybe you can try use static array instead of dynamic static immutable ubyte[10][10] QG10Matrix = ...I shaved it to this to discard unneccessary time-consuming functions: static immutable ubyte[10][10] QG10Matrix = [ [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3], [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6], [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1], [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7], [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0], ]; static immutable string number = "0123456789012345678901234567890123456789012345678901234567890123456789"; static int charToInt(char chr) { return ((chr >= '0') || (chr <= '9')) ? cast(int)(chr - '0') : -1; } ubyte checkDigit(string str) { ubyte tmpdigit; foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)]; return tmpdigit; } void main() { auto digit = checkDigit(number); } I even tried making checkDigit static, but surprisingly this increased average execution time by 1ms. Anyway, the previopus version (modified to benefit from a little optimization too) is almost as performant, even though it includes multiplication and sum: static immutable char[] QG10Matrix = "03175986427092154863420687135917509834266123045978" ~ "36742095815869720134894536201794386172052581436790"; static immutable string number = "0123456789012345678901234567890123456789012345678901234567890123456789"; static int charToInt(char chr) { return ((chr >= '0') || (chr <= '9')) ? cast(int)(chr - '0') : -1; } char checkDigit(string str) { char tmpdigit = '0'; foreach(chr; str) tmpdigit = QG10Matrix[charToInt(chr) + (charToInt(tmpdigit) * 10)]; return tmpdigit; } void main() { auto digit = checkDigit(number); } I compiled both with -inline -noboundscheck -release and the multidimensional array version does perform 1ms faster for a couple hundred runs, but I expected the difference to be much more noticeable. Any idea of what might be happening here?
Feb 09 2017
On Thursday, 9 February 2017 at 21:43:08 UTC, Daniel Kozak wrote:Not really, just standard dmd. I tried running each algoritm a few times through avgtime using different digit lengths (upto 10,000, my PC won't handle much more) and different amount of repetitions, and the results aren't consistent, some times one algorithm is marginally faster, sometimes the other. Apparently the compiler is capable of optimizing the unidimensional array version. Thank you all nevertheless for the suggestions.Any idea of what might be happening here?Did you try it with different backends? llvm (ldc), gcc(gdc)?
Feb 09 2017
On 02/09/2017 02:04 PM, Nestor wrote:I tried running each algoritm a few times through avgtime using different digit lengthsavgtime profiles the whole process, right? It measures everything that is involved in that little program. At least OS starting the program, D runtime initializing, program interacting with the console, etc. Since you're comparing algorithms, you need to take everything else out of the equation. A better approach is to use std.datetime.benchmark, which would repeat just the algorithm that you're interested in. Ali
Feb 09 2017
Dne 9.2.2017 v 23:08 Ali Çehreli via Digitalmars-d-learn napsal(a):On 02/09/2017 02:04 PM, Nestor wrote:Yes this is important, and for example when you used shared libphobos it makes really big differenceI tried running each algoritm a few times through avgtime using different digit lengthsavgtime profiles the whole process, right? It measures everything that is involved in that little program. At least OS starting the program, D runtime initializing, program interacting with the console, etc. Since you're comparing algorithms, you need to take everything else out of the equation. A better approach is to use std.datetime.benchmark, which would repeat just the algorithm that you're interested in. Ali
Feb 09 2017
On Thursday, 9 February 2017 at 19:39:49 UTC, Nestor wrote:On Thursday, 9 February 2017 at 18:34:30 UTC, Era Scarecrow wrote:The question is, why do you expect it to be noticably faster? On modern hardware it the optimizations are such that so little a change in code is very hard to link to a difference in running time. If you really want to show one the following code does the benchmark the right way (ie: using a very long input, tens of thousands of runs, and avoiding I/O and load time of the program to compare the bare function implementations): import std.conv; import std.stdio; import std.range; import std.array; import std.algorithm; string testcase; static immutable char[] QG10MatrixOne = "03175986427092154863420687135917509834266123045978" ~ "36742095815869720134894536201794386172052581436790"; char checkDigitOne(string str) { char tmpdigit = 0; foreach(chr; str) tmpdigit = QG10MatrixOne[(chr - '0') + (tmpdigit - '0') * 10]; return tmpdigit; } void testCheckDigitOne() { checkDigitTwo(testcase); } static immutable ubyte[][] QG10MatrixTwo = [ [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3], [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6], [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1], [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7], [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0], ]; static int charToInt(char chr) { scope(failure) return -1; return cast(int)(chr - '0'); } ubyte checkDigitTwo(string str) { ubyte tmpdigit; foreach(chr; str) { tmpdigit = QG10MatrixTwo[tmpdigit][charToInt(chr)]; } return tmpdigit; } void testCheckDigitTwo() { checkDigitTwo(testcase); } void main(string[] args) { testcase = iota(10).cycle.take(100000).map!(to!string).array.join; import std.datetime: benchmark; benchmark!(testCheckDigitOne, testCheckDigitTwo)(10000).each!writeln; } Yet on my machine I have the following times (note that the time itself depends on my hardware, what's important is the difference): TickDuration(15785255852) TickDuration(15784826803) So it seems that the first version is slower than the second one, but by so little that it's hard to link to the actual implementation. If anything it shows the futility of such low-level tricks for meaningless optimization. And by the way note that the benchmark avoid measuring IO because we were interested in the functions. But if we were measuring it it would represent 4 fifth of the running time (on my machine, with numbers as long as the one used in the benchmark). This means even a 20% improvement in checkDigit would actually only represent a 4% improvement of the program.On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:OK I changed the approach using a multidimensional array for the matrix so I could ditch arithmetic operations altogether, but curiously after measuring a few thousand runs of both implementations through avgtime, I see no noticeable difference. Why? import std.stdio; static immutable ubyte[][] QG10Matrix = [ [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3], [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6], [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1], [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7], [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0], ]; static int charToInt(char chr) { scope(failure) return -1; return cast(int)(chr - '0'); } ubyte checkDigit(string str) { ubyte tmpdigit; foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)]; return tmpdigit; } enum { EXIT_SUCCESS = 0, EXIT_FAILURE = 1, } int main(string[] args) { scope(failure) { writeln("Invalid arguments. You must pass a number."); return EXIT_FAILURE; } assert(args.length == 2); ubyte digit = checkDigit(args[1]); if(digit == 0) writefln("%s is a valid number.", args[1]); else { writefln("%s is not a valid number (but it would be, appending digit %s).", args[1], digit); } return EXIT_SUCCESS; }I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome: import std.stdio; static immutable char[] QG10Matrix = "03175986427092154863420687135917509834266123045978" ~ "36742095815869720134894536201794386172052581436790"; char checkDigit(string str) { char tmpdigit = '0'; foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10]; return tmpdigit; }Well one thing is you can probably reduce them from chars to just bytes, instead of having to subtract you can instead add at the end. Although unless you're working with a VERY large input you won't see a difference. Actually since you're also multiplying by 10, you can incorporate that in the table too... (although a mixin might be better for the conversion than by hand) static immutable char[] QG10Matrix = [ 00,30,10,70,50,90,80,60,40,20, 70,00,90,20,10,50,40,80,60,30, 40,20,00,60,80,70,10,30,50,90, 10,70,50,00,90,80,30,40,20,60, 60,10,20,30,00,40,50,90,70,80, 30,60,70,40,20,00,90,50,80,10, 50,80,60,90,70,20,00,10,30,40, 80,90,40,50,30,60,20,00,10,70, 90,40,30,80,60,10,70,20,00,50, 20,50,80,10,40,30,60,70,90,00]; char checkDigit(string str) { char tmpdigit = 0; foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + tmpdigit]; return (tmpdigit/10) + '0'; }
Feb 09 2017
On Thursday, 9 February 2017 at 19:39:49 UTC, Nestor wrote:OK I changed the approach using a multidimensional array for the matrix so I could ditch arithmetic operations altogether, but curiously after measuring a few thousand runs of both implementations through avgtime, I see no noticeable difference. Why?Truthfully, because you'll need tens of millions or hundreds of millions in length to determine if it makes a difference and how much. Addition, subtraction and simple memory lookups take very little time, and since the entire array (100 bytes) fits in the cache, it is going to perform very very very well regardless if you can optimize it further. If you tested this on a much slower system, say an 8bit 6502 the differences would be far more pronounced, but not that much different. Since the algorithm is more or less O(n) optimizing it won't make many differences. It's possible you could get a speedup by making them ints instead of chars, since then it might not have a penalty for the 'address not divisible by 4' that applies (which is more for ARM architectures and less for x86). Other optimizations could be to make it multiple levels, taking the basic 100 elements and expanding them 2-3 levels deep in a lookup and having it do it in more or less a single operation. (100 bytes for 1 level, 10,000 for 2 levels, 1,000,000 for 3 levels, 100,000,000 for 4 levels, etc), but the steps of converting it to the array lookup won't give you that much gain, although fewer memory lookups but none of them will be cached, so any advantage from that is probably lost. Although if you bump up the size to 16x10 instead of 10x10, you could use a shift instead of *10 which will make that slightly faster (there will be unused empty padded spots) In theory if you avoid the memory lookup at all, you could gain some amount of speed, depending on how it searches a manual table, although using a switch-case and a mixin to do all the details it feels like it wouldn't give you any gain... Division operations are the slowest operations you can do, but otherwise most instructions run really fast. Unless you're trying to get it smaller (fewer bytes for the call) or shaving for speed by instruction cycle counting (like on the 6502) i doubt you'll get much benefit.
Feb 09 2017
On Thursday, 9 February 2017 at 23:49:19 UTC, Era Scarecrow wrote:Other optimizations could be to make it multiple levels, taking the basic 100 elements and expanding them 2-3 levels deep in a lookup and having it do it in more or less a single operation. (100 bytes for 1 level, 10,000 for 2 levels, 1,000,000 for 3 levels, 100,000,000 for 4 levels, etc), but the steps of converting it to the array lookup won't give you that much gain, although fewer memory lookups but none of them will be cached, so any advantage from that is probably lost. Although if you bump up the size to 16x10 instead of 10x10, you could use a shift instead of *10 which will make that slightly faster (there will be unused empty padded spots) In theory if you avoid the memory lookup at all, you could gain some amount of speed, depending on how it searches a manual table, although using a switch-case and a mixin to do all the details it feels like it wouldn't give you any gain...Thank you for the detailed reply. I wasn't able to follow you regarding the multilevel stuff though :(
Feb 10 2017
On Friday, 10 February 2017 at 11:27:02 UTC, Nestor wrote:Thank you for the detailed reply. I wasn't able to follow you regarding the multilevel stuff though :(The idea behind it is like this (which you can scale up): static immutable int[] QG10Matrix2 = buildMatrix2(); int[] buildMatrix2() { string digits = "0123456789"; int[] l = new int[16*16*10]; char[3] s; foreach(a; digits) foreach(b; digits) foreach(c; digits) { s[] = [a,b,c]; l[(a-'0')<< 8|(b-'0')<<4|(c-'0')]=checkDigit(cast(string) s) - '0'; } return l; } Using that it SHOULD allow you to get the result of 2 inputs simply by using 2 characters (plus the old result) char checkDigit2(string str) { int tmpdigit = 0; for(;str.length >= 2;str=str[2 .. $]) { tmpdigit = QG10Matrix2[tmpdigit<<8|(str[0]-'0')<< 4|(str[1]-'0')]; } // handle remainder single character and return value While it should be easy, I'm having issues trying to get the proper results via unittests and I'm not sure why. Probably something incredibly simple on my part.
Feb 11 2017
On Saturday, 11 February 2017 at 11:45:02 UTC, Era Scarecrow wrote:On Friday, 10 February 2017 at 11:27:02 UTC, Nestor wrote:Notice this is no ordinary matrix, but an Anti-Simmetric QuasiGroup of order 10, and tmpdigit (called interim in the algorithm) is used in each round (although the function isn't recursive) together with each digit to calculate final check digit.Thank you for the detailed reply. I wasn't able to follow you regarding the multilevel stuff though :(The idea behind it is like this (which you can scale up): static immutable int[] QG10Matrix2 = buildMatrix2(); int[] buildMatrix2() { string digits = "0123456789"; int[] l = new int[16*16*10]; char[3] s; foreach(a; digits) foreach(b; digits) foreach(c; digits) { s[] = [a,b,c]; l[(a-'0')<< 8|(b-'0')<<4|(c-'0')]=checkDigit(cast(string) s) - '0'; } return l; } Using that it SHOULD allow you to get the result of 2 inputs simply by using 2 characters (plus the old result) char checkDigit2(string str) { int tmpdigit = 0; for(;str.length >= 2;str=str[2 .. $]) { tmpdigit = QG10Matrix2[tmpdigit<<8|(str[0]-'0')<< 4|(str[1]-'0')]; } // handle remainder single character and return value While it should be easy, I'm having issues trying to get the proper results via unittests and I'm not sure why. Probably something incredibly simple on my part.
Feb 11 2017
On Saturday, 11 February 2017 at 20:19:51 UTC, Nestor wrote:Notice this is no ordinary matrix, but an Anti-Simmetric QuasiGroup of order 10, and tmpdigit (called interim in the algorithm) is used in each round (although the function isn't recursive) together with each digit to calculate final check digit.Yes i know, which is why i had 3 to calculate 2 inputs because the third is the temp/previous calculation. If however you were calculating a fixed number of digits a single table could be made and do a single lookup, assuming it wasn't too large to make it uncumbersome or impractical.
Feb 11 2017
On Saturday, 11 February 2017 at 21:02:40 UTC, Era Scarecrow wrote:Yes i know, which is why i had 3 to calculate 2 inputs because the third is the temp/previous calculation.Alright I've found the bug and fixed it, and it passes with flying colors (brute force tests up to 6 digits); However it doesn't use the original function to build the table. So I'm satisfied it will handle any length now. But it seriously is a lot of overhead for such a simple function. int[] buildMatrix2() { string digits = "0123456789"; int[] l = new int[16*16*10]; l[] = -1; //printing the array it's obvious to see what is padding foreach(a; digits) foreach(b; digits) foreach(c; digits) { int t = (a-'0')*10, t2 = (QG10Matrix[(b - '0') + t]-'0') * 10, off = (a - '0') << 8 | (b - '0') << 4 | (c - '0'); l[off] = (QG10Matrix[(c - '0') + t2]-'0')<<8; } return l; } char checkDigit2(string str) { int tmpdigit = 0; for(;str.length >= 2;str = str[2 .. $]) tmpdigit = QG10Matrix2[tmpdigit|(str[0]-'0')<<4|(str[1]-'0')]; tmpdigit>>=8; if (str.length==1) return QG10Matrix[(str[0]-'0')+tmpdigit*10]; return (tmpdigit+'0') & 0xff; }
Feb 11 2017
On Saturday, 11 February 2017 at 21:41:11 UTC, Era Scarecrow wrote:But it seriously is a lot of overhead for such a simple function.Just ran the unittests under the dmd profiler, says the algorithm is 11% faster now. So yeah slightly more optimized. Another level and we could probably get 25%, but the built matrix will blow up far larger than the 10k it is now. Num Tree Func Per Calls Time Time Call 12000000 1281989 1281989 0 char damm.checkDigit(immutable(char)[]) 12000000 1146308 1146308 0 char damm.checkDigit2(immutable(char)[])
Feb 11 2017
On Saturday, 11 February 2017 at 21:56:54 UTC, Era Scarecrow wrote:Just ran the unittests under the dmd profiler, says the algorithm is 11% faster now. So yeah slightly more optimized.Ran some more tests. Without optimization but with with 4 levels (a 2.5Mb table), it gains to a whopping 27%! However with optimizations turned on it dwindled to a mere 15% boost And optimization + no bounds checking, 2 & 4 levels both give a 9% boost total. Testing purely on 8byte inputs (Brute forced all combinations) receives the same 9% boost with negligible difference. Safe to say going higher levels isn't going to give you sufficient improvement; Also exe file is 3Mb big (but compresses to 150k).
Feb 11 2017
On Sunday, 12 February 2017 at 05:54:34 UTC, Era Scarecrow wrote:On Saturday, 11 February 2017 at 21:56:54 UTC, Era Scarecrow wrote:Wow! Thanks for the interest and effort.Just ran the unittests under the dmd profiler, says the algorithm is 11% faster now. So yeah slightly more optimized.Ran some more tests. Without optimization but with with 4 levels (a 2.5Mb table), it gains to a whopping 27%! However with optimizations turned on it dwindled to a mere 15% boost And optimization + no bounds checking, 2 & 4 levels both give a 9% boost total. Testing purely on 8byte inputs (Brute forced all combinations) receives the same 9% boost with negligible difference. Safe to say going higher levels isn't going to give you sufficient improvement; Also exe file is 3Mb big (but compresses to 150k).
Feb 12 2017
On Monday, 13 February 2017 at 00:56:37 UTC, Nestor wrote:On Sunday, 12 February 2017 at 05:54:34 UTC, Era Scarecrow wrote:Certainly. But the bulk of the answer comes down that the 2 levels that I've already provided are the fastest you're probably going to get. Certainly we can test using shorts or bytes instead, but it's likely the results will only go down. To note my tests are strictly on my x86 system and it would be better to also test this on other systems like PPC, Linux, ARM, and other architectures to see how they perform, and possibly tweak them as appropriate. Still we did find out there is some optimization that can be done and successfully for the Damm algorithm, it just isn't going to be a lot. Hmmm... A thought does come to mind. Parallelizing the code; However that would require probably 11 instances to get a 2x speedup (calculating the second half with all 10 possibilities for the carry over, and also calculating the first half, then choosing which of the 10 based on the first half's output), which only really works if you have a ton of cores, and the input is REALLY REALLY large, like a meg or something. While the usage of the Damm code is more useful for adding a digit to the end of a code like UPC or Barcodes as error detection, and expecting larger than 32 for real applications is unlikely. But at this point I'm rambling.Ran some more tests.Wow! Thanks for the interest and effort.
Feb 12 2017
On Saturday, 11 February 2017 at 21:41:11 UTC, Era Scarecrow wrote:On Saturday, 11 February 2017 at 21:02:40 UTC, Era Scarecrow wrote:I fail to see where you are declaring QG10Matrix2, because apparently it's an array of chars, but buildMatrix2 returns an array of int (2560 elements??) with lots of -1 values.Yes i know, which is why i had 3 to calculate 2 inputs because the third is the temp/previous calculation.Alright I've found the bug and fixed it, and it passes with flying colors (brute force tests up to 6 digits); However it doesn't use the original function to build the table. So I'm satisfied it will handle any length now. But it seriously is a lot of overhead for such a simple function. int[] buildMatrix2() { string digits = "0123456789"; int[] l = new int[16*16*10]; l[] = -1; //printing the array it's obvious to see what is padding foreach(a; digits) foreach(b; digits) foreach(c; digits) { int t = (a-'0')*10, t2 = (QG10Matrix[(b - '0') + t]-'0') * 10, off = (a - '0') << 8 | (b - '0') << 4 | (c - '0'); l[off] = (QG10Matrix[(c - '0') + t2]-'0')<<8; } return l; } char checkDigit2(string str) { int tmpdigit = 0; for(;str.length >= 2;str = str[2 .. $]) tmpdigit = QG10Matrix2[tmpdigit|(str[0]-'0')<<4|(str[1]-'0')]; tmpdigit>>=8; if (str.length==1) return QG10Matrix[(str[0]-'0')+tmpdigit*10]; return (tmpdigit+'0') & 0xff; }
Feb 11 2017
On Sunday, 12 February 2017 at 00:43:55 UTC, Nestor wrote:I fail to see where you are declaring QG10Matrix2, because apparently it's an array of chars, but buildMatrix2 returns an array of int (2560 elements??) with lots of -1 values.I declared it here: http://forum.dlang.org/post/fiamjuhiddbzwvaplxkr forum.dlang.org and it's not chars, it's ints. Part of it is to avoid the mem access penalty for non divisible by 4 addresses, and another is that the array returns not chars, but numbers from 0-2560, which is the answer *256 (<<8), which offers hopefully some slight speed gain on longer inputs. Also the size of the array is guesstimated, and the -1's just signify the padding (which can be any value, but -1 makes it obvious). Since it's a 10x10 array it's based on, but using 4bits per section, there's 6 elements lost, and 6 whole rows lost. It's a necessary loss to gain speed; Thankfully it's only using 10k (2560 members) and not 700k as was my original guess when i was calculating it wrong earlier. Doing it wrong earlier, the compiler kept crashing... :P running out of memory.
Feb 11 2017
On Thursday, 9 February 2017 at 18:34:30 UTC, Era Scarecrow wrote:... Actually since you're also multiplying by 10, you can incorporate that in the table too...I forgot to comment that what is multiplied by ten is not the value but the starting position in the array (a way to emulate a matrix), but anyway see my comments in previous post.
Feb 09 2017