digitalmars.D.bugs - [Issue 12176] New: Possible std.algorithm.sum optimization for short fixed-size arrays
- d-bugmail puremagic.com (109/109) Feb 15 2014 https://d.puremagic.com/issues/show_bug.cgi?id=12176
https://d.puremagic.com/issues/show_bug.cgi?id=12176 Summary: Possible std.algorithm.sum optimization for short fixed-size arrays Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: bearophile_hugs eml.cc I suggest to modify std.algorithm.sum (to add an overload) to let it accept fixed-sized arrays too, and then add a specialization for short fixed-sized arrays like: int[3] a = [10, 20, 30]; immutable tot = a.sum; This expands the places where you are willing to use sum() because with such specialization a normal D compiler is able to produce inlined loop-free asm code similar to writing down a manual sum of the array items: immutable tot = a[0] + a[1] + a[2]; Generic library code should not be slower than built-in features, where possible, and short fixed-sized arrays are not uncommon in high performance code (and currently DMD doesn't inline code containing a loop. And sometimes even with LDC2 loops are not unrolled if their size is a run-time value. To unroll them it should inline the function and then see that the run-time length is actually a compile-time one). If you know at compile-time the length of an array, this is precious information, and it's bad to just throw such information away, especially when such length is small (when the length is large it makes no difference). This specialization should be present only for small arrays (like Length <= 5). For the other cases and to reduce template bloat the sum() specialized on fixed-sized arrays should just call (with a "static if") the normal sum for dynamic arrays with a slicing: auto sum(size_t N)(T[N] arr) { static if (N < 6) { // short-case implementation here ... } else { return sum(arr[]); } } Using the sum() code from std.algorithm, this main() gets compiled by the latest ldc2: int main() { int[3] a = [10, 20, 30]; return cast(int)a[].sum; } As (-O -release -inline -noboundscheck -output-s): __Dmain: pushl %ebp pushl %ebx pushl %edi pushl %esi subl $28, %esp movl $10, 16(%esp) movl $20, 20(%esp) movl $30, 24(%esp) leal 16(%esp), %edi movl %edi, 4(%esp) movl $3, (%esp) xorl %esi, %esi calll __D3std5array12__T5emptyTiZ5emptyFNaNbNdNfxAiZb subl $8, %esp testb $1, %al jne LBB0_3 movl $3, %eax movl %edi, %ecx leal 20(%esp), %ebx xorl %esi, %esi movl $2, %ebp xorl %edi, %edi .align 16, 0x90 LBB0_2: movl %ecx, 4(%esp) movl %eax, (%esp) calll __D3std5array12__T5frontTiZ5frontFNaNbNcNdNfAiZi subl $8, %esp movl (%eax), %eax movl %eax, %ecx sarl $31, %ecx addl %eax, %esi adcl %ecx, %edi movl %ebx, 8(%esp) movl %ebp, 12(%esp) movl %ebx, 4(%esp) movl %ebp, (%esp) calll __D3std5array12__T5emptyTiZ5emptyFNaNbNdNfxAiZb subl $8, %esp movl 8(%esp), %ecx decl %ebp addl $4, %ebx testb $1, %al movl 12(%esp), %eax je LBB0_2 LBB0_3: movl %esi, %eax addl $28, %esp popl %esi popl %edi popl %ebx popl %ebp ret The idea proposed here is just an optimization, so in theory a sufficiently smart compiler should not need it. -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 15 2014