www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Struct copies

reply "bearophile" <bearophileHUGS lycos.com> writes:
The following code is compiled with the ldc2 compiler based on 
LLVM 3.3.1.

This swaps two values in-place:

void swap(T)(ref T x, ref T y) pure nothrow {
     immutable aux = x;
     x = y;
     y = aux;
}


If I swap uint values I get the asm and IR:

__D5test611__T4swapTkZ4swapFNaNbNfKkKkZv:
	pushl	%esi
	movl	8(%esp), %ecx
	movl	(%ecx), %edx
	movl	(%eax), %esi
	movl	%esi, (%ecx)
	movl	%edx, (%eax)
	popl	%esi
	ret	$4


; Function Attrs: nounwind
define x86_stdcallcc void  "\01__D5test65swap1FNaNbKkKkZv"(i32* 
inreg nocapture %y_arg, i32* nocapture %x_arg) #0 {
entry:
   %tmp = load i32* %x_arg, align 4
   %tmp2 = load i32* %y_arg, align 4
   store i32 %tmp2, i32* %x_arg, align 4
   store i32 %tmp, i32* %y_arg, align 4
   ret void
}


Often I have a simple struct like this, with a sizeof equal to a 
size_t or two size_t (a size_t is a 32 bit unsigned on this 
system):

struct Foo {
     ushort a;
     char b, c;
}


If I instantiate the swap function template on values of type Foo 
I get the asm and IR:

__D5test621__T4swapTS5test63FooZ4swapFNaNbNfKS5test63FooKS5test63FooZv:
	pushl	%edi
	pushl	%esi
	movl	12(%esp), %ecx
	movw	(%ecx), %dx
	movw	2(%ecx), %si
	movl	(%eax), %edi
	movl	%edi, (%ecx)
	movw	%dx, (%eax)
	movw	%si, 2(%eax)
	popl	%esi
	popl	%edi
	ret	$4


; Function Attrs: nounwind
define x86_stdcallcc void 
 "\01__D5test65swap2FNaNbKS5test63FooKS5test63FooZv"(%test6.Foo* 
inreg nocapture %y_arg, %test6.Foo* nocapture %x_arg) #0 {
entry:
   %0 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 0
   %1 = load i16* %0, align 1
   %2 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 1
   %3 = load i8* %2, align 1
   %4 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 2
   %5 = load i8* %4, align 1
   %6 = bitcast %test6.Foo* %y_arg to i32*
   %7 = bitcast %test6.Foo* %x_arg to i32*
   %8 = load i32* %6, align 1
   store i32 %8, i32* %7, align 1
   %9 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 0
   store i16 %1, i16* %9, align 1
   %10 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 1
   store i8 %3, i8* %10, align 1
   %11 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 2
   store i8 %5, i8* %11, align 1
   ret void
}


If I create a new union Bar that contains a 32 bit integer that 
comprises all three Foo fields:

union Bar {
     uint all;
     struct {
         ushort a;
         char b, c;
     }
}


Now I can define a new swap function that works on values of type 
Bar:


void swap2(ref Bar x, ref Bar y) pure nothrow {
     immutable Bar aux = x;
     x.all = y.all;
     y.all = aux.all;
}


Its asm and IR are shorter:

__D5test65swap2FNaNbKS5test63BarKS5test63BarZv:
     pushl   %esi
     movl    8(%esp), %ecx
     movl    (%ecx), %edx
     movl    (%eax), %esi
     movl    %esi, (%ecx)
     movl    %edx, (%eax)
     popl    %esi
     ret $4


; Function Attrs: nounwind
define x86_stdcallcc void 
 "\01__D5test65swap3FNaNbKS5test63BarKS5test63BarZv"(%test6.Bar* 
inreg nocapture %y_arg, %test6.Bar* nocapture %x_arg) #0 {
entry:
   %0 = getelementptr inbounds %test6.Bar* %x_arg, i32 0, i32 0
   %1 = load i32* %0, align 1
   %tmp4 = getelementptr %test6.Bar* %y_arg, i32 0, i32 0
   %tmp5 = load i32* %tmp4, align 4
   store i32 %tmp5, i32* %0, align 4
   store i32 %1, i32* %tmp4, align 4
   ret void
}


In the case of swapping Foos why isn't LLVM optimizing the swap 
function to a shorter asm like swap2? I have asked this on the 
LLVM IRC channel, and aKor has told me that similar C code Clang 
on swaps two Foo using a memcpy so uses a single 32 bit copy. So 
perhaps ldc2 can do the same for this common case.

Bye,
bearophile
Jan 26 2014
next sibling parent reply "Kai Nacke" <kai redstar.de> writes:
On Sunday, 26 January 2014 at 13:02:50 UTC, bearophile wrote:
 In the case of swapping Foos why isn't LLVM optimizing the swap 
 function to a shorter asm like swap2? I have asked this on the 
 LLVM IRC channel, and aKor has told me that similar C code 
 Clang on swaps two Foo using a memcpy so uses a single 32 bit 
 copy. So perhaps ldc2 can do the same for this common case.
Hi bearophile! In fact, ldc uses llvm.memcpy in the swap function. This is what I get with ldc 0.13.0-alpha1 using LLVM 3.4 on mingw32 with no optimization: define weak_odr x86_stdcallcc void "\01__D4swap20__T4swapTS4swap3FooZ4swapFNaNbNfKS4swap3FooKS4sw p3FooZv"(%swap.Foo* inreg %y_arg, %swap.Foo* %x_arg) { entry: %aux = alloca %swap.Foo, align 2 %tmp = bitcast %swap.Foo* %aux to i8* %tmp1 = bitcast %swap.Foo* %x_arg to i8* call void llvm.memcpy.p0i8.p0i8.i32(i8* %tmp, i8* %tmp1, i32 4, i32 1, i1 false) %tmp2 = load %swap.Foo* %aux %tmp3 = bitcast %swap.Foo* %x_arg to i8* %tmp4 = bitcast %swap.Foo* %y_arg to i8* call void llvm.memcpy.p0i8.p0i8.i32(i8* %tmp3, i8* %tmp4, i32 4, i32 1, i1 false) %tmp5 = load %swap.Foo* %x_arg %tmp6 = bitcast %swap.Foo* %y_arg to i8* %tmp7 = bitcast %swap.Foo* %aux to i8* call void llvm.memcpy.p0i8.p0i8.i32(i8* %tmp6, i8* %tmp7, i32 4, i32 1, i1 false) %tmp8 = load %swap.Foo* %y_arg ret void } Using -O2 or -O3, I get IR and ASM similar to the one you posted. I do not understand this. I'll check what clang is doing here. Regards, Kai
Jan 26 2014
parent "Kai Nacke" <kai redstar.de> writes:
On Monday, 27 January 2014 at 07:00:18 UTC, Kai Nacke wrote:
 Using -O2 or -O3, I get IR and ASM similar to the one you 
 posted. I do not understand this. I'll check what clang is 
 doing here.
The obvious difference between ldc and clang is that clang generates better alignment information. Otherwise, the IR is almost identical. Regards, Kai
Jan 26 2014
prev sibling parent reply "Stanislav Blinov" <stanislav.blinov gmail.com> writes:
It would seem that ldc is performing a memberwise assignment. It 
could probably be optimized away since it's known at compile time 
whether the fields have their own assignment overloaded or not. 
With unions it's straight: just a memcopy on the largest size 
(sadly dmd doesn't do that yet, but it also does all sorts of 
nasty things with unions). With structs it's a little more 
involving.

Generally though, pure code generation issues aside, that is one 
very strange swap function, bearophile :)
Jan 27 2014
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Stanislav Blinov:

 Generally though, pure code generation issues aside, that is 
 one very strange swap function, bearophile :)
What's strange on this? void swap(T)(ref T x, ref T y) pure nothrow { immutable aux = x; x = y; y = aux; } Bye, bearophile
Jan 27 2014
parent "Stanislav Blinov" <stanislav.blinov gmail.com> writes:
On Tuesday, 28 January 2014 at 01:39:47 UTC, bearophile wrote:
 Stanislav Blinov:

 Generally though, pure code generation issues aside, that is 
 one very strange swap function, bearophile :)
What's strange on this? void swap(T)(ref T x, ref T y) pure nothrow { immutable aux = x; x = y; y = aux; }
Won't swap references or pointers (due to immutable) or structs with disabled postblit (due to assignment). Solution to first is simple: immutable -> auto. Second would basically require you to perform memcpy manually anyway.
Jan 28 2014