www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - prinft performance problem

reply "bearophile" <bearophileHUGS lycos.com> writes:
I have reduced another performance problem, Windows 32 bit.

--------------------------

A test C program:


#include <stdio.h>
int main() {
     for (double i = 0; i < 200000; i++)
         printf("%f\n", i);
     return 0;
}

--------------------------

A similar D program:


import core.stdc.stdio: printf;
int main() {
     for (double i = 0; i < 200000; i++)
         printf("%f\n", i);
     return 0;
}

--------------------------

I compile with:

gcc -std=gnu99 -Ofast -flto -s test1.c -o test1
gcc 4.8.0

ldmd2 -wi -O -release -inline -noboundscheck test2.d
ldc2 0.13.0-alpha1

If I redirect the output to file, the run-times for me are about 
0.30 seconds for the C version, and about 1.12 seconds for the D 
version.

--------------------------

GCC asm:

_main:
	pushl	%ebp
	movl	%esp, %ebp
	andl	$-16, %esp
	subl	$32, %esp
	call	___main
	fldz
	.p2align 4,,7
L4:
	fstl	4(%esp)
	movl	$LC1, (%esp)
	fstpl	24(%esp)
	call	_printf
	fldl	24(%esp)
	fadds	LC2
	flds	LC3
	fcomip	%st(1), %st
	ja	L4

	fstp	%st(0)
	xorl	%eax, %eax
	leave
	ret

--------------------------

LDC2 asm:

__Dmain:
	pushl	%esi
	subl	$24, %esp
	xorps	%xmm0, %xmm0
	movl	$200000, %esi
	.align	16, 0x90
LBB0_1:
	movsd	%xmm0, 16(%esp)
	movsd	%xmm0, 4(%esp)
	movl	$_.str, (%esp)
	calll	___mingw_printf
	movsd	16(%esp), %xmm0
	addsd	LCPI0_0, %xmm0
	decl	%esi
	jne	LBB0_1

	xorl	%eax, %eax
	addl	$24, %esp
	popl	%esi
	ret

--------------------------

Bye,
bearophile
Mar 17 2014
parent reply David Nadlinger <code klickverbot.at> writes:
On Tue 18 Mar 2014 04:31:30 AM CET, bearophile wrote:
 I have reduced another performance problem, Windows 32 bit.

 A test C program: […]
LDC/Win32 uses the MinGW output/formatting functions, as e.g. the printf() from the MSCRT can't handle reals. I don't really see a reason why the LDC-generated code should be that much slower otherwise (you can verify this by just calling __mingw_printf directly or I think also by passing the -posix/-ansi flags to GCC). As to why the MinGW printf() is actually slower, no idea. Probably just a question of an optimized-to-hell version against a simple hack to make the C99 format specifiers work. David
Mar 18 2014
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
David Nadlinger:

 LDC/Win32 uses the MinGW output/formatting functions, as e.g. 
 the printf() from the MSCRT can't handle reals.
I am compiling the C code with the same gcc that ldc2 is using on default on Windows, as explained in the ldc2 installation procedure.
 I don't really see a reason why the LDC-generated code should 
 be that much slower otherwise
Nor I.
 (you can verify this by just calling __mingw_printf directly or 
 I think also by passing the -posix/-ansi flags to GCC).
OK. If I compile this C code: #include <stdio.h> int main() { double i; for (i = 0; i < 200000; i++) printf("%f\n", i); return 0; } With (no optimizations): gcc -ansi test2.c -o test2 The run-time is about the same (about 0.31 seconds). But if I compile this code: #include <stdio.h> int main() { double i; for (i = 0; i < 200000; i++) __mingw_printf("%f\n", i); return 0; } The run-time is about 1.13 seconds, that is the same as the D version. If I compile this version that uses __mingw_printf with: gcc -Ofast -flto -s -ansi test2.c -o test2 The run-time is still about 1.13 seconds. So the experiment you have suggested has given an interesting answer :-)
 As to why the MinGW printf() is actually slower, no idea. 
 Probably just a question of an optimized-to-hell version 
 against a simple hack to make the C99 format specifiers work.
I don't fully understand this part of your answer. And I don't understand how to fix the D code to make it about four times faster. Can you fix ldc2 to use the same printing function as used on default by C code compiled by GCC? When I have to write a lot of floating point values this could be a significant difference in run-time. Bye, bearophile
Mar 18 2014
parent reply "David Nadlinger" <code klickverbot.at> writes:
On 19 Mar 2014, at 1:30, bearophile wrote:
 David Nadlinger:
 LDC/Win32 uses the MinGW output/formatting functions, as e.g. the 
 printf() from the MSCRT can't handle reals.
I am compiling the C code with the same gcc that ldc2 is using on default on Windows, as explained in the ldc2 installation procedure. […] Can you fix ldc2 to use the same printing function as used on default by C code compiled by GCC?
Doing this would be easy, in fact that's how it was before we started serious work on LDC/MinGW. However, as mentioned in my last message, GCC by default uses the printf() function from the Microsoft C runtime, which can't handle reals (i.e. C long doubles) and some of the C99 format specifiers. Thus, using the MSCRT functions will cause serious problem for many D programs (and the DMD/Phobos test suites), as all the floating point formatting and printing functions in Phobos depend on the C runtime functions like snprintf(). long doubles and C99 format strings are not as widespread in C code, thus you explicitly have to define __MINGW_USE_ANSI_STDIO to get printf() and friends mapped to the MinGW functions in C code (I would have thought that some command line switches also affect that, but apparently I misremembered). If you are actually using C printf() directly in your program (and not the Phobos formatting functions) and the Microsoft runtime covers the format specifiers you need, then you can just manually write the function declarations in question ("extern(C) int printf(const char*, …)"). core.stdc.stdio merely contains an alias from __mingw_printf() to printf(). If the Phobos string formatting performance is not good enough for you, then the best thing to do would be to write a D floating point formatting implementation and finally ditch the C formatting functions. David
Mar 19 2014
parent "bearophile" <bearophileHUGS lycos.com> writes:
David Nadlinger:

 However, as mentioned in my last message, GCC by default uses 
 the printf() function from the Microsoft C runtime, which can't 
 handle reals (i.e. C long doubles) and some of the C99 format 
 specifiers.
I missed that part of your compressed answer :-)
 If you are actually using C printf() directly in your program 
 (and not the Phobos formatting functions)
I sometimes use printf when I have to print lot of data because writeln is usually quite slower.
 and the Microsoft runtime covers the format specifiers you 
 need, then you can just manually write the function 
 declarations in question ("extern(C) int printf(const char*, 
 …)"). core.stdc.stdio merely contains an alias from 
 __mingw_printf() to printf().
Good, this D code runs in 0.31, about as the C version: extern(C) nothrow int printf(const char*, ...); int main() { for (double i = 0; i < 200000; i++) printf("%f\n", i); return 0; } This is enough for my purposes, thank you (I don't need to print large amounts of reals).
 If the Phobos string formatting performance is not good enough 
 for you, then the best thing to do would be to write a D 
 floating point formatting implementation and finally ditch the 
 C formatting functions.
Printing floating point values correctly and quickly is a very complex project :-) Bye, bearophile
Mar 19 2014