c++.dos.32-bits - X32 bug???

Laurentiu Pancescu (37/37) Jan 31 2002 I'm using NASM to assemble an external function, then I link the OBJ fil...

Walter (7/44) Jan 31 2002 Try a simple hello world program with -mx and verify that works on your

Laurentiu Pancescu (13/15) Jan 31 2002 It does, and even pretty large programs, both in C and C++. It's no pro...

Jan Knepper (3/18) Feb 01 2002 Check http://www.dosextender.com/

Laurentiu Pancescu (6/8) Feb 01 2002 I used the latest version, when I saw those problems... it's downloaded ...

Walter (8/23) Feb 01 2002 I use assembler files with x all the time. You can view them at

Laurentiu Pancescu (11/13) Feb 02 2002 again

Walter (7/18) Feb 02 2002 there,
Walter (5/18) Feb 02 2002 Your solution is now in the FAQ! Thanks, -Walter

Laurentiu Pancescu (14/15) Feb 03 2002 Great, thanks! And I'm also glad because my MMX code works now fine wit...

Walter (9/23) Feb 03 2002 weaker

Laurentiu Pancescu (22/27) Feb 04 2002 PXOR and

Walter (12/39) Feb 04 2002 Since you said the critical loop is in the assembler code, it cannot be ...

Heinz Saathoff (7/11) Feb 05 2002 Right. The code might fit into the processor cache in one case
Laurentiu Pancescu (18/19) Feb 05 2002 You'd win the bet... almost! It was an alignment problem, indeed, not ...

"Laurentiu Pancescu" <lpancescu fastmail.fm> writes:

I'm using NASM to assemble an external function, then I link the OBJ file
with the rest of DMC compiled files, but the EXE crashes (GPF, 0DH) at run.
Only -mx is affected by this, -mn works fine, also tests made with Borland
C++ 5.5.1, Cygwin, MinGW and DJGPP (same assembly code, NASM can generate
all suitable formats).

Here's an example:

; test.asm
; use "nasm -f obj test.asm -o test.obj
segment code public use32
global _get_value
_get_value:
    push ebp
    mov ebp, esp
    mov eax, [ebp + 8]
    add eax, eax
    leave
    retn

/* main.c */
/* sc -mx main.c test.obj x32.lib */
#include <stdio.h>
unsigned get_value(unsigned);
int main(void)
{
  printf("Result is %u\n", get_value(9));
  return 0;
}

On all other configurations, the displayed value is 18, as expected.  I
looked in the DMC generated code, and everything is okay (duh!).  Maybe this
is a problem with the DOS extender?  Did anyone else encounter this problem?
If you don't use NASM, I think my asm example should be straightforward to
convert to TASM or MASM syntax.

My environment is Win2k SP2, DMC 8.26, X32 from May 15th (latest version,
you know what I'm talking about).  Any feedback would be appreciated -
thanks!

Is there a debugger for DOSX programs?  WUDEBUG only debugs WDOSX
programs... :(

Laurentiu

Jan 31 2002

"Walter" <walter digitalmars.com> writes:

Try a simple hello world program with -mx and verify that works on your
system. -Walter

"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message
news:a3c0li$2h0u$1 digitaldaemon.com...
 I'm using NASM to assemble an external function, then I link the OBJ file
 with the rest of DMC compiled files, but the EXE crashes (GPF, 0DH) at

run.
 Only -mx is affected by this, -mn works fine, also tests made with Borland
 C++ 5.5.1, Cygwin, MinGW and DJGPP (same assembly code, NASM can generate
 all suitable formats).

 Here's an example:

 ; test.asm
 ; use "nasm -f obj test.asm -o test.obj
 segment code public use32
 global _get_value
 _get_value:
     push ebp
     mov ebp, esp
     mov eax, [ebp + 8]
     add eax, eax
     leave
     retn

 /* main.c */
 /* sc -mx main.c test.obj x32.lib */
 #include <stdio.h>
 unsigned get_value(unsigned);
 int main(void)
 {
   printf("Result is %u\n", get_value(9));
   return 0;
 }

 On all other configurations, the displayed value is 18, as expected.  I
 looked in the DMC generated code, and everything is okay (duh!).  Maybe

this
 is a problem with the DOS extender?  Did anyone else encounter this

problem?
 If you don't use NASM, I think my asm example should be straightforward to
 convert to TASM or MASM syntax.

 My environment is Win2k SP2, DMC 8.26, X32 from May 15th (latest version,
 you know what I'm talking about).  Any feedback would be appreciated -
 thanks!

 Is there a debugger for DOSX programs?  WUDEBUG only debugs WDOSX
 programs... :(

 Laurentiu

Jan 31 2002

"Laurentiu Pancescu" <plaur crosswinds.net> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a3csdi$4ac$1 digitaldaemon.com...
 Try a simple hello world program with -mx and verify that works on your
 system. -Walter

It does, and even pretty large programs, both in C and C++.  It's no problem
when it's only high-level source code.  Problems arise when I try to use
externally defined functions (I use NASM for portability reasons, and
because of its cleaner syntax: I can use the same ASM file for any 32bit
compiler, and virtually any operating system!  It's very suitable for
portable MMX or 3dnow! optimizations).

It works with DMC in Win32 mode, so I think the problem is in X32, not in my
code.  I "heard" about some stack alignment problems when X32 runs under
real-mode DOS: maybe they're not the only ones?  Do you think I should
contact Mr. Doug Hoffman about this issue?

Laurentiu

Jan 31 2002

Jan Knepper <jan smartsoft.cc> writes:

Check http://www.dosextender.com/
I think Doug Huffman put out a new version...



Laurentiu Pancescu wrote:

 "Walter" <walter digitalmars.com> wrote in message
 news:a3csdi$4ac$1 digitaldaemon.com...
 Try a simple hello world program with -mx and verify that works on your
 system. -Walter

 It does, and even pretty large programs, both in C and C++.  It's no problem
 when it's only high-level source code.  Problems arise when I try to use
 externally defined functions (I use NASM for portability reasons, and
 because of its cleaner syntax: I can use the same ASM file for any 32bit
 compiler, and virtually any operating system!  It's very suitable for
 portable MMX or 3dnow! optimizations).

 It works with DMC in Win32 mode, so I think the problem is in X32, not in my
 code.  I "heard" about some stack alignment problems when X32 runs under
 real-mode DOS: maybe they're not the only ones?  Do you think I should
 contact Mr. Doug Hoffman about this issue?

 Laurentiu

Feb 01 2002

"Laurentiu Pancescu" <plaur crosswinds.net> writes:

I used the latest version, when I saw those problems... it's downloaded 2
days ago, but with the same result!  It may be related to NTVDM bugs, I
don't know... I'll try to boot with a DOS disk, and see if it still crashes.

Laurentiu

"Jan Knepper" <jan smartsoft.cc> wrote in message
news:3C5AAA54.88A9D7CA smartsoft.cc...
 Check http://www.dosextender.com/
 I think Doug Huffman put out a new version...

Feb 01 2002

"Walter" <walter digitalmars.com> writes:

 I use assembler files with x all the time. You can view them at
\dm\src\core32\*.asm and \dm\src\dos32\*.asm.

Can I suggest taking your asm file and assembling it with nasm. Try it again
using dmc's inline assembler. Obj2asm the results and compare!

"Laurentiu Pancescu" <plaur crosswinds.net> wrote in message
news:a3dhlt$f8m$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a3csdi$4ac$1 digitaldaemon.com...
 Try a simple hello world program with -mx and verify that works on your
 system. -Walter

 It does, and even pretty large programs, both in C and C++.  It's no

problem
 when it's only high-level source code.  Problems arise when I try to use
 externally defined functions (I use NASM for portability reasons, and
 because of its cleaner syntax: I can use the same ASM file for any 32bit
 compiler, and virtually any operating system!  It's very suitable for
 portable MMX or 3dnow! optimizations).

 It works with DMC in Win32 mode, so I think the problem is in X32, not in

my
 code.  I "heard" about some stack alignment problems when X32 runs under
 real-mode DOS: maybe they're not the only ones?  Do you think I should
 contact Mr. Doug Hoffman about this issue?

 Laurentiu

Feb 01 2002

"Laurentiu Pancescu" <lpancescu fastmail.fm> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a3fcef$rir$1 digitaldaemon.com...
 Can I suggest taking your asm file and assembling it with nasm. Try it

again
 using dmc's inline assembler. Obj2asm the results and compare!

I tried to obj2asm the object generated by NASM: I only saw db lines there,
instead of actual assembly code.  So, I added 'class=CODE' in the segment
declaration, and it's fine now.  Probably X32 got GPF when calling code
inside of a DATA segment.  I don't understand why this is okay with all
Windows compilers, including DMC, and also DJGPP (which also uses a DOS
extender).  Probably it's related to how the linker and the OS loader work??

Thanks,
  Laurentiu

Feb 02 2002

"Walter" <walter digitalmars.com> writes:

"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message
news:a3gdvj$1i17$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a3fcef$rir$1 digitaldaemon.com...
 Can I suggest taking your asm file and assembling it with nasm. Try it

 again
 using dmc's inline assembler. Obj2asm the results and compare!

 I tried to obj2asm the object generated by NASM: I only saw db lines

there,
 instead of actual assembly code.  So, I added 'class=CODE' in the segment
 declaration, and it's fine now.  Probably X32 got GPF when calling code
 inside of a DATA segment.  I don't understand why this is okay with all
 Windows compilers, including DMC, and also DJGPP (which also uses a DOS
 extender).  Probably it's related to how the linker and the OS loader

work??

Glad you found what was going wrong. The reason you got the crash is X32
marks the code segment as execute only, and the data as not executable.
Other dos extenders apparently don't do that.

Feb 02 2002

"Walter" <walter digitalmars.com> writes:

Your solution is now in the FAQ! Thanks, -Walter

"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message
news:a3gdvj$1i17$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a3fcef$rir$1 digitaldaemon.com...
 Can I suggest taking your asm file and assembling it with nasm. Try it

 again
 using dmc's inline assembler. Obj2asm the results and compare!

 I tried to obj2asm the object generated by NASM: I only saw db lines

there,
 instead of actual assembly code.  So, I added 'class=CODE' in the segment
 declaration, and it's fine now.  Probably X32 got GPF when calling code
 inside of a DATA segment.  I don't understand why this is okay with all
 Windows compilers, including DMC, and also DJGPP (which also uses a DOS
 extender).  Probably it's related to how the linker and the OS loader

work??
 Thanks,
   Laurentiu

Feb 02 2002

"Laurentiu Pancescu" <lpancescu fastmail.fm> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a3hkt8$232q$2 digitaldaemon.com...
 Your solution is now in the FAQ! Thanks, -Walter

Great, thanks!  And I'm also glad because my MMX code works now fine with
DMC.  However, I notice that the performance of my loop is about 20% weaker
than in the Borland or gcc cases (no external calls, only MOVQ, PXOR and
POR!).  I expect this to be the same for any compiler, since they don't
touch it.  Then, I tried to force an alignment to a paragraph border for my
assembly function, but this only made things worse by an additional 10% - I
guess OPTLINK knows better about alignments... :)

Is it possible that the way different runtime libraries initialize the FPU
affects the MMX performance (since both MMX and FPU instructions use the
same physical registers)???  There's also a slight difference between
Borland and gcc generated EXEs, about 2-3% - I don't see another reason.

Laurentiu

Feb 03 2002

"Walter" <walter digitalmars.com> writes:

"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message
news:a3jgvm$2u58$2 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a3hkt8$232q$2 digitaldaemon.com...
 Your solution is now in the FAQ! Thanks, -Walter

 Great, thanks!  And I'm also glad because my MMX code works now fine with
 DMC.  However, I notice that the performance of my loop is about 20%

weaker
 than in the Borland or gcc cases (no external calls, only MOVQ, PXOR and
 POR!).  I expect this to be the same for any compiler, since they don't
 touch it.  Then, I tried to force an alignment to a paragraph border for

my
 assembly function, but this only made things worse by an additional 10% -

I
 guess OPTLINK knows better about alignments... :)

Alignment probably is the issue. Try putting in NOPs one at a time before
your loop, and time each time.

 Is it possible that the way different runtime libraries initialize the FPU
 affects the MMX performance (since both MMX and FPU instructions use the
 same physical registers)???  There's also a slight difference between
 Borland and gcc generated EXEs, about 2-3% - I don't see another reason.

I can't imagine how that would affect things. If it does, please let me
know!

Feb 03 2002

"Laurentiu Pancescu" <lpancescu fastmail.fm> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a3knen$dig$1 digitaldaemon.com...
 I notice that the performance of my loop is about 20%
 weaker than in the Borland or gcc cases (no external calls, only MOVQ,


PXOR and
 POR).

 Alignment probably is the issue. Try putting in NOPs one at a time before
 your loop, and time each time.

I did some testing, with very interesting results: when I specified -o+space
for the compiling of the C source files, the C code performance dropped
slightly, but the MMX loop performance is the same as in the EXEs generated
by BCC or gcc (even slightly better).  I'm really confused about this, since
NASM handles my MMX loop in the same way each time, and I called OPTLINK
directly, so that it doesn't know about requirements to do space
optimization (just in case it cares about SC's -o+space).  Even more, I got
used to the fact that the corresponding DOSX program, compiled from the same
source, runs about 5-10% slower than its Win32 counterpart, but now,
with -o+space, it runs faster!!!

I also did another test, using a source with a simple C loop, seen on one of
BCC's newsgroups some months ago:
- -o, -o+speed, -o+all: execution time is 13 seconds
- no optimization flags specified: execution time is 4 seconds
- -o+space: execution time is 3 seconds

I thought -o+all is *always* the best to use, but it proves not to be the
case...  I can send you the sources for those two tests, if you want -
perhaps it could help improving the optimizer?

Laurentiu

Feb 04 2002

"Walter" <walter digitalmars.com> writes:

Since you said the critical loop is in the assembler code, it cannot be the
optimizer. The optimizer does not affect the assembler. I bet it's
alignment. Try the NOP suggestion. -Walter

"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message
news:a3mo9g$1fp3$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a3knen$dig$1 digitaldaemon.com...
 I notice that the performance of my loop is about 20%
 weaker than in the Borland or gcc cases (no external calls, only MOVQ,


 PXOR and
 POR).

 Alignment probably is the issue. Try putting in NOPs one at a time


before
 your loop, and time each time.

 I did some testing, with very interesting results: when I

specified -o+space
 for the compiling of the C source files, the C code performance dropped
 slightly, but the MMX loop performance is the same as in the EXEs

generated
 by BCC or gcc (even slightly better).  I'm really confused about this,

since
 NASM handles my MMX loop in the same way each time, and I called OPTLINK
 directly, so that it doesn't know about requirements to do space
 optimization (just in case it cares about SC's -o+space).  Even more, I

got
 used to the fact that the corresponding DOSX program, compiled from the

same
 source, runs about 5-10% slower than its Win32 counterpart, but now,
 with -o+space, it runs faster!!!

 I also did another test, using a source with a simple C loop, seen on one

of
 BCC's newsgroups some months ago:
 - -o, -o+speed, -o+all: execution time is 13 seconds
 - no optimization flags specified: execution time is 4 seconds
 - -o+space: execution time is 3 seconds

 I thought -o+all is *always* the best to use, but it proves not to be the
 case...  I can send you the sources for those two tests, if you want -
 perhaps it could help improving the optimizer?

 Laurentiu

Feb 04 2002

Heinz Saathoff <hsaat bre.ipnet.de> writes:

Walter schrieb...
 
 Since you said the critical loop is in the assembler code, it cannot be the
 optimizer. The optimizer does not affect the assembler. I bet it's
 alignment. Try the NOP suggestion. -Walter

Right. The code might fit into the processor cache in one case 
and not in the other depending on the starting address of the critical 
code. Due to optimization the assembly part can move to a base address 
that is not optimal for caching.

Just a guess,

	Heinz

Feb 05 2002

"Laurentiu Pancescu" <lpancescu fastmail.fm> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a3nvlq$2f0q$2 digitaldaemon.com...
 I bet it's alignment. Try the NOP suggestion. -Walter

You'd win the bet... almost!  It was an alignment problem, indeed,  not of
the code, of the data that the MMX instructions access.  Playing with NOP
only improved performance by 2%, not significant when compared to a boost
from 2.5 seconds to 1.8 (execution time).

One of the operands of my intructions cannot be aligned, but the other one
could.  I used an automatic vector (char p[48]), declared in main(), and
passed the pointer to that.  The option "-o+all" determines p to be aligned
at a 4-byte boundary, while "-o+space" makes p's alignment to be 8-byte
boundary, which is vital for MMX performance.  Both BCC and GCC align
automatic vectors at 8 or 16 bytes by default, so this is where the
performance penalty came from!

I did more tests related to alignment in code generated by DMC and other
compilers, but I will post a separate message in c++, since we're already
pretty far away from the original crash of NASM generated code... :)

Many thanks for your help and suggestions!

Laurentiu

Feb 05 2002

D Programming

C/C++ Programming

Other

c++.dos.32-bits - X32 bug???