www.digitalmars.com         C & C++   DMDScript  

c++.dos.32-bits - X32 bug???

reply "Laurentiu Pancescu" <lpancescu fastmail.fm> writes:
I'm using NASM to assemble an external function, then I link the OBJ file
with the rest of DMC compiled files, but the EXE crashes (GPF, 0DH) at run.
Only -mx is affected by this, -mn works fine, also tests made with Borland
C++ 5.5.1, Cygwin, MinGW and DJGPP (same assembly code, NASM can generate
all suitable formats).

Here's an example:

; test.asm
; use "nasm -f obj test.asm -o test.obj
segment code public use32
global _get_value
_get_value:
    push ebp
    mov ebp, esp
    mov eax, [ebp + 8]
    add eax, eax
    leave
    retn

/* main.c */
/* sc -mx main.c test.obj x32.lib */
#include <stdio.h>
unsigned get_value(unsigned);
int main(void)
{
  printf("Result is %u\n", get_value(9));
  return 0;
}

On all other configurations, the displayed value is 18, as expected.  I
looked in the DMC generated code, and everything is okay (duh!).  Maybe this
is a problem with the DOS extender?  Did anyone else encounter this problem?
If you don't use NASM, I think my asm example should be straightforward to
convert to TASM or MASM syntax.

My environment is Win2k SP2, DMC 8.26, X32 from May 15th (latest version,
you know what I'm talking about).  Any feedback would be appreciated -
thanks!

Is there a debugger for DOSX programs?  WUDEBUG only debugs WDOSX
programs... :(

Laurentiu
Jan 31 2002
parent reply "Walter" <walter digitalmars.com> writes:
Try a simple hello world program with -mx and verify that works on your
system. -Walter

"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message
news:a3c0li$2h0u$1 digitaldaemon.com...
 I'm using NASM to assemble an external function, then I link the OBJ file
 with the rest of DMC compiled files, but the EXE crashes (GPF, 0DH) at

 Only -mx is affected by this, -mn works fine, also tests made with Borland
 C++ 5.5.1, Cygwin, MinGW and DJGPP (same assembly code, NASM can generate
 all suitable formats).

 Here's an example:

 ; test.asm
 ; use "nasm -f obj test.asm -o test.obj
 segment code public use32
 global _get_value
 _get_value:
     push ebp
     mov ebp, esp
     mov eax, [ebp + 8]
     add eax, eax
     leave
     retn

 /* main.c */
 /* sc -mx main.c test.obj x32.lib */
 #include <stdio.h>
 unsigned get_value(unsigned);
 int main(void)
 {
   printf("Result is %u\n", get_value(9));
   return 0;
 }

 On all other configurations, the displayed value is 18, as expected.  I
 looked in the DMC generated code, and everything is okay (duh!).  Maybe

 is a problem with the DOS extender?  Did anyone else encounter this

 If you don't use NASM, I think my asm example should be straightforward to
 convert to TASM or MASM syntax.

 My environment is Win2k SP2, DMC 8.26, X32 from May 15th (latest version,
 you know what I'm talking about).  Any feedback would be appreciated -
 thanks!

 Is there a debugger for DOSX programs?  WUDEBUG only debugs WDOSX
 programs... :(

 Laurentiu

Jan 31 2002
parent reply "Laurentiu Pancescu" <plaur crosswinds.net> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a3csdi$4ac$1 digitaldaemon.com...
 Try a simple hello world program with -mx and verify that works on your
 system. -Walter

It does, and even pretty large programs, both in C and C++. It's no problem when it's only high-level source code. Problems arise when I try to use externally defined functions (I use NASM for portability reasons, and because of its cleaner syntax: I can use the same ASM file for any 32bit compiler, and virtually any operating system! It's very suitable for portable MMX or 3dnow! optimizations). It works with DMC in Win32 mode, so I think the problem is in X32, not in my code. I "heard" about some stack alignment problems when X32 runs under real-mode DOS: maybe they're not the only ones? Do you think I should contact Mr. Doug Hoffman about this issue? Laurentiu
Jan 31 2002
next sibling parent reply Jan Knepper <jan smartsoft.cc> writes:
Check http://www.dosextender.com/
I think Doug Huffman put out a new version...



Laurentiu Pancescu wrote:

 "Walter" <walter digitalmars.com> wrote in message
 news:a3csdi$4ac$1 digitaldaemon.com...
 Try a simple hello world program with -mx and verify that works on your
 system. -Walter

It does, and even pretty large programs, both in C and C++. It's no problem when it's only high-level source code. Problems arise when I try to use externally defined functions (I use NASM for portability reasons, and because of its cleaner syntax: I can use the same ASM file for any 32bit compiler, and virtually any operating system! It's very suitable for portable MMX or 3dnow! optimizations). It works with DMC in Win32 mode, so I think the problem is in X32, not in my code. I "heard" about some stack alignment problems when X32 runs under real-mode DOS: maybe they're not the only ones? Do you think I should contact Mr. Doug Hoffman about this issue? Laurentiu

Feb 01 2002
parent "Laurentiu Pancescu" <plaur crosswinds.net> writes:
I used the latest version, when I saw those problems... it's downloaded 2
days ago, but with the same result!  It may be related to NTVDM bugs, I
don't know... I'll try to boot with a DOS disk, and see if it still crashes.

Laurentiu

"Jan Knepper" <jan smartsoft.cc> wrote in message
news:3C5AAA54.88A9D7CA smartsoft.cc...
 Check http://www.dosextender.com/
 I think Doug Huffman put out a new version...

Feb 01 2002
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
 I use assembler files with x all the time. You can view them at
\dm\src\core32\*.asm and \dm\src\dos32\*.asm.

Can I suggest taking your asm file and assembling it with nasm. Try it again
using dmc's inline assembler. Obj2asm the results and compare!

"Laurentiu Pancescu" <plaur crosswinds.net> wrote in message
news:a3dhlt$f8m$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a3csdi$4ac$1 digitaldaemon.com...
 Try a simple hello world program with -mx and verify that works on your
 system. -Walter

It does, and even pretty large programs, both in C and C++. It's no

 when it's only high-level source code.  Problems arise when I try to use
 externally defined functions (I use NASM for portability reasons, and
 because of its cleaner syntax: I can use the same ASM file for any 32bit
 compiler, and virtually any operating system!  It's very suitable for
 portable MMX or 3dnow! optimizations).

 It works with DMC in Win32 mode, so I think the problem is in X32, not in

 code.  I "heard" about some stack alignment problems when X32 runs under
 real-mode DOS: maybe they're not the only ones?  Do you think I should
 contact Mr. Doug Hoffman about this issue?

 Laurentiu

Feb 01 2002
parent reply "Laurentiu Pancescu" <lpancescu fastmail.fm> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a3fcef$rir$1 digitaldaemon.com...
 Can I suggest taking your asm file and assembling it with nasm. Try it

 using dmc's inline assembler. Obj2asm the results and compare!

I tried to obj2asm the object generated by NASM: I only saw db lines there, instead of actual assembly code. So, I added 'class=CODE' in the segment declaration, and it's fine now. Probably X32 got GPF when calling code inside of a DATA segment. I don't understand why this is okay with all Windows compilers, including DMC, and also DJGPP (which also uses a DOS extender). Probably it's related to how the linker and the OS loader work?? Thanks, Laurentiu
Feb 02 2002
next sibling parent "Walter" <walter digitalmars.com> writes:
"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message
news:a3gdvj$1i17$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a3fcef$rir$1 digitaldaemon.com...
 Can I suggest taking your asm file and assembling it with nasm. Try it

 using dmc's inline assembler. Obj2asm the results and compare!

I tried to obj2asm the object generated by NASM: I only saw db lines

 instead of actual assembly code.  So, I added 'class=CODE' in the segment
 declaration, and it's fine now.  Probably X32 got GPF when calling code
 inside of a DATA segment.  I don't understand why this is okay with all
 Windows compilers, including DMC, and also DJGPP (which also uses a DOS
 extender).  Probably it's related to how the linker and the OS loader

Glad you found what was going wrong. The reason you got the crash is X32 marks the code segment as execute only, and the data as not executable. Other dos extenders apparently don't do that.
Feb 02 2002
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
Your solution is now in the FAQ! Thanks, -Walter

"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message
news:a3gdvj$1i17$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a3fcef$rir$1 digitaldaemon.com...
 Can I suggest taking your asm file and assembling it with nasm. Try it

 using dmc's inline assembler. Obj2asm the results and compare!

I tried to obj2asm the object generated by NASM: I only saw db lines

 instead of actual assembly code.  So, I added 'class=CODE' in the segment
 declaration, and it's fine now.  Probably X32 got GPF when calling code
 inside of a DATA segment.  I don't understand why this is okay with all
 Windows compilers, including DMC, and also DJGPP (which also uses a DOS
 extender).  Probably it's related to how the linker and the OS loader

 Thanks,
   Laurentiu

Feb 02 2002
parent reply "Laurentiu Pancescu" <lpancescu fastmail.fm> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a3hkt8$232q$2 digitaldaemon.com...
 Your solution is now in the FAQ! Thanks, -Walter

Great, thanks! And I'm also glad because my MMX code works now fine with DMC. However, I notice that the performance of my loop is about 20% weaker than in the Borland or gcc cases (no external calls, only MOVQ, PXOR and POR!). I expect this to be the same for any compiler, since they don't touch it. Then, I tried to force an alignment to a paragraph border for my assembly function, but this only made things worse by an additional 10% - I guess OPTLINK knows better about alignments... :) Is it possible that the way different runtime libraries initialize the FPU affects the MMX performance (since both MMX and FPU instructions use the same physical registers)??? There's also a slight difference between Borland and gcc generated EXEs, about 2-3% - I don't see another reason. Laurentiu
Feb 03 2002
parent reply "Walter" <walter digitalmars.com> writes:
"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message
news:a3jgvm$2u58$2 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a3hkt8$232q$2 digitaldaemon.com...
 Your solution is now in the FAQ! Thanks, -Walter

Great, thanks! And I'm also glad because my MMX code works now fine with DMC. However, I notice that the performance of my loop is about 20%

 than in the Borland or gcc cases (no external calls, only MOVQ, PXOR and
 POR!).  I expect this to be the same for any compiler, since they don't
 touch it.  Then, I tried to force an alignment to a paragraph border for

 assembly function, but this only made things worse by an additional 10% -

 guess OPTLINK knows better about alignments... :)

Alignment probably is the issue. Try putting in NOPs one at a time before your loop, and time each time.
 Is it possible that the way different runtime libraries initialize the FPU
 affects the MMX performance (since both MMX and FPU instructions use the
 same physical registers)???  There's also a slight difference between
 Borland and gcc generated EXEs, about 2-3% - I don't see another reason.

I can't imagine how that would affect things. If it does, please let me know!
Feb 03 2002
parent reply "Laurentiu Pancescu" <lpancescu fastmail.fm> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a3knen$dig$1 digitaldaemon.com...
 I notice that the performance of my loop is about 20%
 weaker than in the Borland or gcc cases (no external calls, only MOVQ,


 POR).

your loop, and time each time.

I did some testing, with very interesting results: when I specified -o+space for the compiling of the C source files, the C code performance dropped slightly, but the MMX loop performance is the same as in the EXEs generated by BCC or gcc (even slightly better). I'm really confused about this, since NASM handles my MMX loop in the same way each time, and I called OPTLINK directly, so that it doesn't know about requirements to do space optimization (just in case it cares about SC's -o+space). Even more, I got used to the fact that the corresponding DOSX program, compiled from the same source, runs about 5-10% slower than its Win32 counterpart, but now, with -o+space, it runs faster!!! I also did another test, using a source with a simple C loop, seen on one of BCC's newsgroups some months ago: - -o, -o+speed, -o+all: execution time is 13 seconds - no optimization flags specified: execution time is 4 seconds - -o+space: execution time is 3 seconds I thought -o+all is *always* the best to use, but it proves not to be the case... I can send you the sources for those two tests, if you want - perhaps it could help improving the optimizer? Laurentiu
Feb 04 2002
parent reply "Walter" <walter digitalmars.com> writes:
Since you said the critical loop is in the assembler code, it cannot be the
optimizer. The optimizer does not affect the assembler. I bet it's
alignment. Try the NOP suggestion. -Walter

"Laurentiu Pancescu" <lpancescu fastmail.fm> wrote in message
news:a3mo9g$1fp3$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:a3knen$dig$1 digitaldaemon.com...
 I notice that the performance of my loop is about 20%
 weaker than in the Borland or gcc cases (no external calls, only MOVQ,


 POR).



 your loop, and time each time.

I did some testing, with very interesting results: when I

 for the compiling of the C source files, the C code performance dropped
 slightly, but the MMX loop performance is the same as in the EXEs

 by BCC or gcc (even slightly better).  I'm really confused about this,

 NASM handles my MMX loop in the same way each time, and I called OPTLINK
 directly, so that it doesn't know about requirements to do space
 optimization (just in case it cares about SC's -o+space).  Even more, I

 used to the fact that the corresponding DOSX program, compiled from the

 source, runs about 5-10% slower than its Win32 counterpart, but now,
 with -o+space, it runs faster!!!

 I also did another test, using a source with a simple C loop, seen on one

 BCC's newsgroups some months ago:
 - -o, -o+speed, -o+all: execution time is 13 seconds
 - no optimization flags specified: execution time is 4 seconds
 - -o+space: execution time is 3 seconds

 I thought -o+all is *always* the best to use, but it proves not to be the
 case...  I can send you the sources for those two tests, if you want -
 perhaps it could help improving the optimizer?

 Laurentiu

Feb 04 2002
next sibling parent Heinz Saathoff <hsaat bre.ipnet.de> writes:
Walter schrieb...
 
 Since you said the critical loop is in the assembler code, it cannot be the
 optimizer. The optimizer does not affect the assembler. I bet it's
 alignment. Try the NOP suggestion. -Walter

Right. The code might fit into the processor cache in one case and not in the other depending on the starting address of the critical code. Due to optimization the assembly part can move to a base address that is not optimal for caching. Just a guess, Heinz
Feb 05 2002
prev sibling parent "Laurentiu Pancescu" <lpancescu fastmail.fm> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a3nvlq$2f0q$2 digitaldaemon.com...
 I bet it's alignment. Try the NOP suggestion. -Walter

You'd win the bet... almost! It was an alignment problem, indeed, not of the code, of the data that the MMX instructions access. Playing with NOP only improved performance by 2%, not significant when compared to a boost from 2.5 seconds to 1.8 (execution time). One of the operands of my intructions cannot be aligned, but the other one could. I used an automatic vector (char p[48]), declared in main(), and passed the pointer to that. The option "-o+all" determines p to be aligned at a 4-byte boundary, while "-o+space" makes p's alignment to be 8-byte boundary, which is vital for MMX performance. Both BCC and GCC align automatic vectors at 8 or 16 bytes by default, so this is where the performance penalty came from! I did more tests related to alignment in code generated by DMC and other compilers, but I will post a separate message in c++, since we're already pretty far away from the original crash of NASM generated code... :) Many thanks for your help and suggestions! Laurentiu
Feb 05 2002