c++.command-line - Handling MMX Instructions
- Isma'il Adeniran (80/80) Mar 25 2005 I think there's an error in the way dmc handles inline assembler MMX
- Isma'il Adeniran (2/99) Mar 25 2005
- Walter (4/5) Mar 28 2005 If you could post assembler code generated by OW C++ for that function, ...
- Isma'il Adeniran (131/134) Mar 29 2005 This is the assembler code generated with bits cut out. I haven't really...
- Isma'il Adeniran (39/49) Mar 30 2005 Jack (check the other posts) just found the bug.
- Jack (9/108) Mar 30 2005 Spotted the bug. There is a bug with the with 'pextrw' instruction in th...
- Jack (4/114) Mar 30 2005 Just found out not only first operand is different from the source code ...
- Isma'il Adeniran (11/136) Mar 30 2005 You're right. The problem's with the 'pextrw' instruction. It uses the
- Isma'il Adeniran (8/8) Mar 30 2005 I also just posted the assembly output from both DMC and OW for the
- Walter (3/5) Apr 02 2005 You're right. I have it fixed now, it'll go out in the next update.
I think there's an error in the way dmc handles inline assembler MMX instructions. I compiled and ran this code using dmc: <code> #include <stdio.h> int main(void) { int cnt; int a1[4] = {12, 1, 34, 17}; int b1[4] = {17, 7, 4, 33}; int c1[4]; printf(" a1: "); for (cnt = 0; cnt < 4;cnt++) printf("%d\t", a1[cnt]); printf("\n b1: "); for (cnt = 0; cnt < 4;cnt++) printf("%d\t", b1[cnt]); _asm { movq mm0, qword ptr a1 movq mm1, qword ptr a1+8 packssdw mm0, mm1 movq mm1, qword ptr b1 movq mm2, qword ptr b1+8 packssdw mm1, mm2 paddw mm0, mm1 lea ESI, c1 xor EDI,EDI pextrw EDI,mm0, 0 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 1 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 2 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 3 mov dword ptr [ESI], EDI add ESI, 4 emms }; printf("\n\n c1: \n"); for (cnt = 0; cnt < 4;cnt++) printf(" a1[%d] + b1[%d] = %d\n", cnt, cnt, c1[cnt]); return 0; } </code> <Output on execution:> D:>pack (using dmc) a1: 12 1 34 17 b1: 17 7 4 33 c1: a1[0] + b1[0] = 0 a1[1] + b1[1] = 0 a1[2] + b1[2] = 0 a1[3] + b1[3] = 0 This is incorrect. I compiled and ran it with Open Watcom C/C++ and I got the correct resul below: D:>pack (using Open Watcom) a1: 12 1 34 17 b1: 17 7 4 33 c1: a1[0] + b1[0] = 29 a1[1] + b1[1] = 8 a1[2] + b1[2] = 38 a1[3] + b1[3] = 50 Is this a bug with dmc? Best regards Isma'il -----
Mar 25 2005
It also compiles correctly with VC++.NET. Isma'il Adeniran wrote:I think there's an error in the way dmc handles inline assembler MMX instructions. I compiled and ran this code using dmc: <code> #include <stdio.h> int main(void) { int cnt; int a1[4] = {12, 1, 34, 17}; int b1[4] = {17, 7, 4, 33}; int c1[4]; printf(" a1: "); for (cnt = 0; cnt < 4;cnt++) printf("%d\t", a1[cnt]); printf("\n b1: "); for (cnt = 0; cnt < 4;cnt++) printf("%d\t", b1[cnt]); _asm { movq mm0, qword ptr a1 movq mm1, qword ptr a1+8 packssdw mm0, mm1 movq mm1, qword ptr b1 movq mm2, qword ptr b1+8 packssdw mm1, mm2 paddw mm0, mm1 lea ESI, c1 xor EDI,EDI pextrw EDI,mm0, 0 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 1 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 2 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 3 mov dword ptr [ESI], EDI add ESI, 4 emms }; printf("\n\n c1: \n"); for (cnt = 0; cnt < 4;cnt++) printf(" a1[%d] + b1[%d] = %d\n", cnt, cnt, c1[cnt]); return 0; } </code> <Output on execution:> D:>pack (using dmc) a1: 12 1 34 17 b1: 17 7 4 33 c1: a1[0] + b1[0] = 0 a1[1] + b1[1] = 0 a1[2] + b1[2] = 0 a1[3] + b1[3] = 0 This is incorrect. I compiled and ran it with Open Watcom C/C++ and I got the correct resul below: D:>pack (using Open Watcom) a1: 12 1 34 17 b1: 17 7 4 33 c1: a1[0] + b1[0] = 29 a1[1] + b1[1] = 8 a1[2] + b1[2] = 38 a1[3] + b1[3] = 50 Is this a bug with dmc? Best regards Isma'il -----
Mar 25 2005
If you could post assembler code generated by OW C++ for that function, I can compare the two. "Isma'il Adeniran" <ismail tamarindseed.com> wrote in message news:d21smg$mi7$1 digitaldaemon.com...I compiled and ran it with Open Watcom C/C++ and I got the correct resul
Mar 28 2005
This is the assembler code generated with bits cut out. I haven't really
combed through it extensively but both compilers generate practically
identical code for the inline assembly (as expected)!
<assembly>
_TEXT SEGMENT BYTE PUBLIC USE32 'CODE'
ASSUME CS:_TEXT, DS:DGROUP, SS:DGROUP
L$1:
DB 0cH, 0, 0, 0, 1, 0, 0, 0
DB 22H, 0, 0, 0, 11H, 0, 0, 0
L$2:
DB 11H, 0, 0, 0, 7, 0, 0, 0
DB 4, 0, 0, 0, 21H, 0, 0, 0
main:
push 54H
call near ptr FLAT:__CHK
push ebx
push esi
push edi
push ebp
mov ebp,esp
sub esp,30H
lea edi,-30H[ebp]
mov esi,offset FLAT:L$1
movsd
movsd
movsd
movsd
lea edi,-20H[ebp]
mov esi,offset FLAT:L$2
movsd
movsd
movsd
movsd
push offset FLAT:L$9
call near ptr FLAT:printf
add esp,4
xor ebx,ebx
L$3:
mov edx,dword ptr -30H[ebp+ebx*4]
push edx
push offset FLAT:L$10
call near ptr FLAT:printf
add esp,8
inc ebx
cmp ebx,4
jl L$3
push offset FLAT:L$11
call near ptr FLAT:printf
add esp,4
xor ebx,ebx
L$4:
mov ecx,dword ptr -20H[ebp+ebx*4]
push ecx
push offset FLAT:L$10
call near ptr FLAT:printf
add esp,8
inc ebx
cmp ebx,4
jl L$4
movq mm0,-30H[ebp]
movq mm1,-28H[ebp]
packssdw mm0,mm1
movq mm1,-20H[ebp]
movq mm2,-18H[ebp]
packssdw mm1,mm2
paddw mm0,mm1
lea esi,-10H[ebp]
xor edi,edi
pextrw edi,mm0,0
mov dword ptr [esi],edi
add esi,4
pextrw edi,mm0,1
mov dword ptr [esi],edi
add esi,4
pextrw edi,mm0,2
mov dword ptr [esi],edi
add esi,4
pextrw edi,mm0,3
mov dword ptr [esi],edi
add esi,4
emms
push offset FLAT:L$12
call near ptr FLAT:printf
add esp,4
xor ebx,ebx
L$5:
mov esi,dword ptr -10H[ebp+ebx*4]
push esi
push ebx
push ebx
push offset FLAT:L$13
call near ptr FLAT:printf
add esp,10H
inc ebx
cmp ebx,4
jl L$5
mov edi,dword ptr FLAT:__iob+4
test edi,edi
jle L$6
mov eax,dword ptr FLAT:__iob
xor ebx,ebx
mov bl,byte ptr [eax]
sub ebx,0dH
cmp ebx,0dH
ja L$7
L$6:
push offset FLAT:__iob
call near ptr FLAT:fgetc
add esp,4
jmp L$8
L$7:
lea edx,-1[edi]
mov dword ptr FLAT:__iob+4,edx
inc eax
mov dword ptr FLAT:__iob,eax
L$8:
xor eax,eax
mov esp,ebp
pop ebp
pop edi
pop esi
pop ebx
ret
_TEXT ENDS
<\assembly>
Walter wrote:
If you could post assembler code generated by OW C++ for that function, I
can compare the two.
--
Knowledge comes from finding the answers, yes but
understanding what the answers mean
is what brings wisdom.
- Lionel Luthor
Mar 29 2005
Jack (check the other posts) just found the bug.
It's with the 'pextrw' instruction.
The word's extracted into the EAX register but it's the EDI register
(which has been zeroed out) that's actually been copied to ESI.
This produces the zeroes on output.
Comparing the assembly output from DMC and OW compilers confirms this.
Reposting the pertinent assembly listing for the function. Apologies for
the one I posted yesterday.
****************DMC************ ********Open Watcom************
------------------------------- -------------------------------
X$5:
movq mm0,-0x30[ebp] movq mm0,-0x30[ebp]
movq mm1,-0x28[ebp] movq mm1,-0x28[ebp]
packssdw mm0,mm1 packssdw mm0,mm1
movq mm1,-0x20[ebp] movq mm1,-0x20[ebp]
movq mm2,-0x18[ebp] movq mm2,-0x18[ebp]
packssdw mm1,mm2 packssdw mm1,mm2
paddw mm0,mm1 paddw mm0,mm1
lea esi,-0x10[ebp] lea esi,-0x10[ebp]
xor edi,edi xor edi,edi
pextrw eax,mm7,0x00 pextrw edi,mm0,0x00
mov [esi],edi mov [esi],edi
add esi,0x00000004 add esi,0x00000004
pextrw eax,mm7,0x01 pextrw edi,mm0,0x01
mov [esi],edi mov [esi],edi
add esi,0x00000004 add esi,0x00000004
pextrw eax,mm7,0x02 pextrw edi,mm0,0x02
mov [esi],edi mov [esi],edi
add esi,0x00000004 add esi,0x00000004
pextrw eax,mm7,0x03 pextrw edi,mm0,0x03
mov [esi],edi mov [esi],edi
add esi,0x00000004 add esi,0x00000004
emms emms
Walter wrote:
If you could post assembler code generated by OW C++ for that function, I
can compare the two.
"Isma'il Adeniran" <ismail tamarindseed.com> wrote in message
news:d21smg$mi7$1 digitaldaemon.com...
I compiled and ran it with Open Watcom C/C++ and I got the correct resul
--
Knowledge comes from finding the answers, yes but
understanding what the answers mean
is what brings wisdom.
- Lionel Luthor
Mar 30 2005
Spotted the bug. There is a bug with the with 'pextrw' instruction in the code compiled with DMC (first operand is always changed when linked). With obj2asm, everything is correct in the compiled object code (.obj), first operand of 'pextrw' is just same as specified in the source code. When linked to an excutable, the first operand of 'pextrw' command changed to EAX (happened always, no matter the first operand in the source code change to whatever). Perhaps a bug with the linker? In article <d21smg$mi7$1 digitaldaemon.com>, Isma'il Adeniran says...It also compiles correctly with VC++.NET. Isma'il Adeniran wrote:I think there's an error in the way dmc handles inline assembler MMX instructions. I compiled and ran this code using dmc: <code> #include <stdio.h> int main(void) { int cnt; int a1[4] = {12, 1, 34, 17}; int b1[4] = {17, 7, 4, 33}; int c1[4]; printf(" a1: "); for (cnt = 0; cnt < 4;cnt++) printf("%d\t", a1[cnt]); printf("\n b1: "); for (cnt = 0; cnt < 4;cnt++) printf("%d\t", b1[cnt]); _asm { movq mm0, qword ptr a1 movq mm1, qword ptr a1+8 packssdw mm0, mm1 movq mm1, qword ptr b1 movq mm2, qword ptr b1+8 packssdw mm1, mm2 paddw mm0, mm1 lea ESI, c1 xor EDI,EDI pextrw EDI,mm0, 0 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 1 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 2 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 3 mov dword ptr [ESI], EDI add ESI, 4 emms }; printf("\n\n c1: \n"); for (cnt = 0; cnt < 4;cnt++) printf(" a1[%d] + b1[%d] = %d\n", cnt, cnt, c1[cnt]); return 0; } </code> <Output on execution:> D:>pack (using dmc) a1: 12 1 34 17 b1: 17 7 4 33 c1: a1[0] + b1[0] = 0 a1[1] + b1[1] = 0 a1[2] + b1[2] = 0 a1[3] + b1[3] = 0 This is incorrect. I compiled and ran it with Open Watcom C/C++ and I got the correct resul below: D:>pack (using Open Watcom) a1: 12 1 34 17 b1: 17 7 4 33 c1: a1[0] + b1[0] = 29 a1[1] + b1[1] = 8 a1[2] + b1[2] = 38 a1[3] + b1[3] = 50 Is this a bug with dmc? Best regards Isma'il -----
Mar 30 2005
Just found out not only first operand is different from the source code but second operand too! The second operand of 'pextrw' instruction is always mm7 when linked. In article <d2egdn$1cge$1 digitaldaemon.com>, Jack says...Spotted the bug. There is a bug with the with 'pextrw' instruction in the code compiled with DMC (first operand is always changed when linked). With obj2asm, everything is correct in the compiled object code (.obj), first operand of 'pextrw' is just same as specified in the source code. When linked to an excutable, the first operand of 'pextrw' command changed to EAX (happened always, no matter the first operand in the source code change to whatever). Perhaps a bug with the linker? In article <d21smg$mi7$1 digitaldaemon.com>, Isma'il Adeniran says...It also compiles correctly with VC++.NET. Isma'il Adeniran wrote:I think there's an error in the way dmc handles inline assembler MMX instructions. I compiled and ran this code using dmc: <code> #include <stdio.h> int main(void) { int cnt; int a1[4] = {12, 1, 34, 17}; int b1[4] = {17, 7, 4, 33}; int c1[4]; printf(" a1: "); for (cnt = 0; cnt < 4;cnt++) printf("%d\t", a1[cnt]); printf("\n b1: "); for (cnt = 0; cnt < 4;cnt++) printf("%d\t", b1[cnt]); _asm { movq mm0, qword ptr a1 movq mm1, qword ptr a1+8 packssdw mm0, mm1 movq mm1, qword ptr b1 movq mm2, qword ptr b1+8 packssdw mm1, mm2 paddw mm0, mm1 lea ESI, c1 xor EDI,EDI pextrw EDI,mm0, 0 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 1 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 2 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 3 mov dword ptr [ESI], EDI add ESI, 4 emms }; printf("\n\n c1: \n"); for (cnt = 0; cnt < 4;cnt++) printf(" a1[%d] + b1[%d] = %d\n", cnt, cnt, c1[cnt]); return 0; } </code> <Output on execution:> D:>pack (using dmc) a1: 12 1 34 17 b1: 17 7 4 33 c1: a1[0] + b1[0] = 0 a1[1] + b1[1] = 0 a1[2] + b1[2] = 0 a1[3] + b1[3] = 0 This is incorrect. I compiled and ran it with Open Watcom C/C++ and I got the correct resul below: D:>pack (using Open Watcom) a1: 12 1 34 17 b1: 17 7 4 33 c1: a1[0] + b1[0] = 29 a1[1] + b1[1] = 8 a1[2] + b1[2] = 38 a1[3] + b1[3] = 50 Is this a bug with dmc? Best regards Isma'il -----
Mar 30 2005
You're right. The problem's with the 'pextrw' instruction. It uses the wrong register. The edi is zeroed. The word is extracted into the EAX register but this instruction: mov [esi], edi is carried out instead of mov [esi], eax. Consequently, the contents of esi is always 0. Nice work Jack!!! Jack wrote:Just found out not only first operand is different from the source code but second operand too! The second operand of 'pextrw' instruction is always mm7 when linked. In article <d2egdn$1cge$1 digitaldaemon.com>, Jack says...-- Knowledge comes from finding the answers, yes but understanding what the answers mean is what brings wisdom. - Lionel LuthorSpotted the bug. There is a bug with the with 'pextrw' instruction in the code compiled with DMC (first operand is always changed when linked). With obj2asm, everything is correct in the compiled object code (.obj), first operand of 'pextrw' is just same as specified in the source code. When linked to an excutable, the first operand of 'pextrw' command changed to EAX (happened always, no matter the first operand in the source code change to whatever). Perhaps a bug with the linker? In article <d21smg$mi7$1 digitaldaemon.com>, Isma'il Adeniran says...It also compiles correctly with VC++.NET. Isma'il Adeniran wrote:I think there's an error in the way dmc handles inline assembler MMX instructions. I compiled and ran this code using dmc: <code> #include <stdio.h> int main(void) { int cnt; int a1[4] = {12, 1, 34, 17}; int b1[4] = {17, 7, 4, 33}; int c1[4]; printf(" a1: "); for (cnt = 0; cnt < 4;cnt++) printf("%d\t", a1[cnt]); printf("\n b1: "); for (cnt = 0; cnt < 4;cnt++) printf("%d\t", b1[cnt]); _asm { movq mm0, qword ptr a1 movq mm1, qword ptr a1+8 packssdw mm0, mm1 movq mm1, qword ptr b1 movq mm2, qword ptr b1+8 packssdw mm1, mm2 paddw mm0, mm1 lea ESI, c1 xor EDI,EDI pextrw EDI,mm0, 0 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 1 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 2 mov dword ptr [ESI], EDI add ESI, 4 pextrw EDI,mm0, 3 mov dword ptr [ESI], EDI add ESI, 4 emms }; printf("\n\n c1: \n"); for (cnt = 0; cnt < 4;cnt++) printf(" a1[%d] + b1[%d] = %d\n", cnt, cnt, c1[cnt]); return 0; } </code> <Output on execution:> D:>pack (using dmc) a1: 12 1 34 17 b1: 17 7 4 33 c1: a1[0] + b1[0] = 0 a1[1] + b1[1] = 0 a1[2] + b1[2] = 0 a1[3] + b1[3] = 0 This is incorrect. I compiled and ran it with Open Watcom C/C++ and I got the correct resul below: D:>pack (using Open Watcom) a1: 12 1 34 17 b1: 17 7 4 33 c1: a1[0] + b1[0] = 29 a1[1] + b1[1] = 8 a1[2] + b1[2] = 38 a1[3] + b1[3] = 50 Is this a bug with dmc? Best regards Isma'il -----
Mar 30 2005
I also just posted the assembly output from both DMC and OW for the function to my reply to Walter above. Check it out (skewed). Isma'il -- Knowledge comes from finding the answers, yes but understanding what the answers mean is what brings wisdom. - Lionel Luthor
Mar 30 2005
"Isma'il Adeniran" <ismail tamarindseed.com> wrote in message news:d21qsd$kq1$1 digitaldaemon.com...I think there's an error in the way dmc handles inline assembler MMX instructions.You're right. I have it fixed now, it'll go out in the next update.
Apr 02 2005









Isma'il Adeniran <ismail tamarindseed.com> 