www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Incorrect struct and union size. Possible bug?

reply MicroWizard <MicroWizard_member pathlink.com> writes:
I wrote a program to access Win32 character based console form D, but got
strange screen addressing problems. When I dig into the deep, I have found this.

In the following program when I mix structs and unions, the compiler does not
calculate with the last struct element(s).

alias char CHAR;
alias wchar WCHAR;
alias ushort WORD;

struct TST {
union Char{
WCHAR UnicodeChar;
CHAR   AsciiChar;
};
WORD Attributes;
};

struct TST2 {
WCHAR UnicodeChar;
WORD Attributes;
};

struct TST3 {
union Char{
WCHAR UnicodeChar;
CHAR   AsciiChar;
};
union Char2{
WCHAR UnicodeChar;
CHAR   AsciiChar;
};
};

void main(char[][] arg)
{
printf("TST.sizeof=%d,TST2.sizeof=%d,TST3.sizeof=%d\n",
TST.sizeof,   TST2.sizeof,   TST3.sizeof);
}

The result is:
TST.sizeof=2,TST2.sizeof=4,TST3.sizeof=1

And what I expect:
TST.sizeof=4,TST2.sizeof=4,TST3.sizeof=4

Or missed I something completely?

Best regards,
Tamas Nagy
Nov 20 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
MicroWizard wrote:

 struct TST {
 union Char{
 WCHAR UnicodeChar;
 CHAR   AsciiChar;
 };
 WORD Attributes;
 };
 
 struct TST2 {
 WCHAR UnicodeChar;
 WORD Attributes;
 };
 
 struct TST3 {
 union Char{
 WCHAR UnicodeChar;
 CHAR   AsciiChar;
 };
 union Char2{
 WCHAR UnicodeChar;
 CHAR   AsciiChar;
 };
 };
 
 void main(char[][] arg)
 {
 printf("TST.sizeof=%d,TST2.sizeof=%d,TST3.sizeof=%d\n",
 TST.sizeof,   TST2.sizeof,   TST3.sizeof);
 }
 
 The result is:
 TST.sizeof=2,TST2.sizeof=4,TST3.sizeof=1
 
 And what I expect:
 TST.sizeof=4,TST2.sizeof=4,TST3.sizeof=4
 
 Or missed I something completely?

In D, declaring a union does *not* mean allocating a variable... (even an empty structure: { } has a .sizeof of 1, a little quirk) This works:
 union CharU{
 WCHAR UnicodeChar;
 CHAR   AsciiChar;
 }
 union Char2U{
 WCHAR UnicodeChar;
 CHAR   AsciiChar;
 }
 
 struct TST {
 CharU Char;
 WORD Attributes;
 }
 struct TST2 {
 WCHAR UnicodeChar;
 WORD Attributes;
 }
 struct TST3 {
 CharU Char;
 Char2U Char2;
 }

Outputs: TST.sizeof=4,TST2.sizeof=4,TST3.sizeof=4 --anders
Nov 20 2004
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Anders F Björklund wrote:
 In D, declaring a union does *not* mean allocating a variable...
 (even an empty structure: { } has a .sizeof of 1, a little quirk)

Actually, you aren't quite correct here. The syntax that MicroWizard used was correct, and is known as an "anonymous union." The documentation mentions that they are supported, provided that they are part of a struct. See http://digitalmars.com/d/struct.html I tested this program on Linux (0.106), and it works as expected:
 struct foo {
   align(1):
   union {
     char c;
     short s;
   }
   int i;
 }
  
 import std.stdio;
 void main() {
   foo f[2];
   foo *ptr = cast(foo*)0;
   writef("%x %x\n"
          "%d %d %d\n"
          "%d %d %d\n"
          "%d %d %d\n",
          cast(int)&f[0], cast(int)&f[1],
          f.sizeof,f[0].sizeof,foo.sizeof,
          foo.c.sizeof, foo.s.sizeof, foo.i.sizeof,
          cast(int)&ptr.c, cast(int)&ptr.s, cast(int)&ptr.i);
 }

It outputs the following:
 ffffffffbfee0710 ffffffffbfee0716
 12 6 6
 1 2 4
 0 0 2

Nov 22 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Russ Lewis wrote:

 In D, declaring a union does *not* mean allocating a variable...

Actually, you aren't quite correct here. The syntax that MicroWizard used was correct, and is known as an "anonymous union." The documentation mentions that they are supported, provided that they are part of a struct. See http://digitalmars.com/d/struct.html

True, but that wasn't the syntax used in the previous example... :-P They were all named ?
 struct TST {
 union Char{
   WCHAR UnicodeChar;
   CHAR   AsciiChar;
 }
 WORD Attributes;
 }
 
 struct TST2 {
 WCHAR UnicodeChar;
 WORD Attributes;
 }
 
 struct TST3 {
 union Char{
   WCHAR UnicodeChar;
   CHAR   AsciiChar;
 }
 union Char2{
   WCHAR UnicodeChar;
   CHAR   AsciiChar;
 }
 }

TST.sizeof=2,TST2.sizeof=4,TST3.sizeof=1 (small sizes due to no Char fields present) They could be anonymized:
 struct TST {
 union {
   WCHAR UnicodeChar;
   CHAR   AsciiChar;
 }
 WORD Attributes;
 }
 
 struct TST2 {
 WCHAR UnicodeChar;
 WORD Attributes;
 }
 
 struct TST3 {
 union {
   WCHAR UnicodeChar;
   CHAR   AsciiChar;
 }
 union {
   WCHAR UnicodeChar2;
   CHAR   AsciiChar2;
 }
 }

TST.sizeof=4,TST2.sizeof=4,TST3.sizeof=4 (and losing one layer of indirection too) Thanks for pointing that out, the D spec is kinda terse sometimes... --anders PS. This is still a little quirky:
 struct NULL
 {
 }
 import std.stdio;
 void main()
 {
   writefln("%d", NULL.sizeof);
 }

(Hint: it does not print zero bytes)
Nov 22 2004
next sibling parent Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Eek!  You're right, sorry...

Anders F Björklund wrote:
 Russ Lewis wrote:
 
 In D, declaring a union does *not* mean allocating a variable...

Actually, you aren't quite correct here. The syntax that MicroWizard used was correct, and is known as an "anonymous union." The documentation mentions that they are supported, provided that they are part of a struct. See http://digitalmars.com/d/struct.html

True, but that wasn't the syntax used in the previous example... :-P

Nov 22 2004
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <cnt11q$2qra$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
PS. This is still a little quirky:

 struct NULL
 {
 }
 import std.stdio;
 void main()
 {
   writefln("%d", NULL.sizeof);
 }

(Hint: it does not print zero bytes)

It should print 1 byte (or 4 bytes, I'm not sure how big empty classes are in D). The reason is that for a class to be uniquely addressable it has to occupy space in memory. C++ is the same way. C++ does have something called the "base class optimization" however, that allows empty base classes to have zero size in derived classes. ie. class Base {} class Derived : Base {} printf( "base size: %u\nderived size: %u\n", Base.sizeof, Derived.sizeof ); This should print: base size: 4 derived size: 4 (it looks like the size of an empty class is int.sizeof after all) Sean
Nov 22 2004
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Sean Kelly wrote:

 It should print 1 byte (or 4 bytes, I'm not sure how big empty classes are in
 D).  The reason is that for a class to be uniquely addressable it has to occupy
 space in memory.  C++ is the same way.  C++ does have something called the
"base
 class optimization" however, that allows empty base classes to have zero size
in
 derived classes.  ie.

(structs are 1 byte when compiled with gdc, and both the base and derived class are 4) Interesting! The following little C snippet:
 #include <stdio.h>
 
 struct empty
 {
 };
 
 int main(void)
 {
   printf("%d\n",sizeof(struct empty));
   return 0;
 }

Prints "0" compiled with gcc, and "1" with g++. (since structs are classes in C++, no doubt...) Just a little weird, it's not that it's used. :-) --anders PS. But gcc -Wall -pedantic croaks: "warning: struct has no members"
Nov 22 2004
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message
news:cnt11q$2qra$1 digitaldaemon.com...
 PS. This is still a little quirky:

 struct NULL
 {
 }
 import std.stdio;
 void main()
 {
   writefln("%d", NULL.sizeof);
 }

(Hint: it does not print zero bytes)

It should print 1. The reason is to be compatible with C.
Dec 04 2004
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Walter wrote:

struct NULL
{
}

(Hint: it does not print zero bytes)

It should print 1. The reason is to be compatible with C.

And I understood the rationale there was just so that it would allocate *something*, to be able to adress later on ? null structures and classes are probably quite rare in reality, I just found it interesting that I got different {} results: gdc:
 struct: 4
 class:  1

 struct: 1
 class:  4

--anders
Dec 05 2004
parent Sean Kelly <sean f4.ca> writes:
In article <coukni$1kke$1 digitaldaemon.com>,
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= says...
Walter wrote:

struct NULL
{
}

(Hint: it does not print zero bytes)

It should print 1. The reason is to be compatible with C.

And I understood the rationale there was just so that it would allocate *something*, to be able to adress later on ?

Yup. C++ requires that all objects have a unique address, and thus they all occupy at least one byte. The empty base class optimization tend to help keep derived class size down however.
null structures and classes are probably quite rare in reality,
I just found it interesting that I got different {} results:

gdc:
 struct: 4
 class:  1

 struct: 1
 class:  4


That's odd. g++ produces different sized objects just from switching 'struct' with 'class'? Good to know, I suppose. Sean
Dec 05 2004