www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - misaligned read handling on various processors

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Consider:

struct A {
     char a;
     align(1) int b;
}

Accesses to b will be rather slow because it's a misaligned read. My 
question is, how exactly is that handled on various processors? I seem 
to recall various anecdotes (including that misaligned reads on Intel 
cause a trap that does the needed double reading, shifting, and 
masking), but Google search has surprisingly little on the matter.


Thanks,

Andrei
Oct 06 2009
next sibling parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 Consider:
 
 struct A {
     char a;
     align(1) int b;
 }
 
 Accesses to b will be rather slow because it's a misaligned read. My 
 question is, how exactly is that handled on various processors? I seem 
 to recall various anecdotes (including that misaligned reads on Intel 
 cause a trap that does the needed double reading, shifting, and 
 masking), but Google search has surprisingly little on the matter.

Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86 doesn't hurt much at all, except for doubles and reals. For the case you mention there'll probably be no misalignment penalty at all, the latency gets hidden in the early stages of the pipeline. Although there may be a penalty if you cross a cache line boundary.
Oct 06 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Don wrote:
 Andrei Alexandrescu wrote:
 Consider:

 struct A {
     char a;
     align(1) int b;
 }

 Accesses to b will be rather slow because it's a misaligned read. My 
 question is, how exactly is that handled on various processors? I seem 
 to recall various anecdotes (including that misaligned reads on Intel 
 cause a trap that does the needed double reading, shifting, and 
 masking), but Google search has surprisingly little on the matter.

Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86 doesn't hurt much at all, except for doubles and reals. For the case you mention there'll probably be no misalignment penalty at all, the latency gets hidden in the early stages of the pipeline. Although there may be a penalty if you cross a cache line boundary.

Thanks! Are there some online docs that discuss that in detail? Andrei
Oct 06 2009
parent "Jb" <jb nowhere.com> writes:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:hafmb2$15lj$1 digitalmars.com...
 Don wrote:
 Andrei Alexandrescu wrote:
 Consider:

 struct A {
     char a;
     align(1) int b;
 }

 Accesses to b will be rather slow because it's a misaligned read. My 
 question is, how exactly is that handled on various processors? I seem 
 to recall various anecdotes (including that misaligned reads on Intel 
 cause a trap that does the needed double reading, shifting, and 
 masking), but Google search has surprisingly little on the matter.

Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86 doesn't hurt much at all, except for doubles and reals. For the case you mention there'll probably be no misalignment penalty at all, the latency gets hidden in the early stages of the pipeline. Although there may be a penalty if you cross a cache line boundary.

Thanks! Are there some online docs that discuss that in detail?

http://www.intel.com/products/processor/manuals/ Check the optimization manual at the bottom. Chapter 3.6.3 http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF chapter 5.2 http://www.agner.org/optimize/optimizing_assembly.pdf chapter : optimizing memory access
Oct 06 2009
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Don (nospam nospam.com)'s article
 Andrei Alexandrescu wrote:
 Consider:

 struct A {
     char a;
     align(1) int b;
 }

 Accesses to b will be rather slow because it's a misaligned read. My
 question is, how exactly is that handled on various processors? I seem
 to recall various anecdotes (including that misaligned reads on Intel
 cause a trap that does the needed double reading, shifting, and
 masking), but Google search has surprisingly little on the matter.

doesn't hurt much at all, except for doubles and reals. For the case you mention there'll probably be no misalignment penalty at all, the latency gets hidden in the early stages of the pipeline. Although there may be a penalty if you cross a cache line boundary.

By default, an unaligned read on SPARC will cause a bus error. The trap is enabled via a compiler switch, and as one might expect is ridiculously slow. I believe unaligned ops are nearly as fast as aligned ops on x86, as you say.
Oct 06 2009
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-10-06 09:58:42 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Consider:
 
 struct A {
      char a;
      align(1) int b;
 }
 
 Accesses to b will be rather slow because it's a misaligned read. My 
 question is, how exactly is that handled on various processors? I seem 
 to recall various anecdotes (including that misaligned reads on Intel 
 cause a trap that does the needed double reading, shifting, and 
 masking), but Google search has surprisingly little on the matter.

Wikipedia: <http://en.wikipedia.org/wiki/Data_structure_alignment#Architectures> RISC Most RISC processors will generate an alignment fault when a load or store instruction accesses a misaligned address. This allows the operating system to emulate the misaligned access using other instructions. For example, the alignment fault handler might use byte loads or stores (which are always aligned) to emulate a larger load or store instruction. Some architectures like MIPS have special unaligned load and store instructions. One unaligned load instruction gets the bytes from the memory word with the lowest byte address and another gets the bytes from the memory word with the highest byte address. Similarly, store-high and store-low instructions store the appropriate bytes in the higher and lower memory words respectively. The Alpha architecture has a two-step approach to unaligned loads and stores. The first step is to load the upper and lower memory words into separate registers. The second step is to extract or modify the memory words using special low/high instructions similar to the MIPS instructions. An unaligned store is completed by storing the modified memory words back to memory. The reason for this complexity is that the original Alpha architecture could only read or write 32-bit or 64-bit values. This proved to be a severe limitation that often led to code bloat and poor performance. To address this limitation, an extension called the Byte Word Extensions (BWX) was added to the original architecture. It consisted of instructions for byte and word loads and stores. Because these instructions are larger and slower than the normal memory load and store instructions they should only be used when necessary. Most C and C++ compilers have an “unaligned” attribute that can be applied to pointers that need the unaligned instructions. x86 and x86-64 While the x86 architecture originally did not require aligned memory access and still works without it, SSE2 and x86-64 instructions on x86 CPUs do require the data to be 128-bit (16-byte) aligned and there can be substantial performance advantages from using aligned data on these architectures. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 06 2009