digitalmars.D - misaligned read handling on various processors

Andrei Alexandrescu (12/12) Oct 06 2009 Consider:

Don (6/18) Oct 06 2009 Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86

Andrei Alexandrescu (3/23) Oct 06 2009 Thanks! Are there some online docs that discuss that in detail?

Jb (8/30) Oct 06 2009 http://www.intel.com/products/processor/manuals/

Sean Kelly (4/22) Oct 06 2009 By default, an unaligned read on SPARC will cause a bus error. The trap...

Michel Fortin (43/55) Oct 06 2009 Wikipedia:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Consider:

struct A {
     char a;
     align(1) int b;
}

Accesses to b will be rather slow because it's a misaligned read. My 
question is, how exactly is that handled on various processors? I seem 
to recall various anecdotes (including that misaligned reads on Intel 
cause a trap that does the needed double reading, shifting, and 
masking), but Google search has surprisingly little on the matter.


Thanks,

Andrei

Oct 06 2009

Don <nospam nospam.com> writes:

Andrei Alexandrescu wrote:
 Consider:
 
 struct A {
     char a;
     align(1) int b;
 }
 
 Accesses to b will be rather slow because it's a misaligned read. My 
 question is, how exactly is that handled on various processors? I seem 
 to recall various anecdotes (including that misaligned reads on Intel 
 cause a trap that does the needed double reading, shifting, and 
 masking), but Google search has surprisingly little on the matter.

Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86 
doesn't hurt much at all, except for doubles and reals.
For the case you mention there'll probably be no misalignment penalty at 
all, the latency gets hidden in the early stages of the pipeline.
Although there may be a penalty if you cross a cache line boundary.

Oct 06 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Don wrote:
 Andrei Alexandrescu wrote:
 Consider:

 struct A {
     char a;
     align(1) int b;
 }

 Accesses to b will be rather slow because it's a misaligned read. My 
 question is, how exactly is that handled on various processors? I seem 
 to recall various anecdotes (including that misaligned reads on Intel 
 cause a trap that does the needed double reading, shifting, and 
 masking), but Google search has surprisingly little on the matter.

 
 Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86 
 doesn't hurt much at all, except for doubles and reals.
 For the case you mention there'll probably be no misalignment penalty at 
 all, the latency gets hidden in the early stages of the pipeline.
 Although there may be a penalty if you cross a cache line boundary.
 

Thanks! Are there some online docs that discuss that in detail?

Andrei

Oct 06 2009

"Jb" <jb nowhere.com> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:hafmb2$15lj$1 digitalmars.com...
 Don wrote:
 Andrei Alexandrescu wrote:
 Consider:

 struct A {
     char a;
     align(1) int b;
 }

 Accesses to b will be rather slow because it's a misaligned read. My 
 question is, how exactly is that handled on various processors? I seem 
 to recall various anecdotes (including that misaligned reads on Intel 
 cause a trap that does the needed double reading, shifting, and 
 masking), but Google search has surprisingly little on the matter.

 Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86 
 doesn't hurt much at all, except for doubles and reals.
 For the case you mention there'll probably be no misalignment penalty at 
 all, the latency gets hidden in the early stages of the pipeline.
 Although there may be a penalty if you cross a cache line boundary.

 Thanks! Are there some online docs that discuss that in detail?

http://www.intel.com/products/processor/manuals/

Check the optimization manual at the bottom. Chapter 3.6.3

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF

chapter 5.2

http://www.agner.org/optimize/optimizing_assembly.pdf

chapter : optimizing memory access

Oct 06 2009

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Don (nospam nospam.com)'s article
 Andrei Alexandrescu wrote:
 Consider:

 struct A {
     char a;
     align(1) int b;
 }

 Accesses to b will be rather slow because it's a misaligned read. My
 question is, how exactly is that handled on various processors? I seem
 to recall various anecdotes (including that misaligned reads on Intel
 cause a trap that does the needed double reading, shifting, and
 masking), but Google search has surprisingly little on the matter.

 Not on Intel. IIRC the trapping happens on Sparc. Misalignment on x86
 doesn't hurt much at all, except for doubles and reals.
 For the case you mention there'll probably be no misalignment penalty at
 all, the latency gets hidden in the early stages of the pipeline.
 Although there may be a penalty if you cross a cache line boundary.

By default, an unaligned read on SPARC will cause a bus error.  The trap is
enabled via a compiler switch, and as one might expect is ridiculously slow.
I believe unaligned ops are nearly as fast as aligned ops on x86, as you say.

Oct 06 2009

Michel Fortin <michel.fortin michelf.com> writes:

On 2009-10-06 09:58:42 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Consider:
 
 struct A {
      char a;
      align(1) int b;
 }
 
 Accesses to b will be rather slow because it's a misaligned read. My 
 question is, how exactly is that handled on various processors? I seem 
 to recall various anecdotes (including that misaligned reads on Intel 
 cause a trap that does the needed double reading, shifting, and 
 masking), but Google search has surprisingly little on the matter.

Wikipedia: 
<http://en.wikipedia.org/wiki/Data_structure_alignment#Architectures>

RISC

Most RISC processors will generate an alignment fault when a load or 
store instruction accesses a misaligned address. This allows the 
operating system to emulate the misaligned access using other 
instructions. For example, the alignment fault handler might use byte 
loads or stores (which are always aligned) to emulate a larger load or 
store instruction.

Some architectures like MIPS have special unaligned load and store 
instructions. One unaligned load instruction gets the bytes from the 
memory word with the lowest byte address and another gets the bytes 
from the memory word with the highest byte address. Similarly, 
store-high and store-low instructions store the appropriate bytes in 
the higher and lower memory words respectively.

The Alpha architecture has a two-step approach to unaligned loads and 
stores. The first step is to load the upper and lower memory words into 
separate registers. The second step is to extract or modify the memory 
words using special low/high instructions similar to the MIPS 
instructions. An unaligned store is completed by storing the modified 
memory words back to memory. The reason for this complexity is that the 
original Alpha architecture could only read or write 32-bit or 64-bit 
values. This proved to be a severe limitation that often led to code 
bloat and poor performance. To address this limitation, an extension 
called the Byte Word Extensions (BWX) was added to the original 
architecture. It consisted of instructions for byte and word loads and 
stores.

Because these instructions are larger and slower than the normal memory 
load and store instructions they should only be used when necessary. 
Most C and C++ compilers have an “unaligned” attribute that can be 
applied to pointers that need the unaligned instructions.

x86 and x86-64

While the x86 architecture originally did not require aligned memory 
access and still works without it, SSE2 and x86-64 instructions on x86 
CPUs do require the data to be 128-bit (16-byte) aligned and there can 
be substantial performance advantages from using aligned data on these 
architectures.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 06 2009

D Programming

C/C++ Programming

Other

digitalmars.D - misaligned read handling on various processors