Data Alignment
Download: alignment.zip
What is Data Alignment?
In programming language, a data object (variable) has 2 properties; its value and the storage location (address). Data alignment means that the address of a data can be evenly divisible by 1, 2, 4, or 8. In other words, data object can have 1-byte, 2-byte, 4-byte, 8-byte alignment or any power of 2. For instance, if the address of a data is 12FEECh (1244908 in decimal), then it is 4-byte alignment because the address can be evenly divisible by 4. (You can divide it by 2 or 1, but 4 is the highest number that is divisible evenly.)
CPU does not read from or write to memory one byte at a time. Instead, CPU accesses memory in 2, 4, 8, 16, or 32 byte chunks at a time. The reason for doing this is the performance - accessing an address on 4-byte or 16-byte boundary is a lot faster than accessing an address on 1-byte boundary.
The following diagram illustrates how CPU accesses a 4-byte chuck of data with 4-byte memory access granularity.
If the data is misaligned of 4-byte boundary, CPU has to perform extra work to access the data: load 2 chucks of data, shift out unwanted bytes then combine them together. This process definitely slows down the performance and wastes CPU cycle just to get right data from memory.
Structure Member Alignment
In 32-bit x86 systems, the alignment is mostly same as its size of data type. Compiler aligns variables on their natural length boundaries. CPU will handle misaligned data properly, so you do not need to align the address explicitly.
Data Type | Alignment (bytes) |
---|---|
char | 1 |
short | 2 |
int | 4 |
float | 4 |
double | 4 or 8 |
However, the story is a little different for member data in struct, union or class objects. The struct (or union, class) member variables must be aligned to the highest bytes of the size of any member variables to prevent performance penalties. For example, if you have 1 char variable (1-byte) and 1 int variable (4-byte) in a struct, the compiler will pads 3 bytes between these two variables. Therefore, the total size of this struct variable is 8 bytes, instead of 5 bytes. By doing this, the address of this struct data is divisible evenly by 4. This is called structure member alignment. Of course, the size of struct will be grown as a consequence.
// size = 2 bytes, alignment = 1-byte, address can be divisible by 1
struct S1 {
char m1; // 1-byte
char m2; // 1-byte
};
// size = 4 bytes, alignment = 2-byte, address can be divisible by 2
struct S2 {
char m1; // 1-byte
// padding 1-byte space here
short m2; // 2-byte
};
// size = 8 bytes, alignment = 4-byte, address can be divisible by 4
struct S3 {
char m1; // 1-byte
// padding 3-byte space here
int m2; // 4-byte
};
// size = 16 bytes, alignment = 8-byte, address can be divisible by 8
struct S4 {
char m1; // 1-byte
// padding 7-byte space here
double m2; // 8-byte
};
// size = 16 bytes, alignment = 8-byte, address can be divisible by 8
struct S5 {
char m1; // 1-byte
// padding 3-byte space here
int m2; // 4-byte
double m2; // 8-byte
};
You may use "pack" pragma directive to specify different packing alignment for struct, union or class members.
// 1-byte struct member alignment
// size = 9, alignment = 1-byte, no padding for these struct members
#pragma pack(push, 1)
struct S6 {
char m1; // 1-byte
double m2; // 8-byte
};
#pragma pack(pop)
Be aware of using custom struct member alignment. It may cause serious compatibility issues, for example, linking external library using different packing alignments. It is better use default alignment all the time.
Data Alignment for SSE
SSE (Streaming SIMD Extensions) defines 128-bit (16-byte) packed data types (4 of 32-bit float data) and access to data can be improved if the address of data is aligned by 16-byte; divisible evenly by 16.
You can declare a variable with 16-byte aligned in MSVC, using __declspec(align(16)) keyword;
__declspec(align(16)) float array[SIZE];
...
struct __declspec(align(16)) S1
{
float v[4];
}
Dynamic array can be allocated using _aligned_malloc() function, and deallocated using _aligned_free().
// allocate 16-byte aligned data
float* array = (float*)_aligned_malloc(SIZE*sizeof(float), 16);
...
// deallocate memory
_aligned_free(array);
Or, you can manually align address like this;
// allocate 15 byte larger array
// because in worst case, the data can be misaligned upto 15 bytes.
float* array = (float*)malloc(SIZE*sizeof(float)+15);
// find the aligned position
// and use this pointer to read or write data into array
float* alignedArray = (float*)(((unsigned long)array + 15) & (~0x0F));
...
// dellocate memory original "array", NOT alignedArray
free(array);
array = alignedArray = 0;
Because 16-byte aligned address must be divisible by 16, the least significant digit in hex number should be 0 all the time. That is why logical operators are used to make the first digit zero in hex number.
And, you may have from 0 to 15 bytes misaligned address. In worst case, you have to move the address 15 bytes forward before bitwise AND operation. Therefore, you need to append 15 bytes extra when allocating memory. For example, the 16-byte aligned addresses from 1000h are 1000h, 1010h, 1020h, 1030h, and so on. And if malloc() or C++ new operator allocates a memory space at 1011h, then we need to move 15 bytes forward, which is the next 16-byte aligned address.
Example
Download the source and binary: alignment.zip
This example source includes MS VisualStudio project file and source code for printing out the addresses of structure member alignment and data alignment for SSE. Note that it uses MS specific keywords; __declspec() and __alignof().