C Struct Layout Rules

The issue of the layout of C struct data type fields has cropped up a number of times recently, so it seems appropriate to document the apparent layout rules. This is important to developers who are using a language other than C to access netcdf-4 datatypes: python, or fortran.

These rules are taken from the HDF5 code. They are used in netcdf in ncgen4 and (the soon to be released) DAP->netcdf-4 translator.

The key to the layout is the notion of alignment. The alignment of a primitive data type (e.g. char, short, int, etc.) is the memory boundary on which all instances of the type should occur. As a rule, the alignment of a primitive type is equal to the sizeof(). Thus, the alignment of a char is 1, a short is 2, and so on. Note that the alignment of long depends on the machine. For 32-bit machines, it is 4 and for 64-bit machines the alignment of a long is 8.

|However, the above rule is not always correct.  For some machines, the alignment boundary may be smaller than the sizeof() function indicates. For example, on a SPARC, double values can be aligned on a 4-byte boundary instead of the expected 8-byte boundary. This means the alignment must be computed on a per-machine (though hopefully not on a per-compiler basis). To compute these true alignments, one must construct the following set of C structs.

|    struct S { char f1; T f2;}

|T ranges over all of the possible primitive types: char, short, int, float, double, etc. For each such struct, the value of the offsetof(S,f2) macro (from stddef.h) must be calculated and used as the alignment for type T.  The offset of a field in a C struct is the relative address of the field from the beginning of the struct, where the initial offset is zero. Thus, on a SPARC, offsetof(S,f2) when T = double is 4, whereas on a 64-bit X86 machine, offsetof(S,f2) when T = double is 8. This value is the alignment that must be used when computing struct offsets as defined below.

To test if a primitive type is properly aligned, the following should be true, where A is the address and alignment is the alignment of the primitive type.

 ((unsigned long)A) % alignment == 0 

Given this, the rules for layout of a C struct are as follows.

  1. The initial offset is zero
  2. Given a current offset, O, and a field F whose alignment is A, the offset of F is O + P, where P is the padding needed to be added to make sure that F is aligned to A. P is defined as
    (O % A == 0)?0:(A - (O % A)). 
  3. After adding field F, the offset is then O = O + P + A.
  4. One more rule is needed to complete the description. It appears that the alignment of a nested structure is the alignment of the most stringent field in the nested structure. "Stringent" effectively means the largest alignment.
  5. The size of a struct is the offset after the last field is added rounded up to a multiple of the most stringent field alignment.

More simply put, when adding a field, bump the offset until the offset is at the alignment required by the field.

Posted by: dmh
Mar 30, 2009

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and email addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Article Category
Article type
Developer Blog