What is Structure Padding in C?
Structure padding mainly tells us about memory allocation for variables that are aligned in sequence based on the size of the variable. For example, let us assume a “char” of 1-byte memory can be assigned anywhere between 0x5000 to 0x5001. Here comes the concept of "Structure padding in C". It manipulates the memory allocation for different data types and reduces the CPU cycle of a system. Structure padding adds one or more empty bytes between the memory addresses to align the data in memory.
Syntax of Structure Padding in C
Lets us see an example for understanding the syntax of Structure Padding in C.
Explanation:
In the above example, we can see a structure is declared in which there are three structure members namely- 'char' which will take 1 byte, them there is int type so, padding will be done here and three empty blocks will be assigned then 'int' take 4 bytes. And then the remaining 'int' and 'double' will take 4 and 8 bytes respectively.
How Does Structure Padding Work in C?
Let us take an example to understand the concept of structure padding in C: First, we will create a structure of type "bag" and declare its members of different data types.
As we can see in the above example, after creating the object, memory will be allocated to the structure members in sequence. Variable 'x' will be allocated by first memory, then variable 'y', and then variable 'z'. Here the two char 'x' and 'y' will take 1 byte and 1 byte respectively. And the int 'z' will take 4 bytes in the memory. But this memory allocation will not be in a structured manner. Since 32-bit processor takes 4 bytes at a time and 64-bit processor takes 8 bytes at a time.
Let us consider here we are using a 32-bit processor. The memory allocation would be like : IMAGE
Some Examples of Structure Padding in C
Here are some examples of structure padding for better understanding:
Example 1
Why Structure Padding?
Suppose we have a 32-bit processor and we have a structure in which there are three structure members char, char, and int type. The memory block allocation will be contiguous: '1 byte' for char x, '1 byte' for char y, and '4 bytes' for int z. We know that 32-bit architecture reads 4 bytes in a single CPU cycle. For accessing the remaining 2 bytes of int, the processor needs one more CPU cycle.
Now let us assume a situation in which we want to access only the int variable. And the int variable is divided into two portions here. The processor should access int'z' in one CPU cycle but the CPU will access int 'z' in two CPU cycles. Here the unnecessary wastage of the CPU cycle appears. For avoiding this problem, the concept of 'Structure Padding' was introduced. This process is done automatically by the compiler to reduce the number of CPU cycles.
How is Structure Padding Done?
For achieving structure padding, we need to create an empty row on the left and two bytes of the 'z' variable are shifted from the left to the right of memory. By doing this, all four bytes of 'z' variable are shifted to the right side and the whole 'z' variable can be accessed in a single CPU cycle. Now we can clearly see that the occupied memory size is now different after structure padding is done. The memory allocated will be now greater than the previous one i.e- (1+1+2+4) bytes. After structure padding, the memory utilization increases but the CPU cycle is reduced.
Lets take an example for better understanding.
Output:
Explanation: We can see in the above example, how a structure named "bag" is created in which different objects inside the bag is declared as variables. sizeof() operator is used to calculate the total size of the bag. As we can see the output is 8 bytes only due to structure padding being done here.
Changing Order of the Variables
We have seen how the memory size duffers when we take structure members order-wise. Now we will see if changing the order of variables affect the result or not. We will take the same example for better understanding.
But change the order of variables.
Output:
Here we can see the result is changed to 12 bytes when we change the order of variables inside the structure. Now let us discuss why the result is different in the two cases.
- When the variables are in an ordered manner inside the structure, the memory allocation will be like this-
- When the variables are not in an ordered manner inside the structure, the memory allocation will be like this:
The first variable is of char type so 'x' will occupy 1 byte of memory.Then the second variable is of int type so 'z' will occupy 4 bytes of memory. Since the int variable occupies 4 bytes and there are only 3 bytes remaining on the left side so the 'int' type will shift to the next 4-byte space and an 'empty' space will be added after the char type variable. By doing this integer variable will be accessed by a single CPU cycle.
Now it's time to allocate memory for char 'z'. We know that the CPU can access 4 bytes at a time so the next 4 bytes will be allocated to type char 'y'. And hence the total memory allocated to access these three variables will be (4 + 4 + 4 = 12 bytes).
How to Avoid the Structure Padding in C?
There comes a time when structure padding should be avoided. We know that Structure padding in C language is an automatic process that is done by a compiler. Sometimes the size of the structure becomes greater than the size of the structure members, so we need to stop structure padding in this case.
Following are the various ways in which we can avoid structure padding in C language. Each of them is explained with suitable examples.
1. Using #pragma pack(1) Directive:
#pragma pack(1) directive is used to avoid structure padding by forcing the compiler not to use structure padding and align the structure members end to end during the memory allocation process.
Example:
Output:
Here, we can see the size of structure padding is 16 bytes by default but after avoiding structure padding, the result is 13 bytes. In this way, #pragma pack(1) directive can be used to avoid structure padding and reduce memory wastage.
2. Using Attribute:
Lets see an example.
Output:
In the above example, we can see how using attributes, structure padding is avoided and memory wastage is reduced.
3. Avoid Relying on Structure Layout:
Another method to avoid structure padding is all the padding fields can be avoided together. In it, the data can be accessed directly from structure members rather than the structure. This can be done only when padding data can be safely ignored by all use cases, while solving data structure questions. There are some tools present which can be used for detecting structure padding in a program like 'wpadded', 'Clang'. 'wpadded' gives a compiler warning and 'Clang' detects padding. If any padding is found, we can fill the empty spaces with placeholders.
4. Use memset to Zero-initialize Padding Bits:
memset is a function used in 'structure padding C' language which fills the empty space during the memory allocation process. This function is found in 'string.h' library. Memset is done manually so that it will be error-free. It sets all the memory space including padding bits with a particular value.
Syntax of memset:
Structure (Zero) Initialization
Objects that have static storage for variables, data members or functions can be initialized once only. This will remain same for the whole life of program. The objects that have static storage duration, for them padding bits can be initialized to 0 (zero).
According to C11 standard, chapter §6.2.6.1/6, for any uninitialized objects:
- When a value is stored in a structure member of the structure or union type, including its structure member(member objects), the bytes of the structure member take unspecified values that corresponds to any padding bytes of the object.
The Current State
We can set the current state of a structure using four different ways. Let us first describe an example structure.
Let us now see the various initialization strategies:
1. memset to Zeros
Example:
2. Individually Set all Members to 0
Example:
3. Use { 0 } Zero-initializer
Example:
4. Use {} GCC Extension Zero-initializer
Example:
Understanding of Structure Padding in C with Alignment
When we create a structure in C and its structure members are declared that may be of any data type, then the compiler automatically inserts some memory bytes between the structure members so that the alignment can be done easily. The extra bytes that the compiler provides between the structure members are called as 'padding bytes'. This process of aligning the structure members or variables inside memory is called 'Structure Padding in C'.
By using the concept of Structure Padding, we utilize more memory but it decreases the CPU cycle, which increases the processor performance.
When Does this Matter?
Now, we have an understanding of Structure padding so, we will see when does structure padding C makes a difference and sometimes it is not. First, we will see when does this matter?
Comparing padded structs:
Lets take an example of comparison between two padded structs by using' memcmp':
In the above program, structs are compared using memcmp. We have to keep in mind that this program may give an incorrect result if padding is not counted.
Serializing structs outside the application:
When we serialize the structs outside the application, ity may be a matter of context.
If this set of data will be read by different software, or if a new field is added to the structure, it may be sensible to pack the struct to make the data structure easy and clear.
Security Issues:
Structure padding in C is a very important and useful concept but sometimes it may be less secure as the values stored in the padding spaces between the variable members can potentially leak some sensitive information or data. This is possible if the structures cross the trust boundaries. The padding spaces may contain data from other objects that were previously used for allocation on the stack. These are the security issues with structure padding in C.
When it Doesn’t Matter?
There are many situations where structure padding will not cause any issue like if the structure is used only internally in the application library. By internally, it means the structure is never serialized out of external storage or over an external communication interface. Security issues of structure padding can be resolved by accessing the structure on a per-member basis.
Conclusion:
- Structure padding is defined as the process of adding one or more empty bytes between the different data types to align data in memory.
- Structure padding increases memory consumption but is reduces CPU cycles.
- Structure contains structure members which can be accessed by a processor in chunks of 4 bytes at a time.
- Changing the order of variables inside will result in a change of output.
- Sometimes we need to avoid structure padding, when the size of the structure is greater than the size of the structure members.
- Padding can be avoided in the following ways: #pragma pack(1) directive, using an attribute, Avoid relying on structure layout, use memset to zero-initialize padding bits.
- Sometimes, structure padding may lead to a security breach.
- But it will not matter when it is used inside the application library.