Basic Blocks in Compiler Design

Overview

A linear sequence of lines of code without any internal branching except at its beginning or end is referred to as a Basic Block in Compiler Design. These sets of instructions are executed sequentially without any interruption. The source code of high-level language is converted into intermediate code during compilation. This intermediate code is composed of such basic blocks and the flow of execution between them is depicted with the help of flow graphs.

What are Basic Blocks in Compiler Design?

A Basic Block in Compiler Design is a block of code that contains a sequence of instructions that has only one entry point and one exit point. Let us break this statement into smaller pieces to understand it better.

Single Entry Point: There is only one way to enter a basic block. The control flow can only get to the beginning of the basic block from the immediately preceding block.

Single Exit Point: There is only one way to exit from a basic block which is the last statement of the block, we cannot jump from in between the blocks.

Three-address code It is an intermediate code generated by the compiler. Three-address code as the name suggests can only have at max three operands in a single statement.

Consider the code below for the expression $a = b * c + 5 / 6$ :

The above code doesn't contain any branches or jump statements. It is only composed of sequences of sentences. There is only one way to get to this block of code which is the first statement function and the last statement is the only exit point for this block of code.

Consider another sequence of code:

The above code contains a goto statement and therefore, it cannot be considered as a basic block.

Security Considerations

Compilers modify block structures to identify vulnerabilities in code and ensure valid control flow and transitions. The compiler changes the block layout to detect vulnerabilities. Various security checks are performed at block ends.

Two types of attacks possible are control flow integrity and data flow integrity.

Control Flow Integrity(CFI) attacks are attacks performed by manipulating control flow to execute malicious code.
Data Flow Integrity(DFI) attacks are attacks performed on data by unauthorized personnel leading to security breaches.

Performance

Positive Impact:

Code is effectively optimized thus, leading to smaller and faster code.
Performs code analysis and transformations for performance improvements.

Potential Overhead:

Basic block identification and analysis add to compilation time.
Optimizations can introduce extra costs sometimes.

Comparing positive and overhead impacts, the benefits of basic blocks outweigh potential overhead.

Algorithm

Before moving to the algorithm to partition into basic blocks, let us first understand the leader statement. We initially determine the set of leader statements, the first statements of each basic block.

Input

Input to the algorithm is a sequence of the three-address statements.

Output

The algorithm returns a list of basic blocks. Each of these basic blocks is returned with three-address statements in exactly one block.

Method

1. Leader Statement:

The very first instruction in a block of code is considered a leader statement.
Any target statement to the conditional or unconditional jump is also a leader statement.
A statement that is immediately after a jump(conditional or unconditional) statement is also a leader statement.

2. Basic Block:

Each of these leader statements is the very first statement of each of the basic blocks. The statement immediately before the next leader statement is the last statement of the previous basic block.

Example

Consider the source code for the dot product of two vectors a and b of length 20.

The three-address code for the above block of code can be generated as follows:

Let us identify the leader statements to get the basic blocks from the above three-address code.

Statement 1 is always a leader statement.
In the $3^{rd}$ statement we enter a loop and thus, its first statement is the leader statement.

Thus, there are two leader statements and the basic blocks formed are:

Basic Block 1: STATEMENT (1) TO (2)
Basic Block 2: STATEMENT (3) TO (12)

Basic Block Construction

Let us look at another example to understand the construction of a basic block.

Certainly! Let's consider another example and generate its three-address code. After that, we'll identify the leader statements to form basic blocks.

The three-address code for the above block of code can be generated as follows:

There are four basic blocks formed:

Basic Block 1: STATEMENT (1) TO (3)
Basic Block 2: STATEMENT (4) TO (6)
Basic Block 3: STATEMENT (7) TO (9)
Basic Block 4: STATEMENT (10) TO (11)

Explanation

Now, let's identify the leader statements to form Basic Block in Compiler Design:

Statement 1 is always a leader statement.
Statement 4 is a leader statement as it is the target of a conditional jump.
Statement 7 is a leader statement, the first statement after the conditional jump.
Statement 10 is a leader statement as it is the target of an unconditional jump.

Each Basic Block in Compiler Design represents a sequence of instructions with a single entry and exit point.

Transformations on Basic Blocks

Optimization

Code written may not always be optimal. This is due to machine-level instructions and addresses not known to the programmer, optimization by hand requires more effort and time, and advanced architecture requires certain optimization.

There are two types of optimizations:

Machine Independent Optimization
Machine Dependent Optimization

Basic block construction is part of Machine Dependent Optimization.

Basic Block in Compiler Design enables optimization at local and global levels.

Local optimizations focus on individual blocks improving their efficiency.
Global optimizations include loop optimizations, function inlining, register allocation, etc. by analyzing interactions between blocks across larger code sections.

These optimizations aim to reduce redundancy, improve execution speed, and minimize overall resource usage.

Transformations

Several transformations are applied to these basic blocks without changing the set of expressions computed by the block. In other words, without modifying the output or meaning of the code.

The transformations are performed on Basic Block in Compiler Design to improve the code's performance. The transformations allow compilers to apply a variety of analysis and transformation techniques resulting in structured and effective code without compromising the correctness of the program.

Structure-Preserving Transformations

Let us explore various structure-preserving transformations.

Common Subexpression Elimination: Common subexpression elimination is the optimization technique in the compiler that eliminates redundant computations by recognizing previously computed expressions.

In the above example, b + c is already computed and stored in a. This computation can be eliminated for f.

After elimination:

Dead-Code Elimination: Dead code elimination in compiler optimization techniques removes code that is never going to be executed and is not going to affect the program's output. It is often caused by the variables or computations which are never used in the program.

After dead-code elimination:

Renaming Temporary Variables Renaming temporary variables is done to remove name conflicts and improve code clarity. Using proper names for the variables is important as it defines their meaning as well as avoids conflicts.

After renaming temporary variables:

Interchange of Statements Interchanging statements refers to changing the positions of statements without changing the output or meaning of code. This improves performance and simplifies the code structure.

After statement interchange

Statement 1 and Statement 2 are independent and don't depend on each other's results. The compiler has determined that swapping their order could improve performance. Possible reasons for swapping could be reduced pipeline stalls and instruction level parallelism.

Algebraic Transformations

Algebraic transformations refer to changing a set of expressions computed by a basic block into an algebraically equivalent set but with reduced cost and computations.

For example:

$x := x + 0$ - Wasting time and space for this kind of computation is useless.
Another popular example is $x := x * 1$ . There is no point in such calculations and they must be avoided.
$a := pow(5,2)$ calculates the square of 5. To simply calculate the square we can use the multiplication operator as $a := 5 * 5$ which reduces overall cost.

FAQs

Q. What is a Basic Block?

A. In compiler construction, a basic block is a fundamental unit of code that has the following key properties:

Single entry point: It can only be entered at its first instruction.
Single exit point: It can only be exited at its last instruction.
No internal branches: It doesn't contain any conditional or unconditional jumps within itself, except for the one at the end.
Straight-line code: The instructions within a basic block are always executed sequentially, one after the other.

Q. How can we represent a basic block?

A. A basic block can be represented using three-address code, which breaks down complex statements into simpler instructions. Each basic block is usually identified by a leader statement, and the instructions within the block are organized linearly. Control flow between basic blocks is often represented using directed edges in a control flow graph.

Q. What are the basic blocks in the control flow diagram?

A. - In a control flow diagram, basic blocks are represented by nodes, and the control flow between them is represented by directed edges.

Each basic block corresponds to a set of instructions executed sequentially without any branching within the block.
The entry and exit points of a basic block align with the points in the program where control flow can enter or exit that block.
They provide a structured representation of a program's control flow, making it easier to apply transformations and improvements.
The flow graph for the first example can be represented as follows:

basic block

Conclusion

A Basic Block in Compiler Design is a block having only one entry point and one exit point. Each block only consists of a linear sequence of instructions.
To construct a basic block we must first identify leader statements.
Each leader statement is the first statement of the block and the statement right before the next leader statement is the last statement of the basic block.
Transformations are performed on the basic blocks to improve their efficiency and performance. Transformations can be algebraic transformations or structure-preserving transformations.
Local and global optimizations are performed on basic blocks. Basic block structures have several vulnerabilities such as CFI and DFI but their benefits outweigh the potential overheads.