Function Inlining by Compiler
Published:
Compilers often insert the function code into caller’s code stream, which is called function’s inline expansion.
But before discussing the performance aspect of inlining a function call lets review the One Definition Rule (ODR) in C++ and see how inline specifier helps in allowing multiple definitions of a function across different translation units.
One Definition Rule1: The One Definition Rule (ODR) in C++ states that an entity (like a variable, function, or class) must have exactly one definition in the entire program, and if it appears in multiple translation units, the definitions must be identical.
If you define a function in multiple translation units without using the inline
specifier, the linker will throw a multiple definition
error.
Let’s see this through an example:
// add.h
#pragma once
int add(int a, int b) {
return a + b;
}
// a.cpp
#include "add.h"
#include <iostream>
int foo();
int main() {
int result_1 = add(1, 1);
std::cout << result_1 << std::endl;
int result_2 = foo();
std::cout << result_2 << std::endl;
}
// b.cpp
#include "add.h"
int foo() {
int result = add(1, 2);
return result;
}
Here we have one header file add.h
and we have included this file in two cpp files a.cpp
and b.cpp
. The preprocessor will copy paste the content of the header file in both the cpp files, resulting in multiple definitions.
When attempting to compile the program using clang
, the following error occurred:
shivamverma@Shivams-MacBook-Air ~/Temp/inline % clang a.cpp b.cpp -o prog
duplicate symbol 'add(int, int)' in:
/private/var/folders/5x/brsly38n1yn9ry7bd6qbrzw80000gn/T/a-abe000.o
/private/var/folders/5x/brsly38n1yn9ry7bd6qbrzw80000gn/T/b-e5c28b.o
ld: 1 duplicate symbols
clang: error: linker command failed with exit code 1 (use -v to see invocation)
But when we add inline
specifier in the add function, the linker throws no error.
// add.h
#pragma once
inline int add(int a, int b) {
return a + b;
}
The inline
specifier changes this behavior by indicating that the function’s definition can appear in multiple translation units, and the linker treats it as a weak symbol2.
Functions defined in header files are often declared as inline because the header may be included in multiple translation units.
I recommend reading this post written by my friend3, if you want to delve into how the inline specifier affects the compilation and linking of a function.
Performance Improvement?
Assuming that we are now comfortable about the ODR and how marking a function inline helps that, lets discuss function inling from performance POV.
Why Inlining
When the compiler inline-expands a function call, the function’s code gets inserted into the caller’s code stream.
When a program makes a function call, the instruction pointer (IP) jumps to a different memory address, executes the instructions at that location, and then jumps back to the original location.
This jumping to a new address can be inefficient because the next instruction to be executed may not be cached in the L1-I cache
.
If the function is small, it often makes more sense for it to be inlined in the caller’s code stream. In such cases, there is no jump to an arbitrary location, and the L1-I cache
remains warm.
The overheads of the stack operations, jump instructions, and return instructions are also eliminated if the function’s body is directly executed as part of caller’s code4.
Additionally, compilers are generally better suited to apply optimizations when the code is inlined, compared to optimizing across multiple distinct functions.
Why not always inline
Inlining all function calls can lead to code bloat, increasing the size of the executable and potentially causing cache thrashing.
Consider a scenario in the hot path: Before sending an order to the exchange, we perform a sanity check. If there is an error, we call the function logAndDebug
, which handles some bookkeeping internally. In the typical case (the happy path), the order is sent to the exchange.
bool isError = checkOrder(order);
if (isError) {
logAndDebug(order);
} else {
sendOrderToExchange(order);
}
Here, isError
is rarely true, and the happy path is executed most of the time.
If the function logAndDebug
were inlined, unnecessary instructions—executed only in rare cases—would occupy space in the instruction cache, potentially polluting it. This could slow down the program instead of improving performance.
Summary
Using the inline specifier is just a suggestion that the compiler is allowed to ignore. Modern compilers are pretty smart and may choose not to inline if it deems the function too large or complex based on some heuristics.
Another recommended reference: C++ FAQs
Footnotes