9.4 KiB
C++ Linkers: From Object Files to Executables
Symbol Resolution · Static & Dynamic Linking · Name Mangling · Debug Techniques
1. What Is a Linker?
The linker is the final step in the build pipeline. It combines multiple object files and libraries into a single executable by:
- Resolving symbol references — matching function/variable uses to their definitions across translation units
- Relocating — assigning final memory addresses to all symbols
- Stripping or keeping debug info, depending on build flags
Source Files (.cpp)
↓ [compiler]
Object Files (.o)
↓ [linker: ld / lld / link.exe]
Executable (ELF / Mach-O / PE)
2. The Compilation Pipeline
Each .cpp file is compiled independently into a relocatable object file.
// foo.cpp — defines the symbol
int add(int a, int b) {
return a + b;
}
// main.cpp — references the symbol
extern int add(int, int); // declaration only
int main() {
int r = add(3, 4); // unresolved reference until link time
return r;
}
g++ -c foo.cpp -o foo.o # compile only, no link
g++ -c main.cpp -o main.o
g++ foo.o main.o -o app # link step
The four stages: Preprocessing → Compilation → Assembly → Linking
3. Symbol Resolution
The linker maintains a symbol table and matches every UNDEF reference to a GLOBAL definition.
// math.cpp — defines the symbol
double square(double x) { return x * x; }
// main.cpp — references the symbol
double square(double); // extern declaration
int main() {
return (int)square(5.0); // unresolved until link
}
| Symbol | Type | Binding |
|---|---|---|
_ZN4math6squareEd |
FUNC | GLOBAL |
__gxx_personality_v0 |
UNDEF | GLOBAL |
Strong symbols (definitions) must be unique. Weak symbols can be overridden. UNDEF means referenced but not yet defined.
nm -C -g math.o # list exported symbols (demangled)
nm -u main.o # show unresolved (UNDEF) symbols
objdump -t main.o # full symbol table dump
4. Name Mangling
C++ encodes namespaces, class names, and parameter types into symbol names so the linker can distinguish overloads.
// C++ overloaded functions → different mangled names
int process(int x); // _Z7processi
int process(double x); // _Z7processd
int process(int x, double y); // _Z7processid
namespace Math {
double sqrt(double x); // _ZN4Math4sqrtEd
}
class Vector {
double dot(const Vector& v); // _ZN6Vector3dotERKS_
};
Each compiler uses its own ABI scheme (Itanium ABI on Linux/macOS, MSVC on Windows), so mixing compiler-built objects requires caution.
# Disable mangling for C interoperability
extern "C" {
int legacy_init(void); // symbol stays: legacy_init
void legacy_free(void*); // symbol stays: legacy_free
}
# Demangle a symbol manually
c++filt _ZN4Math4sqrtEd # → Math::sqrt(double)
5. Static Linking
The linker copies the needed object files from .a archives directly into the executable.
# Build a static library
ar rcs libmymath.a vec.o mat.o quat.o
# Link statically — no runtime dependencies
g++ main.o -L. -lmymath -static -o app_static
# Verify: no shared lib deps
ldd app_static # → statically linked
Pros: single self-contained binary, no runtime dependency issues, faster startup.
Cons: larger binary, security patches require a full rebuild, code is duplicated across binaries.
Note: Link order matters — list object files before libraries:
g++ main.o -lmymath, notg++ -lmymath main.o.
6. Dynamic Linking
Shared libraries (.so / .dll / .dylib) are loaded at runtime by the dynamic linker (ld.so).
# Build a shared library (-fPIC is required)
g++ -fPIC -shared vec.o mat.o -o libmymath.so
# Link dynamically (default behavior)
g++ main.o -L. -lmymath -Wl,-rpath,'$ORIGIN' -o app
# Inspect runtime dependencies
ldd app
# libmymath.so => ./libmymath.so
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
# Override a library at runtime (useful for mocking)
LD_PRELOAD=./mock_net.so ./app
PLT/GOT mechanism: external calls go through the Procedure Linkage Table (PLT); the Global Offset Table (GOT) holds resolved addresses filled in lazily on first call. Use -z now or BIND_NOW to resolve all symbols at startup instead.
Pros: shared memory between processes, hot-patching by replacing .so, smaller binaries.
Cons: dependency management ("DLL hell"), slight startup overhead, harder single-file deployment.
7. Static vs. Dynamic: At a Glance
| Aspect | Static (.a) |
Dynamic (.so / .dll) |
|---|---|---|
| Resolution | Link time | Load / runtime |
| Binary size | Larger (code embedded) | Smaller (references only) |
| Memory sharing | No — each process has its own copy | Yes — single copy in RAM |
| Deployment | One self-contained file | Must ship .so alongside |
| Hot patching | Full relink required | Replace .so and restart |
| Startup overhead | Minimal | Dynamic loader adds ~ms |
| Security updates | Manual rebuild | OS-level update propagates |
8. Linking with C Libraries
Use extern "C" to suppress name mangling when calling C code from C++ (or exposing C++ to C callers).
// wrapper.h — expose C++ code to C callers
#pragma once
#ifdef __cplusplus
extern "C" { // disables mangling for these symbols
#endif
void vec_create(void** out);
void vec_destroy(void* vec);
void vec_push(void* vec, double val);
double vec_get(void* vec, int idx);
#ifdef __cplusplus
}
#endif
// Calling a C library from C++
extern "C" int sqlite3_open(const char*, void**);
g++ main.cpp wrapper.cpp -lsqlite3 -o app
# -l<name> → links libname.so or libname.a
# -L<path> → add directory to library search path
# -Wl,--as-needed → skip libs that aren't actually used
Common pitfalls:
- Forgetting
extern "C"→ mangled name doesn't match the C header - C struct padding may differ across compilers, breaking ABI
- C code cannot unwind C++ exceptions — use
noexceptat boundaries - Link order still matters: objects first, then libraries
9. Common Linker Errors
undefined reference to 'add(int, int)'
Cause: definition is missing or the library wasn't linked.
Fix: add -lmylib or include the .cpp that defines it.
multiple definition of 'globalVar'
Cause: variable defined (not just declared) in a header included by multiple TUs.
Fix: use inline (C++17), or extern declaration in the header + one definition in a .cpp.
cannot find -lmylib
Cause: linker can't locate libmylib.so or libmylib.a.
Fix: add -L/path/to/lib, or set LD_LIBRARY_PATH / PKG_CONFIG_PATH.
10. Debugging Linker Issues
# Inspect symbols
nm -C -g lib.a # demangled, global symbols only
nm -u main.o # undefined (unresolved) symbols
objdump -d my.o # disassembly
readelf -s my.o # ELF symbol table
# Trace linker decisions
g++ main.o -lmylib -Wl,--verbose 2>&1 | grep "attempt"
ld --trace my.o # shows each file the linker considers
# Check shared lib deps
ldd ./app
chrpath -l ./app # show embedded RPATH
# Demangle a mangled symbol
c++filt _ZN4Math4sqrtEd # → Math::sqrt(double)
Useful flags:
-Wl,--no-undefined— catch unresolved symbols at build time-Wl,--as-needed— skip unused shared libraries-Wl,--start-group ... --end-group— resolve circular dependencies between archives
Rule of thumb: "When in doubt,
nmit out."
11. Bonus: Link-Time Optimization (LTO)
Without LTO, the compiler can only optimize within a single translation unit. With LTO, it embeds IR (Intermediate Representation) in .o files and performs whole-program optimization at link time.
# Enable LTO
g++ -flto -O2 -c foo.cpp -o foo.o
g++ -flto -O2 -c main.cpp -o main.o
g++ -flto -O2 foo.o main.o -o app_lto
# Thin LTO — faster, scales to large codebases (clang)
clang++ -flto=thin -O2 *.cpp -o app
LTO enables cross-TU inlining, dead code elimination, inter-procedural constant propagation, and whole-program devirtualization — typically 10–25% speedup on real codebases.
Gotcha: all TUs must be compiled with -flto. Third-party archives compiled without it will still link, but that code won't be optimized across boundaries.
Summary & Quick Reference
Key concepts
- Linker: resolves symbols, relocates addresses, produces the binary
- Symbol resolution order: strong > weak > UNDEF
- Name mangling encodes C++ type info into flat symbol names
extern "C"disables mangling for C interoperability
Essential commands
| Command | Purpose |
|---|---|
nm -C -g lib.a |
List exported symbols (demangled) |
c++filt <sym> |
Demangle a symbol |
ldd app |
Show shared library dependencies |
objdump -d obj |
Disassemble object file |
ar rcs lib.a *.o |
Create a static library |
readelf -s obj |
ELF symbol table |
Build flags
| Flag | Effect |
|---|---|
-static |
Link everything statically |
-fPIC -shared |
Build a position-independent shared library |
-Wl,--no-undefined |
Fail at link time on unresolved symbols |
-Wl,-rpath,... |
Embed library search path in binary |
-flto |
Enable link-time optimization |
-Wl,--as-needed |
Only link libraries that are actually used |