C++ linker presentation
This commit is contained in:
316
cpplinker/cpp_linkers.md
Normal file
316
cpplinker/cpp_linkers.md
Normal file
@@ -0,0 +1,316 @@
|
||||
# C++ Linkers: From Object Files to Executables
|
||||
|
||||
*Symbol Resolution · Static & Dynamic Linking · Name Mangling · Debug Techniques*
|
||||
|
||||
---
|
||||
|
||||
## 1. What Is a Linker?
|
||||
|
||||
The linker is the final step in the build pipeline. It combines multiple object files and libraries into a single executable by:
|
||||
|
||||
- **Resolving symbol references** — matching function/variable uses to their definitions across translation units
|
||||
- **Relocating** — assigning final memory addresses to all symbols
|
||||
- **Stripping or keeping** debug info, depending on build flags
|
||||
|
||||
```
|
||||
Source Files (.cpp)
|
||||
↓ [compiler]
|
||||
Object Files (.o)
|
||||
↓ [linker: ld / lld / link.exe]
|
||||
Executable (ELF / Mach-O / PE)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. The Compilation Pipeline
|
||||
|
||||
Each `.cpp` file is compiled independently into a relocatable object file.
|
||||
|
||||
```cpp
|
||||
// foo.cpp — defines the symbol
|
||||
int add(int a, int b) {
|
||||
return a + b;
|
||||
}
|
||||
|
||||
// main.cpp — references the symbol
|
||||
extern int add(int, int); // declaration only
|
||||
int main() {
|
||||
int r = add(3, 4); // unresolved reference until link time
|
||||
return r;
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
g++ -c foo.cpp -o foo.o # compile only, no link
|
||||
g++ -c main.cpp -o main.o
|
||||
g++ foo.o main.o -o app # link step
|
||||
```
|
||||
|
||||
The four stages: **Preprocessing** → **Compilation** → **Assembly** → **Linking**
|
||||
|
||||
---
|
||||
|
||||
## 3. Symbol Resolution
|
||||
|
||||
The linker maintains a symbol table and matches every `UNDEF` reference to a `GLOBAL` definition.
|
||||
|
||||
```cpp
|
||||
// math.cpp — defines the symbol
|
||||
double square(double x) { return x * x; }
|
||||
|
||||
// main.cpp — references the symbol
|
||||
double square(double); // extern declaration
|
||||
int main() {
|
||||
return (int)square(5.0); // unresolved until link
|
||||
}
|
||||
```
|
||||
|
||||
| Symbol | Type | Binding |
|
||||
|--------|------|---------|
|
||||
| `_ZN4math6squareEd` | FUNC | GLOBAL |
|
||||
| `__gxx_personality_v0` | UNDEF | GLOBAL |
|
||||
|
||||
**Strong symbols** (definitions) must be unique. **Weak symbols** can be overridden. `UNDEF` means referenced but not yet defined.
|
||||
|
||||
```bash
|
||||
nm -C -g math.o # list exported symbols (demangled)
|
||||
nm -u main.o # show unresolved (UNDEF) symbols
|
||||
objdump -t main.o # full symbol table dump
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Name Mangling
|
||||
|
||||
C++ encodes namespaces, class names, and parameter types into symbol names so the linker can distinguish overloads.
|
||||
|
||||
```cpp
|
||||
// C++ overloaded functions → different mangled names
|
||||
int process(int x); // _Z7processi
|
||||
int process(double x); // _Z7processd
|
||||
int process(int x, double y); // _Z7processid
|
||||
|
||||
namespace Math {
|
||||
double sqrt(double x); // _ZN4Math4sqrtEd
|
||||
}
|
||||
|
||||
class Vector {
|
||||
double dot(const Vector& v); // _ZN6Vector3dotERKS_
|
||||
};
|
||||
```
|
||||
|
||||
Each compiler uses its own ABI scheme (Itanium ABI on Linux/macOS, MSVC on Windows), so mixing compiler-built objects requires caution.
|
||||
|
||||
```bash
|
||||
# Disable mangling for C interoperability
|
||||
extern "C" {
|
||||
int legacy_init(void); // symbol stays: legacy_init
|
||||
void legacy_free(void*); // symbol stays: legacy_free
|
||||
}
|
||||
|
||||
# Demangle a symbol manually
|
||||
c++filt _ZN4Math4sqrtEd # → Math::sqrt(double)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Static Linking
|
||||
|
||||
The linker copies the needed object files from `.a` archives directly into the executable.
|
||||
|
||||
```bash
|
||||
# Build a static library
|
||||
ar rcs libmymath.a vec.o mat.o quat.o
|
||||
|
||||
# Link statically — no runtime dependencies
|
||||
g++ main.o -L. -lmymath -static -o app_static
|
||||
|
||||
# Verify: no shared lib deps
|
||||
ldd app_static # → statically linked
|
||||
```
|
||||
|
||||
**Pros:** single self-contained binary, no runtime dependency issues, faster startup.
|
||||
**Cons:** larger binary, security patches require a full rebuild, code is duplicated across binaries.
|
||||
|
||||
> **Note:** Link order matters — list object files before libraries: `g++ main.o -lmymath`, not `g++ -lmymath main.o`.
|
||||
|
||||
---
|
||||
|
||||
## 6. Dynamic Linking
|
||||
|
||||
Shared libraries (`.so` / `.dll` / `.dylib`) are loaded at runtime by the dynamic linker (`ld.so`).
|
||||
|
||||
```bash
|
||||
# Build a shared library (-fPIC is required)
|
||||
g++ -fPIC -shared vec.o mat.o -o libmymath.so
|
||||
|
||||
# Link dynamically (default behavior)
|
||||
g++ main.o -L. -lmymath -Wl,-rpath,'$ORIGIN' -o app
|
||||
|
||||
# Inspect runtime dependencies
|
||||
ldd app
|
||||
# libmymath.so => ./libmymath.so
|
||||
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
|
||||
|
||||
# Override a library at runtime (useful for mocking)
|
||||
LD_PRELOAD=./mock_net.so ./app
|
||||
```
|
||||
|
||||
**PLT/GOT mechanism:** external calls go through the *Procedure Linkage Table* (PLT); the *Global Offset Table* (GOT) holds resolved addresses filled in lazily on first call. Use `-z now` or `BIND_NOW` to resolve all symbols at startup instead.
|
||||
|
||||
**Pros:** shared memory between processes, hot-patching by replacing `.so`, smaller binaries.
|
||||
**Cons:** dependency management ("DLL hell"), slight startup overhead, harder single-file deployment.
|
||||
|
||||
---
|
||||
|
||||
## 7. Static vs. Dynamic: At a Glance
|
||||
|
||||
| Aspect | Static (`.a`) | Dynamic (`.so` / `.dll`) |
|
||||
|---|---|---|
|
||||
| Resolution | Link time | Load / runtime |
|
||||
| Binary size | Larger (code embedded) | Smaller (references only) |
|
||||
| Memory sharing | No — each process has its own copy | Yes — single copy in RAM |
|
||||
| Deployment | One self-contained file | Must ship `.so` alongside |
|
||||
| Hot patching | Full relink required | Replace `.so` and restart |
|
||||
| Startup overhead | Minimal | Dynamic loader adds ~ms |
|
||||
| Security updates | Manual rebuild | OS-level update propagates |
|
||||
|
||||
---
|
||||
|
||||
## 8. Linking with C Libraries
|
||||
|
||||
Use `extern "C"` to suppress name mangling when calling C code from C++ (or exposing C++ to C callers).
|
||||
|
||||
```cpp
|
||||
// wrapper.h — expose C++ code to C callers
|
||||
#pragma once
|
||||
#ifdef __cplusplus
|
||||
extern "C" { // disables mangling for these symbols
|
||||
#endif
|
||||
void vec_create(void** out);
|
||||
void vec_destroy(void* vec);
|
||||
void vec_push(void* vec, double val);
|
||||
double vec_get(void* vec, int idx);
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
```
|
||||
|
||||
```cpp
|
||||
// Calling a C library from C++
|
||||
extern "C" int sqlite3_open(const char*, void**);
|
||||
```
|
||||
|
||||
```bash
|
||||
g++ main.cpp wrapper.cpp -lsqlite3 -o app
|
||||
# -l<name> → links libname.so or libname.a
|
||||
# -L<path> → add directory to library search path
|
||||
# -Wl,--as-needed → skip libs that aren't actually used
|
||||
```
|
||||
|
||||
**Common pitfalls:**
|
||||
- Forgetting `extern "C"` → mangled name doesn't match the C header
|
||||
- C struct padding may differ across compilers, breaking ABI
|
||||
- C code cannot unwind C++ exceptions — use `noexcept` at boundaries
|
||||
- Link order still matters: objects first, then libraries
|
||||
|
||||
---
|
||||
|
||||
## 9. Common Linker Errors
|
||||
|
||||
### `undefined reference to 'add(int, int)'`
|
||||
|
||||
**Cause:** definition is missing or the library wasn't linked.
|
||||
**Fix:** add `-lmylib` or include the `.cpp` that defines it.
|
||||
|
||||
### `multiple definition of 'globalVar'`
|
||||
|
||||
**Cause:** variable defined (not just declared) in a header included by multiple TUs.
|
||||
**Fix:** use `inline` (C++17), or `extern` declaration in the header + one definition in a `.cpp`.
|
||||
|
||||
### `cannot find -lmylib`
|
||||
|
||||
**Cause:** linker can't locate `libmylib.so` or `libmylib.a`.
|
||||
**Fix:** add `-L/path/to/lib`, or set `LD_LIBRARY_PATH` / `PKG_CONFIG_PATH`.
|
||||
|
||||
---
|
||||
|
||||
## 10. Debugging Linker Issues
|
||||
|
||||
```bash
|
||||
# Inspect symbols
|
||||
nm -C -g lib.a # demangled, global symbols only
|
||||
nm -u main.o # undefined (unresolved) symbols
|
||||
objdump -d my.o # disassembly
|
||||
readelf -s my.o # ELF symbol table
|
||||
|
||||
# Trace linker decisions
|
||||
g++ main.o -lmylib -Wl,--verbose 2>&1 | grep "attempt"
|
||||
ld --trace my.o # shows each file the linker considers
|
||||
|
||||
# Check shared lib deps
|
||||
ldd ./app
|
||||
chrpath -l ./app # show embedded RPATH
|
||||
|
||||
# Demangle a mangled symbol
|
||||
c++filt _ZN4Math4sqrtEd # → Math::sqrt(double)
|
||||
```
|
||||
|
||||
**Useful flags:**
|
||||
- `-Wl,--no-undefined` — catch unresolved symbols at build time
|
||||
- `-Wl,--as-needed` — skip unused shared libraries
|
||||
- `-Wl,--start-group ... --end-group` — resolve circular dependencies between archives
|
||||
|
||||
> *Rule of thumb: "When in doubt, `nm` it out."*
|
||||
|
||||
---
|
||||
|
||||
## 11. Bonus: Link-Time Optimization (LTO)
|
||||
|
||||
Without LTO, the compiler can only optimize within a single translation unit. With LTO, it embeds IR (Intermediate Representation) in `.o` files and performs whole-program optimization at link time.
|
||||
|
||||
```bash
|
||||
# Enable LTO
|
||||
g++ -flto -O2 -c foo.cpp -o foo.o
|
||||
g++ -flto -O2 -c main.cpp -o main.o
|
||||
g++ -flto -O2 foo.o main.o -o app_lto
|
||||
|
||||
# Thin LTO — faster, scales to large codebases (clang)
|
||||
clang++ -flto=thin -O2 *.cpp -o app
|
||||
```
|
||||
|
||||
LTO enables cross-TU inlining, dead code elimination, inter-procedural constant propagation, and whole-program devirtualization — typically 10–25% speedup on real codebases.
|
||||
|
||||
**Gotcha:** all TUs must be compiled with `-flto`. Third-party archives compiled without it will still link, but that code won't be optimized across boundaries.
|
||||
|
||||
---
|
||||
|
||||
## Summary & Quick Reference
|
||||
|
||||
**Key concepts**
|
||||
- Linker: resolves symbols, relocates addresses, produces the binary
|
||||
- Symbol resolution order: strong > weak > UNDEF
|
||||
- Name mangling encodes C++ type info into flat symbol names
|
||||
- `extern "C"` disables mangling for C interoperability
|
||||
|
||||
**Essential commands**
|
||||
|
||||
| Command | Purpose |
|
||||
|---|---|
|
||||
| `nm -C -g lib.a` | List exported symbols (demangled) |
|
||||
| `c++filt <sym>` | Demangle a symbol |
|
||||
| `ldd app` | Show shared library dependencies |
|
||||
| `objdump -d obj` | Disassemble object file |
|
||||
| `ar rcs lib.a *.o` | Create a static library |
|
||||
| `readelf -s obj` | ELF symbol table |
|
||||
|
||||
**Build flags**
|
||||
|
||||
| Flag | Effect |
|
||||
|---|---|
|
||||
| `-static` | Link everything statically |
|
||||
| `-fPIC -shared` | Build a position-independent shared library |
|
||||
| `-Wl,--no-undefined` | Fail at link time on unresolved symbols |
|
||||
| `-Wl,-rpath,...` | Embed library search path in binary |
|
||||
| `-flto` | Enable link-time optimization |
|
||||
| `-Wl,--as-needed` | Only link libraries that are actually used |
|
||||
Reference in New Issue
Block a user