Written by:
Last updated: 14 July 2022
Please fill in the feedback form for lecture 9.
Polymorphism is centred on the idea of function dispatch. Whenever we write code that invokes different behaviour depending on the type of an object, we are performing a polymorphic operation. Because we might not necessarily know the type of an object, polymorphic code exhibits different behaviour that depends on other (compile time or run time) factors.
It’s easier to show by example.
struct Expr {
enum Op { PLUS, MINUS, VALUE } op;
Expr* lhs; // Only if `op != VALUE`
Expr* rhs; // Only if `op != VALUE`
int value; // Only if `op == VALUE`
int evaluate() const {
switch (op) { // <--Polymorphism!
case PLUS: // <--type
return lhs->evaluate() //
+ rhs->evaluate();
case MINUS: // <--type
return lhs->evaluate() //
- rhs->evaluate();
case VALUE: // <--type
return value;
}
}
};
struct SortedInts {
// v-- Polymorphism!
bool (*comp)(int, int);
// Note: we can use a template
// instead, which would let us
// use function objects as well
std::vector<int> ints;
void insert(int x) {
// don't @ me for this algo
ints.push_back(x);
std::sort( //
ints.begin(),
ints.end(),
comp);
}
};
template <typename T>
T min(T a, T b) {
// v- Polymorphism!
return a < b ? a : b;
}
int main() {
using std::literals;
// T = int
// < compares integer
min(1, 2);
// T = double,
// < compares double
min(1.0, 2.0);
// T = std::string,
// < compares std::string
min("Hi"s, "mark"s);
// Different assembly is
// emitted in all 3 cases
}
// Polymorphism?
// vv vv
std::cout << "Hello" << "world\n"s;
template <typename T>
void print(T x) { std::cout << x; }
// ^^
// Polymorphism!
// Polymorphism! Polymorphism?
// v~~~~~~~~~~~v v
std::unordered_map<std::string, int> m1;
std::unordered_map<int, int> singtel;
// Polymorphism?
m1.emplace("Hi", 1);
// emplace calls
// `std::hash<std::string>{}("Hi"s)`
// which is some string hashing algo
singtel.emplace(2, 3);
// emplace calls
// `std::hash<int>{}(2)`
// which is the identity function
In this lecture, we cover polymorphism, starting with philosophy, moving on to relatively language-agnostic patterns for obtaining polymorphism, and finally showing the C++ language features that help us achieve polymorphism.
References:
Polymorphism is centred on the idea of dispatch, but there are (at least) two places where dispatch can be done.
The first is during compile time, where the compiler inspects the types of the arguments and decides what implementation to call.
This is also called static dispatch, and here we sometimes qualify the definition further to specify that the compiler is inspecting the static types of the arguments during the dispatch.
Everything we mentioned in the previous lecture more or less falls into this category.
The second is during runtime, where we use the “runtime type” of the arguments to decide what implementation to call. We do this by finding some way to encode the “runtime type” of an object into the object itself.
At runtime, the static types of the arguments have obviously been fixed, so static dispatch, even if it occurred, does not affect what this second round of dispatch will do.
Philosophically, I am rather loose about what runtime information we might inspect in order to determine the “runtime type”, as I think it’s better not to pigeonhole yourself into one mode of thinking. This might be the dynamic type, or some other information entirely, like function pointers or enum values.
// Static dispatch, static types
// decide which impl to call
int min(int, int);
long min(long, long);
// Dispatch happens here
min(1, 2);
// Dynamic dispatch, runtime
// information decides
struct Expr {
void* data_ptr;
int (*evaluate_ptr)(void*);
int evaluate() {
// Dispatch happens here
return evaluate_ptr(data_ptr);
}
};
Within static polymorphism, there is another dimension: ad-hoc vs parametric. The distinction is that parametric polymorphism allows us to write a single implementation that operates identically over a range of types. We know this in C++ as templates, since their bodies are identical (specialisations notwithstanding), but they can be instantiated with different types.
On the other hand, ad-hoc polymorphism is the case where a single polymorphic function can have potentially distinct implementations based on the input types; in C++ we know this as function overloading.
Similarly, we can have either closed or open dynamic polymorphism.
With dynamic polymorphism, the compiler will have fixed some dispatching algorithm at compile time, but this dispatching algorithm will call an “actual” implementation at runtime.
The question is how many different “actual” implementations are possible with such a system. If the number of possible actual implementations is arbitrary, we say that the polymorphism is open. Otherwise, we say it is closed.
struct Expr {
void* data;
int (*evaluate_ptr)(void* data);
int evaluate() {
// Many implementations
return evaluate_ptr(data);
}
};
struct Expr {
enum Op { PLUS, MINUS, VALUE } op;
Expr* lhs;
Expr* rhs;
int value;
int evaluate() {
switch (op) {
// At most 3 implementations
}
}
};
Lastly, C++ also has a little-known kind of polymorphism — keyword polymorphism. This refers to the case where a single keyword has different behaviour depending on where it’s used. Here’s a (probably non-exhaustive) list of them:
requires
noexcept
template
virtual
class
typename
in template parameters
enum
class
typename
static
...
template <typename...>)
void foo(Args...))
void printf(const char*, ...))
->
foo->bar)
auto
As you can see, there are many dimensions to polymorphism :D. C++ is probably the language with the greatest amount of keyword polymorphism (that we know of, at least).
Is this polymorphic?
int add(int a, int b) { return a + b; }
long add(long a, long b) { return a + b; }
add overloads
Most people would say no. And yet, if we invoked add with
unknown types, we require static dispatch to be done before we can see
what implementation is used, which is polymorphism.
template <typename T>
requires std::same_as<int, T> || std::same_as<long, T>
T f(T a, T b) {
add(a, b);
}
add with unknown types
However, the only polymorphic part is the add itself. While
the individual implementations of add wouldn’t be considered
“polymorphic” by most people, the effect of having both overloads means
that it may be used in a context where the type is unknown. In such cases,
the use of add is definitely be considered polymorphic.
Just like how lvalues and lvalue references are different kinds of terminology (value category vs type), there is also a slight nuance here. We might have non-polymorphic code that makes use of polymorphic functions, or polymorphic code that doesn’t use any polymorphic functions.
template <class T>
bool is_int(T) {
return false;
}
template <>
bool is_int(int) {
return true;
}
int my_min(int a, int b) {
return std::min(a, b); //
}
template <typename T>
T my_max(T a, T b) {
return -std::min(-a, -b);
}
All the tools that we covered in the previous lecture (templates,
constexpr, SFINAE, concepts, etc.) can be used to implement
static polymorphism in C++.
Since templates are instantiated at compile-time with constant types and values, they are clearly an example of static polymorphism; at the callsite, the compiler performs the instantiation (deduction and substitution) and calls exactly that function. Specifically, this is a form of parametric polymorphism.
In this section, we’ll talk more specifically about how these tools enable you to write polymorphic code.
println example
Revisiting our println example yet again, we’ll annotate each
place in the code where some kind of polymorphism appears:
template <typename Arg> // <<-- (1) -- vv
void print_one(const char*& fmt, size_t& len, Arg&& arg) {
for (size_t i = 0; i < len; i++) {
if (fmt[i] == '{' && i + 1 < len && fmt[i + 1] == '}') {
// vv -- (2)
std::cout << std::string_view(fmt, i) << arg;
// (3) -- ^^
fmt += (i + 2);
len -= (i + 2);
}
}
}
void println(const char* fmt) {
// vv vv -- (4)
std::cout << fmt << "\n";
}
template <typename... Args> // << vv -- (5)
void println(const char* fmt, Args&&... args) {
auto len = strlen(fmt);
// vv -- (6)
(print_one(fmt, len, std::forward<Args>(args)), ...);
std::cout << "\n";
// ^^ -- (7)
}
int main() {
// vv -- (8)
println("hello, {} #{}", "world", 42);
}
println function from our previous lecture
Let’s go through each site:
print_one is a polymorphic function, since it is templated
operator<<
(which is overloaded)
operator<<), since it varies depending on the type of arg
operator<<
println is a polymorphic function. We can also
consider it “multiply polymorphic”, since it is also polymorphic on the
number of arguments (but still static, since it is known at
compile-time)
println)
As you can see, we’ve covered the different ways that polymorphism can appear in code; in this example, templates were the main mechanism that we used to achieve static polymorphism. In fact, all static polymorphism in C++ must be done with either overloading or templates.
constexpr
Another way we can implement static polymorphism is through
constexpr, in a way that is very similar to our switch-case example above. Rather
than switching on the runtime value of a type variable, we
now switch on the compile-time type of the expression. For example:
struct UnaryOp {};
struct BinaryOp {};
template <typename Expr>
auto evaluate(const Expr& expr) {
if constexpr (std::is_same_v<Expr, UnaryOp>) {
// do some stuff
} else if constexpr (std::is_same_v<Expr, BinaryOp>) {
// do some other stuff
} else {
// do some other other stuff
(void) expr;
}
}
constexpr
Since we can make templated functions but also overload them, we can get a combination of both ad-hoc and parametric polymorphism in C++.
Together with the overload resolution rules we covered last lecture, this lets us have finer-grained control over exactly which function is called given a set of argument types. For example, we can have a set of overloads to handle specific cases, while also having a templated function as a fallback.
Note that explicit function specialisation serves much the same purpose as overloading, and can be thought of as a form of ad-hoc polymorphism as well (since the specialisations are distinct implementations).
References:
Here, we introduce yet another initialism into our ever-growing collection — the Curiously Recurring Template Pattern, or CRTP. The pattern is essentially to make a templated base class, and template it with the derived class, like this:
template <typename Impl>
struct Base {
void foo() {
auto derived = static_cast<Impl*>(this);
derived->implementation();
}
// more methods and stuff
};
struct Derived : Base<Derived> {
void implementation() { //
std::cout << "uwu\n";
}
// more methods and stuff
};
Why would we want to do something like that? Consider what we know from non-virtual inheritance (the kind introduced in Lecture 2):
struct Base {
void foo() {
std::cout << "Base::foo\n";
this->impl();
}
void impl() { //
std::cout << "Base::impl\n";
}
};
struct Derived : Base {
void impl() { //
std::cout << "Derived::impl\n";
}
};
Base b{};
b.foo();
std::cout << "\n";
Derived d{};
d.foo();
$ ./main.out | sed '1,/<1/d;/1>/,$d'
Base::foo
Base::impl
Base::foo
Base::impl
What we might have wanted to see was
Derived::impl in the second call, but we got
Base::impl instead. If we think about it, this is completely
expected; the base class will perform static dispatch, and will
only ever call the base class methods — even if the runtime type of the
object is Derived.
This pattern is common when we have a derived class that needs to override some specific behaviour, but still wants to re-use most of the implementation from the base class (eg. setup/teardown code).
How CRTP works is that we rely on the derived class to provide some (or all) of the actual implementation, while the base class is responsible for either providing the interface, or handling any setup or wrapping that needs to happen.
We make use of
static_cast<T>
here to cast from the base class’s
this pointer
to an instance of the derived class, which is safe as long as a class
always inherits from the base of itself.
Then, we are able use the derived class — call both member and static functions, access fields, etc. This happens entirely at compile-time, since the type of the derived class is known (as a template argument) by the compiler, making CRTP a kind of static polymorphism. At each call site we know exactly what derived type we have, and thus exactly what derived implementation method to call.
You may wonder how the base class can even know about the methods of the derived class, let alone call them. This is possible thanks to the way C++ compiles templates; the body of a templated function is only instantiated when it is called — lazily.
So, as long as the definition of the derived class is available by the time you call the methods, then everything works!
template <typename T>
struct Foo {
void foo() {
std::cout << "hello\n";
}
void poison() {
static_assert(std::is_same_v<char>);
}
};
int main() {
Foo<int> f{};
f.foo();
}
The above code works without issues! Of course, if you actually try to
call poison(), then you get a compile error.
As mentioned above, one of the main reasons that CRTP is employed is to reuse code from the base class while customising some aspect of it, or providing some functionality that the base class requires. For example, we might have code that is designed to download stuff from websites:
template <typename T>
struct Downloader {
void download() {
auto impl = static_cast<T*>(this);
connect_to_website();
impl->authenticate();
do_download();
}
protected:
void send(...) {}
void cookie(...) {}
};
struct Luminus //
: Downloader<Luminus> {
void authenticate() { //
cookie("oauth", "asdf");
}
};
struct Github //
: Downloader<Github> {
void authenticate() {
send("torvalds", "*******");
cookie("my_cookie", 6969);
cookie("csrf", "aaaaaaaa");
}
};
Here, we can factor out the common code (opening the socket, sending HTTP requests, that kind of thing) into the base class. Then, we can use the fact that the base class knows exactly what the derived class is (via the template argument) to call website-specific functionality (like authentication).
Another good example of CRTP used in the interface/code-reuse way is in
Boost’s iterator_facade.
Another instance where CRTP is commonly used is when we want the derived class to be correctly “preserved” when returning from a base class method. Let’s implement our own printing class that makes use of method chaining:
struct MyBadCout {
MyBadCout& print(auto x) {
std::cout << x;
return *this;
}
};
struct ColourfulBadCout : MyBadCout {
ColourfulBadCout& set_colour(int colour) {
std::cout << "\x1b[" << colour << "m";
return *this;
}
};
The problem with this class is that once we call print(), we
can no longer call set_colour() any more:
ColourfulBadCout colr{};
colr.set_colour(1) //
.print("hello"); //
// .set_colour(10); <-- this won't compile!
This is because print() always returns an instance of
MyBadCout, and never anything else. We can solve this with
CRTP, and now our original example will compile:
template <typename T>
struct MyCout {
T& print(auto x) {
std::cout << x;
return static_cast<T&>(*this);
}
};
struct BasicPrinter //
: MyCout<BasicPrinter> {};
struct ColourfulPrinter //
: MyCout<ColourfulPrinter> {
ColourfulPrinter& //
set_colour(int colour) {
std::cout << "\x1b[" << colour << "m";
return *this;
}
};
ColourfulPrinter colr{};
colr.set_colour(1) //
.print("hello")
.set_colour(10)
.print(" world");
Note that we had to create a BasicPrinter here, because
MyCout needs to be instantiated with some class.
Even though CRTP seems quite powerful, there are still some limitations, the main one being that now there are lots of templates in your code, and any user of the base class must now be templated:
template <typename T>
void do_stuff(const Base<T>& obj) {
// ...
}
This is because Base itself isn’t a valid type, and we can
only refer to a specific instantiation of it. This also applies to storage
— it’s now hard to have a container of bases (that can be
any derived type). For instance,
std::vector<Base<A>*> can only store instances of
A and not B, which sort of defeats the purpose.
One way around it is to make a new, “AbstractBase” class that the base CRTP class inherits from:
struct AbstractBase {};
template <typename T>
struct Base : AbstractBase {
// ...
};
Now, we can have std::vector<AbstractBase*>, or pass it
to functions that are not templated. Of course, this doesn’t avoid the
issue entirely, only postpones it. When we want to actually use the
object, we still need to cast it — either directly to the derived type, or
to a
Base<T>.
The main thing to keep in mind is that at the end of the day, all dynamic polymorphism depends on some runtime state in order to execute different code. This can be achieved in many different ways.
We’ll first show examples of closed polymorphism patterns, and later show examples of open polymorphism patterns.
We’ve already shown many examples of this so far.
struct Expr {
enum Op { PLUS, MINUS, VALUE } op;
Expr* lhs; // Only if `op != VALUE`
Expr* rhs; // Only if `op != VALUE`
int value; // Only if `op == VALUE`
int evaluate() const {
switch (op) {
case PLUS:
return lhs->evaluate() //
+ rhs->evaluate();
case MINUS:
return lhs->evaluate() //
- rhs->evaluate();
case VALUE:
return value;
}
}
};
int main() {
Expr a{.op = Expr::VALUE, //
.value = 5};
Expr b{.op = Expr::VALUE, //
.value = 8};
Expr c{.op = Expr::PLUS, //
.lhs = &a,
.rhs = &b};
std::cout << "5 + 8 = " //
<< c.evaluate() //
<< std::endl;
}
We can also use a union in order to save 4 bytes from the struct layout.
struct Expr {
struct BinaryArgs {
Expr* lhs;
Expr* rhs;
};
union {
BinaryArgs args;
int value;
};
enum Op { PLUS, MINUS, VALUE } op;
// ...
int main() {
Expr a{.value = 5, //
.op = Expr::VALUE};
Expr b{.value = 8, //
.op = Expr::VALUE};
Expr c{.args = {&a, &b}, //
.op = Expr::PLUS};
std::cout << "5 + 8 = " //
<< c.evaluate() //
<< std::endl;
}
std::variant
Rather than managing the tagged union ourselves, we could also throw the
problem at the standard library. While we get much better exception safety
and type checking, this comes at a cost of slightly more annoying code,
since we have to use std::visit or std::get and
all that jazz.
template <typename... T>
struct overloaded : T... {
using T::operator()...;
};
template <class... Ts>
overloaded(Ts...) //
->overloaded<Ts...>;
struct Expr {
struct PlusExpr {
Expr* lhs;
Expr* rhs;
};
struct MinusExpr {
Expr* lhs;
Expr* rhs;
};
struct ValueExpr {
int value;
};
std::variant<PlusExpr, //
MinusExpr,
ValueExpr>
expr;
// ... evaluate ...
};
int evaluate() const {
return std::visit( //
overloaded{
//
[](const PlusExpr& expr) {
return //
expr.lhs->evaluate() //
+ expr.rhs->evaluate();
},
[](const MinusExpr& expr) {
return //
expr.lhs->evaluate() //
- expr.rhs->evaluate();
},
[](const ValueExpr& expr) {
return expr.value; //
},
},
expr);
}
int main() {
Expr a{Expr::ValueExpr{5}};
Expr b{Expr::ValueExpr{8}};
Expr c{Expr::PlusExpr{&a, &b}};
std::cout << "5 + 8 = " //
<< c.evaluate() //
<< std::endl;
}
std::variant for closed polymorphism
Since tags can only encode a limited number of enumeration values that are fixed at the point of declaration, they cannot be used to implement open polymorphism.
Similarly, switch case statements are useless because they only have a fixed number of branches.
Instead, we need to use function pointers so that the number of possible
behaviours simply depends on how many functions we can create, which is
unlimited a lot. To give some runtime object a particular
behaviour, we simply need to give it access to this function pointer.
There are many ways this function pointer can be encoded. In general, there are two dimensions to this:
| Internal (in struct) | External (alongside pointer) | |
|---|---|---|
| Directly |
Linux container_ofZig @fieldParentPtr
|
Not really used |
| Through a vtable |
C++
virtual
|
Rust
dyn Trait*Go interface
|
We’ll actually first go through one even more naive pattern first where the data struct and the polymorphic handle are the same struct. We’ll call this the “naive” approach, and it’s a sort of combination of both internal and external. It has the downsides of both, and arguably not even openly polymorphic :P
Another thing we’ll discuss is when we want a particular concrete object to support multiple orthogonal interfaces. This turns out to be very easy in the external case (e.g. Rust and Go), but very tricky in the internal case (e.g. Linux, Zig, and C++).
Finally, throughout these examples, try to be aware of where the code that does the dispatching lives. You will see subtle differences in all the solutions we’re discussing today.
The most basic way to implement dynamic polymorphism is to store function pointers directly in the struct we want to imbue with polymorphic behaviour. We’ll also just use this struct as the handle, which means that the dispatching code lives in the same struct.
Here’s an example (and we will reuse this interface for the rest of the section):
struct Expr {
// Contains the "runtime type"
int (*evaluate_ptr)(Expr* lhs, Expr* rhs, int value);
void (*print_ptr)(Expr* lhs, Expr* rhs, int value);
// Data fields that are needed for correct `Expr`ing
Expr* lhs;
Expr* rhs;
int value;
int evaluate() {
// Dispatcher
return evaluate_ptr(lhs, rhs, value);
}
void print() {
// Dispatcher
print_ptr(lhs, rhs, value);
}
};
While this is certainly powerful enough to create what we want, in order to actually get anything useful out of this struct, we need to populate the function pointer member fields with actual functions that do the behaviour we want.
It’s unreasonable to simply ship this and expect the user of our library to do it all. So we can create some factory methods:
Expr make_plus(Expr* lhs, Expr* rhs) {
return {
[](Expr* lhs, Expr* rhs, int) {
return lhs->evaluate() + rhs->evaluate();
},
[](Expr* lhs, Expr* rhs, int) {
std::cout << '(';
lhs->print();
std::cout << " + ";
rhs->print();
std::cout << ')';
},
lhs,
rhs,
0,
};
}
Expr make_minus(Expr* lhs, Expr* rhs) {
return {
[](Expr* lhs, Expr* rhs, int) {
return lhs->evaluate() - rhs->evaluate();
},
[](Expr* lhs, Expr* rhs, int) {
std::cout << '(';
lhs->print();
std::cout << " - ";
rhs->print();
std::cout << ')';
},
lhs,
rhs,
0,
};
}
Expr make_value(int value) {
return {
[](Expr*, Expr*, int value) { return value; },
[](Expr*, Expr*, int value) { std::cout << value; },
nullptr,
nullptr,
value,
};
}
In order to use this, the user can now do the following:
int main() {
Expr a = make_value(3);
Expr b = make_value(23);
Expr c = make_plus(&a, &b);
c.print();
std::cout << " = " << c.evaluate() //
<< std::endl;
}
$ ./internal-direct-1.out
(3 + 23) = 26
Since the polymorphism is open, if the user is unhappy with the library, they might choose to add new possible behaviours, though they cannot change the actual interface.
Expr make_times(Expr* lhs, Expr* rhs) {
return {
[](Expr* lhs, Expr* rhs, int) {
return lhs->evaluate() * rhs->evaluate();
},
[](Expr* lhs, Expr* rhs, int) {
std::cout << '(';
lhs->print();
std::cout << " * ";
rhs->print();
std::cout << ')';
},
lhs,
rhs,
0,
};
}
int main() {
Expr a = make_value(3);
Expr b = make_value(23);
Expr c = make_times(&a, &b);
c.print();
std::cout << " = " << c.evaluate() //
<< std::endl;
}
$ ./internal-direct-2.out
(3 * 23) = 69
Notice that in our example, we had a total of 6 custom behaviours:
evaluateprintevaluateprintevaluateprint
We can put the pointers to these implementations in
evaluate_ptr and print_ptr as required, but
there’s no reason we would ever want to mix and match:
evaluate_ptr = plus_behaviour_for_evaluate;
print_ptr = minus_behaviour_for_print;
I’m pointing this out because as the size of our interface grows, so does
the number of possible pairings. Suppose we added 2 more functions to our
interface, say take_derivative and simplify, the
number of possible pairings would grow from
32 = 9 to
34 = 81, which is a lot!
Knowing that most pairings are useless, we notice that we only really have as many possible runtime states as the number of distinct expression types. In our case, that’s 3:
Rather than storing the function pointers separately, we can decrease size
of the Expr struct by storing a pointer to the function
pointers. A single pointer is sufficient as we won’t have exponentially
many distinct expression types.
We call the struct of function pointers a virtual table, or vtable.
struct Expr {
struct Vtable {
int (*evaluate_ptr)(Expr* lhs, Expr* rhs, int value);
void (*print_ptr)(Expr* lhs, Expr* rhs, int value);
};
// Expr now only needs 1 pointer instead of 2
// to store dynamic type information
Vtable* vtable_ptr;
Expr* lhs;
Expr* rhs;
int value;
int evaluate() {
return vtable_ptr->evaluate_ptr(lhs, rhs, value);
}
void print() {
vtable_ptr->print_ptr(lhs, rhs, value);
}
};
// Each dynamic type first creates a global vtable struct
// containing behaviour function pointers
Expr::Vtable impl_expr_for_plus{
[](Expr* lhs, Expr* rhs, int) {
return lhs->evaluate() + rhs->evaluate();
},
[](Expr* lhs, Expr* rhs, int) {
std::cout << '(';
lhs->print();
std::cout << " + ";
rhs->print();
std::cout << ')';
},
};
Expr make_plus(Expr* lhs, Expr* rhs) {
// Each instance can just point to the corresponding vtable
return {&impl_expr_for_plus, lhs, rhs, 0};
}
int main() {
Expr a = make_value(3);
Expr b = make_value(23);
Expr c = make_plus(&a, &b);
c.print();
std::cout << " = " << c.evaluate() //
<< std::endl;
}
$ ./internal-vtable-1.out
(3 + 23) = 26
So far, our behaviour functions simply take in as parameters the members
of the Expr struct.
int evaluate_behaviour(Expr* lhs, Expr* rhs, int value);
void print_behaviour(Expr* lhs, Expr* rhs, int value);
This has some downsides. For one, some fields may not be used by some
behaviours, e.g. print for values don’t touch
lhs or rhs, so the caller is passing unnecessary
information to print. Another downside is that the behaviour
functions cannot mutate the Expr struct containing the data
itself, as they’ve all been copied out.
To solve this, we can have our behaviour functions take in a pointer to
the Expr itself, just like a member function. We can add a
new function to our interface that actually mutates the Expr,
called triple.
int evaluate_behaviour(Expr* self);
void print_behaviour(Expr* self);
void triple_behaviour(Expr* self);
Let’s see how this looks like:
struct Expr {
int (*evaluate_ptr)(Expr* self);
void (*print_ptr)(Expr* self);
void (*triple_ptr)(Expr* self);
Expr* lhs;
Expr* rhs;
int value;
int evaluate() {
return evaluate_ptr(this);
}
void print() {
print_ptr(this);
}
void triple() {
triple_ptr(this);
}
};
Expr make_plus(Expr* lhs, Expr* rhs) {
return {
[](Expr* self) {
return self->lhs->evaluate() + self->rhs->evaluate();
},
[](Expr* self) {
std::cout << '(';
self->lhs->print();
std::cout << " + ";
self->rhs->print();
std::cout << ')';
},
[](Expr* self) {
self->lhs->triple();
self->rhs->triple();
},
lhs,
rhs,
0,
};
}
// ...
Expr make_value(int value) {
return {
[](Expr* self) { return self->value; },
[](Expr* self) { std::cout << self->value; },
[](Expr* self) { self->value *= 3; },
nullptr,
nullptr,
value,
};
}
int main() {
Expr a = make_value(3);
Expr b = make_value(23);
Expr c = make_plus(&a, &b);
c.triple();
std::cout << "Tripled: ";
c.print();
std::cout << " = " << c.evaluate() //
<< std::endl;
}
$ ./internal-direct-3.out
Tripled: (9 + 69) = 78
Notice that whenever we created a value Expr, we still had to
fill in the lhs and rhs fields and set them to
nullptr even
though they are unused.
In fact, in every single type of Expr, we were either leaving
value unused or leaving lhs and
rhs unused. That’s a waste of memory!
It would be far better if we could instead only store the fields necessary
for that particular type of Expr.
Earlier, we showed how vtable pointers can optimise the struct layout by pulling the common function pointer pairings out to an external struct.
Similarly, we can pull out the information required for each type of
Expr into external structs, and store a pointer to it in
Expr.
struct PlusExpr {
Expr* lhs;
Expr* rhs;
};
struct MinusExpr {
Expr* lhs;
Expr* rhs;
};
struct ValueExpr {
int value;
};
We can obviously implement their behaviours quite simply, because the types are no longer polymorphic:
struct PlusExpr {
Expr* lhs;
Expr* rhs;
int evaluate() {
return lhs->evaluate() //
+ rhs->evaluate();
}
};
struct ValueExpr {
int value;
int evaluate() {
return value;
}
};
But we get stuck when we try to actually create a PlusExpr of
two ValueExprs:
ValueExpr a{3};
ValueExpr b{23};
PlusExpr c{ ??? };
PlusExpr since it
requires polymorphic Expr* pointers
Well that’s obvious! We created non-polymorphic types when we needed
polymorphic ones. PlusExpr cannot be written with anything
other than a polymorphic type!
We now have to look at Expr itself. We need it to store a
pointer to some unknown type, because we’re trying to implement open
polymorphism. Since no single type will work, we use void*,
which means “pointer to unknown type”.
struct Expr {
int (*evaluate_ptr)(void* self);
void* expr_ptr;
int evaluate() { return evaluate_ptr(this); }
};
We can now create instances of Expr by creating adapters
between the concrete types like PlusExpr to
Expr, and the other way around from void* to
PlusExpr as well.
We’ll also change PlusExpr to simply take an
Expr directly, now that the Expr is really a fat
pointer to some polymorphic Expr type.
struct PlusExpr {
Expr lhs;
Expr rhs;
int evaluate() {
return lhs.evaluate() //
+ rhs.evaluate();
}
Expr toExpr() {
// Adapter to Expr
return {
// When Expr::evaluate is called,
// we have to adapt the void* back to a PlusExpr
[](void* self) { reinterpret_cast<PlusExpr*>(self)->evaluate(); },
reinterpret_cast<void*>(this)};
}
};
ExprNow it works:
int main() {
ValueExpr a{3};
ValueExpr b{23};
PlusExpr c{a.toExpr(), b.toExpr()};
c.print();
std::cout << " = " << c.evaluate() //
<< std::endl;
}
$ ./external-direct-1.out
(3 + 23) = 26
Notice that when we switched to fat pointers, we only really use the
polymorphic Expr type when we need to make use of
polymorphism. PlusExpr needs to take in two handles to
“actual” expressions that are fixed in size so that it can operate on any
kind of Expr. However, when we create stuff like
ValueExpr, we need a concrete instance to exist so that the
data is stored somewhere.
Just as we can use vtables to optimise the struct layout for internally stored function pointers, they also work with externally stored function pointers (fat pointers).
struct Expr {
int (*evaluate_ptr)(void* self);
void (*print_ptr)(void* self);
void* expr_ptr;
int evaluate() {
return evaluate_ptr(expr_ptr);
}
void print() {
print_ptr(expr_ptr);
}
};
struct Expr {
struct Vtable {
int (*evaluate_ptr)(void* self);
void (*print_ptr)(void* self);
};
Vtable* vtable_ptr;
void* expr_ptr;
int evaluate() {
return vtable_ptr->evaluate_ptr(expr_ptr);
}
void print() {
vtable_ptr->print_ptr(expr_ptr);
}
};
Initialising the vtable pointer is the same as before:
struct PlusExpr {
Expr lhs;
Expr rhs;
// ...
static Expr::Vtable impl_expr;
Expr toExpr() {
return {&impl_expr, reinterpret_cast<void*>(this)};
}
};
Expr::Vtable PlusExpr::impl_expr{
[](void* self) {
return reinterpret_cast<PlusExpr*>(self)->evaluate();
},
[](void* self) { reinterpret_cast<PlusExpr*>(self)->print(); },
};
std::function is a fat pointer
std::function
is a polymorphic fat pointer for any callable type.
For example, we might have the following function object:
struct EvaluableValueExpr {
int value;
int operator()() {
return value;
}
};
ValueExpr to use
operator()
instead of evaluate
We can create a std::function from this as follows:
// int() is the type of a function
// that has no parameters and returns int
EvaluableValueExpr a{3};
std::function<int()> a_func{a};
// The type can be inferred thanks to CTAD
std::function a_func2{a};
std::function fat pointer
The way this works is pretty much the same as how we created our
Expr fat pointer. A subtle difference is that
std::function will copy the given callable and own the copy,
whereas our Expr pointer simply stored a (non-owning)
pointer.
We can use this to adapt PlusExpr as well:
struct EvaluablePlusExpr {
std::function<int()> lhs;
std::function<int()> rhs;
int operator()() {
return lhs() + rhs();
}
};
EvaluableValueExpr a{3};
EvaluableValueExpr b{23};
EvaluablePlusExpr c{a, b};
std::cout << "(3 + 23) = " << c() << std::endl;
$ ./function-1.out
(3 + 23) = 26
std::function to adapt
PlusExpr
Here’s a simplified implementation of std::function that does
NOT own the callable, and instead just takes a pointer to it:
std::function
#include <iostream>
#include <utility>
template <typename Fn>
struct SimpleFunction;
template <typename Ret, typename... Args>
struct SimpleFunction<Ret(Args...)> {
void* callable_ptr;
Ret (*fn_ptr)(void*, Args...);
// What gets called when we do simple_func_obj(...)
auto operator()(Args... args) -> decltype(auto) {
// We call the fn_ptr we stored, which takes in the state pointer plus
// any additional arguments, just like a member function
return fn_ptr(callable_ptr, std::forward<Args>(args)...);
}
// But what is fn_ptr initialised to? This is a function that casts the
// type of callable_ptr from void* to the correct Callable* type, so
// that it can be called.
// This is a templated global variable that dispatches the fn_ptr call
// to a call of the correct type for Callables.
template <typename Callable>
static inline Ret (*fn_ptr_for_func_object)(void*, Args...) =
[](void* callable_ptr, Args... args) {
// Here is where we cast the void* back to a Callable*, so that it
// can be called.
return (*static_cast<Callable*>(callable_ptr))(
std::forward<Args>(args)...);
};
// In the constructor, we simply have to store the pointer to the
// callable in callable_ptr, and initialise fn_ptr with our templated
// global variable, instantiated for the Callable type.
template <typename Callable>
SimpleFunction(Callable* c)
: callable_ptr{reinterpret_cast<void*>(c)},
fn_ptr{fn_ptr_for_func_object<Callable>} {}
};
int main() {
int i = 23;
auto mult_i = [&i](int j) { i *= j; };
SimpleFunction<void(int)> mult_func{&mult_i};
mult_func(3);
std::cout << "(23 * 3) = " << i << std::endl;
}
$ ./simple-function.out
(23 * 3) = 69
With fat pointers, it becomes very easy to support multiple interfaces. All we need is more fat pointer types, each with their own interface. Any type that implements that interface can expose a function to cast to that fat pointer type.
For example, while Exprs are both evaluable and printable,
there are other types that are only printable (like
SimpleString), or some types that are only evaluable (like
SimpleFunction).
Let’s model that!
struct Evaluable {
int (*evaluate_ptr)(void* self);
void* data_ptr;
int evaluate() {
return evaluate_ptr(data_ptr);
}
};
struct Printable {
void (*print_ptr)(void* self);
void* data_ptr;
void print() {
print_ptr(data_ptr);
}
};
struct EvaluablePrintable {
int (*evaluate_ptr)(void* self);
void (*print_ptr)(void* self);
void* data_ptr;
int evaluate() {
return evaluate_ptr(data_ptr);
}
void print() {
print_ptr(data_ptr);
}
Evaluable toEvaluable() {
return {evaluate_ptr, data_ptr};
}
Printable toPrintable() {
return {print_ptr, data_ptr};
}
};
Evaluable and Printable fat
pointers, as well as their combination
We can then add the appropriate cast functions to the various
Exprs, as well as to SimpleString and
SimpleFunction (stubbed).
struct PlusExpr {
EvaluablePrintable lhs;
EvaluablePrintable rhs;
// ...
EvaluablePrintable toEvaluablePrintable() {
return {
[](void* self) {
return reinterpret_cast<PlusExpr*>(self)->evaluate();
},
[](void* self) { reinterpret_cast<PlusExpr*>(self)->print(); },
reinterpret_cast<void*>(this),
};
}
Evaluable toEvaluable() {
return toEvaluablePrintable().toEvaluable();
}
Printable toPrintable() {
return toEvaluablePrintable().toPrintable();
}
};
PlusExpr
struct SimpleString {
std::string str;
void print() {
std::cout << str;
}
Printable toPrintable() {
return {
[](void* self) {
reinterpret_cast<SimpleString*>(self)->print();
},
reinterpret_cast<void*>(this),
};
}
};
SimpleString
struct SimpleFunction {
std::function<int()> fn;
int evaluate() {
return fn();
}
Evaluable toEvaluable() {
return {
[](void* self) {
return reinterpret_cast<SimpleFunction*>(self)->evaluate();
},
reinterpret_cast<void*>(this),
};
}
};
SimpleFunction
This allows us to do really cool stuff now, since we can now store vectors of both expressions as well as strings or functions!
int main() {
ValueExpr a{3};
ValueExpr b{23};
PlusExpr c{a.toEvaluablePrintable(), b.toEvaluablePrintable()};
SimpleString s{"wow"};
SimpleFunction f{[]() { return 42; }};
// ,- Being able to do this is the crux of interfaces!
// | This is also called "type erasure"
// v
std::vector<Printable> printables{c.toPrintable(), s.toPrintable()};
std::vector<Evaluable> evaluables{c.toEvaluable(), f.toEvaluable()};
for (auto& p : printables) {
std::cout << "Printing: ";
p.print();
std::cout << std::endl;
}
for (auto& e : evaluables) {
std::cout << "Evaluating: " << e.evaluate() << std::endl;
}
}
$ ./external-multiple-1.out
Printing: (3 + 23)
Printing: wow
Evaluating: 26
Evaluating: 42
We’ve showed how multiple interfaces are possible with externally stored function pointers. The last combination, and the trickiest, is storing multiple interfaces internally.
Before we move on, let’s first solve the problem of supporting the interface pattern, but where we only have one such interface.
Here’s the very first solution for open polymorphism we’ve seen in this lecture.
struct Expr {
int (*evaluate_ptr)(Expr* lhs, Expr* rhs, int value);
void (*print_ptr)(Expr* lhs, Expr* rhs, int value);
Expr* lhs;
Expr* rhs;
int value;
};
Let’s consider how we might separate the
EvaluablePrintable interface from this
Expr class.
To be clear about what we want to achieve, imagine that we created a new
type called TernaryExpr in order to get around the limitation
that Expr supports only up to two sub-expressions.
struct TernaryExpr {
int (*evaluate_ptr)(Expr* arg1, Expr* arg2, Expr* arg3, int value);
void (*print_ptr)(Expr* arg1, Expr* arg2, Expr* arg3, int value);
Expr* arg1;
Expr* arg2;
Expr* arg3;
int value;
};
TernaryExpr struct layout
In order for our existing Exprs to coexist with the new
TernaryExpr, we might want to create a
std::vector containing both such objects, so we can iterate
over them simultaneously. Since Expr and
TernaryExpr have different sizes, let’s settle for a
std::vector of pointers.
We also don’t want to simply use a std::vector of fat
pointers, since we’ve already done that earlier.
std::vector<SomeTypeGoesHere*> exprs_and_new_exprs;
// ...
for (SomeTypeGoesHere* p : exprs_and_new_exprs) {
p->print();
std::cout << " = " << p->evaluate() << std::endl;
}
Exprs and
TernaryExprs at the same time
Whatever p->evaluate() calls, it is some dispatcher code
that needs to call evaluate_ptr of the object. Right now, the
evaluate_ptrs of Expr and
TernaryExpr don’t even have the same types, which means the
dispatcher code, whatever it is, can’t possibly perform the function call
correctly.
We can fix that by using void* pointers to the object rather
than passing the full contents of the object as the argument.
struct Expr {
int (*evaluate_ptr)(void* self);
void (*print_ptr)(void* self);
Expr* lhs;
Expr* rhs;
int value;
};
struct TernaryExpr {
int (*evaluate_ptr)(void* self);
void (*print_ptr)(void* self);
Expr* arg1;
Expr* arg2;
Expr* arg3;
int value;
};
Now, notice that the structs share a common base! Let’s simply factor that out into a base class.
struct EvalPrint {
int(*evaluate_ptr) //
(void* self);
void(*print_ptr) //
(void* self);
};
struct Expr {
EvalPrint fns;
Expr* lhs;
Expr* rhs;
int value;
};
struct TernaryExpr {
EvalPrint fns;
Expr* arg1;
Expr* arg2;
Expr* arg3;
int value;
};
Expr and
TernaryExpr to share a common base
Now, notice that EvalPrint is at offset 0 in both
Expr and TernaryExpr. This means that any
pointer to the EvalPrint sub-object in an
Expr has the same value as a pointer to Expr,
and it means we can do the following:
Expr a = make_...;
EvalPrint* a_interface_ptr = &a.fns;
// This is legal!
Expr* a_ptr = reinterpret_cast<Expr*>(a_interface_ptr);
a_ptr->evaluate();
EvalPrint* to a
Expr*
We now have all the ingredients necessary to build our
std::vector! Let’s put the dispatcher code in the
EvalPrint struct, and get evaluate_ptr for
Expr and TernaryExpr to cast the
EvalPrint* it receives to Expr* and
TernaryExpr* respectively, before performing the call.
struct EvalPrint {
int (*evaluate_ptr)(EvalPrint* self);
void (*print_ptr)(EvalPrint* self);
// For fat pointers, note how we would
// have passed expr_ptr or similar.
// Here, we just pass `this`
int evaluate() {
return evaluate_ptr(this);
}
void print() {
print_ptr(this);
}
};
struct Expr {
EvalPrint fns;
Expr* lhs;
Expr* rhs;
int value;
// Convenience methods
int evaluate() {
return fns.evaluate();
}
void print() {
fns.print();
}
};
struct TernaryExpr {
EvalPrint fns;
Expr* arg1;
Expr* arg2;
Expr* arg3;
int value;
// Convenience methods
int evaluate() {
return fns.evaluate();
}
void print() {
fns.print();
}
};
Expr make_plus(Expr* lhs, Expr* rhs) {
// Note that the old `make_plus` would've accepted an `Expr* self`
// Here, we accept a `EvalPrint*` since the dispatcher code doesn't
// know whether we were an `Expr` or a `TernaryExpr`.
EvalPrint fns{
[](EvalPrint* void_self) {
Expr* self = reinterpret_cast<Expr*>(void_self);
return self->lhs->evaluate() + self->rhs->evaluate();
},
[](EvalPrint* void_self) {
Expr* self = reinterpret_cast<Expr*>(void_self);
std::cout << '(';
self->lhs->print();
std::cout << " + ";
self->rhs->print();
std::cout << ')';
},
};
return {fns, lhs, rhs, 0};
}
So this solves the puzzle of how we can get interface pointers where the
interface is simply embedded at the top of the concrete object. We can
easily convert between an interface pointer and concrete object pointer by
simply using reinterpret_cast.
Now, what about concrete types that support multiple interfaces? Can we do something like the following?
struct Eval {
int (*eval_ptr)(Eval* self);
int evaluate() {
return eval_ptr(this);
}
};
struct Print {
void (*print_ptr)(Print* self);
void print() {
print_ptr(this);
}
};
struct EvalPrint {
Eval e;
Print p;
int evaluate() {
return e.evaluate();
}
void print() {
return p.print();
}
};
struct Expr {
EvalPrint fns;
Expr* lhs;
Expr* rhs;
int value;
int evaluate() {
return fns.evaluate();
}
void print() {
fns.print();
}
};
The problem now is that a pointer to the Print inside an
Expr is not necessarily the same as a pointer to the
Expr itself. So if we were to naively port over our type
casting dispatch handlers, it is going to explode.
Expr make_plus(Expr* lhs, Expr* rhs) {
EvalPrint fns{
{// Note the change in types, this takes in Eval*
[](Eval* eval_self) {
Expr* self = reinterpret_cast<Expr*>(eval_self);
return self->lhs->evaluate() + self->rhs->evaluate();
}},
{// This takes in Print*
[](Print* print_self) {
// Here lies the bug.
// The Print subobject does not live at offset 0!
Expr* self = reinterpret_cast<Expr*>(print_self);
// So at this point, self is an illegal pointer.
std::cout << '(';
self->lhs->print();
std::cout << " + ";
self->rhs->print();
std::cout << ')';
}},
};
return {fns, lhs, rhs, 0};
}
int main() {
Expr a = make_value(3);
Expr b = make_value(23);
Expr c = make_plus(&a, &b);
c.print();
}
$ ./internal-multiple-2-broken.asan
=================================================================
==1337==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffffea78 at pc 0x0000004ceae9 bp 0x7fffffffe850 sp 0x7fffffffe848
READ of size 4 at 0x7fffffffea78 thread T0
#0 0x4ceae8 in make_value(int)::$_3::operator()(Print*) const /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:88:28
#1 0x4ce684 in make_value(int)::$_3::__invoke(Print*) /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:86:8
#2 0x4d0756 in Print::print() /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:18:5
#3 0x4d0718 in EvalPrint::print() /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:33:14
#4 0x4ceb24 in Expr::print() /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:50:9
#5 0x4ce9ad in make_plus(Expr*, Expr*)::$_1::operator()(Print*) const /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:70:21
#6 0x4ce434 in make_plus(Expr*, Expr*)::$_1::__invoke(Print*) /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:64:8
#7 0x4d0756 in Print::print() /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:18:5
#8 0x4d0718 in EvalPrint::print() /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:33:14
#9 0x4ceb24 in Expr::print() /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:50:9
#10 0x4ce814 in main /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:99:5
#11 0x7ffff7b46082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
#12 0x41d37d in _start (/__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.asan+0x41d37d)
Address 0x7fffffffea78 is located in stack of thread T0 at offset 152 in frame
#0 0x4ce69f in main /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:95
This frame has 3 object(s):
[32, 72) 'a' (line 96)
[112, 152) 'b' (line 97) <== Memory access at offset 152 overflows this variable
[192, 232) 'c' (line 98)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /__w/ccc-2/ccc-2/lectures/l10/open/internal-multiple-2-broken.cpp:88:28 in make_value(int)::$_3::operator()(Print*) const
Shadow bytes around the buggy address:
0x10007fff7cf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fff7d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fff7d10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fff7d20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fff7d30: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
=>0x10007fff7d40: 00 00 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00[f2]
0x10007fff7d50: f2 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
0x10007fff7d60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fff7d70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fff7d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fff7d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1337==ABORTING
reinterpret_casting
Print* to Expr* is a bug
However, it turns out to be quite easy to account for the discrepancy by simply subtracting away the offset from the invalid pointer to get back a pointer that is actually valid.
Expr make_plus(Expr* lhs, Expr* rhs) {
EvalPrint fns{
{[](Eval* eval_self) {
// This adjustment subtracts 0,
// but we might as well include it for symmetry
Expr* self = reinterpret_cast<Expr*>(
reinterpret_cast<char*>(eval_self) - offsetof(Expr, fns.e));
return self->lhs->evaluate() + self->rhs->evaluate();
}},
{[](Print* print_self) {
// Here is the magic adjustment
// We first cast it to `char*` so that the pointer arithmetic
// operates on individual bytes.
// After that we have a valid Expr* pointer,
// so we cast it to `Expr*`
Expr* self = reinterpret_cast<Expr*>(
reinterpret_cast<char*>(print_self) - offsetof(Expr, fns.p));
// Now self is a valid pointer
std::cout << '(';
self->lhs->print();
std::cout << " + ";
self->rhs->print();
std::cout << ')';
}},
};
return {fns, lhs, rhs, 0};
}
int main() {
Expr a = make_value(3);
Expr b = make_value(23);
Expr c = make_plus(&a, &b);
c.print();
}
$ ./internal-multiple-3.asan
(3 + 23)
offsetof to compute the difference
between the Print* and Expr* for
Exprs
We’ve covered many different ways to do open dynamic polymorphism in this
lecture. Let’s run through them one last time. In these examples,
Data stores the actual data that the concrete object will
hold. Handles are polymorphic, and the dispatch happens here.
// Most basic
// No data struct. Simply ensure
// Handle has all necessary fields
struct Handle {
fn ptr
fn ptr
data field
data field
};
// Vtable optimisation
struct Handle {
struct Vtable {
fn ptr
fn ptr
};
Vtable* v ptr
data field
data field
};
// Fat pointer
struct Data {
data field
data field
};
struct Handle {
fn ptr
fn ptr
Data*
};
// Fat pointer with vtable
struct Data {
data field
data field
};
struct Handle {
struct Vtable {
fn ptr
fn ptr
};
Vtable*
Data*
};
// Embedded interface
struct Data {
struct Iface {
fn ptr
fn ptr
};
Iface
data field
data field
};
struct Handle {
Iface*
};
// Multiple
// embedded interfaces
struct Data {
struct IfaceA {
fn ptr };
struct iFaceB {
fn ptr };
IfaceA
IfaceB
data field
data field
};
struct HandleA {
IfaceA*
};
struct HandleB {
IfaceB*
};
// Embedded interface
// with vtable
struct Data {
struct Vtable {
fn ptr
fn ptr
};
struct Iface {
Vtable*
}
Iface
data field
data field
};
// Note:
// double pointer :^)
struct Handle {
Iface*
};
virtual:
Built-in vtables for C++
Under the hood, C++
virtual simply
uses the function-pointer-in-vtable pattern to implement virtual methods.
Of course, it also stores the type tag (the dynamic type of the object)
alongside the vtable pointer.
In this section, we cover roughly how
virtual
methods are compiled by showing how they map to the vtable pattern we
showed earlier.
Let’s look at a simple case of using virtual methods. As seasoned OOP developers, we’ll dispense with the classic “animal taxonomy” example, and revisit our expressions from above:
struct Expr {
virtual int evaluate() = 0;
virtual void print() = 0;
};
struct Value : Expr {
Value(int x) : value(x) {}
virtual int evaluate() override {
return value;
}
virtual void print() override {
std::cout << value;
}
int value;
};
struct Plus : Expr {
Plus(Expr* l, Expr* r) //
: lhs(l), rhs(r) {}
virtual int evaluate() override {
return lhs->evaluate() //
+ rhs->evaluate();
}
virtual void print() override {
std::cout << '(';
lhs->print();
std::cout << " + ";
rhs->print();
std::cout << ')';
}
Expr* lhs;
Expr* rhs;
};
We’ve implemented evaluate and print from above,
and they do exactly what you’d expect. To make functions virtual, we
prepend the
virtual
specifier to it, and to override the base implementation, we append
override (in
the same place as we would put
const).
Note that if a method has the same declaration as a virtual function in a
base class, then that method is also
virtual, even if it did not specify the
virtual
keyword. Similarly, it will override base class implementations whether or
not the
override
keyword is present.
Finally, note the use of
= 0
in declaring the evaluate and print methods in
the base Expr class — doing this declares them as
pure virtual functions. If a class has at least one such
function, then it is abstract, and instances of it cannot be
created. If you’re familiar with Java, then these mechanics should not be
surprising.
Let’s run some of this code:
Expr* a = new Value(69);
Expr* b = new Value(420);
Expr* c = new Plus(a, b);
c->print();
std::cout << " = " //
<< c->evaluate() //
<< "\n";
$ ./expr.out | sed '1,/<1/d;/1>/,$d'
(69 + 420) = 489
Even though the static types of a, b,
and c were all Expr, we called the correct
functions at runtime due to dynamic dispatch, which is done
automatically by the compiler thanks to us using
virtual
functions.
When calling a virtual method (eg. expr->evaluate()), the
compiler still does member lookup as usual. However, after lookup
is completed, if the function is virtual, it then inserts code to perform
the dynamic dispatch instead.
This means that, in the following example:
struct Animal {
virtual void eat() { /* ... */ }
};
struct Bird {
virtual void fly() { /* ... */ }
};
// later:
Animal* a = new Bird();
a->fly(); // won't compile!
We can’t call fly() on a, since its
static type is still Animal, and member lookup for a
fly function will fail. However, since
Bird still inherits from Animal, member lookup
will still find the inherited eat virtual function, and
inserts the appropriate dynamic dispatch code.
virtual corresponds to
virtual on a
method roughly corresponds to adding a dispatcher code.
override-ing a
virtual
method corresponds to changing some behaviour in the vtable corresponding
to that virtual, and the pure specifier
= 0
corresponds to having no vtable for that class, making that class
non-constructible.
Our earlier example roughly corresponds to the following “manually written” polymorphism:
struct Expr {
struct Vtable {
int (*evaluate_ptr)(Expr*);
void (*print_ptr)(Expr*);
};
Vtable* vtable_ptr;
Expr() = delete;
protected:
Expr(Vtable* vtable_ptr) : vtable_ptr{vtable_ptr} {}
public:
int evaluate() {
return vtable_ptr->evaluate_ptr(this);
}
void print() {
vtable_ptr->print_ptr(this);
}
};
struct Value : Expr {
int value;
Value(int x) : Expr{&impl_Expr_for_Value}, value(x) {}
static int evaluate_impl(Expr* expr) {
return static_cast<Value*>(expr)->value;
}
static void print_impl(Expr* expr) {
std::cout << static_cast<Value*>(expr)->value;
}
static inline Expr::Vtable impl_Expr_for_Value{
evaluate_impl,
print_impl,
};
};
struct Plus : Expr {
Expr* lhs;
Expr* rhs;
Plus(Expr* l, Expr* r) //
: Expr{&impl_Expr_for_Plus}, lhs(l), rhs(r) {}
static int evaluate_impl(Expr* expr) {
Plus* self = static_cast<Plus*>(expr);
return self->lhs->evaluate() //
+ self->rhs->evaluate();
}
static void print_impl(Expr* expr) {
Plus* self = static_cast<Plus*>(expr);
std::cout << '(';
self->lhs->print();
std::cout << " + ";
self->rhs->print();
std::cout << ')';
}
static inline Expr::Vtable impl_Expr_for_Plus{
evaluate_impl,
print_impl,
};
};
virtual
corresponds to
virtual
In terms of memory layout, a PlusExpr object looks something
like this:
This shouldn’t be surprising, since the vtable implementation of dynamic
dispatch was already discussed above. In this case, since
PlusExpr didn’t introduce any new virtual functions (it only
overrode those from the base class), it does not have a vtable pointer of
its own.
Let’s go back to animal taxonomy for a moment:
struct Animal {
~Animal() {
std::cout << "x_x\n";
}
virtual void eat() {
std::cout << "nom nom\n";
}
virtual void sleep() {
std::cout << "zzz\n";
}
};
struct Bird : Animal {
~Bird() {
std::cout << "splat\n";
}
virtual void eat() override {
std::cout << "gobble\n";
}
virtual void fly() {
std::cout << "wheee\n";
}
};
Animal and Bird classes
In this case, the layout would look something like this:
Here, since Bird introduces a new virtual method
(fly), it also has a vtable pointer. Note that the vtable
impl Animal for Bird points to Animal::sleep(),
since Bird did not override this particular function.
As an optimisation, we can actually collapse the two vtables into one, and
have only one vtable pointer for the entire struct. Since
Bird inherits from Animal, it would make sense
that the vtable for Bird would also “inherit” the vtable for
Animal, ie. contain the same functions.
In fact, all compilers perform this optimisation, so that the actual layout looks like this:
Nothing much has changed, we simply put the vtable for
Animal directly into Bird.
dynamic_cast:
checking the type at runtime
Sometimes, it might be necessary to check whether an object has a specific dynamic type; this can be useful if we only need to filter for one or two classes in the hierarchy and don’t want to make a new method to abstract out this tiny behaviour.
To do that, we can use
dynamic_cast, which is another kind of cast in C++; it allows you to cast between
types in an inheritance hierarchy depending on the
dynamic type of an object. Implementation-wise, it simply checks
the hidden type tag at runtime to see if the target type is the same as
the actual runtime type.
For instance, if we wanted to check if an expression was specifically a
Value, we could do this:
Expr* a = new Value(100);
Expr* b = new Plus(a, a);
auto va = dynamic_cast<Value*>(a);
auto vb = dynamic_cast<Value*>(b);
std::cout << "va = " //
<< (void*) va //
<< "\n";
std::cout << "vb = " //
<< (void*) vb //
<< "\n";
$ ./expr.out | sed '1,/<2/d;/2>/,$d'
va = 0x407310
vb = (nil)
dynamic_cast
It’s pretty simple — if the object indeed has that type, then it returns a pointer to it (note that because of multiple inheritance, the pointer returned is not necessarily the pointer given). Otherwise, it returns a null pointer.
Of course, “OOP experts” would say not to use
dynamic_cast
at all, since we should just rely on dynamic dispatch entirely.
Fortunately, C++ still lets you do it (:
Note that we can also use
static_cast to
perform a downcast in the same way that we used
dynamic_cast. However, you should be sure that the dynamic type of the
object is what you claim it to be. Otherwise, you get undefined behaviour!
Lastly, note that the conversion from a derived class to a base class like this:
Base* b = new Derived();
is allowed implicitly.
Just like in Java, we can declare classes and variables as
final, and they have the same behaviour:
class Foo final { // note the location of 'final'
};
// not allowed! 'Foo' is final
class Bar : Foo {
};
Again,
final on a
method works the same way:
class Foo {
virtual void foo() final;
};
class Bar : Foo {
// not allowed! 'Foo::foo' is final
virtual void foo() override;
};
There’s one glaring flaw with our examples above (both the animals and the expressions), though. What happens when we want to destruct our objects?
Naturally, if we have a Bird object, we’d like to call its
destructor so that it can clean up any resources:
struct Animal {
~Animal() {
std::cout << "x_x\n";
}
};
struct Bird : Animal {
~Bird() {
std::cout << "splat\n";
}
};
Animal* a = new Bird();
a->eat();
a->sleep();
delete a;
$ ./animal.out
gobble
zzz
x_x
Unfortunately, we didn’t call the right destructor! Just like normal functions, destructor calls are also looked up by their static type only. If the compiler doesn’t find a virtual function, then it just performs static dispatch.
Here, since the destructor wasn’t virtual (it was the implicitly declared
compiler-generated one), we only call ~Animal when we delete
it, instead of ~Bird. To fix this, all we need to do is mark
the destructor in the base class as
virtual:
struct Animal {
virtual ~Animal() {
std::cout << "x_x\n";
}
};
// later
Animal* a = new Bird();
a->eat();
a->sleep();
delete a;
$ ./animal2.out
gobble
zzz
splat
x_x
Note that we didn’t need to make the Bird destructor
virtual — this is due to the “implicitly virtual if any base class
functions are virtual” mechanism discussed earlier. Also, we see that
~Bird calls ~Animal, which is completely in-line
with typical destructor behaviour; we just made the initial dispatch
happen dynamically instead of statically.
In short, whenever you’re using any virtual functions, you should almost always have a virtual destructor as well. In fact, compilers can even warn you:
clang++ -g -stdlib=libc++ -std=c++20 -Wpedantic -Wall -Wextra -Wconversion -Wno-unused-variable -Wno-unused-but-set-variable -c -o animal.o animal.cpp
clang++ -g -stdlib=libc++ -std=c++20 -Wpedantic -Wall -Wextra -Wconversion -Wno-unused-variable -Wno-unused-but-set-variable animal.o -o animal.out
animal.cpp:50:3: warning: delete called on non-final 'Animal' that has virtual functions but non-virtual destructor [-Wdelete-non-abstract-non-virtual-dtor]
delete a;
^
1 warning generated.
Everything is rather simple when we’re only dealing with single inheritance, but what if we have multiple inheritance? For example, let’s expand our animal taxonomy:
struct Animal {
int hp;
virtual ~Animal() {
std::cout << "x_x\n";
}
virtual void eat() {
std::cout << "nom nom\n";
}
virtual void sleep() {
std::cout << "zzz\n";
}
};
struct Beaked {
virtual ~Beaked() {}
virtual void peck() {
std::cout << "whoosh\n";
}
};
struct Bird : Animal, Beaked {
~Bird() {
std::cout << "splat\n";
}
virtual void fly() {
std::cout << "wheee\n";
}
virtual void eat() override {
std::cout << "gobble\n";
}
virtual void peck() override {
std::cout << "ouch\n";
}
};
Of course, the code still works as we expect and the output is the same. However, the layout in memory is now quite different:
Note that we added an int field (hp) to the
Animal class, and that base top-offset for our
impl Beaked for Bird is now 16. For multiple inheritance, the
vtable must also store the offset from the base pointer for each subclass;
in this case, the Beaked object really only starts after the
Animal object in a Bird.
We can verify this with
dynamic_cast:
Animal* birb = new Bird();
auto anim = dynamic_cast<Animal*>(birb);
auto beak = dynamic_cast<Beaked*>(birb);
std::cout << "birb = " //
<< (void*) birb << "\n";
std::cout << "anim = " //
<< (void*) birb << "\n";
std::cout << "beak = " //
<< (void*) beak << "\n";
$ ./animal3.out
birb = 0x4062a0
anim = 0x4062a0
beak = 0x4062b0
Beaked is
offset from the address of Bird
Indeed, we see that the address of beak is higher than that
of birb. As for why it’s 16 and not just 4, this is due to
alignment, as well as the extra typeinfo field (which is
essentially the type tag). Either way, we do see that the address of the
Beaked object is not the same as the
Animal (which is the same as Bird).
The actual implementation of the “base top-offset” varies, but one method
is to have vtable point to a thunk (small function), which first
performs the offset to
this, and then calls the actual implementation.
Alternatively, we can implement the base-to-top offset in our dispatch handler like so:
struct Expr {
struct Vtable {
ptrdiff_t base_top_offset;
int (*evaluate_ptr)(Expr*);
void (*print_ptr)(Expr*);
};
Vtable* vtable_ptr;
Expr() = delete;
protected:
Expr(Vtable* vtable_ptr) : vtable_ptr{vtable_ptr} {}
public:
int evaluate() {
return vtable_ptr->evaluate_ptr(this);
}
void print() {
vtable_ptr->print_ptr(this);
}
};
struct Value : Expr {
int value;
Value(int x) : Expr{&impl_Expr_for_Value}, value(x) {}
static int evaluate_impl(Expr* expr) {
Value* self =
reinterpret_cast<Value*>(reinterpret_cast<char*>(expr) -
expr->vtable_ptr->base_top_offset);
return self->value;
}
static void print_impl(Expr* expr) {
Value* self =
reinterpret_cast<Value*>(reinterpret_cast<char*>(expr) -
expr->vtable_ptr->base_top_offset);
std::cout << self->value;
}
static inline Expr::Vtable impl_Expr_for_Value{
0,
evaluate_impl,
print_impl,
};
};
We used
dynamic_cast
above, but
static_cast
would have also worked, and it does perform the correct address offsetting
in multiple-inheritance scenarios:
Bird* birb = new Bird();
auto beak1 = static_cast<Beaked*>(birb);
auto beak2 = (Beaked*) birb;
std::cout << "birb = " //
<< (void*) birb << "\n";
std::cout << "beak1 = " //
<< (void*) beak1 << "\n";
std::cout << "beak2 = " //
<< (void*) beak2 << "\n";
$ ./animal4.out
birb = 0x4062a0
beak1 = 0x4062b0
beak2 = 0x4062b0
static_cast
also performs the offset computation
As an aside, C-style casts (like
(Beaked*) birb) will first try
static_cast
before
reinterpret_cast, so in this case it does the correct thing. If we explicitly use
reinterpret_cast
however, we don’t get the automatic offset computation!
Bird* birb = new Bird();
auto beak = reinterpret_cast<Beaked*>(birb);
std::cout << "birb = " //
<< (void*) birb << "\n";
std::cout << "beak = " //
<< (void*) beak << "\n";
$ ./animal5.out
birb = 0x4062a0
beak = 0x4062a0
reinterpret_cast
does not do offset computation!
Thankfully though, the compiler helpfully warns us:
clang++ -g -stdlib=libc++ -std=c++20 -Wpedantic -Wall -Wextra -Wconversion -Wno-unused-variable -Wno-unused-but-set-variable -c -o animal5.o animal5.cpp
clang++ -g -stdlib=libc++ -std=c++20 -Wpedantic -Wall -Wextra -Wconversion -Wno-unused-variable -Wno-unused-but-set-variable animal5.o -o animal5.out
animal5.cpp:50:15: warning: 'reinterpret_cast' from class 'Bird *' to its base at non-zero offset 'Beaked *' behaves differently from 'static_cast' [-Wreinterpret-base-class]
auto beak = reinterpret_cast<Beaked*>(birb);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
animal5.cpp:50:15: note: use 'static_cast' to adjust the pointer correctly while upcasting
auto beak = reinterpret_cast<Beaked*>(birb);
^~~~~~~~~~~~~~~~
static_cast
1 warning generated.
reinterpret_cast
in this case
Since C++ allows for multiple inheritance, we can run into the “dreaded” diamond inheritance problem:
struct Widget {
virtual void event() {
std::cout << "boop\n";
};
};
struct Image : Widget {};
struct Button : Widget {
virtual void event() override {
std::cout << "clicked\n";
};
};
struct ImageButton : Image, Button {};
In the memory layout diagram on the right (vtable pointers omitted), we
see that there’s two instances of the Widget class in an
ImageObject — one belonging to the Image base
class, and one belonging to Button.
This is clearly suboptimal, especially if Widget has many
fields. Even worse, ImageButton becomes a pain to use:
clang++ -g -stdlib=libc++ -std=c++20 -Wpedantic -Wall -Wextra -Wconversion -Wno-unused-variable -Wno-unused-but-set-variable -c -o widget-broken.o widget-broken.cpp
widget-broken.cpp:24:7: error: member 'event' found in multiple base classes of different types
ib->event();
^
widget-broken.cpp:6:16: note: member found by ambiguous name lookup
virtual void event() {
^
widget-broken.cpp:14:16: note: member found by ambiguous name lookup
virtual void event() override {
^
1 error generated.
make: *** [Makefile:25: widget-broken.o] Error 1
There’s two ways around it; either we perform a cast to the appropriate base class that we want to call:
auto ib = new ImageButton();
static_cast<Image*>(ib)->event();
static_cast<Button*>(ib)->event();
$ ./wig.out | sed '1,/<1/d;/1>/,$d'
boop
clicked
Or we explicitly qualify the function call:
auto ib = new ImageButton();
ib->Image::event();
ib->Button::event();
$ ./wig.out | sed '1,/<2/d;/2>/,$d'
boop
clicked
Either way, this is not very nice to use. Enter virtual inheritance, where we simply make Image and Button inherit from Widget virtually:
struct Image : virtual Widget {};
struct Button : virtual Widget {
virtual void event() override {
std::cout << "clicked\n";
};
};
auto ib = new ImageButton();
ib->event();
$ ./wig2.out | sed '1,/<1/d;/1>/,$d'
clicked
Virtual inheritance ensures that there’s only one instance of the base
class (in this case, Widget) per object. Since there’s now no
longer any ambiguity, we can also just call event() without
qualification or casting, and it’ll do the right thing.
Note that we had to add
virtual to the
classes inheriting from Widget — not the top-level class. In
order to ensure that really only one instance of
Widget will exist in your class hierarchy, every class
deriving from Widget should use virtual inheritance.
The layout and implementation of virtual inheritance is quite complex, and we won’t go into it too much (and most of the time, it’s not necessary to know the full details). For starters, the layout now looks something like this:
Notice that there’s now only one copy of Widget in the
object. Also, there are two new pointers in Image and
Button — these tell it the location of its base class (they
both inherit Widget). This is necessary because it is
impossible to know the exact offset at compile-time. It’s called a virtual
base pointer because… it points to the virtual base class, though these
are commonly stored as offsets instead.
Omitted from the diagram are also other offsets, notably in each base class that tells the offset to the start of the entire object, which is required for some pointer operations.
Since virtual methods require an extra level of indirection that cannot be resolved at compile-time, they generally have a non-zero performance penalty. Furthermore, they can’t be inlined (again, because in general we do not know the target at compile-time), so an optimisation opportunity is lost.
This can materialise as instruction cache misses or branch mis-predictions, or it can be a complete non-issue if the compiler was able to sufficiently optimise the code. As with all performance-sensitive code though, please benchmark your own use-case first!
Since virtual methods are so widespread (not just in C++, but also in other languages like Java), compiler designers have come up with many novel ways to optimise them, and you should not abandon virtual dispatch just because it might be slow.
Reference: Eli Bendersky - The cost of dynamic (virtual calls) vs. static (CRTP) dispatch in C++
We won’t repeat the article above, but the TL;DR is that because everything is known at compile-time for CRTP, the compiler is able to optimise a lot better — mostly by inlining code. The additional cost of an indirect call, especially in a tight loop, is very high.
TODO: Directly stored fptrs have less indirections required, but vtable pointers are more space efficient (if there is more than one fptr).
Reference: Arthur O’Dwyer - When can the C++ compiler devirtualise a call?
In some situations, the compiler can devirtualise a virtual function call — that is, transform it into a normal function call that doesn’t go through the vtable and doesn’t depend on the dynamic type of the object; we’ll cover some cases here.
The first (and simplest) case is when the compiler knows exactly what the dynamic type of the object is, like in these cases:
// this case is trivial
Derived d;
d.foo();
// a sufficiently smart compiler can also detect this case
Base* b = new Derived();
b->foo();
// it needs to be a little smarter to still handle this case
Derived d1, d2;
Base* b = rand() % 2 ? &d1 : &d2;
b->foo();
Another case is when the compiler knows that a given class is a “leaf” class, or that no other class derives from it. One easy way is when the method is marked final:
struct Derived : Base {
void foo() final {
// ...
}
};
Derived* d = ...;
d->foo();
Another more esoteric case is when a class has internal linkage — if nobody outside of the translation unit can even name the class, then of course nobody else can derive from it.
Reference/case study: allocgate (zig)
When comparing between internally stored function pointers (embedded interfaces) and vtable pointers (either internal like C++, or externally as fat pointers like Rust), the primary concern is usually object size — copying around the list of function pointers (especially if there’s many) can be slow.
However, a large factor is also performance, specifically the ability for the compiler to devirtualise the virtual call. As mentioned above, compiler authors have come up with many techniques and heuristics to detect and optimise virtual calls, but of course they can only detect a certain “shape” of usage.
In Zig’s case above, the LLVM backend was unable to optimise the embedded interface pointers, whereas it was able to devirtualise the fat pointer implementation a lot better.
© 14 July 2022, Thomas Tan, Ng Zhia Yang, All Rights Reserved
^