Written by:
Last updated: 24 June 2022
Please fill in the feedback form for lecture 3 here.
std::vector usage
// vectors.cpp
#include <iostream>
#include <vector>
// just skip this thing, it's not that important
template <typename Container>
struct Printer {
char Open;
char Close;
char Separator;
const Container& container;
friend std::ostream& operator<<(std::ostream& os, const Printer& p) {
os << p.Open;
for (auto it = p.container.begin(); it != p.container.end(); ++it) {
if (it != p.container.begin()) {
os << p.Separator << ' ';
}
os << *it;
}
os << p.Close;
return os;
}
};
template <typename Container>
Printer<Container> print(const Container& cont,
char open = '[',
char close = ']',
char sep = ',') {
return Printer<Container>{open, close, sep, cont};
}
// start reading from here:
int main() {
std::vector<int> xs{1, 2, 3};
std::cout << "xs = " << print(xs) << "\n";
std::cout << "xs.size() = " //
<< xs.size() << "\n";
std::cout << "x[0] = " //
<< xs[0] << "\n";
std::cout << "x[1] = " //
<< xs[1] << "\n";
xs.push_back(10);
xs.push_back(20);
std::cout << "xs.size() = " //
<< xs.size() << "\n";
xs.pop_back();
std::cout << "xs = " << print(xs) << "\n";
std::vector<int> ys{};
ys = xs;
std::cout << "ys = " << print(ys) << "\n";
#if 0
std::cout << xs[99] << "\n";
std::cout << xs.at(99) << "\n";
#endif
{
std::vector<int> default_constructed;
std::vector<int> initialiser_list{1, 2, 3, 4};
std::vector<int> copy_constructed{initialiser_list};
std::cout << "default = " << print(default_constructed) << "\n";
std::cout << "init_list = " << print(initialiser_list) << "\n";
std::cout << "copy_ctor = " << print(copy_constructed) << "\n";
}
{
std::vector<int> a{1, 2, 3};
std::vector<int> b{4, 5, 6};
std::cout << "a = " //
<< print(a) //
<< "\n";
std::cout << "b = " //
<< print(b) //
<< "\n";
a.swap(b);
std::cout << "a = " //
<< print(a) //
<< "\n";
std::cout << "b = " //
<< print(b) //
<< "\n";
b.swap(a);
std::cout << "a = " //
<< print(a) //
<< "\n";
std::cout << "b = " //
<< print(b) //
<< "\n";
}
}
One of the most useful standard container types that you will use in C++
is the humble std::vector, which is basically a
dynamically-resizeable array — you might know this as lists in Python,
ArrayList in Java, etc.
Since the full implementation of std::vector is quite
involved, we’ll cover only the basics for now, and use our own
SimpleVector as a surrogate for explaining how containers and
(basic) templates work.
C++ templates are syntactically (superficially) similar to Java generics
(if you’re familiar with those); to instantiate a templated type
like std::vector, simply put angle brackets after the type
name, and the type you want to instantiate with within. For example, to
make a vector of integers, you can do this:
std::vector<int> xs{1, 2, 3};
std::cout << "xs = " << print(xs) << "\n";
$ ./vectors.out | head -n1
xs = [1, 2, 3]
std::vector usage
You can think of templated classes as a sort of higher-kinded type — by
passing in arguments (in this case, we passed in
int), you can create a new type (in this case,
std::vector<int>). Note that
std::vector
(or any templated class) on its own is not a valid type; all template
parameters must be provided1.
ArrayList<int>
for instance, you can’t!). In fact, C++ even supports non-type template
parameters (aka values) as template arguments — making them sort of like
compile-time arguments.
If you’re familiar with Java, note that C++ templates do not perform type erasure — there will be multiple copies of the function present in the final executable, one for each instantiation that appears in the program. This means that there are no performance drawbacks involved with type-checking/casting at runtime, since each function instantiation is specific for that type.
As a result, templated things (functions, types) are usually quite efficient, since they tend to give the compiler more opportunities to optimise things (given that they are now known at compile-time). The only (potential) drawback here is executable size.
std::vector
Now that we have introduced how to create a
std::vector<int>
object, it’s time to start using it. std::vector supports
most of the basic things that we expect a dynamic array to support:
std::cout << "xs.size() = " //
<< xs.size() << "\n";
std::cout << "x[0] = " //
<< xs[0] << "\n";
std::cout << "x[1] = " //
<< xs[1] << "\n";
xs.push_back(10);
xs.push_back(20);
std::cout << "xs.size() = " //
<< xs.size() << "\n";
xs.pop_back();
std::cout << "xs = " << print(xs) << "\n";
std::vector<int> ys{};
ys = xs;
std::cout << "ys = " << print(ys) << "\n";
$ ./vectors.out | head -n7
xs = [1, 2, 3]
xs.size() = 3
x[0] = 1
x[1] = 2
xs.size() = 5
xs = [1, 2, 3, 10]
ys = [1, 2, 3, 10]
std::vector usage
As we can see above:
.size() can be used to query the number of elements in the
vector
.push_back() can be used to add items to the end of the
vector
.pop_back() can be used to remove elements from the end of
the vector
[] can be used to access elements by index (the operator is
overloaded)
= can be used to assign the left-hand side a copy of the
right-hand side
Note that removing items at arbitrary indices requires knowledge of iterators, which we’ll cover later :D
std::vector
Note that
operator[]
does not perform bounds checking by default (even in debug builds,
unless you’re using MSVC). If we tried to access say the 100th element
in xs, we would invoke undefined behaviour; it might crash,
but it might also give a garbage value:
std::cout << xs[99] << "\n";
$ ./vectors.out
xs[99] = 0
On the other hand, if we used .at(99) instead, we get an
exception:
std::cout << xs.at(99) << "\n";
$ ./vectors.out
libc++abi: terminating with uncaught exception of
type std::out_of_range: vector
make: *** [vectors.out.run] Abort trap: 6
You may be wondering what the implementation of
print() looks like, since trying to use
std::cout <<
xs
directly won’t work. Well, the implementation is quite involved (and
requires knowledge of iterators), but we’ll just include it here for
reference:
template <typename Container>
struct Printer {
char Open;
char Close;
char Separator;
const Container& container;
friend std::ostream& operator<<(std::ostream& os, const Printer& p) {
os << p.Open;
for (auto it = p.container.begin(); it != p.container.end(); ++it) {
if (it != p.container.begin()) {
os << p.Separator << ' ';
}
os << *it;
}
os << p.Close;
return os;
}
};
template <typename Container>
Printer<Container> print(const Container& cont,
char open = '[',
char close = ']',
char sep = ',') {
return Printer<Container>{open, close, sep, cont};
}
The idea is basically to accept any type in our
print function, and the
operator<<
in the printer expects the container to be iterable (which STL
containers are). It also expects the container’s element type to support
operator<<, which for our case is usually true.
std::vector constructors
You might have noticed that we constructed our xs vector and
gave it the 3 elements directly; this is one of the constructor overloads
for std::vector. We also constructed ys using
the default constructor. We’ll cover some of the useful ones here:
std::vector<int> default_constructed;
std::vector<int> initialiser_list{1, 2, 3, 4};
std::vector<int> copy_constructed{initialiser_list};
std::vector
We can verify that these constructors do what they’re supposed to:
std::cout << "default = " << print(default_constructed) << "\n";
std::cout << "init_list = " << print(initialiser_list) << "\n";
std::cout << "copy_ctor = " << print(copy_constructed) << "\n";
$ ./vectors.out | head -n10 | tail -n3
default = []
init_list = [1, 2, 3, 4]
copy_ctor = [1, 2, 3, 4]
One related function (which we’ll need later) is swap. As the name
suggests, it swaps the left vector (the one that .swap() is
called on) with the argument passed to it. Suppose we start with these:
std::vector<int> a{1, 2, 3};
std::vector<int> b{4, 5, 6};
std::cout << "a = " //
<< print(a) //
<< "\n";
std::cout << "b = " //
<< print(b) //
<< "\n";
$ ./vectors.out | head -n12 | tail -n2
a = [1, 2, 3]
b = [4, 5, 6]
After we swap them, as expected, they swap:
a.swap(b);
std::cout << "a = " //
<< print(a) //
<< "\n";
std::cout << "b = " //
<< print(b) //
<< "\n";
$ ./vectors.out | head -n14 | tail -n2
a = [4, 5, 6]
b = [1, 2, 3]
Lastly, we can swap them back (but using b.swap(a) instead):
b.swap(a);
std::cout << "a = " //
<< print(a) //
<< "\n";
std::cout << "b = " //
<< print(b) //
<< "\n";
$ ./vectors.out | head -n16 | tail -n2
a = [1, 2, 3]
b = [4, 5, 6]
As expected, swapping behaves very intuitively. Note that instead of doing
a.swap(b), we can also do
std::swap(a,
b)
— which does the same thing.
std::vector layout
std::vector uses a layer of indirection, so the contained
elements are not stored directly within the vector object itself
(otherwise it would not be resizeable). The buffer itself lives on the
heap, and elements are contiguous in the buffer, just like normal array.
Taking that into account, the layout of std::vector looks
something like this:
// simple-vector.cpp
// NOTE: SimpleVector1 and SimpleVector2 don't handle all the edge cases,
// so use SimpleVector for reference.
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <initializer_list>
#include <iostream>
#include <iterator>
#include <ostream>
#include <utility>
template <typename ElemTy>
struct SimpleVector1 {
private:
ElemTy* m_buffer;
size_t m_size;
public:
SimpleVector1() : m_buffer(nullptr), m_size(0) {}
SimpleVector1(const SimpleVector1& other) : m_size(other.m_size) {
m_buffer = new ElemTy[m_size];
for (size_t i = 0; i < m_size; i++) {
m_buffer[i] = other.m_buffer[i];
}
}
SimpleVector1(std::initializer_list<ElemTy> xs) : m_size(xs.size()) {
m_buffer = new ElemTy[m_size];
for (size_t i = 0; i < xs.size(); i++) {
m_buffer[i] = std::data(xs)[i];
}
}
~SimpleVector1() {
delete[] m_buffer;
}
// note: broken
SimpleVector1& operator=(const SimpleVector1& other) {
delete[] m_buffer;
m_buffer = new ElemTy[other.m_size];
m_size = other.m_size;
for (size_t i = 0; i < m_size; i++) {
m_buffer[i] = other.m_buffer[i];
}
return *this;
}
void push_back(const ElemTy& element) {
ElemTy* new_buffer = new ElemTy[m_size + 1];
for (size_t i = 0; i < m_size; i++) {
new_buffer[i] = m_buffer[i];
}
new_buffer[m_size] = element;
m_size += 1;
delete[] m_buffer;
m_buffer = new_buffer;
}
ElemTy pop_back() {
assert(m_size > 0);
ElemTy* new_buffer = new ElemTy[m_size - 1];
for (size_t i = 0; i < m_size - 1; i++) {
new_buffer[i] = m_buffer[i];
}
ElemTy last_elem = m_buffer[m_size - 1];
m_size -= 1;
delete[] m_buffer;
m_buffer = new_buffer;
return last_elem;
}
size_t size() const {
return m_size;
}
ElemTy& operator[](size_t idx) {
return m_buffer[idx];
}
const ElemTy& operator[](size_t idx) const {
return m_buffer[idx];
}
void swap(SimpleVector1& other) {
std::swap(m_buffer, other.m_buffer);
std::swap(m_size, other.m_size);
}
friend void swap(SimpleVector1& a, SimpleVector1& b) {
a.swap(b);
}
friend std::ostream& operator<<(std::ostream& os, SimpleVector1& vec) {
os << "[";
for (size_t i = 0; i < vec.size(); i++) {
if (i > 0) {
os << ", ";
}
os << vec[i];
}
os << "]";
return os;
}
// the stuff below lets us conform to the iterator API, which will
// be covered a little later in this lecture.
using value_type = ElemTy;
using reference = ElemTy&;
using const_reference = const ElemTy&;
using iterator = ElemTy*;
using const_iterator = const ElemTy*;
using difference_type = std::ptrdiff_t;
using size_type = std::size_t;
iterator begin() {
return m_buffer;
}
iterator end() {
return m_buffer + m_size;
}
const_iterator begin() const {
return m_buffer;
}
const_iterator end() const {
return m_buffer + m_size;
}
const_iterator cbegin() const {
return m_buffer;
}
const_iterator cend() const {
return m_buffer + m_size;
}
};
// --------------------------------------------------------
template <typename ElemTy>
struct SimpleVector2 {
private:
ElemTy* m_buffer;
size_t m_size;
public:
SimpleVector2() : m_buffer(nullptr), m_size(0) {}
SimpleVector2(const SimpleVector2& other) : m_size(other.m_size) {
m_buffer = new ElemTy[m_size];
for (size_t i = 0; i < m_size; i++) {
m_buffer[i] = other.m_buffer[i];
}
}
SimpleVector2(std::initializer_list<ElemTy> init_list)
: m_size(init_list.size()) {
m_buffer = new ElemTy[m_size];
for (size_t i = 0; i < init_list.size(); i++) {
m_buffer[i] = std::data(init_list)[i];
}
}
~SimpleVector2() {
delete[] m_buffer;
}
// note: (still) broken
SimpleVector2& operator=(const SimpleVector2& other) {
if (this == &other) {
return *this;
}
delete[] m_buffer;
m_buffer = new ElemTy[other.m_size];
m_size = other.m_size;
for (size_t i = 0; i < m_size; i++) {
m_buffer[i] = other.m_buffer[i];
}
return *this;
}
void push_back(const ElemTy& element) {
ElemTy* new_buffer = new ElemTy[m_size + 1];
for (size_t i = 0; i < m_size; i++) {
new_buffer[i] = m_buffer[i];
}
new_buffer[m_size] = element;
m_size += 1;
delete[] m_buffer;
m_buffer = new_buffer;
}
ElemTy pop_back() {
assert(m_size > 0);
ElemTy* new_buffer = new ElemTy[m_size - 1];
for (size_t i = 0; i < m_size - 1; i++) {
new_buffer[i] = m_buffer[i];
}
ElemTy last_elem = m_buffer[m_size - 1];
m_size -= 1;
delete[] m_buffer;
m_buffer = new_buffer;
return last_elem;
}
size_t size() const {
return m_size;
}
ElemTy& operator[](size_t idx) {
return m_buffer[idx];
}
const ElemTy& operator[](size_t idx) const {
return m_buffer[idx];
}
void swap(SimpleVector2& other) {
std::swap(m_buffer, other.m_buffer);
std::swap(m_size, other.m_size);
}
friend void swap(SimpleVector2& a, SimpleVector2& b) {
a.swap(b);
}
friend std::ostream& operator<<(std::ostream& os, SimpleVector2& vec) {
os << "[";
for (size_t i = 0; i < vec.size(); i++) {
if (i > 0) {
os << ", ";
}
os << vec[i];
}
os << "]";
return os;
}
using value_type = ElemTy;
using reference = ElemTy&;
using const_reference = const ElemTy&;
using iterator = ElemTy*;
using const_iterator = const ElemTy*;
using difference_type = std::ptrdiff_t;
using size_type = std::size_t;
iterator begin() {
return m_buffer;
}
iterator end() {
return m_buffer + m_size;
}
const_iterator begin() const {
return m_buffer;
}
const_iterator end() const {
return m_buffer + m_size;
}
const_iterator cbegin() const {
return m_buffer;
}
const_iterator cend() const {
return m_buffer + m_size;
}
};
// --------------------------------------------------------
template <typename ElemTy>
struct SimpleVector {
private:
ElemTy* m_buffer;
size_t m_size;
public:
SimpleVector() : m_buffer(nullptr), m_size(0) {}
SimpleVector(const SimpleVector& other) : m_size(other.m_size) {
m_buffer = new ElemTy[m_size];
for (size_t i = 0; i < m_size; i++) {
m_buffer[i] = other.m_buffer[i];
}
}
SimpleVector(std::initializer_list<ElemTy> init_list)
: m_size(init_list.size()) {
m_buffer = new ElemTy[m_size];
for (size_t i = 0; i < init_list.size(); i++) {
m_buffer[i] = std::data(init_list)[i];
}
}
~SimpleVector() {
delete[] m_buffer;
}
// note: correct (finally)
SimpleVector& operator=(const SimpleVector& other) {
SimpleVector copy{other};
this->swap(copy);
return *this;
}
#if 0
SimpleVector& operator=(SimpleVector copy) {
this->swap(copy);
return *this;
}
#endif
void push_back(const ElemTy& element) {
ElemTy* new_buffer = new ElemTy[m_size + 1];
for (size_t i = 0; i < m_size; i++) {
new_buffer[i] = m_buffer[i];
}
new_buffer[m_size] = element;
m_size += 1;
delete[] m_buffer;
m_buffer = new_buffer;
}
ElemTy pop_back() {
assert(m_size > 0);
ElemTy* new_buffer = new ElemTy[m_size - 1];
for (size_t i = 0; i < m_size - 1; i++) {
new_buffer[i] = m_buffer[i];
}
ElemTy last_elem = m_buffer[m_size - 1];
m_size -= 1;
delete[] m_buffer;
m_buffer = new_buffer;
return last_elem;
}
size_t size() const {
return m_size;
}
ElemTy& operator[](size_t idx) {
return m_buffer[idx];
}
const ElemTy& operator[](size_t idx) const {
return m_buffer[idx];
}
void swap(SimpleVector& other) {
std::swap(m_buffer, other.m_buffer);
std::swap(m_size, other.m_size);
}
friend void swap(SimpleVector& a, SimpleVector& b) {
a.swap(b);
}
friend std::ostream& operator<<(std::ostream& os, SimpleVector& vec) {
os << "[";
for (size_t i = 0; i < vec.size(); i++) {
if (i > 0) {
os << ", ";
}
os << vec[i];
}
os << "]";
return os;
}
using value_type = ElemTy;
using reference = ElemTy&;
using const_reference = const ElemTy&;
using iterator = ElemTy*;
using const_iterator = const ElemTy*;
using difference_type = std::ptrdiff_t;
using size_type = std::size_t;
iterator begin() {
return m_buffer;
}
iterator end() {
return m_buffer + m_size;
}
const_iterator begin() const {
return m_buffer;
}
const_iterator end() const {
return m_buffer + m_size;
}
const_iterator cbegin() const {
return m_buffer;
}
const_iterator cend() const {
return m_buffer + m_size;
}
};
int main() {
SimpleVector1<int> sv1{1, 2, 3, 4};
sv1 = sv1;
std::cout << "sv1[0] = " << sv1[0] << "\n";
SimpleVector2<int> sv2{1, 2, 3, 4};
sv2 = sv2;
std::cout << "sv2[0] = " << sv2[0] << "\n";
struct Nest {
SimpleVector2<Nest> nests;
};
Nest nested{{Nest{{Nest{}, Nest{}}}}};
nested = nested.nests[0];
}
Now that we’ve seen what some of the basic functionality of
std::vector is, we can try to replicate that with our own
implementation, which we’ll unimaginatively call
SimpleVector.
Because we haven’t covered some of the advanced concepts yet, our vector
cannot actually fulfil all the duties of a std::vector, but
it’ll suffice for illustrative purposes. We also won’t implement anything
smart — it will have no “excess capacity” and always resize on each
append, and performs too many copies — but we’ll fix those in due time.
Before we get to any of the methods in SimpleVector, we need
to start with its declaration. How do we write a templated class?
(somewhat) simple:
template <typename ElemTy>
struct SimpleVector1 {
A template declaration (or definition) starts with the
template
keyword, followed by a pair of < and >.
Within those angle brackets, one or more template parameters can be
declared. Here, we used
typename ElemTy
to say that we accept one parameter that is a type (hence
typename)2, and the name of that parameter is ElemTy.
We then follow that with a normal struct definition.
We start with the members of the vector. For the most simple use case, we just need a pointer (to our array) and a size (the number of elements):
private:
ElemTy* m_buffer;
size_t m_size;
SimpleVector
Since we declared ElemTy as a template parameter, we can use
it inside the struct definition; here we made a pointer to
ElemTy. Once our SimpleVector is instantiated,
ElemTy will be replaced with the type we passed in the
template arguments, eg.
int.
Note that we declared these fields
private, and prefixed their names with m_ — this is just a
convention that makes it easier to distinguish members from non-members
when not using this-> explicitly.
Next, we’ll have the constructors. We’ll support the 3 kinds of
constructors that we saw from std::vector earlier (there are
more, of course, but those will come later).
This simply creates an empty vector:
SimpleVector1() : m_buffer(nullptr), m_size(0) {}
SimpleVector
We initialize m_buffer with nullptr, and
m_size with 0. The body itself is empty, since we don’t need
to actually do anything else.
Next, we have the copy constructor, which makes a copy of the elements from another vector:
SimpleVector1(const SimpleVector1& other) : m_size(other.m_size) {
m_buffer = new ElemTy[m_size];
for (size_t i = 0; i < m_size; i++) {
m_buffer[i] = other.m_buffer[i];
}
}
SimpleVector
The steps we need to perform are:
Note that our SimpleVector follows the rule of 3 mentioned in
Lecture 3; since we are writing a custom copy constructor (since we want
to make a copy of the buffer contents), we also need a custom
destructor and copy-assignment operator, both of which we will write
below.
std::initializer_list constructor
Lastly, we have the constructor that takes in a
std::initializer_list. A braced list of items, eg.
{1, 2, 3} will prefer to deduce to
std::initializer_list<int>
(we’ll cover this in more detail in a future lecture), and for our
purposes right now, we can just treat it as an array of elements.
We can use the size and data functions to get
the number of elements as well as a pointer to the first one, so we can
perform very similar steps as the copy constructor above:
SimpleVector1(std::initializer_list<ElemTy> xs) : m_size(xs.size()) {
m_buffer = new ElemTy[m_size];
for (size_t i = 0; i < xs.size(); i++) {
m_buffer[i] = std::data(xs)[i];
}
}
std::initializer_list constructor for
SimpleVector
Since we own the buffer we used to hold the elements (and allocated it
with new), we must also define a destructor that is responsible for performing
delete on the
array:
~SimpleVector1() {
delete[] m_buffer;
}
SimpleVector
Note that, even though m_buffer can be
nullptr (since
that’s what an empty array would be), it’s not necessary to check for this
before performing the
delete, since it is explicitly defined to do nothing when passed a
nullptr (this
also applies when we do
delete later
on).
Now that the boilerplate is out of the way, we can implement the actual
functionality, which are push_back and pop_back.
Again, note that we are not implementing this optimally; we’ll do that in
a future lecture.
First, appending:
void push_back(const ElemTy& element) {
ElemTy* new_buffer = new ElemTy[m_size + 1];
for (size_t i = 0; i < m_size; i++) {
new_buffer[i] = m_buffer[i];
}
new_buffer[m_size] = element;
m_size += 1;
delete[] m_buffer;
m_buffer = new_buffer;
}
push_back for
SimpleVector
This is quite straightforward; we make a new buffer, copy the existing elements from the old buffer to the new one, then add the new element, before deleting the old buffer.
Some things to note here:
const ElemTy&.
This is to avoid making a copy, since we will already be copy-assigning
into the new buffer.
memcpy to copy the old buffer
to the new; this is because it only performs a bytewise copy, which is
insufficient for non-POD3
types (eg. std::string) which have copy-constructors and
copy-assignment operators.
Next, popping from the back; this is much the same (with the same caveats as above), just in reverse:
ElemTy pop_back() {
assert(m_size > 0);
ElemTy* new_buffer = new ElemTy[m_size - 1];
for (size_t i = 0; i < m_size - 1; i++) {
new_buffer[i] = m_buffer[i];
}
ElemTy last_elem = m_buffer[m_size - 1];
m_size -= 1;
delete[] m_buffer;
m_buffer = new_buffer;
return last_elem;
}
push_back for
SimpleVector
Note that we also return the element we popped here just for convenience —
std::vector doesn’t do that, which sometimes can be kind of
annoying.
Neither of these methods are const-qualified, since they modify the vector.
Next, we can implement various bits and bobs, starting with getting the size of our vector:
size_t size() const {
return m_size;
}
That’s it, very simple. It is const-qualified, since it doesn’t need to modify any of the fields.
After that, the subscript operators:
ElemTy& operator[](size_t idx) {
return m_buffer[idx];
}
const ElemTy& operator[](size_t idx) const {
return m_buffer[idx];
}
Note that we need to define two separate operators:
Finally, the swap methods:
void swap(SimpleVector1& other) {
std::swap(m_buffer, other.m_buffer);
std::swap(m_size, other.m_size);
}
friend void swap(SimpleVector1& a, SimpleVector1& b) {
a.swap(b);
}
swap to implement some other methods
later on
In order for our type to be swappable, we need to implement the non-member
swap, done here with a friend function. We also implement the
swap member function, which simply swaps the individual
fields of the vector. This implementation is quite simple since the value
of the vector is really just its fields.
operator<<
As a convenience for ourselves, let’s overload
operator<<
so that we can easily print our SimpleVector. The
implementation is quite straightforward; it simply iterates through all
the elements and calls their own overload of
operator<<
(assuming it exists).
friend std::ostream& operator<<(std::ostream& os, SimpleVector1& vec) {
os << "[";
for (size_t i = 0; i < vec.size(); i++) {
if (i > 0) {
os << ", ";
}
os << vec[i];
}
os << "]";
return os;
}
operator<<
for our SimpleVector
Finally, we come to the tricky part — the copy-assignment operator. Its purpose is to copy the contents of the right hand operand into the left-hand operand, replacing its old contents.
The straightforward way to implement this operator would be something like this:
SimpleVector1& operator=(const SimpleVector1& other) {
delete[] m_buffer;
m_buffer = new ElemTy[other.m_size];
m_size = other.m_size;
for (size_t i = 0; i < m_size; i++) {
m_buffer[i] = other.m_buffer[i];
}
return *this;
}
Again, the idea is quite simple — we delete our old buffer, make a new one
of the appropriate size, and copy the elements over. Note that we must
return *this
as mentioned in Lecture 2, to allow for chaining assignments a-la
a = b = c.
However, the keener-eyed amongst you might have noticed that the header says “(broken)”. Why is that? Suppose we had the following code, which performs a self-assignment:
SimpleVector1<int> sv1{1, 2, 3, 4};
sv1 = sv1;
std::cout << "sv1[0] = " << sv1[0] << "\n";
Now, the most intuitive behaviour from this is that nothing happens, and
sv is left unmodified. However, if we actually run this code:
$ ./simple-vector.out | head -n1
Our vector suddenly changed! Clearly, this didn’t actually perform a
no-op. Keep in mind that in this scenario,
this and
other refer to the exact same vector. When we do
delete[]
m_buffer
on the first line, there’s only one buffer! So we also delete
other.m_buffer, and when we read from it later (in the loop),
we’re actually reading uninitialized memory.
Note that this is the same broken behaviour for self-assignment as we saw
in OwnedInt from the previous lecture.
Given that, we can fix it with the same pattern — simply check for self-assignment and return early:
SimpleVector2& operator=(const SimpleVector2& other) {
if (this == &other) {
return *this;
}
delete[] m_buffer;
m_buffer = new ElemTy[other.m_size];
m_size = other.m_size;
for (size_t i = 0; i < m_size; i++) {
m_buffer[i] = other.m_buffer[i];
}
return *this;
}
SimpleVector2<int> sv2{1, 2, 3, 4};
sv2 = sv2;
std::cout << "sv2[0] = " << sv2[0] << "\n";
Now we get the correct behaviour, and no garbage values:
$ ./simple-vector.out | head -n2 | tail -n1
However, we still have a problem if we have nested data structures (the heading still says “(broken)”, doesn’t it? Hehe). Suppose we have a struct like this:
struct Nest {
SimpleVector2<Nest> nests;
};
Nest nested{{Nest{{Nest{}, Nest{}}}}};
While it may look contrived, these kinds of data structures can appear in problems that are best expressed with a tree structure, like abstract syntax trees for compilers or interpreters.
Let’s try to assign from one of the inner pieces to the outer piece:
nested = nested.nests[0];
$ ./simple-vector.out
=================================================================
==1337==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000000020 at pc 0x0000004ce8e9 bp 0x7fffffffe610 sp 0x7fffffffe608
READ of size 8 at 0x603000000020 thread T0
#0 0x4ce8e8 in SimpleVector2<main::Nest>::operator=(SimpleVector2<main::Nest> const&) /__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.cpp:209:33
#1 0x4ce740 in main::Nest::operator=(Nest const&) /__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.cpp:437:10
#2 0x4cdd2b in main /__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.cpp:445:10
#3 0x7ffff7a71082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
#4 0x41c34d in _start (/__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.out+0x41c34d)
0x603000000020 is located 16 bytes inside of 24-byte region [0x603000000010,0x603000000028)
freed by thread T0 here:
#0 0x4cb21d in operator delete[](void*) (/__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.out+0x4cb21d)
#1 0x4ce8c2 in SimpleVector2<main::Nest>::operator=(SimpleVector2<main::Nest> const&) /__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.cpp:208:5
#2 0x4ce740 in main::Nest::operator=(Nest const&) /__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.cpp:437:10
#3 0x4cdd2b in main /__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.cpp:445:10
#4 0x7ffff7a71082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
previously allocated by thread T0 here:
#0 0x4ca9cd in operator new[](unsigned long) (/__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.out+0x4ca9cd)
#1 0x4ce3bf in SimpleVector2<main::Nest>::SimpleVector2(std::initializer_list<main::Nest>) /__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.cpp:192:16
#2 0x4cdc6a in main /__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.cpp:441:15
#3 0x7ffff7a71082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
SUMMARY: AddressSanitizer: heap-use-after-free /__w/ccc-2/ccc-2/lectures/l04/simple-vector/simple-vector.cpp:209:33 in SimpleVector2<main::Nest>::operator=(SimpleVector2<main::Nest> const&)
Shadow bytes around the buggy address:
0x0c067fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c067fff8000: fa fa fd fd[fd]fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1337==ABORTING
Oops! Note the scary output here is because we compiled with AddressSanitizer, which is a useful tool that we’ll cover later on. Basically, it lets us know when we’ve done something bad with memory (eg. here, a heap use-after-free). Let’s analyse what happens here:
this ==
&other. Since we are not doing a self-assignment, this is false, so we don’t
return early.
delete[]
m_buffer, which is the buffer of the left-hand side. Since our buffer contains
all the inner SimpleVectors by value, they also get
deleted! This means the entire data structure is now gone — including
other.
other.m_buffer, which is a
use-after-free.
Note that without AddressSanitizer, we’ll most likely not even know about this bug, and instead will be faced with silent memory corruption, which is often very difficult to debug. So this edge case is even more insidious!
At last, let’s look at the correct version, which handles this edge case as well:
SimpleVector& operator=(const SimpleVector& other) {
SimpleVector copy{other};
this->swap(copy);
return *this;
}
That’s it! This approach of writing the copy-assignment operator is known
as the Copy-and-Swap idiom, which concisely expresses what we’re
doing — first we copy the RHS into a temporary (copy), then
swap it with the LHS (ourselves). When the function returns,
copy — which now has the old contents of
this — is
destroyed, so the semantics are identical.
For self-assignment, since we do not delete the buffer early, the problem
is avoided. Note that this incurs a performance penalty for
self-assignment, since we end up with two copies of the data for a brief
moment. You can add an explicit
this ==
&other
check to fix it, or don’t — self-assignment is almost always a
bug anyway.
In case of the second edge case, where we copied from an inner part of
ourselves, we also avoid the bug because a copy of the inner part is made,
and the old data (ie. the old contents of
this) is only destroyed at the end of the function.
In summary, you should always be cognisant of self-assignment and copy-from-inner-part when writing copy assignment operators, and a good approach is to follow the copy-and-swap idiom.
An alternative approach which has been advocated by some people is to write the copy-assignment operator as taking its argument by value, like this:
SimpleVector& operator=(SimpleVector copy) {
this->swap(copy);
return *this;
}
This still functions as a copy-assignment operator according to the
standard (taking T by value is one of the allowed forms).
This way, the caller is responsible for making the copy, and otherwise
the code functions identically.
So far, we’ve shown some of the possible failure modes when implementing
a templated class like vector, and showed the copy and swap
idiom as a good way to avoid such bugs by composing two operations that
always work (copy-construct and swap).
We can take a peek at standard libraries to see how they actually implement these things!
Let’s start with libc++, whose implementation can be found here, as of 23 June 2022: https://github.com/llvm/llvm-project/blob/681cde7dd8b5613dbafc9ca54e0288477f946be3/libcxx/include/vector#L1314-L1350
All they basically do is a self-assign check, then it checks if it has enough capacity to store the new elements. If so, it will copy-assign (or move-assign) as much as possible, and copy-construct or destroy the excess. If not, it will first deallocate its buffer, before copy-constructing a new buffer.
// (simplified!)
vector<_Tp, _Allocator>& vector<_Tp, _Allocator>::operator=(
const vector& __x) {
if (this != &__x) {
if (__x.size() <= capacity()) {
// Copy-assign as much as possible which is the first min(size(),
// __x.size()) elements. The remaining elements are either excess
// (we have too many), which we call the destructor for, or we have
// less elements and there are still more elements to copy, so we
// call the copy-constructor for that.
} else {
__vdeallocate();
__vallocate(__recommend(__new_size));
__construct_at_end(__first, __last, __new_size);
}
}
return *this;
}
If you’ve been keenly following our examples so far, this is obviously
wrong, and indeed it explodes with our Nest struct.
// boom-vector.cpp
#include <iostream>
#include <vector>
struct Nest {
std::vector<Nest> nests;
};
int main() {
Nest nested{{Nest{{Nest{}, Nest{}}}, Nest{}}};
nested = nested.nests[0];
return 0;
}
Running it, we get a boom as expected!
$ ./boom-vector.libc++.out
=================================================================
==1337==ERROR: AddressSanitizer: container-overflow on address 0x6040000000a8 at pc 0x0000004ce331 bp 0x7fffffffe600 sp 0x7fffffffe5f8
READ of size 8 at 0x6040000000a8 thread T0
#0 0x4ce330 in std::__1::vector<Nest, std::__1::allocator<Nest> >::operator=(std::__1::vector<Nest, std::__1::allocator<Nest> > const&) /usr/lib/llvm-13/bin/../include/c++/v1/vector:1427:20
#1 0x4cdba0 in Nest::operator=(Nest const&) /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector.cpp:6:8
#2 0x4cf14e in Nest* std::__1::__copy_constexpr<Nest*, Nest*>(Nest*, Nest*, Nest*) /usr/lib/llvm-13/bin/../include/c++/v1/__algorithm/copy.h:35:19
#3 0x4cf0f4 in Nest* std::__1::__copy<Nest*, Nest*>(Nest*, Nest*, Nest*) /usr/lib/llvm-13/bin/../include/c++/v1/__algorithm/copy.h:44:12
#4 0x4ce7fe in Nest* std::__1::copy<Nest*, Nest*>(Nest*, Nest*, Nest*) /usr/lib/llvm-13/bin/../include/c++/v1/__algorithm/copy.h:72:13
#5 0x4ce608 in std::__1::enable_if<(__is_cpp17_forward_iterator<Nest*>::value) && (is_constructible<Nest, std::__1::iterator_traits<Nest*>::reference>::value), void>::type std::__1::vector<Nest, std::__1::allocator<Nest> >::assign<Nest*>(Nest*, Nest*) /usr/lib/llvm-13/bin/../include/c++/v1/vector:1473:23
#6 0x4ce375 in std::__1::vector<Nest, std::__1::allocator<Nest> >::operator=(std::__1::vector<Nest, std::__1::allocator<Nest> > const&) /usr/lib/llvm-13/bin/../include/c++/v1/vector:1427:9
#7 0x4cdba0 in Nest::operator=(Nest const&) /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector.cpp:6:8
#8 0x4cd5e3 in main /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector.cpp:12:10
#9 0x7ffff7b46082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
#10 0x41c31d in _start (/__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector.libc++.out+0x41c31d)
0x6040000000a8 is located 24 bytes inside of 48-byte region [0x604000000090,0x6040000000c0)
allocated by thread T0 here:
#0 0x4ca88d in operator new(unsigned long) (/__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector.libc++.out+0x4ca88d)
#1 0x4d0664 in void* std::__1::__libcpp_operator_new<unsigned long>(unsigned long) /usr/lib/llvm-13/bin/../include/c++/v1/new:235:10
#2 0x4d05ac in std::__1::__libcpp_allocate(unsigned long, unsigned long) /usr/lib/llvm-13/bin/../include/c++/v1/new:261:10
#3 0x4d04e3 in std::__1::allocator<Nest>::allocate(unsigned long) /usr/lib/llvm-13/bin/../include/c++/v1/__memory/allocator.h:108:38
#4 0x4d014c in std::__1::allocator_traits<std::__1::allocator<Nest> >::allocate(std::__1::allocator<Nest>&, unsigned long) /usr/lib/llvm-13/bin/../include/c++/v1/__memory/allocator_traits.h:262:20
#5 0x4cec6f in std::__1::vector<Nest, std::__1::allocator<Nest> >::__vallocate(unsigned long) /usr/lib/llvm-13/bin/../include/c++/v1/vector:1015:37
#6 0x4cf853 in std::__1::vector<Nest, std::__1::allocator<Nest> >::vector(std::__1::vector<Nest, std::__1::allocator<Nest> > const&) /usr/lib/llvm-13/bin/../include/c++/v1/vector:1280:9
#7 0x4cf6fc in Nest::Nest(Nest const&) /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector.cpp:6:8
#8 0x4d0f4c in Nest* std::__1::construct_at<Nest, Nest const&, Nest*>(Nest*, Nest const&) /usr/lib/llvm-13/bin/../include/c++/v1/__memory/construct_at.h:37:38
#9 0x4d0f10 in void std::__1::allocator_traits<std::__1::allocator<Nest> >::construct<Nest, Nest const&, void, void>(std::__1::allocator<Nest>&, Nest*, Nest const&) /usr/lib/llvm-13/bin/../include/c++/v1/__memory/allocator_traits.h:298:9
#10 0x4d0e8b in void std::__1::__construct_range_forward<std::__1::allocator<Nest>, Nest const*, Nest*>(std::__1::allocator<Nest>&, Nest const*, Nest const*, Nest*&) /usr/lib/llvm-13/bin/../include/c++/v1/memory:750:9
#11 0x4d0c5a in std::__1::enable_if<__is_cpp17_forward_iterator<Nest const*>::value, void>::type std::__1::vector<Nest, std::__1::allocator<Nest> >::__construct_at_end<Nest const*>(Nest const*, Nest const*, unsigned long) /usr/lib/llvm-13/bin/../include/c++/v1/vector:1099:5
#12 0x4cda3f in std::__1::vector<Nest, std::__1::allocator<Nest> >::vector(std::initializer_list<Nest>) /usr/lib/llvm-13/bin/../include/c++/v1/vector:1357:9
#13 0x4cd52e in main /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector.cpp:11:15
#14 0x7ffff7b46082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_container_overflow=0.
If you suspect a false positive see also: https://github.com/google/sanitizers/wiki/AddressSanitizerContainerOverflow.
SUMMARY: AddressSanitizer: container-overflow /usr/lib/llvm-13/bin/../include/c++/v1/vector:1427:20 in std::__1::vector<Nest, std::__1::allocator<Nest> >::operator=(std::__1::vector<Nest, std::__1::allocator<Nest> > const&)
Shadow bytes around the buggy address:
0x0c087fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c087fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c087fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c087fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c087fff8000: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 00 00
=>0x0c087fff8010: fa fa fc fc fc[fc]fc fc fa fa fa fa fa fa fa fa
0x0c087fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1337==ABORTING
As for libstdc++, whose implementation can be found here, as of 23 June 2022: https://github.com/gcc-mirror/gcc/blob/7adcbafe45f8001b698967defe682687b52c0007/libstdc%2B%2B-v3/include/bits/vector.tcc#L229-L256
Similar to libc++, the first thing they do is a self-check, to avoid doing any work when trying to do a self-assignment.
However, control flow breaks up after that point, depending on whether
*this
has sufficient space to store the elements in the other vector. If so,
it will also try its best to use copy-assign rather than destruct +
copy-construct (or move).
vector<_Tp, _Alloc>& vector<_Tp, _Alloc>::operator=(
const vector<_Tp, _Alloc>& __x) {
if (&__x != this) {
const size_type __xlen = __x.size();
if (__xlen > capacity()) {
// Since we have not enough space to store their elements, we need
// to reallocate. (essentially) use copy-construct and swap
} else if (size() >= __xlen) {
// we have more or the same # of elements than them
// copy-assign all their elements, then destroy excess elements
} else {
// we have less elements than them, but we have free space
// copy-assign the first few of their elements then copy-construct
// their excess elements onto our free space
}
}
return *this;
}
Once again, we see that the latter two branches don’t use copy-and-swap, so it’s susceptible to the same bug.
Let’s try it!
$ ./boom-vector.out
$ # no boom?
While it seems like there is no bug, this is simply because AddressSanitizer isn’t great at tracking object lifetime. Since the bug here is that we’re reading from an object whose lifetime has ended, but not yet deallocated, AddressSanitizer doesn’t catch it in this case.
In order to actually observe the bug, we can keep a
std::string around, and the presence of a managed heap
allocation will make sure that after destruction, the heap allocation
becomes dangling. Since use-after-frees are detected by
AddressSanitizer, we will be able to see our bug.
struct Nest {
std::vector<Nest> nests;
std::string s = "Hello world very long string";
};
std::string to create a detectable
use-after-free
$ ./boom-vector-long-string.out
=================================================================
==1337==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000000130 at pc 0x000000498f67 bp 0x7fffffffdca0 sp 0x7fffffffd468
READ of size 28 at 0x603000000130 thread T0
#0 0x498f66 in __asan_memcpy (/__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.out+0x498f66)
#1 0x4d3790 in std::char_traits<char>::copy(char*, char const*, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/char_traits.h:372:33
#2 0x4d36a1 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy(char*, char const*, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:355:4
#3 0x4d52d4 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.tcc:272:6
#4 0x4d50d0 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:1370:8
#5 0x4d02ec in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:700:15
#6 0x4cf415 in Nest::operator=(Nest const&) /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.cpp:7:8
#7 0x4d4c79 in Nest* std::__copy_move<false, false, std::random_access_iterator_tag>::__copy_m<Nest const*, Nest*>(Nest const*, Nest const*, Nest*) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_algobase.h:342:18
#8 0x4d46d8 in Nest* std::__copy_move_a<false, Nest const*, Nest*>(Nest const*, Nest const*, Nest*) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_algobase.h:403:14
#9 0x4d41ad in __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > > >(__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > >) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_algobase.h:441:3
#10 0x4d1159 in __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > > std::copy<__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > > >(__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > >) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_algobase.h:473:14
#11 0x4cfef3 in std::vector<Nest, std::allocator<Nest> >::operator=(std::vector<Nest, std::allocator<Nest> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/vector.tcc:238:22
#12 0x4cf400 in Nest::operator=(Nest const&) /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.cpp:7:8
#13 0x4cea85 in main /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.cpp:15:10
#14 0x7ffff7a71082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
#15 0x41d3fd in _start (/__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.out+0x41d3fd)
0x603000000130 is located 0 bytes inside of 29-byte region [0x603000000130,0x60300000014d)
freed by thread T0 here:
#0 0x4cc1cd in operator delete(void*) (/__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.out+0x4cc1cd)
#1 0x4cf7cc in __gnu_cxx::new_allocator<char>::deallocate(char*, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/new_allocator.h:128:2
#2 0x4cf794 in std::allocator_traits<std::allocator<char> >::deallocate(std::allocator<char>&, char*, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/alloc_traits.h:469:13
#3 0x4cf692 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_destroy(unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:241:9
#4 0x4cf5d6 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose() /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:236:4
#5 0x4cf538 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:662:9
#6 0x4cf0fc in Nest::~Nest() /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.cpp:7:8
#7 0x4d3864 in void std::_Destroy<Nest>(Nest*) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_construct.h:98:19
#8 0x4d3d0f in void std::_Destroy_aux<false>::__destroy<__gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > > >(__gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > >) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_construct.h:108:6
#9 0x4d3b21 in void std::_Destroy<__gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > > >(__gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > >) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_construct.h:136:7
#10 0x4d0d25 in void std::_Destroy<__gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > >, Nest>(__gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > >, std::allocator<Nest>&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_construct.h:206:7
#11 0x4d0009 in std::vector<Nest, std::allocator<Nest> >::operator=(std::vector<Nest, std::allocator<Nest> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/vector.tcc:238:8
#12 0x4cf400 in Nest::operator=(Nest const&) /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.cpp:7:8
#13 0x4d4c79 in Nest* std::__copy_move<false, false, std::random_access_iterator_tag>::__copy_m<Nest const*, Nest*>(Nest const*, Nest const*, Nest*) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_algobase.h:342:18
#14 0x4d46d8 in Nest* std::__copy_move_a<false, Nest const*, Nest*>(Nest const*, Nest const*, Nest*) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_algobase.h:403:14
#15 0x4d41ad in __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > > std::__copy_move_a2<false, __gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > > >(__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > >) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_algobase.h:441:3
#16 0x4d1159 in __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > > std::copy<__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > > >(__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest*, std::vector<Nest, std::allocator<Nest> > >) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_algobase.h:473:14
#17 0x4cfef3 in std::vector<Nest, std::allocator<Nest> >::operator=(std::vector<Nest, std::allocator<Nest> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/vector.tcc:238:22
#18 0x4cf400 in Nest::operator=(Nest const&) /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.cpp:7:8
#19 0x4cea85 in main /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.cpp:15:10
#20 0x7ffff7a71082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
previously allocated by thread T0 here:
#0 0x4cb96d in operator new(unsigned long) (/__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.out+0x4cb96d)
#1 0x4d364b in __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/new_allocator.h:114:27
#2 0x4d35b0 in std::allocator_traits<std::allocator<char> >::allocate(std::allocator<char>&, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/alloc_traits.h:443:20
#3 0x4d32e6 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.tcc:153:14
#4 0x4d2e25 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.tcc:219:14
#5 0x4d2cb4 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct_aux<char*>(char*, char*, std::__false_type) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:251:11
#6 0x4d2bb4 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:270:4
#7 0x4d263f in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/basic_string.h:455:9
#8 0x4d2085 in Nest::Nest(Nest const&) /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.cpp:7:8
#9 0x4d1f4c in void std::_Construct<Nest, Nest const&>(Nest*, Nest const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_construct.h:75:38
#10 0x4d1d0b in Nest* std::__uninitialized_copy<false>::__uninit_copy<__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, Nest*>(__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, Nest*) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_uninitialized.h:83:3
#11 0x4d1adb in Nest* std::uninitialized_copy<__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, Nest*>(__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, Nest*) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_uninitialized.h:137:14
#12 0x4d17a3 in Nest* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, Nest*, Nest>(__gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, __gnu_cxx::__normal_iterator<Nest const*, std::vector<Nest, std::allocator<Nest> > >, Nest*, std::allocator<Nest>&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_uninitialized.h:307:14
#13 0x4d235d in std::vector<Nest, std::allocator<Nest> >::vector(std::vector<Nest, std::allocator<Nest> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_vector.h:555:4
#14 0x4d2070 in Nest::Nest(Nest const&) /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.cpp:7:8
#15 0x4d1f4c in void std::_Construct<Nest, Nest const&>(Nest*, Nest const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_construct.h:75:38
#16 0x4d64ee in Nest* std::__uninitialized_copy<false>::__uninit_copy<Nest const*, Nest*>(Nest const*, Nest const*, Nest*) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_uninitialized.h:83:3
#17 0x4d64a8 in Nest* std::uninitialized_copy<Nest const*, Nest*>(Nest const*, Nest const*, Nest*) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_uninitialized.h:137:14
#18 0x4d6158 in Nest* std::__uninitialized_copy_a<Nest const*, Nest*, Nest>(Nest const*, Nest const*, Nest*, std::allocator<Nest>&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_uninitialized.h:307:14
#19 0x4d5d39 in void std::vector<Nest, std::allocator<Nest> >::_M_range_initialize<Nest const*>(Nest const*, Nest const*, std::forward_iterator_tag) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_vector.h:1582:6
#20 0x4cf2a7 in std::vector<Nest, std::allocator<Nest> >::vector(std::initializer_list<Nest>, std::allocator<Nest> const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_vector.h:626:2
#21 0x4ce92a in main /__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.cpp:14:15
#22 0x7ffff7a71082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
SUMMARY: AddressSanitizer: heap-use-after-free (/__w/ccc-2/ccc-2/lectures/l04/boom-vector/boom-vector-long-string.out+0x498f66) in __asan_memcpy
Shadow bytes around the buggy address:
0x0c067fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff8000: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
0x0c067fff8010: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa
=>0x0c067fff8020: fd fd fd fd fa fa[fd]fd fd fd fa fa fd fd fd fd
0x0c067fff8030: fa fa 00 00 00 05 fa fa 00 00 00 05 fa fa 00 00
0x0c067fff8040: 00 05 fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1337==ABORTING
So, if you want to impress your friends, you can use the following code to show how easily you can crash the standard library.
#include <vector>
int main() {
struct B {
std::vector<B> b;
std::vector<int> c{0};
} b{{{{{}, {}}}, {}}};
b = b.b[0];
return 0;
}
That being said, it’s not easy to prove that this should actually be considered a bug, due to the large number of preconditions that the standard library API requires. Is our example a case of not meeting the preconditions, or is it a specification defect, or is it a standard library bug? Who knows.
At the very least, the teaching team currently believe these examples constitute either an implementation bug or a specification defect.
By the end of this (sub) section, we have written our own
SimpleVector which has some of the more commonly-used
features of std::vector. We’ve written slightly more complex
versions of the various constructors, destructors, and assignment
operators that were covered in the previous lecture, and discussed some
edge cases that can happen when writing these.
Note that the correct implementation is just called
SimpleVector, and is the last one in the file if you need to
refer to it. We will expand on this SimpleVector in future
lectures once we cover move semantics.
Let’s now look at the containers available in the standard library.
std::vector is the most common one, but there are many others
that may suit certain situations better.
The standard library containers are designed to be general-purpose but yet efficient, and much of this efficiency is due to the fact that they are all templated on the element type (and possibly on some comparator function as well, if any).
The remaining section is a quick walkthrough of the containers available in the standard library. It is meant to be read in together with the relevant C++ reference pages, and the focus is on how the containers are laid out in memory, and how that leads to the guarantees (algorithmic or otherwise) provided by each container.
Contiguous containers are those whose elements are laid out contiguously
in memory, like the simple_vector example that we have seen.
std::vector
std::vector<T>, as we have seen earlier, is the typical
growable arraylist. It is much like simple_vector, but is
designed so that push_back and pop_back are
fast.
std::vector achieves amortised
O(1) time complexity for
push_back by reserving an underlying buffer that may have
additional free space. When learning about time complexity analysis, this
multiplying factor is typically set to 2, but any constant multiple
suffices. In practice, 1.5 is often used because it is close to the golden
ratio. pop_back never reallocates the underlying buffer, so
it is always O(1).
Because of this, it needs separate size and
capacity (or equivalently, pointers to the first unused space
in the buffer and to the past-the-end element of the buffer,
respectively), as we have seen earlier.
Note that the standard does not specify the fields contained in
std::vector, but in practice the only reasonable way to
implement std::vector uses these three fields:
There are member functions to retrieve and manipulate properties of the
vector, as well as its underlying buffer. For example,
reserve reallocates the buffer (if necessary) to ensure that
there is sufficient space to store a given number of elements, and
shrink_to_fit reallocates the buffer (if necessary) to remove
any free space in the buffer.
std::vector
We have a vector and we can put elements into it. However, we might want to hold references or pointers to some elements in order to access them later. When we say that an operation is address stable, it means that the locations of elements in the container are not changed by the operation.
For example, consider the following code:
// construct and initialize a vector with 3 elems using the
// std::initializer_list constructor
std::vector<int> v{1, 2, 3};
int& x = v[1]; // take a reference to an element
v.push_back(4); // add a new element
do_stuff(x); // access that reference
push_back
While this example may seem contrived, there could be more complex
situations where we want to hand the
v[1]
reference to another function, which saves it somewhere and does something
to it in the future.
Is the call to do_stuff guaranteed to hold a reference to the
element
v[1]? Or in other words, is push_back address stable?
If no reallocation occurs, then the existing elements remain in the
original buffer, and so the operation is address stable. However, if a
reallocation occurs, then the existing elements are moved (we will cover
exactly what a “move” means in a later lecture) to a new buffer and the
original buffer is deleted. In this case, the operation is not address
stable. In other words, if
size()
< capacity()
initially then push_back is address stable.
As pop_back never reallocates the buffer,
pop_back is always address stable.
std::array
std::array<T, N> is a thin wrapper over a raw C-style
array. Its size (the number of elements in contains) is fixed and elements
are stored inline (i.e. within the object itself, without the need for an
extra heap allocation). Like any other struct/class in C++, its size must
be known at compilation time.
Heap allocations (i.e. new
and delete) are generally slow compared to other operations, as a non-trivial
algorithm is used to maintain the data structure that services heap
allocations (in other words, malloc and free are
not “free”). This may be a significant contributor to inefficiency,
especially when doing many transient heap allocations (i.e. having many
short-lived heap allocated objects).
As std::vector makes a heap allocation but
std::array does not, std::array is almost always
faster and should be preferred if the use case will allow for it,
especially if the objects are being stored for a short period of time. (If
we want to keep things for a longer period of time so that the
std::vector doesn’t need to get reallocated often, then we
are calling
new and
delete rarely,
so this would be less of a performance issue.)
An indirection is, in simple terms, dereferencing a pointer to get the
value it points to (*ptr). This also includes things like accessing an element of an array that
is being held by reference or pointer (arr[i], which is equivalent to
*(arr + i)), or accessing a field of an object passed by pointer (obj->field, which is equivalent to
(*obj).field). Although they may take several forms in code, they are really the same
thing at the assembly level — from a pointer, we figure out what it points
to, and load something from there.
When writing performant code, it is important to take note of indirections because they may unnecessarily slow down the execution of your code.
Accessing an element in a std::array performs zero
indirections, since the elements are inlined into the code that uses it.
However, accessing an element in a std::vector performs one
indirection, since it dereferences the buffer pointer.
This performance penalty for indirection occurs even if heap allocations
are not a major concern, so std::array is again superior and
should be used if possible.
std::array
As we have mentioned repeatedly, std::array does not incur
the extra layer of indirection of std::vector, and it’s
just a thin wrapper around a raw C-style array. In fact, it is a
very thin wrapper — thin enough that when compiling with
optimisations, the entire std::array basically disappears:
#include <array>
std::array<int, 2> arr { 69, 420 };
int main() {
return arr[0] + arr[1];
}
main:
mov eax, dword ptr [rip + arr+4]
add eax, dword ptr [rip + arr]
ret
std::deque
std::deque<T> is a double-ended queue. Unlike a
std::vector that only supports push_back and
pop_back, std::deque additionally supports
push_front and pop_front.
There are two general directions one can go when implementing a double-ended queue:
std::vector, but reserving some space
on both ends of the buffer — by increasing the buffer by a multiple of
the original size, we can achieve amortised constant time complexity for
push_back and push_front.
However, the standard requires std::deque to satisfy fairly
strict requirements that make it not possible to implement it in either of
those ways — elements must be accessible by index (i.e. operator[](size_t)) in constant time, but yet the container must be address stable in the
face of push_back, push_front,
pop_back, and pop_front (apart from the element
being popped of course).
One way that satisfies the requirements of std::deque is by
doing the following:
This allows access by index in constant time, and also ensures that the
elements are never moved even if a reallocation of the buffer occurs.
Hence, this is a valid implementation of std::deque.
However, this is not very efficient, because every element needs a
separate heap allocation — there are many small heap allocations, and each
allocation might end up in an unrelated memory location (which means poor
spatial locality of the memory, which is bad for cache efficiency). All
major standard libraries implement std::deque by having each
heap allocation contain multiple elements together:
The first and last inner arrays might have leading or trailing empty spaces respectively, and all middle inner arrays must be filled. When adding a new element to the front or back of the deque, if there is still space it gets placed there immediately, otherwise it allocates a new array and adds it to the outer buffer (reallocating the buffer if necessary)
This still guarantees address stability of the above four operations, but tries to reduce the number of heap allocations required. The exact number of elements put into the same allocation depends on the standard library implementation, and it often also depends on the size of an element.
As such, std::deque is non-contiguous — if you have a pointer
ptr to some element in the deque,
(ptr +
1)
may not point to the element that immediately follows it.
Container adaptors are what they sound like — they wrap another container, possibly modifying its behaviour.
std::stack
and
std::queue
std::stack<T> and std::queue<T> are
container adaptors that by default wrap a std::deque, though
it is possible to specify another container (or even a custom container
that you have written on your own, using the second template parameter).
std::stack requires the underlying container to have the
back, push_back, and
pop_back member functions, while
std::queue requires it to have the back,
front, push_back, and
pop_front member functions.
Their implementations should be fairly obvious.
std::priority_queue
A priority queue is a
binary max heap. It wraps over std::vector by default, and requires the
underlying container to implement front,
push_back, pop_back, and support random access
(roughly speaking, this means you can get an element by index using
something like
operator[], but we will formalise this a bit better in the next lecture). The
Wikipedia article on
binary heaps
has a good explanation about how it can be implemented using an array (or
in our case, std::vector).
Unlike the previous containers and container adaptors that we have seen, priority queues have a concept of an element being “bigger” or “smaller” than another one. In other words, a priority queue is a binary max heap — but what is the max function that we want to use?
Take a look at the class definition of std::priority_queue on
its C++ Reference page:
template <class T,
class Container = std::vector<T>,
class Compare = std::less<typename Container::value_type>>
class priority_queue;
(Note that class and typename in a template are
equivalent.)
While we have not explained the specifics of the template syntax, you
should be able to make a good guess about what they mean. Like all the
other containers we have seen so far, T is the element type.
The second and third template parameters have a default type, which allow
us to omit them and write just std::priority_queue<T>,
in a way that parallels default arguments in function calls. We can infer
that Container is the underlying container, as its default is
std::vector<T>. The third parameter is interesting, and
looks suspiciously like what we’re looking for — it seems to have to do
with customizing the max function of the binary max heap.
A function object is a struct or class that has a (public) function call
operator (i.e. operator()). This is an example:
struct Comp {
bool operator()(const int& a, const int& b) { return a < b; }
};
ints
We can then use it as:
Comp comp;
comp(1, 2); // returns true
comp(4, 3); // returns false
ints
It is possible for Comp to maintain state (i.e. have one or
more member fields), but in most cases there is no state required.
In the class template declaration above, we see that the default type for
Compare is
std::less<typename Container::value_type>, which in this case is std::less<T>.
std::less is just a templated version of Comp,
which can be specialised for any element type:
template <typename T>
struct less {
bool operator()(const T& a, const T& b) { return a < b; }
};
std::less
(Note: From C++14 there is a template specialisation for
std::less<void>
that allows heterogeneous comparisons, but we will ignore it for now.)
This struct, which Compare is set to by default, tells the
priority queue how to order the elements (i.e. whether a parent and child
node needs to be swapped). For example, if we wanted a min heap instead,
we could write such a comparator:
struct ReverseComp {
bool operator()(const int& a, const int& b) { return a > b; }
};
We would then declare our priority queue as:
std::priority_queue<int, std::vector<int>, ReverseComp>
Alternatively, you can use std::greater, which is the generic
version of our ReverseComp.
Since we can define any custom comparator, we can also make a priority
queue of a type that does not have
operator<
defined, or has no natural ordering (for example our
Point struct) — in this case the custom comparator template
parameter would be compulsory (otherwise there will be a compiler error).
C++ has two linked list containers — std::list (doubly-linked
list) and std::forward_list (singly-linked list). These two
containers do what one would expect — there aren’t very many ways one can
implement a linked list.
They aren’t very cache-friendly and use many allocations (one allocation per element in the list), but they allow insertion and removal anywhere in the list with constant time complexity. They are also address stable.
Associative containers map some key type to some mapped type (which may be the unit type).
Ordered associative containers keep keys ordered. The standard does not explicitly state what data structure should be used to implement them, but instead states some operations and time complexity requirements, which effectively “forces” implementations to use a particular data structure. Some of the requirements are:
These requirements effectively means that implementations need to use some kind of balanced binary search tree (GCC uses red-black trees).
There are four ordered associative containers:
std::set,std::map,std::multiset, andstd::multimap.
In std::set and std::multiset, keys do not have
an associated mapped type; while in std::map and
std::multiset, you specify the associated value type in a
separate template parameter.
For example,
std::set<int>
is a set of int, while
std::map<std::string,
double>
is a map from std::string (the key type) to
double (the
mapped type). In std::set and std::map keys must
be unique, but in std::multiset and
std::multimap there can be duplicate keys. These are
relatively minor implementation differences, and all four containers share
a similar tree structure similar to this:
The standard requires that all operations are address stable, which
effectively means that each node must be in its own heap allocation, which
is essentially a std::pair<K, V> with some additional
bookkeeping information. (We’ll see how having elements be stored as
std::pair<K, V> comes in useful later when we look at
traversing containers.)
As the ordered associative containers need some way to compare whether one
key is smaller than another one, they take in an optional additional
template parameter for the comparator function object, just like
std::priority_queue, in case we want a custom comparator
instead of
operator<.
Unordered associative containers are essentially hash tables, and they provide amortised O(1) insertion, search by key, and deletion, as well as amortised O(N) traversal of the entire container (in any arbitrary order).
Analogous to the ordered versions, there are four unordered associative containers:
std::unordered_set,std::unordered_map,std::unordered_multiset, andstd::unordered_multimap.One peculiarity of the unordered associative containers is that the standard requires them to be address stable, just like the ordered associative containers. Why is this peculiar? The typical implementation of a hash table is an array of key-value pairs (using open addressing), but that isn’t address stable (growing the underlying array will require us to move all the elements). Ensuring that the hash table is address stable introduces a significant amount of overhead — we have to maintain a linked list for every hash (i.e. every bucket), and each element is in its own linked list node.
If you take a look at the C++ Reference page of
std::unordered_map, you will notice that the class
declaration has two additional Hash and
KeyEqual template parameters that have defaults. These are
function objects that tell the container how to hash a key —
Hash is used to hash a key, and KeyEqual is used
to compare if two keys are equal (which is needed because two keys that
hash to the same hash may not be equal). Hash defaults to
std::hash (which you can specialise for your key type, if the
key type has a “natural” hash), and KeyEqual defaults to
operator==.
Abseil is a third-party general-purpose C++ library that has better replacements for some things in the C++ standard library. While not part of the standard library, it is very well known and many projects use Abseil to replace various parts of the standard library.
For ordered associative containers, Abseil provides the replacements:
abseil::btree_map,abseil::btree_set,abseil::btree_multimap, andabseil::btree_multiset,which are more efficient because they use B-trees instead of binary trees.
For unordered associative containers, Abseil has:
abseil::flat_hash_map andabseil::flat_hash_setthat stores elements directly in the array and uses open addressing, and
abseil::node_hash_map andabseil::node_hash_setfor when address stability is desired.
operator[]
All the maps come with an
operator[], so we can write something like my_map[x] for some key
x.
While this operation looks benign,
operator[]
is actually non-const — it might modify the current map.
operator[]
will add a new element to the map if it does not exist yet, and then
return a reference to it. This allows code to both assign to and read from
my_map[x]:
std::map<int, double> my_map;
my_map[100] = 3.14159; // inserting an element
my_map[150] = 2.71828; // inserting an element
std::cout << my_map[100] << std::endl; // reading an element
my_map[150] = 5.5; // replacing an element
// not just reading an element -- this also inserts
// a new element with key 123
std::cout << my_map[123] << std::endl;
operator[]
In order to add a new element to the map,
operator[]
will try to default construct the element. This means that if your type is
not default constructible, then you will get an annoying error message.
#include <string>
#include <unordered_map>
struct from_kmh_t {};
struct from_mph_t {};
struct Speed {
double kmh;
double mph;
Speed(from_kmh_t, double kmh) : kmh(kmh), mph(kmh * 0.6213712) {}
Speed(from_mph_t, double kmh) : kmh(kmh), mph(kmh * 0.6213712) {}
};
using CarMake = std::string;
using Benchmark = std::unordered_map<CarMake, Speed>;
using PairType = Benchmark::value_type;
int main() {
Benchmark top_speeds;
// These work as expected
top_speeds.insert( //
PairType{CarMake{"lamborghini urus"}, //
Speed{from_kmh_t{}, 2.0}}); //
top_speeds.insert( //
PairType{CarMake{"proton saga"}, //
Speed{from_kmh_t{}, 1000000.0}}); //
// ... but these will break!
top_speeds[CarMake{"lamborghini urus"}] = Speed{from_kmh_t{}, 2.0};
top_speeds[CarMake{"proton saga"}] = Speed{from_kmh_t{}, 1000000.0};
}
$ make main-broken.target
clang++ -g -std=c++20 -Wpedantic -Wall -Wextra -Wconversion -Wno-unused-variable -Wno-unused-but-set-variable -c -MMD -o main-broken.o main-broken.cpp
In file included from main-broken.cpp:2:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/unordered_map:46:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable.h:35:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable_policy.h:34:
/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/tuple:1674:9: error: no matching constructor for initialization of 'Speed'
second(std::forward<_Args2>(std::get<_Indexes2>(__tuple2))...)
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/tuple:1661:9: note: in instantiation of function template specialization 'std::pair<const std::basic_string<char>, Speed>::pair<std::basic_string<char> &&, 0UL>' requested here
: pair(__first, __second,
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/new_allocator.h:146:23: note: in instantiation of function template specialization 'std::pair<const std::basic_string<char>, Speed>::pair<std::basic_string<char> &&>' requested here
{ ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/alloc_traits.h:483:8: note: in instantiation of function template specialization '__gnu_cxx::new_allocator<std::__detail::_Hash_node<std::pair<const std::basic_string<char>, Speed>, true>>::construct<std::pair<const std::basic_string<char>, Speed>, const std::piecewise_construct_t &, std::tuple<std::basic_string<char> &&>, std::tuple<>>' requested here
{ __a.construct(__p, std::forward<_Args>(__args)...); }
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable_policy.h:2086:27: note: in instantiation of function template specialization 'std::allocator_traits<std::allocator<std::__detail::_Hash_node<std::pair<const std::basic_string<char>, Speed>, true>>>::construct<std::pair<const std::basic_string<char>, Speed>, const std::piecewise_construct_t &, std::tuple<std::basic_string<char> &&>, std::tuple<>>' requested here
__node_alloc_traits::construct(_M_node_allocator(),
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/hashtable_policy.h:726:15: note: in instantiation of function template specialization 'std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<const std::basic_string<char>, Speed>, true>>>::_M_allocate_node<const std::piecewise_construct_t &, std::tuple<std::basic_string<char> &&>, std::tuple<>>' requested here
__p = __h->_M_allocate_node(std::piecewise_construct,
^
/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unordered_map.h:990:16: note: in instantiation of member function 'std::__detail::_Map_base<std::basic_string<char>, std::pair<const std::basic_string<char>, Speed>, std::allocator<std::pair<const std::basic_string<char>, Speed>>, std::__detail::_Select1st, std::equal_to<std::basic_string<char>>, std::hash<std::string>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::operator[]' requested here
{ return _M_h[std::move(__k)]; }
^
main-broken.cpp:30:13: note: in instantiation of member function 'std::unordered_map<std::basic_string<char>, Speed>::operator[]' requested here
top_speeds[CarMake{"lamborghini urus"}] = Speed{from_kmh_t{}, 2.0};
^
main-broken.cpp:7:8: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct Speed {
^
main-broken.cpp:7:8: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
main-broken.cpp:10:3: note: candidate constructor not viable: requires 2 arguments, but 0 were provided
Speed(from_kmh_t, double kmh) : kmh(kmh), mph(kmh * 0.6213712) {}
^
main-broken.cpp:11:3: note: candidate constructor not viable: requires 2 arguments, but 0 were provided
Speed(from_mph_t, double kmh) : kmh(kmh), mph(kmh * 0.6213712) {}
^
1 error generated.
make: *** [Makefile:32: main-broken.o] Error 1
If we comment out the lines that use
operator[], it compiles.
$ make main.target
clang++ -g -std=c++20 -Wpedantic -Wall -Wextra -Wconversion -Wno-unused-variable -Wno-unused-but-set-variable -c -MMD -o main.o main.cpp
clang++ -g -std=c++20 -Wpedantic -Wall -Wextra -Wconversion -Wno-unused-variable -Wno-unused-but-set-variable main.o -o main.out
clang++ -O3 -g -std=c++20 -Wpedantic -Wall -Wextra -Wconversion -Wno-unused-variable -Wno-unused-but-set-variable -c -MMD -o main.O3.o main.cpp
clang++ -O3 -g -std=c++20 -Wpedantic -Wall -Wextra -Wconversion -Wno-unused-variable -Wno-unused-but-set-variable main.O3.o -o main.O3.out
When given a raw array, there are two typical ways to traverse it:
int arr[5] = {1, 2, 3, 4, 5};
for (size_t i = 0; i != 5; ++i) {
std::cout << arr[i] << std::endl;
}
int arr[5] = {1, 2, 3, 4, 5};
int* begin = arr;
int* end = arr + 5;
for (int* ptr = begin; ptr != end; ++ptr) {
std::cout << *ptr << std::endl;
}
The first way iterates using indices, and the second way iterates using pointers. Both ways are equivalent for arrays, but when using indices, we assume that it is efficient to access an element using an index, whereas for pointers we only need to be able to access the next element from the current one.
C++ generalises this concept of iterating an array using pointers.
We want to be able to do something like this (auto
tells the compiler to infer the type — we will cover the specifics of
auto in a
future lecture):
Container container = /* stuff */;
auto begin = container.begin();
auto end = container.end();
for (auto ptr = begin; ptr != end; ++ptr) {
std::cout << *ptr << std::endl;
}
In other words, we want to design some type that
container.begin() and container.end() returns,
and this type must act like a pointer (it must at least overload
operator++
and
operator*) — this type is called an iterator. In general, we want this type to
feel as much like pointers as possible, and so if possible, we would like
to overload more than just
operator++
and
operator*
(we’ll cover this in the next lecture).
Each container usually implements its own iterator type, distinct from
other containers, usually exposed as a member type (e.g. vector<int>::iterator). This is because the for each type
operator++
and
operator*
almost always needs to do something different.
For the contiguous containers, iterators are equivalent to pointers — it
is possible to define
vector<T>::iterator
and
array<T>::iterator
to be type aliases for
T*. (In practice standard library implementations still write a separate
iterator class for them, to give better error messages and type safety.)
std::deque<T>::iterator
is slightly more complicated — it needs to hold a pointer to both the
inner and outer containers.
Think about how iterators for the other containers can be implemented. There may be multiple reasonable ways to implement some of them.
The usual way to iterate a container looks like this:
Container container = /* stuff */;
for (auto it = container.begin(); it != container.end(); ++it) {
std::cout << *it << std::endl;
}
As container.end() is usually a trivial function, the
compiler is generally clever enough to avoid needing to recompute it after
every iteration.
From C++11, this can be written more concisely in a range-based for loop:
Container container = /* stuff */;
for (const auto& x : container) {
std::cout << x << std::endl;
}
As the associative containers contain elements that are pairs, we can do this:
std::map<std::string, int> my_map = /* stuff */;
for (const std::pair<std::string, int>& elem : my_map) {
std::cout << "key: " << elem.first << ", value = " << elem.second
<< std::endl;
}
Or in C++17, with structured bindings:
std::map<std::string, int> my_map = /* stuff */;
for (const auto& [key, value] : my_map) {
std::cout << "key: " << key << ", value = " << value << std::endl;
}
Previously, we discussed about address stability, which is whether elements might be moved during an operation. Another term for this is reference (or pointer) invalidation, because references to an element become invalid when that element is moved.
As iterators are “fancy pointers”, there are similarly situations where they may be invalidated. Iterator invalidation often happens in similar situations to reference invalidation, but may not always be the case. For example, deque iterators may be invalidated when elements are added to the deque, even though no existing elements are moved.
This StackOverflow question maintains a good list of iterator and reference invalidation rules for C++03 and C++11 (which has the same rules as C++20).
Apart from being used to loop through the elements of a container, an iterator is often used as a “handle” to some element in the container. In all the standard library containers, there are many member functions that accept or return an iterator.
For example,
std::map<T>::find
searches for a key in the binary search tree, and returns an iterator to
the found element (if any):
std::map<int, int> my_map = /* stuff */;
if (auto it = my_map.find(100); it != my_map.end()) {
std::cout << "Found an element, 100 maps to " << it->second
<< std::endl;
} else {
std::cout << "Not found" << std::endl;
}
In the above example, the keys are
ints, which is a trivial type. However, if the keys were say
std::string, we could be unnecessarily constructing a
temporary string to pass to the find member function, which
will make a heap allocation. We want to eliminate that unnecessary
allocation.
Recall that std::map and the other ordered associative
containers figure out the relative order of two keys using the
Compare type parameter. The term “heterogeneous comparison”
means a comparison function (like Compare) that allows the
comparison of two distinct types — in this case, we want to compare a
std::string with something that acts like a string but
doesn’t own the buffer, for example
const char*
or std::string_view.
To enable heterogeneous comparison on the container, we need to give it
a Compare parameter that is able to perform heterogeneous
comparison — the heterogeneous comparator for
operator<
is
std::less<void>
(the void is
the default template argument, so we can also write
std::less<>), which is a specialization of
std::less. We thus write this:
std::map<std::string, int> my_map = /* stuff */;
if (auto it = my_map.find("hello"); it != my_map.end()) {
std::cout << "Found an element, \"hello\" maps to " << it->second
<< std::endl;
} else {
std::cout << "Not found" << std::endl;
}
C++17 introduced Class Template Argument Deduction (CTAD), which
allows you to write code like
std::vector xs
{1, 2, 3, 4}
and have the compiler deduce the template argument (here,
int) for you.↩︎
Note that you can also use the
class
keyword to the same effect, but for the rest of this lecture we’ll
stick to using
typename.↩︎
Plain old data types, or in C++20, trivial types.↩︎
© 24 June 2022, Ng Zhia Yang, Bernard Teo Zhi Yi, All Rights Reserved
^