2015 m. balandžio 3 d., penktadienis

C++ inheritance explained (Part II)

This is the second part, explaining how inheritance works in C++ under the hood.
If you haven't read the first part, I recommend to have at least a quick look, as it clarifies the approach I'm taking. You can find first part here.
In this part I'll explain probably the most feared feature in C++ - multiple inheritance.

Multiple inheritance with simple base classes

Let's go for simple, but yet famous diamond problem:

class CommonBase
{
public:
  int m_some_int;
  void set_int(int x) { m_some_int = x; }
};
class DerivedOne : public CommonBase
{
public:
  float m_some_float;
  void foo(){}
};
class DerivedTwo : public CommonBase
{
public:
  bool m_some_bool;
  void bar() {}
};
class DerivedMultiple : public DerivedOne, public DerivedTwo
{
public:
  void baz(){}
  void set_int(int x) { DerivedOne::set_int(x); }
};

This one is tricky. Let's list only resulting structs here first:

struct CommonBase
{
  int m_some_int;
};
struct DerivedOne
{
  CommonBase _parent;
  float m_some_float;
};
struct DerivedTwo
{
  CommonBase _parent;
  bool m_some_bool;
};
struct DerivedMultiple
{
  DerivedOne _parent1;
  DerivedTwo _parent2;
};
Pay attention to DerivedMultiple struct. To make it clear, let's expand it's parents inline:

struct DerivedMultiple
{
  /* DerivedOne _parent1; */
  CommonBase _parent1_parent;
  float m_some_float;

  /* DerivedTwo _parent2; */
  CommonBase _parent2_parent;
  bool m_some_bool;
};
As you can see, DerivedMultiple has two copies of CommonBase! Another thing to note is that DerivedTwo does not start at offset 0!
That immediately raises two questions:
  1. Which CommonBase is used, when needed?
  2. How do we call DerivedTwo::bar() on a DerivedMultiple object?
Answers get clear when we translate the calling code:

/* DerivedMultiple object; */
object.foo();
object.baz();
object.bar();

results in:

/* DerivedMultiple object; */
CommonBase_foo(&object);
DerivedMultiple_baz(&object);
DerivedTwo_bar(&object._parent2);  /* <--- PAY ATTENTION */

As you can see, when we call method, that comes from DerivedTwo, we don't pass in pointer to our object as first argument! Instead, we pass in pointer to the subobject part, where the DerivedTwo part is located!
But now we have another question: what if we call foo() from inside bar()? How is CommonBase resolved when we have pointer pointing somewhere inside DerivedMultiple object?

Let's demystify it with this example:

/* DerivedMultiple object; */
DerivedOne *derived1 = &object;
DerivedTwo *derived2 = &object;
derived1->foo();
derived2->foo();

This code translates into the following:

/* DerivedMultiple object; */
DerivedOne *derived1 = &object;
DerivedTwo *derived2 = &object._parent2;
CommonBase_foo(derived1);
CommonBase_foo(derived2);  /* but, THEY LOOK THE SAME? */
OK, so this gives us two puzzles. First, when assigning pointer to DerivedMultiple to a pointer to DerivedTwo, the pointer is automatically shifted to the subobject part! Second, and the most important, THERE IS NOTHING SPECIAL DONE TO RESOLVE CommonBase!
Yes, that right - the two calls will access the different CommonBase subobject inside DerivedMultiple!

Let illustrate it with numbers:

/* DerivedMultiple object; */
object.set_int(5);
DerivedTwo *two = &object;
two->set_int(6);
int five = object.DerivedOne::m_some_int;
int six = object.DerivedTwo::m_some_int;

Calling set_int() on main object and DerivedTwo will set different m_some_int fields.
Now you know, that multiple inheritance is hated for a reason!

Multiple inheritance involving virtual base class

Take this hierarchy:

class SimpleBase
{
public:
  int m_some_int;
};
class VirtualBase
{
public:
  float m_some_float;
  virtual void foo() {}
};
class Derived : public SimpleBase, public VirtualBase
{
};

The resulting structs are:

struct SimpleBase
{
  int m_some_int;
};
struct VirtualBase
{
  void *_vtable;
  float m_some_float;
};
struct Derived
{
  void *_vtable;
  SimpleBase _parent1;
  VirtualBase _parent2;
};

To make it clear, let's expand parents directly inside:

struct Derived
{
  void *_vtable;
  /* SimpleBase _parent1; */
  int m_some_int;

  /* VirtualBase _parent2; */
  void *_parent2_vtable;
  float m_some_float;
};

As you see, the only changes are related to VTable. There are few possible scenarios:

  • if the first base class is polymorphic, it's VTable can be reused (no need to add such field at the beginning)
  • there might be several pointers to VTable inside class (some can be inherited)
  • every pointer to VTable can point to the same or different place (this is entirely up to compiler)

Method overrides in multiple inheritance

Things get further complicated when we override methods in a class with more than one base.
Let's take this example:

class Base
{
public:
  int m_base_int;
};
class SimpleDerived : public Base
{
public:
  int m_simple_derived_int;
};
class VirtualDerived : public Base
{
public:
  int m_virtual_derived_int;
  virtual void set_int(int x) { m_virtual_derived_int = x; }
};
class Multiple : public SimpleDerived, public VirtualDerived
{
public:
  virtual void set_int(int x) override { m_simple_derived_int = x; }
};

You already have the idea, how resulting struct look like, so I won't bother pasting them here. Let's execute this code:

/* Multiple object; */
object.set_int(5);
VirtualDerived *vd = &object;
vd->set_int(8);   /* <--- HOW DOES THIS ONE WORK? */

When translated to C it looks like this:

/* Multiple object; */
_get_method_address(object._vtable, set_int)(&object, 5);   /* no magic here */

VirtualDerived *vd = &object._parent2;   /* this one familliar too */
_get_method_address(vd->_vtable, set_int)(_get_object_address_for_func(vd, set_int), 8);

Looks like I've been lying to you a bit, when explaining how method overrides work :)
What you see here happening is:

  • method address is obtained from VTable as usual
  • because object we have a pointer to can be involved in multiple inheritance, we can not pass pointer to it to a function - what if we have a pointer to some subobject inside, while method expects a pointer to actual object?
  • before pointer is passed to function, it goes through some compiler function, that looks to VTable and returns us a valid address to pass to function (not necessairy to beginning of real object)
  • function is used all the time when object address is passed to a virtual method, because we never know, what types are derived from given class, the tree can have a very complicated mixture of single and multiple inheritance

Hints for safe use of multiple-inheritance

  • try to use only single inheritance and interfaces; in C++ interface would be a class, that has nothing but statics and pure-virtual methods
  • the biggest problems come from classes with fields and non-virtual methods; try to achieve, that non-first base class has none
  • make non-primary base classes as trivial as possible (ideally interfaces), best top level classes (not derived from anything)
  • avoid diamond, use virtual inheritance once noticed
  • be very very careful

Stay tuned for part III, which will have another complicated aspect - virtual inheritance!