2015 m. balandžio 3 d., penktadienis

C++ inheritance explained (Part II)

This is the second part, explaining how inheritance works in C++ under the hood.
If you haven't read the first part, I recommend to have at least a quick look, as it clarifies the approach I'm taking. You can find first part here.
In this part I'll explain probably the most feared feature in C++ - multiple inheritance.

Multiple inheritance with simple base classes

Let's go for simple, but yet famous diamond problem:

class CommonBase
{
public:
  int m_some_int;
  void set_int(int x) { m_some_int = x; }
};
class DerivedOne : public CommonBase
{
public:
  float m_some_float;
  void foo(){}
};
class DerivedTwo : public CommonBase
{
public:
  bool m_some_bool;
  void bar() {}
};
class DerivedMultiple : public DerivedOne, public DerivedTwo
{
public:
  void baz(){}
  void set_int(int x) { DerivedOne::set_int(x); }
};

This one is tricky. Let's list only resulting structs here first:

struct CommonBase
{
  int m_some_int;
};
struct DerivedOne
{
  CommonBase _parent;
  float m_some_float;
};
struct DerivedTwo
{
  CommonBase _parent;
  bool m_some_bool;
};
struct DerivedMultiple
{
  DerivedOne _parent1;
  DerivedTwo _parent2;
};
Pay attention to DerivedMultiple struct. To make it clear, let's expand it's parents inline:

struct DerivedMultiple
{
  /* DerivedOne _parent1; */
  CommonBase _parent1_parent;
  float m_some_float;

  /* DerivedTwo _parent2; */
  CommonBase _parent2_parent;
  bool m_some_bool;
};
As you can see, DerivedMultiple has two copies of CommonBase! Another thing to note is that DerivedTwo does not start at offset 0!
That immediately raises two questions:
  1. Which CommonBase is used, when needed?
  2. How do we call DerivedTwo::bar() on a DerivedMultiple object?
Answers get clear when we translate the calling code:

/* DerivedMultiple object; */
object.foo();
object.baz();
object.bar();

results in:

/* DerivedMultiple object; */
CommonBase_foo(&object);
DerivedMultiple_baz(&object);
DerivedTwo_bar(&object._parent2);  /* <--- PAY ATTENTION */

As you can see, when we call method, that comes from DerivedTwo, we don't pass in pointer to our object as first argument! Instead, we pass in pointer to the subobject part, where the DerivedTwo part is located!
But now we have another question: what if we call foo() from inside bar()? How is CommonBase resolved when we have pointer pointing somewhere inside DerivedMultiple object?

Let's demystify it with this example:

/* DerivedMultiple object; */
DerivedOne *derived1 = &object;
DerivedTwo *derived2 = &object;
derived1->foo();
derived2->foo();

This code translates into the following:

/* DerivedMultiple object; */
DerivedOne *derived1 = &object;
DerivedTwo *derived2 = &object._parent2;
CommonBase_foo(derived1);
CommonBase_foo(derived2);  /* but, THEY LOOK THE SAME? */
OK, so this gives us two puzzles. First, when assigning pointer to DerivedMultiple to a pointer to DerivedTwo, the pointer is automatically shifted to the subobject part! Second, and the most important, THERE IS NOTHING SPECIAL DONE TO RESOLVE CommonBase!
Yes, that right - the two calls will access the different CommonBase subobject inside DerivedMultiple!

Let illustrate it with numbers:

/* DerivedMultiple object; */
object.set_int(5);
DerivedTwo *two = &object;
two->set_int(6);
int five = object.DerivedOne::m_some_int;
int six = object.DerivedTwo::m_some_int;

Calling set_int() on main object and DerivedTwo will set different m_some_int fields.
Now you know, that multiple inheritance is hated for a reason!

Multiple inheritance involving virtual base class

Take this hierarchy:

class SimpleBase
{
public:
  int m_some_int;
};
class VirtualBase
{
public:
  float m_some_float;
  virtual void foo() {}
};
class Derived : public SimpleBase, public VirtualBase
{
};

The resulting structs are:

struct SimpleBase
{
  int m_some_int;
};
struct VirtualBase
{
  void *_vtable;
  float m_some_float;
};
struct Derived
{
  void *_vtable;
  SimpleBase _parent1;
  VirtualBase _parent2;
};

To make it clear, let's expand parents directly inside:

struct Derived
{
  void *_vtable;
  /* SimpleBase _parent1; */
  int m_some_int;

  /* VirtualBase _parent2; */
  void *_parent2_vtable;
  float m_some_float;
};

As you see, the only changes are related to VTable. There are few possible scenarios:

  • if the first base class is polymorphic, it's VTable can be reused (no need to add such field at the beginning)
  • there might be several pointers to VTable inside class (some can be inherited)
  • every pointer to VTable can point to the same or different place (this is entirely up to compiler)

Method overrides in multiple inheritance

Things get further complicated when we override methods in a class with more than one base.
Let's take this example:

class Base
{
public:
  int m_base_int;
};
class SimpleDerived : public Base
{
public:
  int m_simple_derived_int;
};
class VirtualDerived : public Base
{
public:
  int m_virtual_derived_int;
  virtual void set_int(int x) { m_virtual_derived_int = x; }
};
class Multiple : public SimpleDerived, public VirtualDerived
{
public:
  virtual void set_int(int x) override { m_simple_derived_int = x; }
};

You already have the idea, how resulting struct look like, so I won't bother pasting them here. Let's execute this code:

/* Multiple object; */
object.set_int(5);
VirtualDerived *vd = &object;
vd->set_int(8);   /* <--- HOW DOES THIS ONE WORK? */

When translated to C it looks like this:

/* Multiple object; */
_get_method_address(object._vtable, set_int)(&object, 5);   /* no magic here */

VirtualDerived *vd = &object._parent2;   /* this one familliar too */
_get_method_address(vd->_vtable, set_int)(_get_object_address_for_func(vd, set_int), 8);

Looks like I've been lying to you a bit, when explaining how method overrides work :)
What you see here happening is:

  • method address is obtained from VTable as usual
  • because object we have a pointer to can be involved in multiple inheritance, we can not pass pointer to it to a function - what if we have a pointer to some subobject inside, while method expects a pointer to actual object?
  • before pointer is passed to function, it goes through some compiler function, that looks to VTable and returns us a valid address to pass to function (not necessairy to beginning of real object)
  • function is used all the time when object address is passed to a virtual method, because we never know, what types are derived from given class, the tree can have a very complicated mixture of single and multiple inheritance

Hints for safe use of multiple-inheritance

  • try to use only single inheritance and interfaces; in C++ interface would be a class, that has nothing but statics and pure-virtual methods
  • the biggest problems come from classes with fields and non-virtual methods; try to achieve, that non-first base class has none
  • make non-primary base classes as trivial as possible (ideally interfaces), best top level classes (not derived from anything)
  • avoid diamond, use virtual inheritance once noticed
  • be very very careful

Stay tuned for part III, which will have another complicated aspect - virtual inheritance!

2015 m. kovo 26 d., ketvirtadienis

C++ inheritance explained (Part I)

Inheritance in C++ is one of most complex forms of inheritance there is. Understanding, how it works and what hidden features are involved is useful (if not required) to not mess things up.
I'll try to explain it all in detail by examples.
Before we start, there are few things to note:
  • Visibility (both member and inheritance) has no effect, so everything in all examples is public
  • The C++ code will translated to C code to reveal, what is done automatically by compiler
  • The "C++ compiler" is an imaginary one, in attempt to make things simple and clear
  • Namespaces and name mangling are ignored for simplicity (have no impact on inheritance)

The simple inheritance

Let's start with the most trivial example:

class SimpleBase
{
public:
  int m_some_int;
  void foo(int x) {}
};
class SimpleDerived : public SimpleBase
{
public:
  float m_some_float;
  void bar() {}
};

When compiled, compiler turns it into something like this:

struct SimpleBase
{
  int m_some_int;
};
void SimpleBase_foo(SimpleBase *_this, int x) {}
struct SimpleDerived
{
  SimpleBase _parent;
  float m_some_float;
};
void SimpleDerive_bar(SimpleDerived *_this) {}

How it works:

  • top level class becomes a struct with member variables matching those of the class
  • methods become functions, that take a pointer (in C++ it's actually a reference) to a corresponding struct as first argument, with other arguments being that of the original method
  • inheritance places parent struct as first member, it has an offset 0, so derived class can be casted to a base one (this is done automatically by compiler)
  • special methods (like constructor) and overloaded operators are also methods and are turned into similar functions
  • static methods are just functions, in this case class serves simply as a namespace plus has some visibility related features

Simple inheritance with polymorphic base class

Let's change the example so that base class is polymorphic:

class PolyBase
{
public:
  virtual void foo() {}
  void bar() {}
  int m_some_int;
};
class SimpleDerivedFromPoly : public PolyBase
{
  virtual void foo() override {}
  void bar() {}
  float m_some_float;
};

In this case compiler turns base class it into something like this:

struct PolyBase
{
  void *_vtable;
  int m_some_int;
};
void PolyBase_foo(PolyBase *_this) {}
void PolyBase_bar(PolyBase *_this) {}

What we see different from simple inheritance is that there is something called _vtable as first member (compiler is free to place it anywhere, but it is usual to place it as first member).
Another thing that changes significantly is how methods are called. Let's take C++ code:

/* PolyBase object; */
object.bar();
object.foo();

Compiler translates it to something like this:

/* PolyBase object; */
PolyBase_bar(&object);
_get_method_address(object._vtable, foo)(&object);

The difference you see here is:

  • non-virtual method is a simple function call
  • virtual method call involves so called vtable-lookup: address of method foo is found in vtable (somehow, this is up to compiler), the address is a pointer to function that is called.
The derived class is compiled into:

struct SimpleDerivedFromPoly
{
  PolyBase _parent;
  float m_some_float;
};
void SimpleDerivedFromPoly_foo(SimpleDerivedFromPoly *_this) {}
void SimpleDerivedFromPoly_bar(SimpleDerivedFromPoly *_this) {}

Nothing particular here. Let's see how method calls look like:

/* SimpleDerivedFromPoly object */
object.bar();
object.foo();

becomes:

/* SimpleDerivedFromPoly object */
SimpleDerivedFromPoly_bar(&object);
_get_method_address(object._parent._vtable, foo)(&object);

Notes:

  • non-virtual method calls the method from child class
  • virtual method call is no different at all (except that _vtable is inside _parent)
  • the actual value of _vtable is different for an object of every class, that's how the correct method is found

Simple inheritance with polymorphism added in derived class

Unlike in previous example, this time let's have simple base class and polymorphic derived:

class SimpleBase
{
public:
  int m_some_int;
  void foo() {}
};
class PolymorphicDerived : public SimpleBase
{
public:
  bool m_some_bool;
  virtual void bar() {}
};

This results in the following structs:

struct SimpleBase
{
  int m_some_int;
};
struct PolymorphicDerived
{
  void *_vtable;
  SimpleBase _parent;
  bool m_some_bool;
};

As you see, now things get a bit more complicated, because a pointer to VTable is prepended before parent (it could place it after it too, but I'm placing it this way, because it will make it easier to understand multiple inheritance later)! Let's see how it works!

/* PolymorphicDerived object; */
SimpleBase *base = &object;
base->foo();
object.foo();

When traslated to C, it results in the following. I'll split it to explain individual parts.

/* PolymorphicDerived object; */
SimpleBase *base = &object._parent;
SimpleBase_foo(base);

So, as you see, when assigning to base the pointer is automatically shifted by compiler to point to parent! The pointer to base class actually points not to object, but inside it. This enables the simple method call as it would be if had an object of SimpleBase. Calling inherited method from the real object is also different:

SimpleBase_foo(&object._parent);

in this call not the object is passed as parameter, but the subobject of relevant type.

Last thing to note is downcasting:

/* SimpleBase *base; */
PolymorphicDerived *derived = static_cast<PolymorphicDerived*>(base);
Results in something like this:

/* SimpleBase *base; */
PolymorphicDerived *derived = (PolymorphicDerived*) ((void*)base) - sizeof(void*);

The exact opposite must happen, the pointer must be moved back by pointer size, in order to point at the beginning!
This involves the following things behind scenes:

  • Casting PolymorphicDerived* to void* and then to SimpleBase* will result in invalid pointer (not pointing to subobject part)!
  • Comparing PolymorphicDerived* to SimpleBase* using == or != operator will move one of them (either) by pointer size before comparison

End of Part I. Stay tuned for the next part which will involve one of the scary parts in C++: multiple inheritance!

2015 m. sausio 13 d., antradienis

Basic rules of automated software testing

There are a lot of posts about what should or shouldn't be done when writing automated tests for software. Below is my list.

Tests are written for others first, only then for yourself

There are only few cases, where tests are written to check whether the code works. In most cases writing test is not the most efficient way to check. Instead, tests are written primarily to catch regressions when unrelated changes are made. Since it's quite easy to break something you don't know about, tests are written to prevent others to break something they possibly don't even consider.
As such, claims "I don't need tests" are pretty much void if there is more than one developer on the project.

Test that hasn't failed at least once has not been proven to test anything

Should be obvious, but if the test has never been red, how do you know that it actually tests something? Maybe the test is simply always green and will not catch any bug!
Tests are just like any other code, they have to be verified. The simple way to do this is to introduce a bug in the code and run the test, to see if it fails.

When code evolves, test suite should evolve with it

The only case when code changes not necessarily cause test changes is refactoring. In all other cases if code changes, but tests don't, the new features are not covered, so the test suite degrades.
Test suite is relevant only if it is kept in sync with code it tests.

Tests should test the smallest possible feature set

This is easier said than done. Isolating different features from one another can be difficult. Testing features is only one part. Ideally it should be easy to find, what got broken when test fails. If test depends on multiple features at the same time, failure of that test shows, that one of these features is broken, but not always tells which one.
There are two solutions here:
  • Make tests depend only on one feature, so that test failure indicates a problem with that feature
  • Order tests accordingly. If test depends on 3 different features, but 2 of them have been thoroughly tested before, a failure is likely caused by the third
First one of these two options is preferred.

Testing against mocks is inadequate

It is a popular suggestion by unit-tests proponents to mock everything to achieve single feature isolation.
There is a pitfall in doing so: mocks are never a real thing! Testing against mocks only proves that code works with these mocks! There is almost no software in the world, that is 100% compliant with standards they support or even their own documentation (assuming that software has been around for a while). Yet, people for some reason think, that it is possible to keep mocks 100% identical in behavior to the real thing they imitate.
The real implementations evolve over time and mocks have to kept in sync. Sooner or later they diverge. For this reason integration and end-to-end tests are required, in order to make sure the code works with actual real implementations.

Unit testing is inadequate

Should be obvious: end users don't care, whether your tests are green or not. They care, whether software works or not. Unit tests can prove that individual components work, but they say nothing about the behavior, when those components are put together.

It's not important, whether code or test is written first

TDD zealots claim otherwise and they are wrong. While TDD can increase the coverage and quality of tests, it does not guarantee that!
What matters in the end is the quality of the code and the quality of the tests. When both are good, no one cares, in what order they were produced.

Only testing your own code is risky

in part this is another argument against TDD...
One of the reasons for doing code review is that it is hard to spot problems in your own code. The same applies to tests - when else drills your code besides yourself, the chances of catching bugs increases.

Only testing the correct behavior is inadequate

One common mistake made in automated tests is only verifying that code behaves correctly under correct conditions. For complete testing the opposite should also be tested: code should give errors under incorrect conditions.

Do you have any additional rules to add?

2014 m. rugsėjo 8 d., pirmadienis

Bakcward compatibility sins

Maintaining backward compatibility is more of most important values for every software library, tool or system used by other systems via it's API. However, as system evolves, maintaining compatibility gets harder and sometimes it's not possible to improve it in a desired way, because that would mean breaking compatibility. At those points a tough decision has to be made: maintain compatibility or break it.
The list bellow is not complete by any means, but it shows few examples where I doubt that being backward compatible was the right decision. I also add what I think was the right decision and what we can learn from mistakes made.

  1. WinMain [dead parameter in rarely used function]

    In Windows API, the programs entry point is as follows:
    int CALLBACK WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow);
    Note the second argument: it is always NULL!
    The idea is, that this argument had a meaning in 16-bit Windows, but that was completely removed in 32-bit Windows. So, this parameter is in effect meaningless and is here just for backwards compatibility. While that seem to make sense at first, consider that Win16 and Win32 are not entirely compatible! Applications had to be migrated from one to another. And application has exactly one WinMain.
    As a consequence, what you see here is short term backward compatibility (Win16 died quite soon after Win32 appeared) at the cost of long term API pollution. All for something as trivial as application entry point (that can be solved via preprocessor macro).

  2. WPARAM ["hungarian compatibility"]

    In Windows the signature for Window Procedure is like this:
    LRESULT WINAPI DefWindowProc(HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam);
    and the parameter in question is wParam, it's datatype to be exact.
    The point here is that "W" stand for "WORD" in both datatype and parameter name. This was true in Win16 (WORD=2 bytes), but not anymore since Win32 (WORD datatype still is 2 bytes, but WPARAM now is 4 bytes).
    There are two issues apparent here:

    • Hungarian notation is a bad idea and one of the reasons is in front of you (if you haven't noticed - parameter name LIES to you)
    • Generic datatypes (like int) are redefined as something else to make it possible to change them later. This was done in this case too, except that the name encoded the original type before and LIES to us now.
  3. Double.parseDouble(null) vs. Integer.parseInt(null) [bug-compatibility]

    In Java:
    Integer.parseInt(null) // throws NumberFormatException
    Double.parseDouble(null) // throws NullPointerException

    This inconsistency originated from old versions of Java and is kept here for backward compatibility. This is documented behavior, so it's "a feature".
    What this actually is, is called bug-compatibility. The funniest part is that both these methods can throw NumberFormatException, so a fix is quite simple and hardly will break badly something. I mean, if you handle exception properly, it will just work, otherwise you probably have a quite buggy system, one more or less doesn't make a difference...
    Most importantly, these two are very old. Double.parseDouble() dates to Java 1.2, no such number on the other, but probably around the same. YOU REALLY REALLY COULD HAVE FIXED THIS BACK THEN! Instead, Sun maintained backward, sorry, bug-compatibility, just to see the bug getting harder to fix later.

  4. Java generics vs. C# generics [focus on past, not on future]

    Both languages fell into idea of object collections just to find out, that they destroyed a lot of type safety for more verbose coding (lose-lose situation, that is). How did they add generics later, without breaking languages backward compatibility?
    Java went the hard way by turning existing non-generic into generic. They faced three issues:

    • both non-generic and generic collections should be available (so that old code still compiles)
    • convertibility between generic and old non-generic variants (mix old and new code)
    • behavior compatibility (non-generic accepts anything, generic is limited)
    It was easy and correct to default to Object for non-generic collections. Problems arised for generic collections that are more specific than Object. Solution was to make generic argument a syntactic sugar, only available at compile time, that is collection is still like it was before, just casting is auto-added by compiler. This was done, because old (existing) collections always accepted anything, so if an exception is introduced for non-compatible type, an existing code would be broken. Non-generic collection was made convertible to any generic collection (whatever the argument is). That in turn added two new issues:
    • what happens is non-generic is converted to a generic with incompatible argument?
    • how type-safety is controlled in new code, when under the hood is the old non-generic collection?
    Java creators went the easy way in both these cases: they accepted ClassCastException for first and completely forbid direct conversion of one generic collection to other (i.e. List<Integer> can't be directly casted to List<Number>).
    What went wrong here? Four issues:
    • a generic collection can be passed to old non-generic code which is free to insert anything, that will only explode in new code!
    • no generic type can still be casted to other one, even if arguments are compatible, you have to work that around via cast through non-generic collection
    • generic only exists at compile time, no runtime type checking exists
    • you can't forbid new stuff to be generic-only, it still can be used without generic arguments, where Object is assumed
    What they could do instead? Make collections aware of their generic argument and throw exception, when incompatible object is inserted. That would accomplish the following:
    • type safety of generic collection - it will simply never contain incompatible objects
    • casting of references would simply work, as the protection is there at runtime
    • passing generic collection to old code would reveal bugs (object of wrong type inserted) or show invalid assumptions about it ("oops, it's not String-only collection")
    Yes, this approach could break old code. But the alternative, that was chosen, made all new code suck. Looking forward, new code will slowly outnumber old and Java as a language will have inferior generics than it could!

    C# took different approach here. It simply added generics as something completely new, not compatible with old in any way. Not ideal, as interoperability between old and new code is troublesome. But looking forward, old code will die out. So IMO it's a better approach, that that of Java

  5. C++ compatibility with C [not quitting in time]

    So, C++ is designed to be compatible with C, that is "a valid C program is a valid C++ program", as they say... Well, not really for several reasons:

    • the compatibility is lost with the first new keyword introduced (*caugh* class *caugh*) - what used to be a valid identifier, now is not
    • C++ has different linkage because of name mangling, which makes it incompatible with C. Worse, now the C libraries are forced to add extern "C" markers under __cplusplus define, to make themselves compatible with C++
    • enums and structs have tag names in C, but these are real type names in C++
    What could they do? Well, actually they did the right thing, just for far too long. If C++ the goal was to completely replace C, it failed to to that. And it's long past the time to become an independent language and throw some old C junk away (well, you can introduce some constructs to access C from C++, we have so many of them, that few more doesn't really matter).
    What breaking ties with C would achieve:
    • string literals can become real std::string objects with their functionality (like concatenation using "+")
    • arrays can be std::array by default (being assignable is the first win)
    • a lot of standard C library could be wrapped by C++ function that would accept C++ types (imagine printf() accepting std::string)
    • forget extern "C", you could just have something like #cinclude for C headers
Lessons that can be learned

  • A compatibility break, that is almost guaranteed to have a very small impact, is worth to do (WinMain)
  • If you redefine some type via typedef, make new type more generic, so you can change it (i.e. "an integer of size, which is at least X")
  • Hungarian notation is a bad idea, full one is ten times so
  • Bugs should be fixed! A fix that breaks some small importance thing will give you few rants from people, who are the ones to ignore (I can't imagine good developer complaining about fixed bug, even if it broke something in his buggy code).
  • New code or file format will gradually outnumber old one by large margin, thus look forwards, not backwards
  • If you fail to maintain full compatibility, use it as opportunity to break for better future
  • The number of "breaks" doesn't matter, what matter is overall pain introduced by compatibility break. So, if you broke something important, making minor things compatible wont help much.
  • Bonus: it's not possible to maintain backward compatibility forever, plan break in advance and don't give false promises.

2014 m. liepos 14 d., pirmadienis

The lost war against duplicate code

From what I've seen so far, duplicate code is impossible to avoid in any large project. There are multiple reasons, how duplicate code is created and while it is typically assumed, that duplicate code is bad, this is not always the case.

Why duplicate code is bad

  • Duplicate bugs - it's obvious: if bug is discovered in code, the same bug exists everywhere the same code is used, thus there are many places to fix, instead of one
  • Hard to maintain - pretty much the same as previous, but more extended. In particular, you not only fix bugs, but also add features, optimizations and other improvements. Worse is that duplicate code diverges, making it harder to spot.

What "justifies" code duplication

  • Easier to maintain - while we claim the opposite, this one has some truth in it. By copying code written by someone else you are free to change it in any way you want. Changing common code is harder and often requires agreement across multiple involved parties. Bust: it looks so, but it makes code base larger, which in turn makes it harder to maintain.
  • More freedom to change - common code has to remain common, that is you can't add your specific features to it. The biggest problem with this is that it's an organizational issue: if code is duplicated to have more freedom to change it, it indicates a problem with management or company culture.
  • Faster to develop - everything, that requires involvement of multiple parties, takes more time to do. Bust: short term gain, you usually lose in the long run (unfortunately short term gains is what many manager only care about).

How duplicate code happens

  • Incompetence - it's sad, but there are a lot of bad developers. Many of the write code via copy-paste, and, as always, abusing copy-paste results in duplicates. This is what is often assumed when talking about duplicate code and yes, that is what we should fight.
  • Forgot to refactor - this is trickier. It's like the first one except that the developer is actually not bad. It's fine to use copy-paste in order to make things work. The problem is that you have to refactor at the end. Not forgetting to that is the hardest part... There is a gray area between this and the first one. Code review might be an answer to this one.
  • Too much trouble - sometimes avoiding code duplication is more trouble than worth. The place for common code might not exist! Create a library just for couple of functions? Don't forget, that this will bring entire maintenance hell for that library. Also there often is such thing as code ownership and shared code is owned by someone else. In short, we avoid code duplication to reduce problems, not to add new ones. When that is not the case, duplicating code can be acceptable.
  • Created naturally - it's not impossible that two developers might actually write almost identical code. In large projects with a lot of people this does happen and might take a while to find, that two guys of completely different teams wrote almost identical helper function.
So, to summarize, next time before blaming someone for incompetence, have a second thought.

2014 m. gegužės 4 d., sekmadienis

Exception handling is mostly a failure

In short: exceptions are good for system and critical errors (like out of memory). The simple and more expected error is, exception is less useful and more trouble.

Error handling is hard. Not doing it properly comes back with mysterious failures, where no one can understand was went wrong. Doing it properly is pain in the ass, mostly because it takes a lot of time to do a boring lot of coding, when stuff already works! Really, most of us probably just code the happy path first, prove it and then go on handling all the possible not so happy cases. This is generally the right thing to do - what's the point of handling the errors when you're not yet sure the solution is right?

Sinking among ifs

That's the general idea for exception handling. A typical example given to students is like this:
if(open_file()) {
  if(read_file()) {
    if(process_data()) {
      show_result();
    }
    else {
      error("Failed to process data");
    }
  }
  else {
    error("Failed to read file");
  }
  close_file();
}
else {
  error("Failed to open file");
}
The lines in bold are "good code". Everything else is there for error handling. It seems very nice to write all "good" function call one after another and move error handling code somewhere else - welcome try-catch!
try {
  open_file();
  read_file();
  process_data();
  show_result();
  close_file();
} catch(Exception e) {
  // determine and error message here
}
Nice, we have separated the happy path from error handling code, now it's easy to understand what code does!
The real life situations are not so nice...

There are different types of errors

  • Disasters: something that generally shouldn't happen, like hard disk crash. Some errors are so rare and so fatal, that it's pointless to try preparing for them.
  • Fatal errors: stuff that renders applications unusable, i.e. losing network connection is fatal for web application.
  • Expected mistakes: user haven't filled required fields? Specified file name contains invalid characters? such types of errors are predictable and applications should be ready for them.
  • Glitches: a string "15 " (trailing space) in 99% of cases is an integer number 15, dammit.
The interesting thing here is that exactly the same error can belong to different group depending on exact situation. Failure while writing to file can mean that primary hard disk has just crashed and in few seconds entire computer will be unusable, or it can just mean that user has unplugged the USB stick. Who said that failure to open file is fatal? No config - assume hard-coded defaults.

Opening file is so difficult

... So, we are opening a configuration file, that is not required to exist...
File *file = fopen(filename, "r");
Nice, NULL means it does not exist, otherwise it's something we can read!
What's the problem, you can write it the opposite way:
FileStream file = null;
if(File.Exists(filename)) {
  file = new FileStream(filename);
}
Does the same thing. Does it? Congratulations, you've just introduced full-moon bug! Files sometimes disappear, you know, get deleted. That can happen at any point in time, for example right between the existence check and opening... Fatal error, crash, or ... well, that file was never required to be there in the first place? So now code becomes:
FileStream file = null;
try {
  file = new FileStream(filename);
}
catch (Exception e) {
  // ignore
}
Wonderful, what used to be one line, now is... progress.

How badly you can blow?

C once again. You call a function and you expect it to return. Is this guaranteed? No! Application might die inside, but we don't care. longjmp() can be called, but we don't care again - unless we made it ourselves.
Let's "upgrade" to C++. What can happen now? Yes, exception can be thrown, and there are types of them! Worse: new types of thrown exception can be added in the future!
It's considered a good practice to only catch exceptions you do care about and let other populate up the call stack. That's fine, but what about the new types of exception that might be added in the future? It looks like someone didn't design for future...

Exception safety

There is an amazing thing about exception safety I still can't explain. C++ is a language that with it's standard library throws something extremely rarely. A topic called "exception safety" is part of it's books. When we come to Java and co., where exceptions are thrown here, there and everywhere, this is somehow forgotten...
obj.foo(x, y);
You can only guess, how foo works with x and y, but there's one thing most seem to assume - all or nothing. If exception is thrown out of foo(), you want the state of obj unchanged! Simple concept, but not so easy to get it right. Throw more exceptions and enjoy more full-moon bugs.

Exception specifications

This is something that pissed me off when I started learning Java. C++ has them too, but they are optional and no one seems to use them (except for standard library). Some even discourage it.
Looking at C#, they have thrown away specifications entirely.
Looking back at Java... ArrayIndexOutOfBoundsException, PersistenceException and multiple others are "unchecked" exceptions so you don't need to write them all over the place. Are the two I mentioned so "unexcpected"?

Conclusions

  • Exception handling works well with critical errors. Less serious the error is, less efficient exception handling is. For simple errors exceptions are more trouble.
  • Exceptions are designed to separate useful code from error handling code. When exception handling mechanisms appear inside of a nested code blocks, it's a first sign of exception misuse.
  • I also haven't mentioned, that exception are also expensive in terms of performance...

2014 m. kovo 31 d., pirmadienis

Darkest corners of C++

It is good to know language you are programming in.

Placement-new array offset

It turns out that on some compilers new[] might allocate an integer before an actual array even when placement-new is used:
void *mem = new char[sizeof(A) * n];
A *arr = new(mem) A[n];
Whether mem and arr will point at the same address depends on compiler and code. On GCC pointers I get the same pointers, but on Microsoft compiler, when A has a destructor, arr is mem + sizeof(int) or similar. While this mismatch might look harmless at first sight, it isn't - your array gets outside of allocated memory at the end!
Solution - cast pointer and manually loop over array creating each object individually via placement-new.

Pointer that changes

class Base {int x; };
class Derived : public Base {virtual void foo() {} };

Derived *d = new Derived;
Base *b = d;
Here b and d will not point to the same address. Comparing and casting them does the right thing, but if you cast them to void*, you'll see they're not equal! This is because Base is non-polymorphic (no virtual methods), while Derived is polymorphic. So, Derived object has a pointer to vtable at the beginning of it, followed by Base sub-object and then by it's own additional members.
Things get more funny when there are many classes in the hierarchy and multiple inheritance is involved.
Solution: well, don't cast pointers to objects into void*.

Return void

This code is valid:
void foo() {}
void bar() { return foo(); }
Useful, when writing templates.

Pure-virtual function with implementation

Pure-virtual function means that derived class must override it in order to create objects of it. But it does not mean that such method can not be implemented in the base class. The code bellow compiles and works:
class A
{
public:
  virtual void foo() = 0;
};

void A::foo() { std::cout << "A::foo called\n"; }

class B : public A
{
public:
  virtual void foo() override
  {
    A::foo();
    std::cout << "B::foo called\n";
  }
};
Note that it did not compile for me using GCC, when I tried to provide implementation for A::foo inline.

Function-try block

This is quite a tricky feature. Function-try block basically looks like this:
void foo()
try {
  throw int();
} catch(...) {
  std::cout << "Exception caught\n";
}
However, in this form there is no particular use for it. It's just a shorter way of wrapping entire function body in a try-catch.
The real use for this feature (which also works differently) is for constructors. First of all, when used for constructor, it does not really catch exception! It catches and rethrows them! The real use for it is to free resources allocated in initializer list:
class A
{
public:
  A(int x) { throw x; }
};

class B
{
  A a;
public:
  B()
    try
    : a(5)
    { } catch(...) {
      std::cout << "Exception in initializer merely-caught\n";
    }
};
In here exception is thrown in an initializer list. There is no way to catch it in a constructor itself. But the initializer list may be long and some initilizer can have resource allocations, like memory allocation. To free such resources, you have to use such function-try block for you constructor and free them in a catch block. Remember, that exceptions are rethrown here.

Bitfields

When defining a struct it is possible to specify variable sizes in bits:
struct Bitfields
{
  int i:16;
  int j:8;
  bool b:1;
  char c:7;
};
The size of this structure is 4 bytes (on my machine at least). Each variable in the struct takes as much bits as specified and can hold appropriate value range.

And, since this is about C++, there are definitly more :)