2014 m. kovo 31 d., pirmadienis

Darkest corners of C++

It is good to know language you are programming in.

Placement-new array offset

It turns out that on some compilers new[] might allocate an integer before an actual array even when placement-new is used:
void *mem = new char[sizeof(A) * n];
A *arr = new(mem) A[n];
Whether mem and arr will point at the same address depends on compiler and code. On GCC pointers I get the same pointers, but on Microsoft compiler, when A has a destructor, arr is mem + sizeof(int) or similar. While this mismatch might look harmless at first sight, it isn't - your array gets outside of allocated memory at the end!
Solution - cast pointer and manually loop over array creating each object individually via placement-new.

Pointer that changes

class Base {int x; };
class Derived : public Base {virtual void foo() {} };

Derived *d = new Derived;
Base *b = d;
Here b and d will not point to the same address. Comparing and casting them does the right thing, but if you cast them to void*, you'll see they're not equal! This is because Base is non-polymorphic (no virtual methods), while Derived is polymorphic. So, Derived object has a pointer to vtable at the beginning of it, followed by Base sub-object and then by it's own additional members.
Things get more funny when there are many classes in the hierarchy and multiple inheritance is involved.
Solution: well, don't cast pointers to objects into void*.

Return void

This code is valid:
void foo() {}
void bar() { return foo(); }
Useful, when writing templates.

Pure-virtual function with implementation

Pure-virtual function means that derived class must override it in order to create objects of it. But it does not mean that such method can not be implemented in the base class. The code bellow compiles and works:
class A
{
public:
  virtual void foo() = 0;
};

void A::foo() { std::cout << "A::foo called\n"; }

class B : public A
{
public:
  virtual void foo() override
  {
    A::foo();
    std::cout << "B::foo called\n";
  }
};
Note that it did not compile for me using GCC, when I tried to provide implementation for A::foo inline.

Function-try block

This is quite a tricky feature. Function-try block basically looks like this:
void foo()
try {
  throw int();
} catch(...) {
  std::cout << "Exception caught\n";
}
However, in this form there is no particular use for it. It's just a shorter way of wrapping entire function body in a try-catch.
The real use for this feature (which also works differently) is for constructors. First of all, when used for constructor, it does not really catch exception! It catches and rethrows them! The real use for it is to free resources allocated in initializer list:
class A
{
public:
  A(int x) { throw x; }
};

class B
{
  A a;
public:
  B()
    try
    : a(5)
    { } catch(...) {
      std::cout << "Exception in initializer merely-caught\n";
    }
};
In here exception is thrown in an initializer list. There is no way to catch it in a constructor itself. But the initializer list may be long and some initilizer can have resource allocations, like memory allocation. To free such resources, you have to use such function-try block for you constructor and free them in a catch block. Remember, that exceptions are rethrown here.

Bitfields

When defining a struct it is possible to specify variable sizes in bits:
struct Bitfields
{
  int i:16;
  int j:8;
  bool b:1;
  char c:7;
};
The size of this structure is 4 bytes (on my machine at least). Each variable in the struct takes as much bits as specified and can hold appropriate value range.

And, since this is about C++, there are definitly more :)

2014 m. kovo 2 d., sekmadienis

On a quest for good coding standard

All coding standards suck, except mine!

Reasons for coding standard:

  • Readability - the primary goal of coding standard in organization is to make it easier for developers to understand the code. This asks for consistency, meaningful naming, good comments in the code. It has greater impact on newer developers, less familiar with the code base.
  • Code quality - it is an attempt to make it easier to spot bugs in code, as well to make reasoning behind decisions more obvious. It is expect for code to be easy to modify or fix, without introducing new issues.
Common mistakes
  • Rules, not guidelines. Rules must be followed, guidelines are less strict. Having strict rules everyone is required to follow sometimes actually plays against the initial intent for the standard itself: developers can't make code more readable, because they're required to follow the rule.
  • Consider this example:
    string sign_multiplier = x >= 0 ? 1 : -1;
    string sign_multiplier = x>=0 ? 1 : -1;
    Neither is very easy to read and can be written in more readable way. However, given the choice of two I strongly believe second being more readable. But hey, it breaks one of most common rules - spaces around operators!
  • Standard set to stone. Changing standard is not necessarily bad, it depends on what, how and why you change. Developers change, programming languages evolve, so should standard. Otherwise standard might forbid features, that weren't even there, when the standard itself was written.
  • Adopted standard. Standard should be an agreement among developers on how to write code. Just using someones standard can lead to situation when some rule in standard is hated by every developer. It is the same mistake, when standard is created by someone (who quite often doesn't even write code himself) and thrown upon everyone.
  • No or questionable reasoning. Every rule should have a clear reason, it's good for guidelines to have them too. For one thing, it helps to identify out of date items in the standard. It also can give standard some kind of "spirit", so that guidelines are not just followed or broken. Reasoning should avoid questionable arguments. I.e. what is readable for one person can be rubbish for other. If a rule/guideline was introduced by consensus or strong majority, it is good to state that.
  • Trying to solve unrelated problems. Sometimes people try to solve problems like compiler limitation or bug by introducing rule in the standard. It's a bad idea, because bugs get fixed, limitations get weaker, but standards lag behind. Banning language feature because "developers coming from other programming language might not understand it" is an example of trying to solve lack of training/poor hiring problem by coding standard, which has nothing to do with either.
Common poor reasons for rules
  • Makes code more readable. As already mentioned, "readable" is subjective. Some people find CammelCase readable and underscores unreadable. I personally think exactly the opposite. My suggestion is to avoid any claims that something is more readable.
  • Pointing to other standards. Just because many other standards have certain rule, it does not means yours should have it too. And it's completely void argument to claim the rule is good, "because company X uses it" (replace X with Microsoft, Sun, Google, whatever...). Use other standards as a source for ideas, find their reasons behind rules, but don't just blindly adopt them.
  • Claims from long ago. It's XXI century, we have IDEs, syntax highlighting, etc. We don't need to do anything to make keywords like if, while or for more apparent, it's done for us already. Yet so many standards require to put space between if and opening parenthesis, not that I'm against this, but the reason for this is so out of date...
  • Some numbers lie. Less symbols does not mean it's faster to type. Faster means seconds, not keystrokes. I never found CammelCase to be any faster to type compared to underscores.
Does it really matter?

Since I'm proposing guidelines over rules, this is the primary question to be asked for any rule. Does it really matter?
Obviously some things matter, like naming conventions, indentation or how you place braces.
Take for example space between keyword/function name and opening parenthese in C-like languages. Coding standard can require space, forbid it or... does it really matter? Won't you be able to read code with or without that space? Yes, consistency matters, but to what extent?

Strict numbers are almost guaranteed failure

When standard put a strict limit on something and that limit is exact number, there's a MAX+1 or MIN-1 problem:

  • Line of code can not be longer than 80 characters? So 80-char line is perfect, but 81 is evil?
  • Identifier must be at least 3 characters? So variables x and y are terrible names for X and Y coordinates?
So how to put limits? Well, that's actually a good question. I think we should look at the whole thing, not at separate parts. When it comes to size, one thing tends to affect the other:
  • Longer identifiers lead to longer code lines
  • Longer lines lead to more lines (wrapping)
  • More lines lead to larger functions
  • Larger functions lead to more functions (splitting functions into smaller)
  • ... for gods sake, don't put line limit for class!
Now let's go through this list in an opposite direction:
  • Artificially splitting class into few due to large size is more likely to make code harder to understand
  • It's easier to debug a single function than it is to step through several (oops, step over instead of step into)
  • I personally find wrapped lines harder to read. Especially wrapped conditions.
  • Very long identifiers almost never make code easier to understand.
My definitions of too long:
  • Identifier is too long if people "refuse" to type it (copy-paste or code-completion only), it's horribly too long if they can't remember it exactly
  • Line is too long if it requires to read it several times to "get what it does" (what, not why or how)
  • Function is too long, if reading it through at any point of it you lose track over what it does
  • Class can only do too much, it is never too long or too big
Suggestions to make good standard
  • Start from something very abstract everyone agrees on. Code readability and consistency are good candidates. These should be guiding principles for the rest, a "spirit of the standard".
  • Avoid rules, prefer guidelines. It is good to mention, that guidelines are expected to be followed and are deviated from only for a reason (better corresponds to "spirit").
  • When defining rules, seek consensus. No one says it's easy, but you should at least try.
  • Rules shouldn't change. Think twice before turning anything into rule. CammelCase vs. underscores probably requires rule.
  • It is good to note in standard, why rule or guideline is such ("we voted, this was clear winner").
  • Leave standard open for future modification. However, is is good to note, that "current" is a strong argument, so changes require a strong majority. You can also make the standard reviews, between them standard is locked for modification, but I recommend to avoid this.
  • Don't make one standard for all languages. Just don't.
Tips and trade-offs
  • Standard will be liked by everyone only if everyone equals to one! In a group of people some will always be unhappy. It's probably the best when no one is entirely happy.
  • CammelCase seems to be liked by more people. But you can debate a mix. I personally quite like CammelCase for classes, but underscores for methods.
  • Indentation using spaces looks consistently everywhere without any configuration. With tabs it's almost impossible to reach that. Spaces can be enforced - auto-replace each tab to n spaces, you can't do this with spaces. Personal observation: tabs in standard = mix in code base.
  • In large group of people standard will never be followed 100%. Live with that (but still encourage to follow the standard).
  • I don't recommend to use tools to enforce the standard compliance. Warnings in IDE is a bad idea, because they mix with compiler warnings and warnings only work, when there are close to zero of them.