2014 m. rugsėjo 8 d., pirmadienis

Bakcward compatibility sins

Maintaining backward compatibility is more of most important values for every software library, tool or system used by other systems via it's API. However, as system evolves, maintaining compatibility gets harder and sometimes it's not possible to improve it in a desired way, because that would mean breaking compatibility. At those points a tough decision has to be made: maintain compatibility or break it.
The list bellow is not complete by any means, but it shows few examples where I doubt that being backward compatible was the right decision. I also add what I think was the right decision and what we can learn from mistakes made.

  1. WinMain [dead parameter in rarely used function]

    In Windows API, the programs entry point is as follows:
    int CALLBACK WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow);
    Note the second argument: it is always NULL!
    The idea is, that this argument had a meaning in 16-bit Windows, but that was completely removed in 32-bit Windows. So, this parameter is in effect meaningless and is here just for backwards compatibility. While that seem to make sense at first, consider that Win16 and Win32 are not entirely compatible! Applications had to be migrated from one to another. And application has exactly one WinMain.
    As a consequence, what you see here is short term backward compatibility (Win16 died quite soon after Win32 appeared) at the cost of long term API pollution. All for something as trivial as application entry point (that can be solved via preprocessor macro).

  2. WPARAM ["hungarian compatibility"]

    In Windows the signature for Window Procedure is like this:
    LRESULT WINAPI DefWindowProc(HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam);
    and the parameter in question is wParam, it's datatype to be exact.
    The point here is that "W" stand for "WORD" in both datatype and parameter name. This was true in Win16 (WORD=2 bytes), but not anymore since Win32 (WORD datatype still is 2 bytes, but WPARAM now is 4 bytes).
    There are two issues apparent here:

    • Hungarian notation is a bad idea and one of the reasons is in front of you (if you haven't noticed - parameter name LIES to you)
    • Generic datatypes (like int) are redefined as something else to make it possible to change them later. This was done in this case too, except that the name encoded the original type before and LIES to us now.
  3. Double.parseDouble(null) vs. Integer.parseInt(null) [bug-compatibility]

    In Java:
    Integer.parseInt(null) // throws NumberFormatException
    Double.parseDouble(null) // throws NullPointerException

    This inconsistency originated from old versions of Java and is kept here for backward compatibility. This is documented behavior, so it's "a feature".
    What this actually is, is called bug-compatibility. The funniest part is that both these methods can throw NumberFormatException, so a fix is quite simple and hardly will break badly something. I mean, if you handle exception properly, it will just work, otherwise you probably have a quite buggy system, one more or less doesn't make a difference...
    Most importantly, these two are very old. Double.parseDouble() dates to Java 1.2, no such number on the other, but probably around the same. YOU REALLY REALLY COULD HAVE FIXED THIS BACK THEN! Instead, Sun maintained backward, sorry, bug-compatibility, just to see the bug getting harder to fix later.

  4. Java generics vs. C# generics [focus on past, not on future]

    Both languages fell into idea of object collections just to find out, that they destroyed a lot of type safety for more verbose coding (lose-lose situation, that is). How did they add generics later, without breaking languages backward compatibility?
    Java went the hard way by turning existing non-generic into generic. They faced three issues:

    • both non-generic and generic collections should be available (so that old code still compiles)
    • convertibility between generic and old non-generic variants (mix old and new code)
    • behavior compatibility (non-generic accepts anything, generic is limited)
    It was easy and correct to default to Object for non-generic collections. Problems arised for generic collections that are more specific than Object. Solution was to make generic argument a syntactic sugar, only available at compile time, that is collection is still like it was before, just casting is auto-added by compiler. This was done, because old (existing) collections always accepted anything, so if an exception is introduced for non-compatible type, an existing code would be broken. Non-generic collection was made convertible to any generic collection (whatever the argument is). That in turn added two new issues:
    • what happens is non-generic is converted to a generic with incompatible argument?
    • how type-safety is controlled in new code, when under the hood is the old non-generic collection?
    Java creators went the easy way in both these cases: they accepted ClassCastException for first and completely forbid direct conversion of one generic collection to other (i.e. List<Integer> can't be directly casted to List<Number>).
    What went wrong here? Four issues:
    • a generic collection can be passed to old non-generic code which is free to insert anything, that will only explode in new code!
    • no generic type can still be casted to other one, even if arguments are compatible, you have to work that around via cast through non-generic collection
    • generic only exists at compile time, no runtime type checking exists
    • you can't forbid new stuff to be generic-only, it still can be used without generic arguments, where Object is assumed
    What they could do instead? Make collections aware of their generic argument and throw exception, when incompatible object is inserted. That would accomplish the following:
    • type safety of generic collection - it will simply never contain incompatible objects
    • casting of references would simply work, as the protection is there at runtime
    • passing generic collection to old code would reveal bugs (object of wrong type inserted) or show invalid assumptions about it ("oops, it's not String-only collection")
    Yes, this approach could break old code. But the alternative, that was chosen, made all new code suck. Looking forward, new code will slowly outnumber old and Java as a language will have inferior generics than it could!

    C# took different approach here. It simply added generics as something completely new, not compatible with old in any way. Not ideal, as interoperability between old and new code is troublesome. But looking forward, old code will die out. So IMO it's a better approach, that that of Java

  5. C++ compatibility with C [not quitting in time]

    So, C++ is designed to be compatible with C, that is "a valid C program is a valid C++ program", as they say... Well, not really for several reasons:

    • the compatibility is lost with the first new keyword introduced (*caugh* class *caugh*) - what used to be a valid identifier, now is not
    • C++ has different linkage because of name mangling, which makes it incompatible with C. Worse, now the C libraries are forced to add extern "C" markers under __cplusplus define, to make themselves compatible with C++
    • enums and structs have tag names in C, but these are real type names in C++
    What could they do? Well, actually they did the right thing, just for far too long. If C++ the goal was to completely replace C, it failed to to that. And it's long past the time to become an independent language and throw some old C junk away (well, you can introduce some constructs to access C from C++, we have so many of them, that few more doesn't really matter).
    What breaking ties with C would achieve:
    • string literals can become real std::string objects with their functionality (like concatenation using "+")
    • arrays can be std::array by default (being assignable is the first win)
    • a lot of standard C library could be wrapped by C++ function that would accept C++ types (imagine printf() accepting std::string)
    • forget extern "C", you could just have something like #cinclude for C headers
Lessons that can be learned

  • A compatibility break, that is almost guaranteed to have a very small impact, is worth to do (WinMain)
  • If you redefine some type via typedef, make new type more generic, so you can change it (i.e. "an integer of size, which is at least X")
  • Hungarian notation is a bad idea, full one is ten times so
  • Bugs should be fixed! A fix that breaks some small importance thing will give you few rants from people, who are the ones to ignore (I can't imagine good developer complaining about fixed bug, even if it broke something in his buggy code).
  • New code or file format will gradually outnumber old one by large margin, thus look forwards, not backwards
  • If you fail to maintain full compatibility, use it as opportunity to break for better future
  • The number of "breaks" doesn't matter, what matter is overall pain introduced by compatibility break. So, if you broke something important, making minor things compatible wont help much.
  • Bonus: it's not possible to maintain backward compatibility forever, plan break in advance and don't give false promises.

2014 m. liepos 14 d., pirmadienis

The lost war against duplicate code

From what I've seen so far, duplicate code is impossible to avoid in any large project. There are multiple reasons, how duplicate code is created and while it is typically assumed, that duplicate code is bad, this is not always the case.

Why duplicate code is bad

  • Duplicate bugs - it's obvious: if bug is discovered in code, the same bug exists everywhere the same code is used, thus there are many places to fix, instead of one
  • Hard to maintain - pretty much the same as previous, but more extended. In particular, you not only fix bugs, but also add features, optimizations and other improvements. Worse is that duplicate code diverges, making it harder to spot.

What "justifies" code duplication

  • Easier to maintain - while we claim the opposite, this one has some truth in it. By copying code written by someone else you are free to change it in any way you want. Changing common code is harder and often requires agreement across multiple involved parties. Bust: it looks so, but it makes code base larger, which in turn makes it harder to maintain.
  • More freedom to change - common code has to remain common, that is you can't add your specific features to it. The biggest problem with this is that it's an organizational issue: if code is duplicated to have more freedom to change it, it indicates a problem with management or company culture.
  • Faster to develop - everything, that requires involvement of multiple parties, takes more time to do. Bust: short term gain, you usually lose in the long run (unfortunately short term gains is what many manager only care about).

How duplicate code happens

  • Incompetence - it's sad, but there are a lot of bad developers. Many of the write code via copy-paste, and, as always, abusing copy-paste results in duplicates. This is what is often assumed when talking about duplicate code and yes, that is what we should fight.
  • Forgot to refactor - this is trickier. It's like the first one except that the developer is actually not bad. It's fine to use copy-paste in order to make things work. The problem is that you have to refactor at the end. Not forgetting to that is the hardest part... There is a gray area between this and the first one. Code review might be an answer to this one.
  • Too much trouble - sometimes avoiding code duplication is more trouble than worth. The place for common code might not exist! Create a library just for couple of functions? Don't forget, that this will bring entire maintenance hell for that library. Also there often is such thing as code ownership and shared code is owned by someone else. In short, we avoid code duplication to reduce problems, not to add new ones. When that is not the case, duplicating code can be acceptable.
  • Created naturally - it's not impossible that two developers might actually write almost identical code. In large projects with a lot of people this does happen and might take a while to find, that two guys of completely different teams wrote almost identical helper function.
So, to summarize, next time before blaming someone for incompetence, have a second thought.

2014 m. gegužės 4 d., sekmadienis

Exception handling is mostly a failure

In short: exceptions are good for system and critical errors (like out of memory). The simple and more expected error is, exception is less useful and more trouble.

Error handling is hard. Not doing it properly comes back with mysterious failures, where no one can understand was went wrong. Doing it properly is pain in the ass, mostly because it takes a lot of time to do a boring lot of coding, when stuff already works! Really, most of us probably just code the happy path first, prove it and then go on handling all the possible not so happy cases. This is generally the right thing to do - what's the point of handling the errors when you're not yet sure the solution is right?

Sinking among ifs

That's the general idea for exception handling. A typical example given to students is like this:
if(open_file()) {
  if(read_file()) {
    if(process_data()) {
    else {
      error("Failed to process data");
  else {
    error("Failed to read file");
else {
  error("Failed to open file");
The lines in bold are "good code". Everything else is there for error handling. It seems very nice to write all "good" function call one after another and move error handling code somewhere else - welcome try-catch!
try {
} catch(Exception e) {
  // determine and error message here
Nice, we have separated the happy path from error handling code, now it's easy to understand what code does!
The real life situations are not so nice...

There are different types of errors

  • Disasters: something that generally shouldn't happen, like hard disk crash. Some errors are so rare and so fatal, that it's pointless to try preparing for them.
  • Fatal errors: stuff that renders applications unusable, i.e. losing network connection is fatal for web application.
  • Expected mistakes: user haven't filled required fields? Specified file name contains invalid characters? such types of errors are predictable and applications should be ready for them.
  • Glitches: a string "15 " (trailing space) in 99% of cases is an integer number 15, dammit.
The interesting thing here is that exactly the same error can belong to different group depending on exact situation. Failure while writing to file can mean that primary hard disk has just crashed and in few seconds entire computer will be unusable, or it can just mean that user has unplugged the USB stick. Who said that failure to open file is fatal? No config - assume hard-coded defaults.

Opening file is so difficult

... So, we are opening a configuration file, that is not required to exist...
File *file = fopen(filename, "r");
Nice, NULL means it does not exist, otherwise it's something we can read!
What's the problem, you can write it the opposite way:
FileStream file = null;
if(File.Exists(filename)) {
  file = new FileStream(filename);
Does the same thing. Does it? Congratulations, you've just introduced full-moon bug! Files sometimes disappear, you know, get deleted. That can happen at any point in time, for example right between the existence check and opening... Fatal error, crash, or ... well, that file was never required to be there in the first place? So now code becomes:
FileStream file = null;
try {
  file = new FileStream(filename);
catch (Exception e) {
  // ignore
Wonderful, what used to be one line, now is... progress.

How badly you can blow?

C once again. You call a function and you expect it to return. Is this guaranteed? No! Application might die inside, but we don't care. longjmp() can be called, but we don't care again - unless we made it ourselves.
Let's "upgrade" to C++. What can happen now? Yes, exception can be thrown, and there are types of them! Worse: new types of thrown exception can be added in the future!
It's considered a good practice to only catch exceptions you do care about and let other populate up the call stack. That's fine, but what about the new types of exception that might be added in the future? It looks like someone didn't design for future...

Exception safety

There is an amazing thing about exception safety I still can't explain. C++ is a language that with it's standard library throws something extremely rarely. A topic called "exception safety" is part of it's books. When we come to Java and co., where exceptions are thrown here, there and everywhere, this is somehow forgotten...
obj.foo(x, y);
You can only guess, how foo works with x and y, but there's one thing most seem to assume - all or nothing. If exception is thrown out of foo(), you want the state of obj unchanged! Simple concept, but not so easy to get it right. Throw more exceptions and enjoy more full-moon bugs.

Exception specifications

This is something that pissed me off when I started learning Java. C++ has them too, but they are optional and no one seems to use them (except for standard library). Some even discourage it.
Looking at C#, they have thrown away specifications entirely.
Looking back at Java... ArrayIndexOutOfBoundsException, PersistenceException and multiple others are "unchecked" exceptions so you don't need to write them all over the place. Are the two I mentioned so "unexcpected"?


  • Exception handling works well with critical errors. Less serious the error is, less efficient exception handling is. For simple errors exceptions are more trouble.
  • Exceptions are designed to separate useful code from error handling code. When exception handling mechanisms appear inside of a nested code blocks, it's a first sign of exception misuse.
  • I also haven't mentioned, that exception are also expensive in terms of performance...

2014 m. kovo 31 d., pirmadienis

Darkest corners of C++

It is good to know language you are programming in.

Placement-new array offset

It turns out that on some compilers new[] might allocate an integer before an actual array even when placement-new is used:
void *mem = new char[sizeof(A) * n];
A *arr = new(mem) A[n];
Whether mem and arr will point at the same address depends on compiler and code. On GCC pointers I get the same pointers, but on Microsoft compiler, when A has a destructor, arr is mem + sizeof(int) or similar. While this mismatch might look harmless at first sight, it isn't - your array gets outside of allocated memory at the end!
Solution - cast pointer and manually loop over array creating each object individually via placement-new.

Pointer that changes

class Base {int x; };
class Derived : public Base {virtual void foo() {} };

Derived *d = new Derived;
Base *b = d;
Here b and d will not point to the same address. Comparing and casting them does the right thing, but if you cast them to void*, you'll see they're not equal! This is because Base is non-polymorphic (no virtual methods), while Derived is polymorphic. So, Derived object has a pointer to vtable at the beginning of it, followed by Base sub-object and then by it's own additional members.
Things get more funny when there are many classes in the hierarchy and multiple inheritance is involved.
Solution: well, don't cast pointers to objects into void*.

Return void

This code is valid:
void foo() {}
void bar() { return foo(); }
Useful, when writing templates.

Pure-virtual function with implementation

Pure-virtual function means that derived class must override it in order to create objects of it. But it does not mean that such method can not be implemented in the base class. The code bellow compiles and works:
class A
  virtual void foo() = 0;

void A::foo() { std::cout << "A::foo called\n"; }

class B : public A
  virtual void foo() override
    std::cout << "B::foo called\n";
Note that it did not compile for me using GCC, when I tried to provide implementation for A::foo inline.

Function-try block

This is quite a tricky feature. Function-try block basically looks like this:
void foo()
try {
  throw int();
} catch(...) {
  std::cout << "Exception caught\n";
However, in this form there is no particular use for it. It's just a shorter way of wrapping entire function body in a try-catch.
The real use for this feature (which also works differently) is for constructors. First of all, when used for constructor, it does not really catch exception! It catches and rethrows them! The real use for it is to free resources allocated in initializer list:
class A
  A(int x) { throw x; }

class B
  A a;
    : a(5)
    { } catch(...) {
      std::cout << "Exception in initializer merely-caught\n";
In here exception is thrown in an initializer list. There is no way to catch it in a constructor itself. But the initializer list may be long and some initilizer can have resource allocations, like memory allocation. To free such resources, you have to use such function-try block for you constructor and free them in a catch block. Remember, that exceptions are rethrown here.


When defining a struct it is possible to specify variable sizes in bits:
struct Bitfields
  int i:16;
  int j:8;
  bool b:1;
  char c:7;
The size of this structure is 4 bytes (on my machine at least). Each variable in the struct takes as much bits as specified and can hold appropriate value range.

And, since this is about C++, there are definitly more :)

2014 m. kovo 2 d., sekmadienis

On a quest for good coding standard

All coding standards suck, except mine!

Reasons for coding standard:

  • Readability - the primary goal of coding standard in organization is to make it easier for developers to understand the code. This asks for consistency, meaningful naming, good comments in the code. It has greater impact on newer developers, less familiar with the code base.
  • Code quality - it is an attempt to make it easier to spot bugs in code, as well to make reasoning behind decisions more obvious. It is expect for code to be easy to modify or fix, without introducing new issues.
Common mistakes
  • Rules, not guidelines. Rules must be followed, guidelines are less strict. Having strict rules everyone is required to follow sometimes actually plays against the initial intent for the standard itself: developers can't make code more readable, because they're required to follow the rule.
  • Consider this example:
    string sign_multiplier = x >= 0 ? 1 : -1;
    string sign_multiplier = x>=0 ? 1 : -1;
    Neither is very easy to read and can be written in more readable way. However, given the choice of two I strongly believe second being more readable. But hey, it breaks one of most common rules - spaces around operators!
  • Standard set to stone. Changing standard is not necessarily bad, it depends on what, how and why you change. Developers change, programming languages evolve, so should standard. Otherwise standard might forbid features, that weren't even there, when the standard itself was written.
  • Adopted standard. Standard should be an agreement among developers on how to write code. Just using someones standard can lead to situation when some rule in standard is hated by every developer. It is the same mistake, when standard is created by someone (who quite often doesn't even write code himself) and thrown upon everyone.
  • No or questionable reasoning. Every rule should have a clear reason, it's good for guidelines to have them too. For one thing, it helps to identify out of date items in the standard. It also can give standard some kind of "spirit", so that guidelines are not just followed or broken. Reasoning should avoid questionable arguments. I.e. what is readable for one person can be rubbish for other. If a rule/guideline was introduced by consensus or strong majority, it is good to state that.
  • Trying to solve unrelated problems. Sometimes people try to solve problems like compiler limitation or bug by introducing rule in the standard. It's a bad idea, because bugs get fixed, limitations get weaker, but standards lag behind. Banning language feature because "developers coming from other programming language might not understand it" is an example of trying to solve lack of training/poor hiring problem by coding standard, which has nothing to do with either.
Common poor reasons for rules
  • Makes code more readable. As already mentioned, "readable" is subjective. Some people find CammelCase readable and underscores unreadable. I personally think exactly the opposite. My suggestion is to avoid any claims that something is more readable.
  • Pointing to other standards. Just because many other standards have certain rule, it does not means yours should have it too. And it's completely void argument to claim the rule is good, "because company X uses it" (replace X with Microsoft, Sun, Google, whatever...). Use other standards as a source for ideas, find their reasons behind rules, but don't just blindly adopt them.
  • Claims from long ago. It's XXI century, we have IDEs, syntax highlighting, etc. We don't need to do anything to make keywords like if, while or for more apparent, it's done for us already. Yet so many standards require to put space between if and opening parenthesis, not that I'm against this, but the reason for this is so out of date...
  • Some numbers lie. Less symbols does not mean it's faster to type. Faster means seconds, not keystrokes. I never found CammelCase to be any faster to type compared to underscores.
Does it really matter?

Since I'm proposing guidelines over rules, this is the primary question to be asked for any rule. Does it really matter?
Obviously some things matter, like naming conventions, indentation or how you place braces.
Take for example space between keyword/function name and opening parenthese in C-like languages. Coding standard can require space, forbid it or... does it really matter? Won't you be able to read code with or without that space? Yes, consistency matters, but to what extent?

Strict numbers are almost guaranteed failure

When standard put a strict limit on something and that limit is exact number, there's a MAX+1 or MIN-1 problem:

  • Line of code can not be longer than 80 characters? So 80-char line is perfect, but 81 is evil?
  • Identifier must be at least 3 characters? So variables x and y are terrible names for X and Y coordinates?
So how to put limits? Well, that's actually a good question. I think we should look at the whole thing, not at separate parts. When it comes to size, one thing tends to affect the other:
  • Longer identifiers lead to longer code lines
  • Longer lines lead to more lines (wrapping)
  • More lines lead to larger functions
  • Larger functions lead to more functions (splitting functions into smaller)
  • ... for gods sake, don't put line limit for class!
Now let's go through this list in an opposite direction:
  • Artificially splitting class into few due to large size is more likely to make code harder to understand
  • It's easier to debug a single function than it is to step through several (oops, step over instead of step into)
  • I personally find wrapped lines harder to read. Especially wrapped conditions.
  • Very long identifiers almost never make code easier to understand.
My definitions of too long:
  • Identifier is too long if people "refuse" to type it (copy-paste or code-completion only), it's horribly too long if they can't remember it exactly
  • Line is too long if it requires to read it several times to "get what it does" (what, not why or how)
  • Function is too long, if reading it through at any point of it you lose track over what it does
  • Class can only do too much, it is never too long or too big
Suggestions to make good standard
  • Start from something very abstract everyone agrees on. Code readability and consistency are good candidates. These should be guiding principles for the rest, a "spirit of the standard".
  • Avoid rules, prefer guidelines. It is good to mention, that guidelines are expected to be followed and are deviated from only for a reason (better corresponds to "spirit").
  • When defining rules, seek consensus. No one says it's easy, but you should at least try.
  • Rules shouldn't change. Think twice before turning anything into rule. CammelCase vs. underscores probably requires rule.
  • It is good to note in standard, why rule or guideline is such ("we voted, this was clear winner").
  • Leave standard open for future modification. However, is is good to note, that "current" is a strong argument, so changes require a strong majority. You can also make the standard reviews, between them standard is locked for modification, but I recommend to avoid this.
  • Don't make one standard for all languages. Just don't.
Tips and trade-offs
  • Standard will be liked by everyone only if everyone equals to one! In a group of people some will always be unhappy. It's probably the best when no one is entirely happy.
  • CammelCase seems to be liked by more people. But you can debate a mix. I personally quite like CammelCase for classes, but underscores for methods.
  • Indentation using spaces looks consistently everywhere without any configuration. With tabs it's almost impossible to reach that. Spaces can be enforced - auto-replace each tab to n spaces, you can't do this with spaces. Personal observation: tabs in standard = mix in code base.
  • In large group of people standard will never be followed 100%. Live with that (but still encourage to follow the standard).
  • I don't recommend to use tools to enforce the standard compliance. Warnings in IDE is a bad idea, because they mix with compiler warnings and warnings only work, when there are close to zero of them.

2013 m. gruodžio 17 d., antradienis

Fedora 20: recover stored passwords, in case you lost them

After upgrading to Fedora 20 (I upgraded to RC before the final release), passwords I stored to GNOME Keyring (Keys and Passwords application, also known as Seahorse) were gone. This can also affect some applications, in my case it was Empathy.
The problem is that with this release GNOME Keyring stores passwords in a different place. It used to be:
now it is

The solution is quite simple:

  • open Keys and Passwords app
  • lock login keyring
  • copy keyring files from the old location to the new
  • relogin

2013 m. lapkričio 30 d., šeštadienis

How to make good API

Good API is a large part of success. Sometimes it's the nearly only factor, why library/solution was chosen. It's not easy to create a good API and there is no single recipe for that, since different cases have different requirements. Everyone who has used many different libraries across several different programming languages should already have a feeling for what is a good API. I'll try to summarize main points here.

  1. Flexible but convenient
  2. Flexibility is something everyone understands, unfortunately convenience quite often forgotten. Some use cases are frequent, others are not. One key to success is to have dedicated API for most common use cases along side the general more flexible API.
    For example, consider a very simple Person class:
    • You need a no-argument constructor to create empty object, that will be filled later
    • You need methods to get/set first and last name
    • You need methods to get/set a list of middle names (because some people have more than one)
    The list above makes a flexible API. Now let's go to the convenience part:
    • A constructor, that takes first and last names as an argument, because that's what most people have
    • A method to set/get middle name, because very little people have more than one
    In some cultures most people have middle name, so if application is specific for such culture, replace two argument constructor with the one with three arguments.
    The point here is not to limit API to the basic all-cases set, but add additional APIs to make shortcuts for common use-cases.
  3. Flexible but not bloated
  4. This one is part of first one, but it's so frequent, that it deserves to be a separate point.
    Most APIs are designed in a "what-if" way, however you have to stop in time, because there are no limits to "what-if". There's no clear recipe here, since it all depends on exact domain. But there are few guidelines:
    • Something that very high impact and/or hasn't changed for a long time is unlikely to change without early warning
      • i.e. IPv6. We're moving in this direction so long, because too many software were "hardcoded" for IPv4. Were they wrong? No, they saved a lot by doing so. Right now you can chose to either support these two or to make your system flexible to support a growing list of different formats. Is the later worth the additional time? It's up to you to decide.
    • Convenience classes and method are only convenient as long as it's clear what they do and what's the difference between any two of them; here are couple of bad examples:
    • String is simple and very flexible data type
  5. Names should be meaningful, guessable, structured, consistent and short
  6. Naming is one the most important things in API. While meaningful is the most emphasized one, it's far from being the only one. Although most of the time programmer reads code, he also writes it, so being able to guess a name improves his productivity quite a lot, especially accompanied with code-completion.
    Structuring API is very important, when API is large. A good example of how not to structure your API is Window API, while GTK+ is an example of good structuring. In short: your API should have something that separates it from the rest of the world (namespace, package, ... or simple prefix). Large API should be divided into submodules etc.
    Another things that make names guessable is consistency. Naming conventions should be consistent across the API, preferably consistent with other APIs in the same field, domain, language etc.
    Finally, names should not be longer than needed. While longer usually means clearer, there's always certain point beyond which length no longer adds clarity. I.e. MAX_INT is perfectly fine name and making it MAXIMUM_INTEGER_VALUE adds no additional value.
  7. Performance oriented, but convenient
  8. Some APIs are not meant to be called often, others should think about performance. However, convenience should be maintained. Windows API is an example of API, that sacrificed convenience for performance. What it lacks is functions to fill various structures with some default values. It's really annoying to set every struct member. At the same time it is bloated in amount of functions, there are often several functions instead of one. I.e. FindFirstFile() and FindNextFile() could easily be just one functions and who really needs ZeroMemory() when you can use much more powerful memset().
  9. Convenient defaults
  10. The less user has to specify explicitly, the better. Most of the time...
    Default values should be intuitive and meaningful. Otherwise it's better to require to explicitly specify the value.
  11. Carefully chose data formats
  12. In short - don't just blindly use XML.
    If you're making Web Service, think which format for data is most appropriate. It might be XML, JSON or anything else. Remember, that someone will have to use it and it's for their sake.
    It's also important for configuration files. Forcing people to write XML by hand is... well don't use XML when something simpler works just fine.
  13. Configurable, but not over-configurable
  14. This one applies to frameworks. Many of them allow user to change the behavior via some setting in configuration. This is good, but there are limits. First of all, it should be clear which configuration is valid and which is not. The more settings there are, the harder it is to list all allowed combination, not to mention that they should work! "Everything is pluggable, extensible and configurable" is never the answer as it will result in huge and buggy mess. No setting is better then a non-working one.
    For complicated cases white-box can be used: instead of super-configurable black box component have a component composed of smaller components - when limits are reached, user can compose his own component reusing parts of original one.
  15. Synchronous vs. Asynchronous
  16. Asynchronous API is good for operation that might take long to complete. Ideally all long taking operations should be asynchronous.
    At the same time, every asynchronous API is only good if it provides synchronous alternatives. Strange? Asynchronous is good for UI as it makes it responsive rather than hanging it. But when you're already on on non-UI thread, you want to perform sequential actions synchronously, rather then messing up with asynchronous continuations. Since you can't predict where which API will be used, it's better to provide both to the user and let him decide, rather than pushing one or another down his throat.
  17. The are non-English speaking people out there
  18. If you have UI, it should be localizable. If you throw exceptions or otherwise report errors, error messages should be somehow localizable too, either provide error codes along messages, or localize messages themselves.
    Finally, if you have no idea about localization, don't implement it without consulting someone who does. A good start is GNU gettext manual, especially the section about plural form, to get some grasp what you're dealing with.
  19. Backward/forward compatibility
  20. Ideally every new version should be backward compatible with all previous ones, but in practice it's almost impossible. The guidelines here are:
    • Design API from beginning to be extended in future. Be especially carefully with boolean arguments (enum is a good replacement). In C using opaque structures is a good way to provide extensibility in the future, avoid reserved members/arguments, because you can end-up with something like this
    • If future features are known or very apparent, prepare API for them in advance (forward compatibility)
    • Avoid breaking backward compatibility of API, but don't add workarounds to API itself, because later you'll have to be backward compatible with those too
    • Prefer big breaks to often ones: no one likes API breaks, but often breaks seems to be hated more; when you break API, use the chance to fix all known shortcomings
    • Be clear about your API stability, so that users know, when to expect breakages; major releases are good candidates for breaks, subreleases should be backward compatible
    • It is also good to have alpha/beta testing, where new APIs are introduced for users to try, but are not yet stable and might change in final release, user feedback is best way to determine shortcomings
    • Be realistic, if your thing will survive long enough, you will, eventually, have to break it's API
  21. Convention over configuration is dangerous
  22. The point is to save developers time. In practice, not in theory! Just because developer writes less code/configuration does not mean he actually spends less time on it.
    When things work auto-magically, they sometimes don't work in the same magic way and finding out why can be very time consuming.
    Besides, changing convention is very painful, as it's less apparent (everything compiles as it did, but it doesn't work as it did, happy debugging).
Bottom line

Great APIs are not designed behind closed door by few system architects. Constructive communication and involvement from many parties from the beginning is the way to understand the problem.