Tuples considered harmful
(This post started out as an elaborate explanation to someone who just couldn’t wrap his head around C++’s boost::tuple. The title is probably a misnomer. It sould be renamed “boost::tuples considered harmful outside of quick throw-away internal uses and TMP where they were actually intended to be used,”
)
The examples here use C++, which has a very broken static type system. Some of these problems are only the ill effects of that. Some of these are inherent to tuples.
In particular, if you’re a budding C++ programmer who finally wants to try this new-agey feature from boost called tuples, this is a tutorial to tell you why you “don’t” want to go down that road.
Sin #1: Tuples makes you anti-social
Let’s look at the following piece of code:
1 2 3 | // awesome_math_library.hpp #include <boost/tuple/tuple.hpp> boost::tuple<int, int> divide(int numerator, int denumerator); |
By looking at the function declaration, how would you use divide?
1 2 3 4 5 6 7 | int quotient, remainder; // Possibility 1 boost::tie(quotient, remainder) = divide(42, 10); // Possibility 2 boost::tie(remainder, quotient) = divide(42, 10); |
How do you know which one is the correct usage? It turns out there is no way to know unless you look at the source code.
Forcing people to have to read your source code before they can use it is plain wrong.
Now it’s probably OK for dynamic languages (they tend to come from open source folks), but it will doom all C++ hardcore machos because knowing about breaks the OOP creed — encapsulation.
Some people think “parameter names suffer the same problem, you just need to name your method appropriately.” Here you go:
1 | boost::tuple<int, int> DivideFirstResultIsRemainderSecondIsQuotient(int numerator, int denumerator); |
I don’t think I need to say more. No need to thanks for the laugh
Sin #2: Tuples leads to fragile code
Another example:
1 | boost::tuple<void *, int> get_cube(); |
Fair enough, the type system actually helped us deduce the sensible usage:
1 2 3 | void * vertices; int num_vertices; boost::tie<vertices, num_vertices> = get_cube(); |
So far so good. Fast-forward 6 months, it turns out our method wants to also include color information in the cube:
1 | boost::tuple<void *, int, char, char, char> GetCubeVerticesWithColorsRGBInThisOrder(); |
All of a sudden, original code breaks. Since there doesn’t exist function overloading for return types. Using tuple as your return type is a fast way to seal off your function for future extensions.
Sin #3: Tuples obfuscates your code
1 2 3 4 5 6 7 8 | using boost::tuple; int PickUpTreasure(tuple<int, int, std::string> player, tuple<char, char, int> treasure_chest) { // .. after 200 lines of code, some months later you look at // this misery: wtf does it do? return get<0>(player) + get<2>(treasure_chest) * get<1>(player); } |
This demonstrates the serious problem of Sin #1 — every user of tuples need to go look up the source code
It also depicts another problem with tuples — PickUpTreasure()‘s writer must spell out the full definition of the tuples even though he only uses a handful of the values.
Sin #4: Tuples break type-safety
1 | void TranslateCoordinate(tuple<int, int> point); |
This function can be easily abused, and it compiles without problem:
1 | TranslateCoordinate(divide(30, 50)); |
What a pity.
So am I forbidden to return multiple values?
No. It’s called plain old struct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | struct cube { void * vertices; int num_vertices; unsigned char r, g, b; }; // old code that doesn't know about colors cube old_code() { cube a = {NULL, 0}; return a; } // new code that uses colors cube new_code() { cube a = old_code(); a.r = 10; a.g = 20; a.b = 42; return a; } struct point { int x, y; }; struct division_result { int remainder, quotient; }; void TranslateCoordinate(point p); // Notice that it nicely conveys the intent of the function // This is now illegal as it should: TranslateCoordinate(divide(50, 21)); // ERROR |
Old code can safely use the new struct as-is, without any modifications. Old code can even pass updated objects verbatim to new code that can enhance it. ’nuff said.
FAQs
1. Your second example sucks. I can say “using int as your return type is a fast way to seal off your function for future extensions”
True. Primitive types are usually expected to represent a very fine-grained entity that shouldn’t change. Of course it’s kind of an assumption. The problem with tuple is, tuple<int, int> is actually more like an object. tuple<int, int, char, char, std::string> even more so. The more types you pack together in a tuple, the more fragile that tuple is.
Think about it this way, if a function that originally returns int suddenly is changed to return std::string, the whole function probably needs to be rewritten anyway and the return type is the less of our concern (the more important concern is behavior, obviously). However, if your return type is tuple<int, std::string, double>, it is very likely that your function does more than one thing and you may well need to expand your return type to tuple<int, std::string, double, void *> some time later so you can expose more of your stuff that you originally thought could be encapsulated (this is a whole different topic).
Of course, no one can be absolutely sure he won’t want to return more things when he writes a function that returns more than one value. Think how struct handles it elegantly.
2. Defining a new type every time I want to stilt together a bunch of variables suck!
It does, but that’s a fact of life, embrace it. Understand that what you think is “a bunch of variables” may evolve into a full blown object sooner than you may think.
In a few convenient places, though, you can use tuple internally if you don’t expose tuples to the outside world (so other people don’t get confused). But how many non-trivial projects are internal? Even if your project is from the same company, it’s just good practice to write your part like an API so other people (including yourself 2 years later) can use it conveniently.
With those in mind, we can come to the conclusion that useful scenarios for tuples are really limited.
3. Your third example is stupid. Nobody uses tuples in parameter list. Functions have a natural parameter list that supports parameter naming
Right, let’s change it:
1 | int PickUpTreasure(int playerHP, int playerLevel, int treasureMoney); |
Seems better, let’s see how our caller adapts
1 | PickUpTreasure(get<0>(player), get<1>(player), get(2)<treasure>); |
Yikes!
Just for your comparison, here’s a well-formed and well-typed version using plain old OOP:
1 | player.PickUp(treasure); |
Who said programming languages must be cryptic!?
4. Your last remark in the third example shows your noobness. The author doesn’t have to spell out the definition of the tuple. He could have used a simple typedef
Right, but that kind of defeats the purpose of tuples IMO. The fact that I want to use a tuple is because it’s quick and dirty. If I go through the trouble to type:
1 | typedef tuple<int,int>point; |
Maybe I should just type a few more characters and benefit from named values:
1 | struct point{int x, y;}; |
Oops, it turned out to be less characters, ironically
5. How about out params? (Not really related to tuples)
It’s sometimes needed but should be avoided as much as possible:
1 2 3 4 5 | big_object * my_object; create_big_object(big_object ** something); // Danger! my_object may be NULL use_one_attribute(my_object->name); |
In general, it’s better to return a struct by value given your struct holds a small number of primitive types (such as Point, Matrix, Rectangle, etc.). Because there is no performance penalty for returning a simple struct by value (see explanation below), but the added benefit is that the intent becomes crystal clear. The pass-by-value syntax is how programming languages should work, as God intended. I no longer need to worry about object life-times and a whole bunch of unimportant stuffs.
(For low-level machos, it’s notable that it’s faster for the CPU to juggle around primitive types in registers* than accessing them from memory. (But that’s another different beast topic.))
If you need to return a pointer, use their modern variants instead:
1 2 3 | // Clear intent -- create_big_object will give up ownership so the caller // should take care of the object's life time std::auto_ptr<big_object> my_object = create_big_object(); |
If you think about it, what’s an auto_ptr? It’s a struct! (class actually, synonymous in C++)
* I made that remark without really giving a deep consideration. I did a very crude test using a 2-double Point struct and made that conclusion. Obviously it’s very compiler specific. Most of the time, you’d find that the pass-by-value version is faster when you have 2 to 4 members in the struct, and your compiler is using some sort of fastcall or x64 calling convention. Having said that, it’s probably safer to stick back to good ol’ pass-by-const-reference most of the time anyway. For return values, we have RVO so it’s usually OK to return whole objects.