Code for Concinnity

beautiful and elegant solutions


Tuples considered harmful

(This post started out as an elaborate explanation to someone who just couldn’t wrap his head around C++’s boost::tuple. The title is probably a misnomer. It sould be renamed “boost::tuples considered harmful outside of quick throw-away internal uses and TMP where they were actually intended to be used,” :P )

The examples here use C++, which has a very broken static type system. Some of these problems are only the ill effects of that. Some of these are inherent to tuples.

In particular, if you’re a budding C++ programmer who finally wants to try this new-agey feature from boost called tuples, this is a tutorial to tell you why you “don’t” want to go down that road.

Sin #1: Tuples makes you anti-social

Let’s look at the following piece of code:

1
2
3
// awesome_math_library.hpp
#include <boost/tuple/tuple.hpp>
boost::tuple<int, int> divide(int numerator, int denumerator);

By looking at the function declaration, how would you use divide?

1
2
3
4
5
6
7
int quotient, remainder;

// Possibility 1
boost::tie(quotient, remainder) = divide(42, 10);

// Possibility 2
boost::tie(remainder, quotient) = divide(42, 10);

How do you know which one is the correct usage? It turns out there is no way to know unless you look at the source code.

Forcing people to have to read your source code before they can use it is plain wrong.

Now it’s probably OK for dynamic languages (they tend to come from open source folks), but it will doom all C++ hardcore machos because knowing about breaks the OOP creed — encapsulation.

Some people think “parameter names suffer the same problem, you just need to name your method appropriately.” Here you go:

1
boost::tuple<int, int> DivideFirstResultIsRemainderSecondIsQuotient(int numerator, int denumerator);

I don’t think I need to say more. No need to thanks for the laugh :P

Sin #2: Tuples leads to fragile code

Another example:

1
boost::tuple<void *, int> get_cube();

Fair enough, the type system actually helped us deduce the sensible usage:

1
2
3
void * vertices;
int num_vertices;
boost::tie<vertices, num_vertices> = get_cube();

So far so good. Fast-forward 6 months, it turns out our method wants to also include color information in the cube:

1
boost::tuple<void *, int, char, char, char> GetCubeVerticesWithColorsRGBInThisOrder();

All of a sudden, original code breaks. Since there doesn’t exist function overloading for return types. Using tuple as your return type is a fast way to seal off your function for future extensions.

Sin #3: Tuples obfuscates your code

1
2
3
4
5
6
7
8
using boost::tuple;

int PickUpTreasure(tuple<int, int, std::string> player, tuple<char, char, int> treasure_chest)
{
    // .. after 200 lines of code, some months later you look at
    // this misery: wtf does it do?
    return get<0>(player) + get<2>(treasure_chest) * get<1>(player);
}

This demonstrates the serious problem of Sin #1 — every user of tuples need to go look up the source code

It also depicts another problem with tuples — PickUpTreasure()‘s writer must spell out the full definition of the tuples even though he only uses a handful of the values.

Sin #4: Tuples break type-safety

1
void TranslateCoordinate(tuple<int, int> point);

This function can be easily abused, and it compiles without problem:

1
TranslateCoordinate(divide(30, 50));

What a pity.

So am I forbidden to return multiple values?

No. It’s called plain old struct

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
struct cube
{
    void * vertices;
    int num_vertices;
    unsigned char r, g, b;
};

// old code that doesn't know about colors
cube old_code()
{
    cube a = {NULL, 0};
    return a;
}

// new code that uses colors
cube new_code()
{
    cube a = old_code();
    a.r = 10;
    a.g = 20;
    a.b = 42;
    return a;
}

struct point
{
    int x, y;
};

struct division_result
{
    int remainder, quotient;
};

void TranslateCoordinate(point p);

// Notice that it nicely conveys the intent of the function
// This is now illegal as it should:
TranslateCoordinate(divide(50, 21));    // ERROR

Old code can safely use the new struct as-is, without any modifications. Old code can even pass updated objects verbatim to new code that can enhance it. ’nuff said.

FAQs

1. Your second example sucks. I can say “using int as your return type is a fast way to seal off your function for future extensions”

True. Primitive types are usually expected to represent a very fine-grained entity that shouldn’t change. Of course it’s kind of an assumption. The problem with tuple is, tuple<int, int> is actually more like an object. tuple<int, int, char, char, std::string> even more so. The more types you pack together in a tuple, the more fragile that tuple is.

Think about it this way, if a function that originally returns int suddenly is changed to return std::string, the whole function probably needs to be rewritten anyway and the return type is the less of our concern (the more important concern is behavior, obviously). However, if your return type is tuple<int, std::string, double>, it is very likely that your function does more than one thing and you may well need to expand your return type to tuple<int, std::string, double, void *> some time later so you can expose more of your stuff that you originally thought could be encapsulated (this is a whole different topic).

Of course, no one can be absolutely sure he won’t want to return more things when he writes a function that returns more than one value. Think how struct handles it elegantly.

2. Defining a new type every time I want to stilt together a bunch of variables suck!

It does, but that’s a fact of life, embrace it. Understand that what you think is “a bunch of variables” may evolve into a full blown object sooner than you may think.

In a few convenient places, though, you can use tuple internally if you don’t expose tuples to the outside world (so other people don’t get confused). But how many non-trivial projects are internal? Even if your project is from the same company, it’s just good practice to write your part like an API so other people (including yourself 2 years later) can use it conveniently.

With those in mind, we can come to the conclusion that useful scenarios for tuples are really limited.

3. Your third example is stupid. Nobody uses tuples in parameter list. Functions have a natural parameter list that supports parameter naming

Right, let’s change it:

1
int PickUpTreasure(int playerHP, int playerLevel, int treasureMoney);

Seems better, let’s see how our caller adapts

1
PickUpTreasure(get<0>(player), get<1>(player), get(2)<treasure>);

Yikes!

Just for your comparison, here’s a well-formed and well-typed version using plain old OOP:

1
player.PickUp(treasure);

Who said programming languages must be cryptic!?

4. Your last remark in the third example shows your noobness. The author doesn’t have to spell out the definition of the tuple. He could have used a simple typedef

Right, but that kind of defeats the purpose of tuples IMO. The fact that I want to use a tuple is because it’s quick and dirty. If I go through the trouble to type:

1
typedef tuple<int,int>point;

Maybe I should just type a few more characters and benefit from named values:

1
struct point{int x, y;};

Oops, it turned out to be less characters, ironically :P

5. How about out params? (Not really related to tuples)

It’s sometimes needed but should be avoided as much as possible:

1
2
3
4
5
big_object * my_object;
create_big_object(big_object ** something);

// Danger! my_object may be NULL
use_one_attribute(my_object->name);

In general, it’s better to return a struct by value given your struct holds a small number of primitive types (such as Point, Matrix, Rectangle, etc.). Because there is no performance penalty for returning a simple struct by value (see explanation below), but the added benefit is that the intent becomes crystal clear. The pass-by-value syntax is how programming languages should work, as God intended. I no longer need to worry about object life-times and a whole bunch of unimportant stuffs.

(For low-level machos, it’s notable that it’s faster for the CPU to juggle around primitive types in registers* than accessing them from memory. (But that’s another different beast topic.))

If you need to return a pointer, use their modern variants instead:

1
2
3
// Clear intent -- create_big_object will give up ownership so the caller
// should take care of the object's life time
std::auto_ptr<big_object> my_object = create_big_object();

If you think about it, what’s an auto_ptr? It’s a struct! (class actually, synonymous in C++)

* I made that remark without really giving a deep consideration. I did a very crude test using a 2-double Point struct and made that conclusion. Obviously it’s very compiler specific. Most of the time, you’d find that the pass-by-value version is faster when you have 2 to 4 members in the struct, and your compiler is using some sort of fastcall or x64 calling convention. Having said that, it’s probably safer to stick back to good ol’ pass-by-const-reference most of the time anyway. For return values, we have RVO so it’s usually OK to return whole objects.

Published by kizzx2, on July 31st, 2010 at 2:03 am. Filled under: Interesting things Tags: , , , , No Comments

Archlinux — pure awesomeness (+ how to change Linux console font)

Today I checked out Arch Linux. It’s just that cool. It’s a minimalistic distro that focuses on, well, minimalism, simplicity and elegance.

What’s more of a pleasant surprise is that I find the Arch Linux Wiki to be probably the most comprehensive and high quality resources of Linux in general available on the net. Among those gems, I’ve found the way to change the font at the console. I’ve been using nix environments for a couple of years and this is the question that has always bugged me but nobody seemed to have talked about. There you go, several years of research ;)

Setting up Arch Linux was a pleasant learning experience from itself. The package manager pacman offers the same level of convenience as the famous apt, if not more. The nice thing about it is that Arch’s packages seem to be much more recent than those ancient rugs on Debian and Ubuntu.

At the same time, Arch doesn’t bog you down with lots of unnecessary stuffs. Right out of the box, Arch almost doesn’t have any packages, and you have to edit a couple of config files just to be able to download things from the repos. But it was a pleasant learning experience, largely attributable to the excellent wiki.

If you want to improve your nix-fu or just want plain, beautiful distro to play with, give Arch a try!

Published by kizzx2, on July 17th, 2010 at 2:57 am. Filled under: Uncategorized Tags: , , No Comments

Restoring elegance to CakePHP — doing multiple joins The Right Way™

In my last article about unit testing, I mentioned one way to do ad-hoc multiple joins in CakePHP rather succinctly. Here’s a recap:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function tagged($tag)
{
    $this->bindModel(array('hasOne'=>array(
        'PostsTag'=>array(
            'foreignKey'=>false,
            'conditions'=>"PostsTag.post_id = Post.id"
        ),

        'Tag'=>array(
            'foreignKey'=>false,
            'conditions'=>"PostsTag.tag_id = Tag.id"
        )
    )));

    return $this->find('all', array('conditions'=>array(
        'Tag.name'=>$tag)));
}

This is, of course, rather unintuitive. A hasOne relationship when in fact I’m trying to look for someone hasAndBelongsToMany? I thought more about it.

There is a Bakery article that talked about doing ad-hoc joins. It looks more technically correct but just too freaking much verbose for my liking:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<?php
    $markers = $this->Marker->find('all', array('joins' => array(
        array(
            'table' => 'markers_tags',
            'alias' => 'MarkersTag',
            'type' => 'inner',
            'foreignKey' => false,
            'conditions'=> array('MarkersTag.marker_id = Marker.id')
        ),
        array(
            'table' => 'tags',
            'alias' => 'Tag',
            'type' => 'inner',
            'foreignKey' => false,
            'conditions'=> array(
                'Tag.id = MarkersTag.tag_id',
                'Tag.tag' => explode(' ', $this->params['url']['q'])
            )
        )
    )));
?>

Actually, a simple refactoring can make it (almost) syntactically sweet and technically more correct:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// /app/vendors/joins.php

/**
 * A new helper class to produce those join arrays just to make
 * life less miserable
 */

class Joins
{
    public static function left($model, $conditions)
    {
        return self::_makeJoin($model, $conditions, 'left');
    }

    public static function inner($model, $conditions)
    {
        return self::_makeJoin($model, $conditions, 'inner');
    }

    private static function _makeJoin($model, $conditions, $type)
    {
        return array(
            'table'=>Inflector::tableize($model),
            'alias'=>$model,
            'type'=>$type,
            'foreignKey'=>false,
            'conditions'=>$conditions
        );
    }
}

// /app/app_model.php

App::import('Vendor', 'joins');

class AppModel extends Model
{
    // ...
}

// /app/models/post.php

function tagged($tag)
{
    /*
     * Let's make use of our new class, this has become
     * a "one-liner."
     */

    return $this->find('all', array(
        'conditions'=>array('Tag.name'=>$tag),
        'joins'=>array(
            Joins::inner('PostsTag', 'PostsTag.post_id = Post.id'),
            Joins::inner('Tag', "PostsTag.tag_id = Tag.id")
        )
    ));
}
Published by kizzx2, on July 8th, 2010 at 8:30 pm. Filled under: CakePHP Tags: , , , No Comments

Unit testing in CakePHP — the missing manual and a step-by-step tutorial

Today I’ve finally formalized a streamlined procedure to do unit testing properly in CakePHP — without all the pain. Needless to say, The Cookbook’s chapter on this issue cover the basic grounds but is inconsistent and lack a real life feeling to it.

The Cookbook’s coverage is really basic and doesn’t hold up to more real life complicated cases. Cake seems to be particularly picky about its automagical (too clever for me to figure out for a long time) configurations and bark errors at me very often. That led to reluctance to write tests and sometimes giving it up all together :p

The steps I’ve written here should lay down a very solid framework to make you bullet proof for all your future testing needs.

(Of course, this might be coming a bit late since most people would be better off looking at Cake’s successor Lithium, anyway here we goes.)

Can’t gather up much time to make a polished article but I believe this point-form brain dump is much more effective than most documentation out there.

Testing models

First, some notes

  • Testing models is the most important thing. Of course, a proper MVC application should have fat models and thin controllers.

  • For the most part, you really should use fixtures even if you do not use fixtures to load data. If you just use the database live, chances are Cake might mess up your data and errors might pop up, just go ahead and use fixtures. Yes, you don’t need to load data with your fixtures if you don’t want, but you need to turn this feature on. Yes, this was the thing that confused me a lot.

  • CakePHP uses SimpleTest, you would think that you’d put your setup and teardown code in setUp() and tearDown() just like everybody else? Wrong. Cake has already invaded those spaces. You’ll need to use startTest() and endTest(). This was documented in the Cookbook but this took me quite some time to figure out since I took it for granted and didn’t RTFM.

  • I may cover controller testing later but IMO controller testing in Cake is mostly broken and is quite straight forward to figure out.

Testing models — the tutorial

First we’ll set up the database, install SimpleTest etc. If you don’t know how to do this step you should read the Real Manual first.

Set up our testing application database in (My)SQL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- We have the title field called "name" instead of "title"
-- IMO this is idiomatic Cake because your post's title
-- will now automatically show up in Post::find('list')
CREATE TABLE `posts`
(
    `id` INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
    `name` VARCHAR(50) NOT NULL,
    `body` TEXT
);

-- You may think that the "name" field should be primary key.
-- Wrong! Doing so will make Cake unhappy if you ever use
-- Cake's console schema or the all other Cake migration
-- plugins out there. The id field is the Creed.
CREATE TABLE `tags`
(
    `id` INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
    `name` VARCHAR(50) NOT NULL UNIQUE
);

-- The id field is the Creed, even for join tables
CREATE TABLE `posts_tags`
(
    `id` INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
    `post_id` INT UNSIGNED NOT NULL,
    `tag_id` INT UNSIGNED NOT NULL,
    UNIQUE KEY(post_id, tag_id)
);

Bake our stuffs. Bake has come a long way since the earlier versions and the bake in Cake 1.3 is genuinely useful:

1
2
$ cake bake all Post
$ cake bake all Tag

Now go create some test data:

1
2
3
INSERT INTO `posts` (`id`, `name`) VALUES (1, "Lorem"), (2, "Ipsum"), (3, "Dolor"), (4, "Sit");
INSERT INTO `tags` (`id`, `name`) VALUES (1, "apple"), (2, "orange");
INSERT INTO `posts_tags` (`post_id`, `tag_id`) VALUES (1, 1), (1, 2), (2, 1), (3, 2);

Now open /app/models/post.php and /app/models/tag.php. Both look OK! And Cake has gone ahead and created the test cases for us at /app/tests/cases/model. Let’s try to run it

1
2
$ cake testsuite app all
Error: Missing database table 'test_suite_posts_tags' for model 'PostsTag'

This is how broken it is! Now let’s fix it. The thing is that we haven’t baked the fixture for the join table class PostsTag. Of course, we don’t want to have to create controller, models, views just for a simple join class. Luckily we can amend it by creating all missing fixtures:

1
$ cake bake fixture all -records -count 999

This will create all the missing fixtures. The -records -count 999 part tells Cake to pull real data from our database as fixture. The -count part is needed because it defaults to 10. Might as well enter some very large number but YMMV. This will also update our existing fixtures for Post and Tag so that the new fixture data we’ve added to the database will be reflected in the fixtures. Let’s try to run the tests again:

1
2
3
$ cake testsuite app all
...
4/4  test cases complete.

Finally, we’ve got the testing architecture down. To honor TDD, let’s create (edit) our test file before we do some development. Now go to /app/tests/cases/models/post.test.php, you’ll see this line:

1
var $fixtures = array('app.tag', 'app.post', 'app.posts_tag');

This is unfortunate, because it means every time you add a new association to your model, you’ll have to manually edit this $fixtures array. Too bad there isn’t an automated way to do this. (Running bake test will overwrite your whole file. You’ve been warned!)

Anyway, let’s just write some tests for kicks in /app/tests/cases/model/post.test.php:

1
2
3
4
5
6
7
8
function testSanity()
{
    $this->assertTrue(1 == 1);

    // Our fixtures should be loaded with data
    $posts = $this->Post->find('all');
    $this->assertTrue(!empty($posts));
}

Run it, all is good:

1
2
3
$ cake testsuite app all
...
4/4 test cases complete: 2 passes.

To honor TDD, we’ll write our test first before we do any implementation. Let’s add our test function. We’re going to write a model function that will find all posts for a given tag:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function testTagged()
{
    // These are based on our testing fixture data above
    $posts = $this->Post->tagged('apple');
    $this->assertTrue(!empty($posts));
    $this->assertTrue(Set::matches('/Post[name=Lorem]', $posts));
    $this->assertTrue(Set::matches('/Post[name=Ipsum]', $posts));
    $this->assertFalse(Set::matches('/Post[name=Dolor]', $posts));

    $posts = $this->Post->tagged('orange');
    $this->assertTrue(!empty($posts));
    $this->assertTrue(Set::matches('/Post[name=Lorem]', $posts));
    $this->assertTrue(Set::matches('/Post[name=Dolor]', $posts));
    $this->assertFalse(Set::matches('/Post[name=Ipsum]', $posts));
}

Run the test and watch it fail:

1
2
$ cake testsuite app all
...

Do our implementation in /app/models/post.php

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/*
 * If this line seems alien to you, don't worry.
 * Doing HABTM in CakePHP is a train wreck and whole another
 * topic. This is probably the most elegant way to do it
 * (not necessarily the shortest and cleanest) as far as
 * I know.
 *
 * The good thing about unit test -- we can be sure
 * it works even though we don't understand it lol
 */

function tagged($tag)
{
    $this->bindModel(array('hasOne'=>array(
        'PostsTag'=>array(
            'foreignKey'=>false,
            'conditions'=>"PostsTag.post_id = Post.id"
        ),

        'Tag'=>array(
            'foreignKey'=>false,
            'conditions'=>"PostsTag.tag_id = Tag.id"
        )
    )));

    return $this->find('all', array('conditions'=>array(
        'Tag.name'=>$tag)));
}

The above trick was used to force CakePHP to do left joins for us. There is an article that talks about this technique on the nuts and bolts of cakephp blog.

Well, let’s add one more test. We want to make a function to get us the content of a post:

1
2
-- First, update our test database
UPDATE `posts` SET `body` = "Hello World!!" WHERE `name` = "Lorem";
1
2
3
4
5
6
7
8
9
10
// /app/tests/cases/models/post.test.php
// Then write our new test
function testGetContent()
{
    $post = $this->Post->findByName("Lorem");
    $this->assertTrue(!empty($post));

    $content = $this->Post->getContent($post['Post']['id']);
    $this->assertPattern('/hello world/i', $content);
}

Run it, watch it fail. And then we add our implementation:

1
2
3
4
5
6
// /app/models/post.php
function getContent($id)
{
    $post = $this->read('body', $id);
    return $post['Post']['body'];
}

Great, now run it and expect it to pass….

1
2
3
$ cake testsuite app all
...
4/4 test cases complete: 11 passes, 1 fails.

Wtf!? It failed? Yeah, we forgot to update our fixtures. That’s it. Every time we update our database, we need to update the fixture. Fortunately, this is one area from Cake that is really painless:

1
2
3
4
5
$ cake bake fixture all -records -count 999
...
$ cake testsuite app all
...
4/4 test cases complete: 12 passes.

The fixture updating part was a little redundant, but it’s better than manually updating the fixture from SQL and also from *_fixture.php. I suggest you can store all your test fixtures data into an SQL file and make some bash script or ruby script to deconstruct the database and load the test SQL file for each test run. You can play around with different connection settings if you don’t want to sabotage your main table for every test run. You can package this loading of SQL file and the fixture baking into one script file so that it can be done in one click. (left as an exercise to reader)

That’s it! In retrospect, it isn’t rocket science, it’s just that I haven’t found any good piece of comprehensive tutorial that hand holds me from start to finish. I’ve been putting off a lot of unit test writing because I’ve always had Cake barking errors at me left and right. Now there’s no excuse :P

Published by kizzx2, on July 8th, 2010 at 12:53 am. Filled under: CakePHP Tags: , , , 2 Comments