Code for Concinnity


Elegant Gradle + Android dependency management (even works for Ant-based projects!)

With Gradle the Java world has finally started to catch up in modern dependency management methodologies. Maven the technology has always worked but frankly, everybody who used that suffered.

The new Gradle support in Android’s build system is promising but unfortunately has a lot of rough edges. I have a lot of troubles getting Android Studio to work smoothly, let alone convert existing Ant-based projects.

However, you don’t need to convert your project if all you want is to download .jar files using Gradle. Just create build.gradle along side your build.xml:

Now you can invoke a command like

1
gradle libs

to get the .jar files copied into libs directory. The rest of your Ant/Eclipse workflow would then just work.

Published by kizzx2, on March 2nd, 2014 at 12:26 am. Filled under: UncategorizedNo Comments

Illustrative elaboration — Why are Cassandra secondary indexes recommended for low cardinality attributes?

At the time of writing (as Cassandra is evolving very fast), Cassandra’s documentation recommends using its built-in secondary index only for low cardinality attributes (i.e. attributes with a few unique values).

The reason isn’t immediately obvious and the documentation doesn’t explain it in details. A quick Google search only yields this Jira ticket at the moment. Which does in fact answer the question but does it rather subtlely.

This is an attempt to clarify it from my understanding.

Secondary indexes are stored locally

The main difference between the primary index and secondary indexes are distributed indexes vs. local indexes, as mentioned in the above Jira ticket. Basically, that means that every node in the Cassandra cluster can answer the question “Which node contains the row with primary key d397bb236b2d6c3b6bc6fe36893ec1ea?” immediately.

However, secondary indexes are stored locally, as they are implemented as column families, it is not guaranteed that an arbitrary node can answer the question “Which node contains the Person with state = 'us'?” immediately. To answer that question, the node needs to go out and ask all nodes that question.

An example — low cardinality scenario

Suppose we build a secondary index for gender of Person in a 10 node cluster. Suppose you use RandomPartitioner as recommended, the data is distributed uniformly vs. gender for all nodes. That is, in normal cases every node should contain 50% males and 50% females.

Now if I issue a query “give me 100 males”. No matter which node I connect to, the first node will be able to answer the query without consulting other nodes (assuming each node has at least, for example, 1000 males and 1000 females, etc.).

If I were to issue a query “give me all females”, the first node (coordinator) will have to go out and ask all other nodes. Since all nodes contain 50% females, all nodes will give meaningful responses to the coordinator. Signal to noise ratio is high. Contrast this with the low signal to noise ratio scenario described below.

A counter example — high cardinality scenario

Now suppose we build a secondary index for street_address of Person in a 10 node cluster using RandomPartitioner.

Now if I issue a query “give me 3 people who live in 35 main st.” (Could be a family) With roughly 10% chance I contact the node that maintains the local index of “35 main st.” and it has 5 rows for “35 main st.”, then the coordinator can answer the query and be done with it.

In the other 90% of the time, though, the coordinator does not maintain the index “35 main st.”, so it has to go out and ask all nodes the question. Since only roughly 10% of the nodes has the answer, most nodes will give a meaningless response of “nope, I don’t have it”. The signal to noise ratio is very low and the overhead of such communication is high and wastes bandwidth.

Even if node A contains all people who lives in “35 main st.”, which we suppose is 5. If I were to issue a query “give me all people who live in 35 main st.”, node A is still going to have to go out and ask all nodes, because it does not know that, globally, only 5 people live in 35 main st. In this case, all nodes respond with “nope, I don’t have it” giving a signal to noise ratio of 0%.

So the conclusion is actually what Stu Hood mentioned in the Jira ticket:

Local indexes are better for:

– Low cardinality fields
– Filtering of values in base order

Distributed indexes are better for:

– High cardinality fields
– Querying of values in index order

That’s how I understood it. Hope it helps (or doesn’t hurt, at least).

Published by kizzx2, on August 18th, 2013 at 12:47 pm. Filled under: UncategorizedNo Comments

Slow batch insert with MongoDB sharding and how I debugged it

Now this title sounds fairly technical and seems to belong in a bug ticket rather than a general blog post. But I want to write about it anyway because it took me a couple of hours to find out and it highlights how immature MongoDB in general is. In the end I give a solution that solves the issue so you can still get performance with sharding+batch insert.

So here we go:

A couple of weeks ago I was hanging out at the #mongodb IRC channel and troubleshooting performance issue with a guy. So he had a beefy 32 GB server with 8 cores but it was taking 20 seconds for him to insert 20000 documents as simple as this:

1
{ t: "2013-06-23", n: 7, a: 1, v: 0, j: "1234" }

So I wrote a quick script (included below) to try it on my MacBook Pro with SSD, and I was able to get results like this:

1
2
3
20000 documents inserted
Took 580ms
34482.75862068966 doc/s

So something must be wrong with his configuration / code, I thought, and I kept telling him to just run my code on his machine.

Turns out it’s due to MongoDB sharding failing with batch inserts

So it turns out performance dropped drastically also for me after I enabled sharding:

1
2
3
20000 documents inserted
Took 15701ms
1273.8042162919558 doc/s

The test set up

Here’s what I used to test:

  • 3 config servers
  • 8 shards, sharding on { _id: "hashed" }
  • All on localhost

Here is the setup script I used to create 8 shards on my localhost. (By the way, setting up sharding is painful)

The test script

The test script is as simple as possible — just a normal batch insert:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Quite a lot of orchestration
var count0 = db.foos.find().count();
var t0 = Date.now();

var docs = [];

for(var i = 0; i < 20000; i++) {
  docs.push({ t: "2013-06-23", n: 7 * i, a: 1, v: 0, j: "1234" });
}

// And actually just these couple of lines are the real action
db.foos.insert(docs);
db.getLastError();

var t1 = Date.now();
var count1 = db.foos.find().count();

var took = t1 - t0;
var count = count1 - count0;
var throughput = count / took * 1000;

print(count + " documents inserted");
print("Took " + took + "ms");
print(throughput + " doc/s");

How I systematically discovered the problem

By passing the -v option to mongod and doing something like tail -f mongolab/**/*.log, I saw tons of logs like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
==> mongolab/sh5/mongo.log <==
Sun Aug 18 10:12:23.136 [conn2] run command admin.$cmd { getLastError: 1 }
Sun Aug 18 10:12:23.136 [conn2] command admin.$cmd command: { getLastError: 1 } ntoreturn:1 keyUpdates:0  reslen:67 0ms

==> mongolab/sh6/mongo.log <==
Sun Aug 18 10:12:23.136 [conn2] run command admin.$cmd { getLastError: 1 }
Sun Aug 18 10:12:23.136 [conn2] command admin.$cmd command: { getLastError: 1 } ntoreturn:1 keyUpdates:0  reslen:67 0ms

==> mongolab/sh7/mongo.log <==
Sun Aug 18 10:12:23.137 [conn2] run command admin.$cmd { getLastError: 1 }
Sun Aug 18 10:12:23.137 [conn2] command admin.$cmd command: { getLastError: 1 } ntoreturn:1 keyUpdates:0  reslen:67 0ms

...

So mongos is splitting up the batch insert into individual inserts and donig it one by one, with a getLastError() accompanying each of them!

How to prove that it’s related to batch insert

I changed my test script to do sequential insert, and it worked out fine (note that this is still slower than non-sharded batch insert):

1
2
3
20000 documents inserted
Took 1746ms
11454.75372279496 doc/s

The moral of the lesson is that if you shard — you should benchmark very carefully if you do batch inserts.

So can I shard and still get good batch insert performance?

I figured out a way to still get good batch insert performance by using a numeric (instead of hashed) shard key:

  • sh.shardCollection("shard_test.foos", {rnd: 1})
  • Pre-chunk the collection on rnd
  • On each document, generate the rnd key db.foos.insert({ rnd: _rand(), t: ...
  • Before inserting the documents, sort the doucment array so that mongos will only send N batch inserts, if you have N shards

(The code to do all this is available in GitHub, to avoid flooding this post with code snippets)

So instead of letting mongos calculate and sort the hash keys before sending the inserts, I have to do this all by myself. This is fairly basic and I am totally shocked that it could have been solved just like that.

The last step (sorting) is also required. Apparently mongos is not smart enough to sort the batch insert to optimize its own operation.

Benchmark Summary

Time in ms (lower is better).

No-Shard No-Shard (with rnd) Shard { id: "hashed" } Shard { rnd: 1 }
Batch insert 640 740 21038 1004
Normal (sequential) insert 1404 1468 1573 1790

Note that even with the rnd insert is still slower than the non-sharded version. Granted I sharded all on a single machine but this about shows the general non-zero overhead of sharding.

Published by kizzx2, on August 18th, 2013 at 11:49 am. Filled under: Uncategorized1 Comment

Windows takes 10+ minutes to boot with black screen — Solved!

Just a quick note for anyone struggling with this. This is talking about the infamous case after the Windows logo, there is a black screen for 10 minutes until Windows shows the welcome screen. Caps Lock key is unresponsive during that time.

After struggling on and off for a couple of weeks, the culprit was discovered to be a freaking font file!

Failing with Windows EFS

It turned out that when I installed .otf files from my Desktop, which is EFS encrypted, the font files copied to C:\Windows\Fonts appeared to be encrypted as well. Because the Windows system doesn’t have the EFS key, it would stall in the black screen trying and failing to decrypt the file.

How I solved it

Just decrypt the files before installing them, duh!

Published by kizzx2, on August 3rd, 2013 at 8:00 pm. Filled under: Uncategorized Tags: , , , No Comments

Thunderbird hanging with 100% CPU usage — culprit discovered — the Google Contacts add-on

Several months ago I started using the Google Contacts add-on for Thunderbird and it’s been hanging periodically ever since. I just thought it’s Thunderbird and tried all those compacting folder, resetting the cache thing.

Today it hang again and I was frustrated, so I did dtruss (strace) on it and found that it seemed to be stuck in an infinite loop trying to access my Google contact feed.

I replaced it with the gContactSync add-on and it’s working smoothly now.

Published by kizzx2, on May 6th, 2013 at 9:35 pm. Filled under: Uncategorized Tags: , , No Comments

Install R <= 2.13 with Homebrew

Probably Homebrew has changed much since then, with Superenv and stuff. If you are stuck at an old R version and need to install it using Homebrew, you need to add depends_on :x11 to /usr/local/Library/Formula/r.rb, like so:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# /usr/local/Library/Formula/r.rb
class R < Formula
  url 'http://cran.r-project.org/src/base/R-2/R-2.13.1.tar.gz'
  homepage 'http://www.r-project.org/'
  md5 '28dd0d68ac3a0eab93fe7035565a1c30'

  depends_on 'valgrind' if valgrind?
  depends_on :x11 # Add this line

  def options
    [
      ['--with-valgrind', 'Compile an unoptimized build with support for the Valgrind debugger.']
    ]
  end


# ...
Published by kizzx2, on April 25th, 2013 at 11:29 am. Filled under: UncategorizedNo Comments

tmux-sync — Open a dozen of synchronized tmux panes easily

While managing server clusters I often want to open SSH session to a dozen of machines and have them run the same set of commands, interactively. So I wrote tmux-sync to do this:

1
tmux-sync my-session 'ssh host1' 'ssh host2' 'ssh host3' 'ssh host4'

The above command drops you into a tmux session like below. Your input will be sent to all the panes in tmux:

The source code:

Published by kizzx2, on February 8th, 2013 at 10:28 pm. Filled under: Uncategorized Tags: , , , , , No Comments

A Clean Approach to Partial Object Validation with Rails + Wicked Wizard

This is cleaner than the official method IMO, because it involves zero / minimal model object modification. The wizard logic clearly belongs to the controller only.

The basic ideas are:

  • Derive from the ActiveRecord
  • Override save to make Wicked behave. You can do arbitrarily complex thing there without wrangling with validation conditions.

Here’s the code:

Here is another version using a partial flag to persist the partial object in database:

Published by kizzx2, on February 6th, 2013 at 11:00 pm. Filled under: Uncategorized Tags: , , No Comments

Performance benchmark for Windows EFS, and BoxCryptor

Didn’t have the time to do a very serious benchmark but in case this is useful:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Some SSD, unencrypted
$ dd if=/dev/zero of=zero bs=500M count=1
1+0 records in
1+0 records out
524288000 bytes (524 MB) copied, 4.42172 s, 119 MB/s

# Some SSD, with EFS
$ dd if=/dev/zero of=zero bs=500M count=1
1+0 records in
1+0 records out
524288000 bytes (524 MB) copied, 13.4115 s, 39.1 MB/s

# Some SATA HDD, unencrypted
$ dd if=/dev/zero of=zero bs=500M count=1
1+0 records in
1+0 records out
524288000 bytes (524 MB) copied, 8.64324 s, 60.7 MB/s

# Some SATA HDD, BoxCryptor
$ dd if=/dev/zero of=zero bs=500M count=1
1+0 records in
1+0 records out
524288000 bytes (524 MB) copied, 17.8249 s, 29.4 MB/s

Another run using a Mac with some SSD hard disk. CoreStorage vs. TrueCrypt vs BoxCryptor:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Unencrypted, SSD
$ dd if=/dev/zero of=zero bs=500m count=1
1+0 records in
1+0 records out
524288000 bytes transferred in 1.431038 secs (366 MB/sec)

# CoreStorage (AES-XTS, default), SSD
$ dd if=/dev/zero of=zero bs=500m count=1
1+0 records in
1+0 records out
524288000 bytes transferred in 3.975945 secs (131 MB/sec)

# TrueCrypt (AES-XTS, default), SSD
$ dd if=/dev/zero of=zero bs=500m count=1
1+0 records in
1+0 records out
524288000 bytes transferred in 3.984627 secs (131 MB/sec)

# BoxCryptor, SSD
$ dd if=/dev/zero of=zero bs=500m count=1
1+0 records in
1+0 records out
524288000 bytes transferred in 8.353867 secs (62 MB/sec)
Published by kizzx2, on February 5th, 2013 at 1:44 am. Filled under: Uncategorized Tags: , , , , , No Comments

EFS causing lsass.exe Local Security Authority Process using 100% CPU

So I was using Windows EFS the other day to encrypt some files. This is surprisingly easy to use and beats TrueCrypt and Mac’s disk encryption in the usability department:

1
cipher /E /S:MyFolder

I encrypted AppData, which is a good thing to do since many applications leave its trace there. When I rebooted and logged in, I got lsass 100% CPU and a black desktop.

TL;DR If you changed your password via MMC, you need to re-import your EFS key

The way that EFS works is by protecting your EFS certificate with your login password. That implies some work needs to be done when you change your password. MMC’s “Set Password” doesn’t do it (it warns specifically against this, but this is the first time I actually read what it says :P).

It turns out that EFS spends quite a lot of CPU cycles for the wrong password case. Because I encrypted AppData lsass would just spin itself with the obsolete password, trying in vain to decrypt the EFS cert.

So the way out was to remove the EFS key from certmgr.msc, and then reimport it. You may need to refresh the credential cache by

1
cipher /flushcache

If it worked, you should be able to display some encrypted text files transparently:

1
type test.txt

Tip: Use another account to runas /user:Victim cmd to do the above.

It’s been reported that the 100% CPU usage be related to the large number of SID files in AppData\Microsoft\Protect. I suspect it’s another consequence of this cause.

(The correct way to change password is to select Change Password in Ctrl+Alt+Del. I don’t know of a command line way to do it :/ Feel free to post in the comments)

Published by kizzx2, on February 3rd, 2013 at 5:15 pm. Filled under: UncategorizedNo Comments