Code for Concinnity

Conditionally free up PostgreSQL table space without locking or extensions

The problem

Simply DELETEing rows often does not reclaim disk space in PostgreSQL without doing a VACUUM FULL. Unfortunately that locks up the table which often means the whole app being offline for unforeseeable duration.

The quick and dirty solution


-- Instead of deleting rows you don't need, keep the rows you need -- This is often quite fast if you keep only a small percentage of data select * into request_logs_backup from request_logs where id > 76667000;

-- Truncate the table. This is usually instantaneous truncate request_logs;

-- Copy the data back insert into request_logs select * from request_logs_backup; drop table request_logs_backup;


The “usual” solutions

There are dedicated extensions like pg_repack, pg_squeeze. They require installation on the server, that’s why a quick and dirty solution is proposed here.

Why copy the data instead of renaming the table?

Apparently preserving the table structure in a new table can be tricky. The above solution is defensively safe.

Published by kizzx2, on September 6th, 2021 at 3:33 pm. Filled under: UncategorizedNo Comments

Running NEO-CLI as a daemon

Surprisingly I couldn’t find an easy way to do it, so I made a Docker image to do that in one line:

docker run --name neo-cli -d --rm -it -p 10332:10332 -v $PWD/Chain:/neo-cli/Chain kizzx2/neo-cli

Published by kizzx2, on June 23rd, 2018 at 1:08 pm. Filled under: UncategorizedNo Comments

Restarting a docker container periodically

Published by kizzx2, on June 23rd, 2018 at 1:06 pm. Filled under: UncategorizedNo Comments

`ld: symbol(s) not found for architecture x86_64` when `pip install` on Mac

For example, you would get errors like this:

"PyUnicode_CompareWithASCIIString", referenced from:
    _extract_opcodes in _levenshtein.o
    _extract_editops in _levenshtein.o
"_PyUnicode_FromUnicode", referenced from:
    _median_improve_py in _levenshtein.o
    _apply_edit_py in _levenshtein.o
    _median_common in _levenshtein.o
"_PyUnicode_InternFromString", referenced from:
    _PyInit_levenshtein in levenshtein.o
"_PyUnicode_Type", referenced from:
    _hamming_py in _levenshtein.o
    _jaro_py in _levenshtein.o
    _jaro_winkler_py in _levenshtein.o
    _median_improve_py in _levenshtein.o
    _editops_py in _levenshtein.o
    _opcodes_py in _levenshtein.o
    _apply_edit_py in _levenshtein.o
"_Py_NoneStruct", referenced from:
    _median_improve_py in _levenshtein.o
    _median_common in _levenshtein.o

And here’s how to solve it

LDFLAGS='-undefined dynamic_lookup' pip install cx_Oracle # Or your awesome package
Published by kizzx2, on July 24th, 2017 at 1:58 pm. Filled under: UncategorizedNo Comments

`luarocks install alien` on OS X with Homebrew

If you just do luarocks install alien you may get this error:

checking for ffi_closure_alloc in -lffi... no
configure: error: in `/tmp/luarocks_alien-0.7.0-1-1080/alien-0.7.0':
configure: error: cannot find new enough libffi

The solution is to specify the lib directory with CFLAGS:

brew install libffi
CFLAGS='-L/usr/local/opt/libffi/lib' luarocks install alien
Published by kizzx2, on March 20th, 2015 at 10:50 am. Filled under: UncategorizedNo Comments

Heroku + Node.js: Cannot find module ‘base64′

Some older modules still use the base64 package. Solution:

heroku config:set NODE_PATH=node_modules/base64/build/Release/

Published by kizzx2, on January 21st, 2015 at 9:35 am. Filled under: UncategorizedNo Comments

EMFILE when doing `grunt nodewebkit` — solution

ulimit -n 9999 && grunt nodewebkit

Published by kizzx2, on December 22nd, 2014 at 11:53 pm. Filled under: UncategorizedNo Comments

Run single test with Spring in Rails 4.1

In the old days, you would run a single test with:

$ ruby -I test test/functional/my_controller_test.rb -n test_my_case

This is all good and well, but is painfully slow. Then there were brilliant people who created Spork, Zeus and Spring to speed up that process.

With version 4.1 Rails adopted Spring as the official preloader. Probably due to its pure Ruby nature. Unfortunately it is the only one of the pack that does not seem to support the single test usage.

After munging around I discovered a method that can be used for any preloader. By putting a script script/test.rb:

# script/test.rb
$: << Rails.root.join('test')
options = {} do |opts|
  opts.on('-n TEST_NAME') do |n|
    options[:test_name] = n
  opts.on('-e ENVIRONMENT') do |e|
    raise"Must run in test environment") if e != 'test'
test_files = ARGV.dup
if options[:test_name]
  ARGV << "-n" << options[:test_name]
test_files.each do |f|
  Dir[f].each do |f1|
    load f1

Usage is just like good old ruby -I test:

$ bin/rails r -e test script/test.rb test/controllers/my_controler_test.rb -n test_my_case
$ bin/rails r -e test script/test.rb test/controllers/**/*.rb

If that’s too long, you can just do something like alias rt=bin/rails r -e test script/test.rb in your shell’s rc

Published by kizzx2, on May 10th, 2014 at 1:16 pm. Filled under: UncategorizedNo Comments

Elegant Gradle + Android dependency management (even works for Ant-based projects!)

With Gradle the Java world has finally started to catch up in modern dependency management methodologies. Maven the technology has always worked but frankly, everybody who used that suffered.

The new Gradle support in Android’s build system is promising but unfortunately has a lot of rough edges. I have a lot of troubles getting Android Studio to work smoothly, let alone convert existing Ant-based projects.

However, you don’t need to convert your project if all you want is to download .jar files using Gradle. Just create build.gradle along side your build.xml:

Now you can invoke a command like

gradle libs

to get the .jar files copied into libs directory. The rest of your Ant/Eclipse workflow would then just work.

Published by kizzx2, on March 2nd, 2014 at 12:26 pm. Filled under: UncategorizedNo Comments

Illustrative elaboration — Why are Cassandra secondary indexes recommended for low cardinality attributes?

At the time of writing (as Cassandra is evolving very fast), Cassandra’s documentation recommends using its built-in secondary index only for low cardinality attributes (i.e. attributes with a few unique values).

The reason isn’t immediately obvious and the documentation doesn’t explain it in details. A quick Google search only yields this Jira ticket at the moment. Which does in fact answer the question but does it rather subtlely.

This is an attempt to clarify it from my understanding.

Secondary indexes are stored locally

The main difference between the primary index and secondary indexes are distributed indexes vs. local indexes, as mentioned in the above Jira ticket. Basically, that means that every node in the Cassandra cluster can answer the question “Which node contains the row with primary key d397bb236b2d6c3b6bc6fe36893ec1ea?” immediately.

However, secondary indexes are stored locally, as they are implemented as column families, it is not guaranteed that an arbitrary node can answer the question “Which node contains the Person with state = 'us'?” immediately. To answer that question, the node needs to go out and ask all nodes that question.

An example — low cardinality scenario

Suppose we build a secondary index for gender of Person in a 10 node cluster. Suppose you use RandomPartitioner as recommended, the data is distributed uniformly vs. gender for all nodes. That is, in normal cases every node should contain 50% males and 50% females.

Now if I issue a query “give me 100 males”. No matter which node I connect to, the first node will be able to answer the query without consulting other nodes (assuming each node has at least, for example, 1000 males and 1000 females, etc.).

If I were to issue a query “give me all females”, the first node (coordinator) will have to go out and ask all other nodes. Since all nodes contain 50% females, all nodes will give meaningful responses to the coordinator. Signal to noise ratio is high. Contrast this with the low signal to noise ratio scenario described below.

A counter example — high cardinality scenario

Now suppose we build a secondary index for street_address of Person in a 10 node cluster using RandomPartitioner.

Now if I issue a query “give me 3 people who live in 35 main st.” (Could be a family) With roughly 10% chance I contact the node that maintains the local index of “35 main st.” and it has 5 rows for “35 main st.”, then the coordinator can answer the query and be done with it.

In the other 90% of the time, though, the coordinator does not maintain the index “35 main st.”, so it has to go out and ask all nodes the question. Since only roughly 10% of the nodes has the answer, most nodes will give a meaningless response of “nope, I don’t have it”. The signal to noise ratio is very low and the overhead of such communication is high and wastes bandwidth.

Even if node A contains all people who lives in “35 main st.”, which we suppose is 5. If I were to issue a query “give me all people who live in 35 main st.”, node A is still going to have to go out and ask all nodes, because it does not know that, globally, only 5 people live in 35 main st. In this case, all nodes respond with “nope, I don’t have it” giving a signal to noise ratio of 0%.

So the conclusion is actually what Stu Hood mentioned in the Jira ticket:

Local indexes are better for:

– Low cardinality fields – Filtering of values in base order

Distributed indexes are better for:

– High cardinality fields – Querying of values in index order

That’s how I understood it. Hope it helps (or doesn’t hurt, at least).

Published by kizzx2, on August 18th, 2013 at 12:47 pm. Filled under: UncategorizedNo Comments