Code for Concinnity

EFS causing lsass.exe Local Security Authority Process using 100% CPU

So I was using Windows EFS the other day to encrypt some files. This is surprisingly easy to use and beats TrueCrypt and Mac’s disk encryption in the usability department:

cipher /E /S:MyFolder

I encrypted AppData, which is a good thing to do since many applications leave its trace there. When I rebooted and logged in, I got lsass 100% CPU and a black desktop.

TL;DR If you changed your password via MMC, you need to re-import your EFS key

The way that EFS works is by protecting your EFS certificate with your login password. That implies some work needs to be done when you change your password. MMC’s “Set Password” doesn’t do it (it warns specifically against this, but this is the first time I actually read what it says :P).

It turns out that EFS spends quite a lot of CPU cycles for the wrong password case. Because I encrypted AppData, lsass would just spin itself with the obsolete password, trying in vain to decrypt the EFS cert.

So the way out was to remove the EFS key from certmgr.msc, and then reimport it. You may need to refresh the credential cache by

cipher /flushcache

If it worked, you should be able to display some encrypted text files transparently:

type test.txt

Tip: Use another account to runas /user:Victim cmd to do the above.

It’s been reported that the 100% CPU usage be related to the large number of SID files in AppData\Microsoft\Protect. I suspect it’s another consequence of this cause.

(The correct way to change password is to select Change Password in Ctrl+Alt+Del. I don’t know of a command line way to do it :/ Feel free to post in the comments)

Published by kizzx2, on February 3rd, 2012 at 5:15 pm. Filled under: UncategorizedNo Comments

Binding C++ classes to Lua: a step by step example for beginners

Well, I lied. This isn’t step by step but I believe this commented source code dump should be more illustrative than any tutorial I’ve read, here we go 🙂

Published by kizzx2, on January 11th, 2012 at 10:32 pm. Filled under: UncategorizedNo Comments

Premake whopping rocks! (or how Boost Build whopping sucks, or how to build Luabind for iOS proper)

Last time I wrote about how to build Luabind on OS X. Well, it turned out that was only half the battle won. Building Luabind for iOS (ARMV7) architecture just plain doesn’t work. It seems like some people have managed to make it work but none of what I have found actually worked. If you are reading this, I suppose you are also tired of jumping through hoops just to compile the damn thing, well, Premake to the rescue.

Let me tell you, this is how building stuff in C++ is meant to work:

Just create a file premake4.lua

-- premake4.lua
solution "luabind"
    configurations { "Release" }
    project "luabind"
        language "C++"
        kind "StaticLib"
        flags { "Optimize" }
        includedirs { ".", "/usr/local/include", "/where/is/your/lua" }
        files { "src/**" }


$ premake4 xcode3
$ xcodebuild -sdk iphonesimulator -configuration Release -arch i386
$ xcodebuild -sdk iphoneos -configuration Release -arch armv7

Boost Build whopping sucks. Premake is written by one guy and this turns out to be so trivial (as it should be).

Hope it helps.

Published by kizzx2, on January 10th, 2012 at 11:25 pm. Filled under: UncategorizedNo Comments

Building luabind on Mac OS X

Update: Apparently it may be working now. See Socapex’s comment below.

$ # If you have used the Homebrew recipes -- they don't work
$ brew uninstall lua
$ brew uninstall bjam
$ # The current version of LuaBind doesn't compile with Clang.
$ Fortunately the GitHub HEAD does.
$ git clone
$ # LuaBind currently does not compile with Lua 5.2
$ wget $BOOST_URL
$ wget $LUA51_URL
$ wget $LUA_URL
$ cd /where/is/boost
$ chmod a+x tools/build/v2/
$ chmod a+x tools/build/v2/engine/
$ ./ toolset=darwin
$ cp bjam /usr/local/bin
$ export BOOST_ROOT=/where/is/your/boost
$ export BOOST_BUILD_PATH=/where/is/your/boost/tools/build/v2
$ cd /where/is/lua
$ make macosx
$ # We need these because LuaBind hard-codes the non-standard directory names
$ ln -s src include
$ export LUA_PATH=/where/is/lua
$ cd /where/is/luabind
$ bjam toolset=darwin link=static
Published by kizzx2, on January 9th, 2012 at 11:15 pm. Filled under: UncategorizedNo Comments

BOOTMGR is missing — totally demytisfied

A couple of years back I blogged about how I solved the frustrating and notorious “BOOTMGR is missing” error in Windows. Today I encountered it again and this time it seems to be more tenacious but I still tamed it after numerous trials. Here I document exactly what went wrong, why it happened and how to solve it.

Background information — how does Windows boot?

Here is a quick rundown. Every step below could possibly go wrong:

  1. The BIOS loads the MBR of the booting disk by handing over control to the code residing there. MS’s MBR code will then load the boot sector of the active partition. (If something is wrong here, we need bootsect /nt60 X: /mbr).

  2. The boot sector looks for \Boot\BCD and loads boot information from there. The BCD contains where BOOTMGR resides. If this step is wrong, you will see the infamous “BOOTMGR is missing”. To fix this, we need to correct the BCD by bcdedit (see below).

  3. Finally BOOTMGR loads winload.exe. If the device or osdevice is set to the wrong partition, Windows may report that winload.exe is corrupt. Normally you fix this by bcdedit. In the rare events that winload.exe is really corrupt, you may need to do a Windows Repair.

Why did it happen?

The base case for this to happen is a multi-hard disk computer. Variations may include changing hard disks after installation, but the principle is the same.

Suppose we have a computer with 2 hard disks:

  1. HDD1 (SATA 1/IDE 1)
  2. HDD2 (SATA 2/IDE 2)

That is, HDD1 comes before HDD2 in BIOS.

Since Windows cannot determine your boot order from BIOS, it will always configures the MBR in HDD1, and configures the boot sector of the first partition of HDD1 and also put Boot\BCD in the first partition of HDD1. It will always be HDD1, no matter which disk you decide to install Windows on.

Suppose you have the following boot order in BIOS:

  1. HDD2
  2. HDD1

And suppose you have installed Windows so that:

  • HDD1 — bootloader code
  • HDD2 — actual Windows installation

Then booting will fail and you will see the notorious BOOTMGR is missing error. Another reason you may see this is that you remove HDD1 after some time, thinking that your Windows installation is nicely contained in HDD2.

Why is Windows Repair not enough?

If you use Winodws repair it will install the boot code in HDD1 and you are forced to adjust the BIOS to boot HDD2 through HDD1. For me, it was a purist’s reason to want to move everything to HDD2.

So here is how exactly I tacked it.

How to fix it

The following steps are assumed to be done in the Windows Recovery environment command line. Boot up your Windows CD and press Shift+F10 to call up the command prompt. We will assume:

  • E: to be where Windows is installed.
  • F: to be where BCD is set up.
  • X: to be the Windows installation CD

1) First, make sure you set the Windows partition as active. You can do this using various partition software or just Google how to do it. Here is an example session using diskpart. You need to replace 99 with the correct numbers:

    > diskpart
    > list disk
    > select disk 99
    > list partition
    > select partition 99
    > active
    > exit

2) Now we gotta make sure the MBR of HDD2 is good to go:

    > cd /d X:\Boot
    > bootsect /nt60 E: /mbr

3) Now migrate BOOTMGR and Boot from F: to E:

    > cd /d F:\
    > attrib -s -e -h bootmgr
    > move bootmgr E:\
    > move Boot E:\
    > cd /d E:\
    > attrib +s +e +h bootmgr

4) Now use bcdedit to fix up the BCD

> cd /d E:\Boot
> bcdedit /store BCD /set {bootmgr} device partition=E:
> bcdedit /store BCD /set {default} device partition=E:
> bcdedit /store BCD /set {default} osdevice partition=E:

5) That’s it!

How to fix it from scratch (what I actually did)

If in the process you think you have screwed up so bad that you want to start over, you don’t have to reinstall Windows:

  • It is save to wife out both BOOTMGR and the Boot directory. You can find BOOTMGR in the Windows install CD.

  • To recreate the Boot directory, run bootrec /rebuildbcd. Windows will insist on creating it in the first partition in HDD1. You can apply the above techniques to patch up the newly created BCD and then move it to E:. Alternatively you can create BCD from real scratch by using the UUIDs from bcdedit /enum all.

  • You should be able to get Windows booted with just the two files E:\BOOTMGR and E:\Boot\BCD. If that worked you will find that Windows boots with Vista style’s green bar. To fix it:

> bcdedit /store E:\Boot\BCD /set {bootmgr} locale en-US
> xcopy /e /h X:\Boot\en-us\ E:\Boot\en-us\

That should be it. Enjoy!

Published by kizzx2, on December 27th, 2011 at 10:23 pm. Filled under: UncategorizedNo Comments

Irrlicht vs. Ogre for mobile game development

I have been evaluating several 3D frameworks for cross-platform mobile game development lately. Here’s my findings. To stay objective, I am just going to state a couple of hard facts as conclusion:

  • Irrlicht is smaller than Ogre. I did an actual experiment using both. I compiled both into static libraries (required for an iPhone deployment). A “Hello World” app with just one line of irr::createDevice (to get stuff statically linked) results in an 4MB executable; the same app with one line of new Ogre::Root results in an 6.5MB executable. The Ogre one is already heavily optimized by disabling FreeImage and some other components. (For the record, Cocos3D’s “Hello World” app demo is 2.5MB, but it is not cross platform at the moment).

  • The Ogre-iPhone “branch” is official; the Irrlicht OGL-ES branch is not official (yet).

  • Ogre-iPhone can do shader programming (OpenGL ES 2.0). There are OpenGL ES 2.0 files in the Irrlicht OGL-ES branch but I have not played with it and don’t know how mature it is.

That’s it. The size point might not be that relevant for desktop development but definitely is when it comes to small game developments. I actually prefer Ogre’s “more complicated” “syntax” (a much touted disadvantage by Irrlicht’s community) better but for this one I think I’m going with Irrlicht.

Published by kizzx2, on November 20th, 2011 at 12:55 pm. Filled under: UncategorizedNo Comments

Sencha Touch vs jQuery Mobile — a first look

Most people compare Sencha Touch to jQuery Mobile. My general impression from reading the Web is that Sencha is more heavy-duty, faster than jQuery. My actual experience, however, reveals that they are actually vastly different.

I recently started using PhoneGap to do mobile development. The idea of using Web technology to write cross platform apps is just so great!

The natural thing to do with PhoneGap would be to use a framework. There are many, just to name a few that I tried:

  • xui: Lightweight and fast, just as what you’d expect — jQuery on diet. I am not sure I am unlocking its real potential since examples/demos are non-existent at the time I write this. But this gets the job done.
  • Senca Touch: Claims to be “the best HTML5 mobile Web app framework”. The demos are polished and attractive.
  • jQuery Mobile: Good old familiar jQuery. The demos feel a little sluggish but overall still look good. It has less “widgets” than Sencha.

So let me get to the point of this post: I tried Sencha first and then tried jQuery. I will tell you that Sencha has slightly better performance but it’s approach really disappointed me.

Let me quote a snippet of Sencha Touch code from the PhoneGap tutorial on its Web site. This is how a typical Sencha Touch “class” look like:

app.views.ContactDetail = Ext.extend(Ext.Panel, {
  dockedItems: [{
    xtype: 'toolbar',
    title: 'View contact',
    items: [
        text: 'Back',
        ui: 'back',
        listeners: {
          'tap': function () {
            //  controller: app.controllers.contacts,
            //  action: 'index',
            //  animation: {type:'slide', direction:'right'}
        id: 'edit',
        text: 'Edit',
        ui: 'action',
        listeners: {
          'tap': function () {
            //  controller: app.controllers.contacts,
            //  action: 'edit',
            //  id: this.record.getId()
  scroll: 'vertical',
  items: []

Whew, that’s quite a lot of stuffs. Let me compare that with a selected part of a jQuery Mobile tutorial:

$('#buttonOK').click(function() {
  return false;      
$('#form1').submit(function() {
    var err = false;
    // Hide the Main content
    // Reset the previously highlighted form elements
    // Perform form validation
    if($(this).val()==null || $(this).val()==EMPTY){  
        err = true;
      err = true;
      err = true;
    // If validation fails, show Dialog content
    if(err == true){            
      return false;
    // If validation passes, show Transition content
    // Submit the form
    $.post("/forms/requestProcessor.php", form1Var.serialize(), function(data){
    return false;      

This looks so familiar. It’s just our good old jQuery. And the page was formatted with standard, vanilla HTML5.

The difference should now be apparent — jQuery Mobile is a real “moving the Web development experience to mobile platforms”, whereas Sencha is “moving ExtJS to the mobile platform”.

If you look at the code, you’ll find that the typical Sencha Touch application bears no resemblance to how a Web app looks like — it feels like writing Java view classes. Unless you were already an ExtJS expert, it’s like going to learn a whole different ecosystem. Maybe it’s just me, but I find that way more difficult to adjust the layout than HTML or using XIB GUI editor. Writing layouts in “plain text” feels like something from the 90’s, seriously. Every other GUI toolkit I know of has a GUI editor, and plain HTML has the advantage of being ubiqutous. I can already imagine tearing the manual while writing GUI in Sencha Touch — not particularly attractive.

Using jQuery Mobile, links are automatically “hijaxed” into AJAX-y links, without you having to define these ‘tap’ actions whatsoever. The other good thing about it is that it works in a desktop browser by default, whereas Sencha Touch’s tutorial app just gives me a blank page when viewed from Firefox.

My general feeling is that if I am going to use yet another framework, I may as well go native or use Titanium which gives me native performance. For the moment I am going to stick with jQuery Mobile. f

Published by kizzx2, on October 28th, 2011 at 12:19 pm. Filled under: UncategorizedNo Comments

MiKTeX + XeTeX “\char_make_active:n {“20}%” problem and the solution!

You need to update all your packages. The latest MiKTeX package manager seems rather broken. If you use Task > Update Wizard, you’ll probably meet “The update wizard could not be found”

The solution is to open the Update program directly (type “Update” in Start Menu or find miktex-update.exe. I used both the normal variant and the “(Admin)” variant. A couple of updates later it’s finally working!

Published by kizzx2, on October 12th, 2011 at 9:28 pm. Filled under: UncategorizedNo Comments

Illustrated Haskell

I launched a new site blog Illustrated Haskell where I experiment with a new approach to explain intermediate Haskell concepts, inspired by Khan Academy‘s style. I am not doing videos (yet?) but the style is intended to be narrative and easy to follow.

Check it out 🙂

Published by kizzx2, on September 24th, 2011 at 6:13 pm. Filled under: UncategorizedNo Comments

In Search of Performance in Haskell

I was doing ITA Software’s “Word Numbers” problem. Now there are various solutions on the Web. Reviewing those solutions tell us it’s a graph/grammar problem. Today we’re going to solve it brute-force.

Now let me give a quick recap of the problem:

A "word number" is defined as

wordNumber 1 = "one" wordNumber 2 = "onetwo" wordNumber 3 = "onetwothree" ... wordNumber 12 = "onetwothreefourfivesixseveneightnineteneleventwelve" ...

Find the 51-billion-th letter of the wordNumber Infinity. Assume that letter is found for wordNumber x, also find the sum of 1 to x

(As someone noted in the StackOverflow comments, I actually misinterpreted the original problem. But that doesn’t affect our purpose here.)

A naive algorithm in C++

The idea seems simple: 2 counters, one for sum of numbers and one for length. We go from wordNumber 1 until the length meets 51000000000. Here is how I said it in C++:

OK, nothing surprising. It takes 45 seconds on my machine to execute (g++ -O3). We will use this as a baseline.

First attempt in Haskell

OK, I lied. It was not my first attempt. My initial version had space leaks which led to the program eating up 1GB RAM in several seconds. Using profiling techniques from Real World Haskell, I identified the space leaks and sprinkled Bang patterns everywhere.

So how well does that run?

Well, I don’t know.

As far as I remember, one version finished in 12 minutes, but I am not sure it is the version posted here. In short, the performance was not bearable.

Now if we were using Perl or Ruby, that might not have been too surprising. But for Haskell this seems something obvious is missing.

The solution

So there I went, I asked for help on StackOverflow and in Haskell-cafe.

There I got help from Bryan O’Sullivan who posted his version. Reiner Pope said that with -fllvm, it finishes in about 1.5 minutes. (OK, he did not exactly say that. He said Haskell version is half as fast as C++.)

But it did not work for me

So I downloaded Bryan’s code (we’ll call it RealWordNumber.hs from now on, sorry bad pun 🙂 and off I went in excitement:

ghc -O2 RealWordNumber.hs RealWordNumber Sum: 1 Segmentation fault/access violation in generated code

Since RealWordNumber.hs uses Int, there is an integer overflow in his code. That’s also exactly the reason why I had to use Int64 in my original version.

(I was using Windows where GHC only has 32-bit)

So I patched his code to use Int64 and off I went. It never finished, I had to kill it.


The culprit — 64-bit integers are slow in 32-bit programs

OK the major reason turned out to be this. To confirm, I compiled the C++ version to 32-bit using VC++ (cl /Ox WordNumber.cpp). It took 4 minutes to complete.

So I took RealWordNumber.hs to Arch Linux x64. It turned out that Int in GHC x64 is actually Int64, and the program completed in 5 minutes (ghc -O2).

Since the LLVM backend for GHC 7.0.3 seemed broken at the time I write this, and I did not want to go fixing it immediately, I trusted that it would really turn to 1.5 minutes if I ran it with -fllvm.

Now the fun part — breaking down performance improvements

“So,” I figured, “I just needed to take my original version and put it on Arch x64″. So I tried that, and it was still taking longer than my patience allowed (8+ minutes). The following will explain how I got my version up to the same league with RealWordNumber.hs, step by step.

First of all, I tuned the number down to 510 million (510000000) to avoid having to wait all day doing this.

Here is a base line using 510 million as an input:

RealWordNumber -- 1.94 seconds
K2WordNumber   -- 5.41 seconds

From here we see that if I had waited about 15 minutes, I could have seen my original version finish 😛

Now on to how I got it up to speed:

1. Change divMod to quotRem

I think I heard it from #haskell, people say that quotRem is faster than divMod. So I simply changed all divMod to quotRem. Look at what we’ve got:

K2WordNumber-1 -- 4.48 seconds

OK, that was true. A surprising huge speed up.

2. Change Int64 to Int

Since GHC x64’s Int is actually 64-bit, I changed all Int64 to Int and removed the unnecessary fromIntegral calls.

K2WordNumber-2 -- 3.20 seconds

2 simple changes, we are already getting there 😛

3. Change quotRem to separate quot and rem

Since quotRem returns a tuple of boxed Int64, which may need to be constructed and reconstructed. Max Bolingbroke suggested that using quot and rem separately might help. Here’s the experiment result

K2WordNumber-3 -- 3.13 seconds

It might have been an improvement, but might have just been experimental error. Unfortunately it seems like not much speed up was achieved.

4. Removed redundant parameters

It turned out in my original version, there were several redundant parameters being passed recursively in the function solve (the acc label, and curr). I used ghc -Wall to track them and removed them.

K2WordNumber-4 -- 3.16 seconds

No change. Since GHC figured out enough to warn me, it probably did the right thing and optimized them for me anyway.

5. Simplified parameter passing

The original solve function looked like this

solve :: Int64 -> (Int64, Int64, Int64) -> [Int64] -> (Int64, Int64, Int64)
solve !n !acc@(!sumNum, !sumLen, !curr) (!num:nums)
    | sumLen' >= n = (sumNum', sumLen, num)
    | otherwise = solve n (sumNum', sumLen', num) nums
        sumNum' = sumNum + num
        sumLen' = sumLen + wordLength num

To quote Bryan:

You’re passing parameters in three different ways: a regular Int, a 3-tuple, and a list! Whoa.

He is right. I also noticed that the n parameter does not change and does not need to be passed on. Johan Tibell blogged that GHC could in theory create this closure for us. But I did it manually to see what happens:

solve :: Int -> (Int, Int, Int)
solve n = go 0 0 1
        go !sumNum sumLen i
            | sumLen' >= n = (sumNum', sumLen, i)
            | otherwise = go sumNum' sumLen' (i+1)
                sumNum' = sumNum + i
                sumLen' = sumLen + wordLength i

That resulted in

K2WordNumber-5 -- 2.87 seconds


6. Using unboxed integer types

I replaced quot and rem with these

    (I# a) // (I# b) = I# (a `quotInt#` b)
    (I# a) % (I# b) = I# (a `remInt#` b)

K2WordNumber-6 — 2.77 seconds

Not as much as I had thought

7. Using Data.Vector.Unboxed instead of Data.Array.Unboxed

K2WordNumber-7 -- 2.50 seconds

  1. Using Data.Vector.Generic.unsafeIndex instead of the (!) operator

K2WordNumber-8 -- 2.20 seconds

Wow, this one was huge.

  1. Move the length vectors inside wordLength‘s closure
wordLength :: Int -> Int
wordLength i = go 0 i
        go !pad !n
            | n < 10         = lenOnes `VG.unsafeIndex` n + pad
            | n < 20         = lenTeens `VG.unsafeIndex` (n-10) + pad
            | n < 100        = go (lenTens `VG.unsafeIndex` (n//10) + pad) (n%10)
            | n < 1000       = go (go (7+pad) (n//100)) (n%100)
            | n < 1000000    = go (go (8+pad) (n//1000)) (n%1000)
            | otherwise      = go (go (7+pad) (n//1000000)) (n%1000000)
        (I# a) // (I# b) = I# (a `quotInt#` b)
        (I# a) % (I# b) = I# (a `remInt#` b)
        lenOnes = VU.fromList [0,3,3,5,4,4,3,5,5,4] -- "", "one","two", ...
        lenTens = VU.fromList [0,3,6,6,5,5,5,7,6,6]
        lenTeens = VU.fromList [3,6,6,8,8,7,7,9,8,8] -- first element is "ten" 3

K2WordNumber-9a — 2.20 seconds

No improvement at all… But if I add bangs before the vectors…

!lenOnes = VU.fromList [0,3,3,5,4,4,3,5,5,4] -- "", "one","two", ...
!lenTens = VU.fromList [0,3,6,6,5,5,5,7,6,6]
!lenTeens = VU.fromList [3,6,6,8,8,7,7,9,8,8] -- first element is "ten" 3

K2WordNumber-9 — 2.00 seconds

WOW!! That was a large speed up. It turns out the reason for the speed up is related to strictness. wordLength is not strict in all lenOnes, lenTens and lenTeens, so they might be garbaged collected (?). Putting bangs on them allow them to stay alive. Since we need to use those arrays very frequently, it was better for them to be around all the time.

Haskell did not allow putting bangs on “global” variables, so I missed it in my original bang sprinkling exercise.

OK, so now we have transformed my original sloppy Haskell program to a “well-performing” Haskell program. Since it performs the same as Bryan O’Sullivan (casually) tuned version, I am going to assume it’s probably as fast as it should (before maybe resorting to serious hacks).

And here is the final version of the code:

Moral of the story

  • Haskell is a very fast functional language. But for tight areas like this, always consider giving plain old C a try — it may just save you days of profiling.
  • 64-bit operations in 32-bit programs are slow (*)
  • quotRem is faster than divMod
  • Earlier I argued in #haskell that “if Integer vs Int matters for your performance, your program is probably written wrong”. Well, there are cases where it matters quite a lot. It turns out conversions are quite costly. (Though I still believe people should use Integer by default)
  • Use the wrapper-worker pattern liberally.
  • If you recurse, make sure every variable actually changes in each recursion. (For most cases this will benefit. If you only recurse several times maybe the closure’s overhead will be larger.)
  • Don’t use a list or tuple just because infinite lists look smart.
  • Data.Vector is a better “general case” container than Data.Array
  • Try to reduce the scope of “global” variables. Put them in a closure so you can bang on it.

  • This may be actually the suckage of Windows’ “WOW64″ (32-bit emulation layer). I have not tried to prove this on 32-bit Linux GHC)

Extra: the fastest indexing data structure — functions!?

I heard from #haskell that “pattern matching in Haskell is less than O(n)”. When I heard it, I didn’t really believe it. I have always thought that N pattern matches means translating into N if-then-else statements for the general case. He said something related to Church encoding but I did not see how it could be applied in the general case to avoid N if-then-else statements.

Just for fun, I tried changing the length vectors into functions:

lenOnes 0 = 0
lenOnes 1 = 3
lenOnes 2 = 3
lenOnes 3 = 5
-- ...
lenTeens 9 = 8

K2WordNumber-10 — 2.00 seconds

Yes, pattern matching is as fast as unsafeIndex. WOW! wren nt thornton from the comments below give a more thorough explanation on this one. It turns out GHC is really darn smart and can figure out fast code given “primitive” definitions.

Published by kizzx2, on August 10th, 2011 at 12:21 pm. Filled under: UncategorizedNo Comments