Variable size integer encoding

It is a known fact of nature that most integers are small. When transferring data over the network, or to/from disk, or pretty much anywhere the IO is slow, keeping data compact is a plus. Obviously, compression is an option, but it is fairly CPU heavy. Taking advantage of the fact that, in practice, most integer values are small is an option to effectively reduce the amount of data for a modest cost in term of CPU.

Continue reading

Using Merklix tree to shard block validation

When scaling a system, there always comes a time when all of the processing cannot be done on one machine. This problem is solved by spreading the workload across several machines. This presents some challenges however, solutions are well known in the industry.

In a trustless system such as Bitcoin, peers generally do not trust each other. Currently this is solved by having all peers doing all of the validation, which obviously doesn’t scale, at all. If we are to shard the blockchain, how can a node trust that other shards are validating properly ?

Continue reading

Efficient mempool synchronization using bloom filters and entropy coding

One of the major challenge to scale bitcoin is that a lot of the work done by nodes is reactive. Nodes receive a block and start validating. This takes time and resources and is right in the hot path. The problem gets worse the bigger the block are, to the point were consensus takes more and more time to be reached. At some point, consensus cannot be reached.

In order to improve scaling capabilities of the network, it is key that nodes do as much work as possible ahead of time. When a block is received, node just have to assert that the node effectively contains what they expected, which speeds up validation significantly.

Continue reading

Using Merklix tree to checkpoint an UTXO set

In a previous article, I introduced Merklix trees as a way to cryptographically describe an unordered set or map. It is possible to produce proof that an element is contained or absent of the datastructure in o(ln(n)), and use these proof to mutate the structure, even without knowing its full content.

I will now explain how this can be used to help handling the UTXO set in blockchain technologies like Bitcoin.

Continue reading

Introducing Merklix tree as an unordered Merkle tree on steroid

Merkle tree are awesome. They allow to hash an ordered set of elements and provide a way, known as a Merkle proof, to prove that a given element is at some position in the set without having to provide the whole set. While this is a very useful tool, the fact that it is ordered can be somewhat troublesome. Inserting or removing an element in the middle of the set will require significant work and, more generally, 2 sets containing the same elements, but in a different order will yield 2 radically different trees.

Today, I’d like to introduce a new tool to the software engineer toolbox, which, as far as I know – and that probably tells more about my searching-foo than anything else, hasn’t been proposed before. The goal is to hash an unordered set in such a way that:

  • The end result is the same, no matter in what way the tree is created
  • Insertion and deletion are ln(n)
  • It is possible to produce a proof that an element is contained in the set without producing the set

Continue reading

On code review

Code review has become a standard programming practice, for open source projects, but also for many companies that practice it in house. While this practice has proven to be very valuable, many developers struggle with it. So here are some tips and tricks that will help you work more efficiently when code review is involved.

There are (at least) 2 persons involved in code review. Someone that I’ll call a contributor, who want a piece of code to be merged, and a reviewer, who try to confirm that this piece of code meets quality standards and won’t introduce too much complexity compared to the value it provides. Making the whole exercise efficient requires each party to understand and empathize with the other and act accordingly.

Continue reading

Why Can’t Senior Programmers… Program?

This post is obviously a reference to the notorious Why Can’t Programmers.. Program?. I was earlier today reading an article about unreasonable assignment for junior programmer interview. In that article the author describe how some of one of his junior friend had to program a linked list in java that is compliant with the List interface during a job interview.

The whole assignment needed to be done in 30 minutes, and required to implement at least 2 classes and 37 public methods. It seems pretty unrealistic indeed to expect this from any programmer, especially from a junior programmer.

But that isn’t what is the most surprising about this article. The author then goes to mention he tried to do the assignment pair programming with another senior developer and they couldn’t get it nailed down in 6 hours. In fact, they didn’t even started the iterator part of the problem. Damn, 2 senior developers can’t do it that much time ? What can be so hard about it ?

And so I tried

I had to see by myself what was so hard about this, so I setup myself to do the assignment and see how it goes. It was understood that I had tight time constraint, so I had to blast through the spec getting all the methods do what they are supposed to with less than a minute each to get it by the time limit.

For reference, we are implementing the List interface and the ListIterrator interface.

Hopefully, most methods are actually quite trivial, and, without constraint on the complexity, I could do heavy code reuse. For instance, the addAll method will repeatedly call the add method, which is O((N + M) * M) while doing something in O(N + M) wouldn’t have been too hard. The documentation is also nice enough to provide you with all you need, even sometime sample code for complex condition.

Needless to say, I couldn’t get it done in 30 minutes. I needed 35. I would not recommend to use the result of these 35 minutes of intense coding in production. The complexity is high, allocations too frequent but overall, it works™. If I were to use that code, which would be stupid to boot, because Java already provide a linked list implementation in its standard library, but if I were, I would surely improve the test suite, fix the algorithms that exhibit high complexity and make the code more readable.

To my shame, I skipped retainAll, by choosing to throw an UnsupportedOperationException, which is valid per spec.

What to do during the next 5 hours and 25 minutes ?

As I expected, the assignment was way too complex for the time given, but I’m fairly confident that I can achieve some quality implementation in less than 6h by now. To boot, improving complexity, then, getting the gava test suite make sure all of it passes.

Which lead to the question: How come two senior developers can’t get it done in 6 hours ? The industry I work for is full of mysteries, and this is one of them. While I agree with the author sentiment on whiteboard interview to some extent, I may have some insight on the reasons why he never passes them. Sorry buddy.

Truth be told, pair programming, test driven development, denigrating recruiting practices and ruby on rail sounds very good on a blog, but all of it is no more than playing buzzword bingo if one can’t get shit done.

A good interview question ?

Asking to implement a List is a bad interview question. Its too long, and won’t give much signal, and don’t have a reachable first goal within the 30 minutes.

Ideally you want to have a question with an relatively easy answer, but on which one can improve. For instance, a problem with a relatively easy solution of high complexity. Various heuristics can be added and/or discussed to make it perform better.

The interviewer can get signal pretty quickly because there is an easy answer and discussion can follow on how to improve. The problem is open ended, which allow good candidates to really show how far they can go. It also allow for discussion between the interviewer and interviewee, the kind of which help to asses how one works.

Implementing List is tedious, with many corner cases, certainly not doable in 30 minutes and does not allow for much discussion between the interviewer and interviewee.

Don’t do that. Unless you want your candidate to suffer.

Especially for a ruby on rail position…

Save pixels in your Gnome Shell

I love Gnome Shell, but was bothered by the amount of pixel wasted in its interface. The title bar of window is quite large, and the activity bar is always present. A maximized window still is unable to use a significant amount of pixels in the top of the screen.

This is a major pain on small screen, but even on bigger screen, MOAR pixels for my app is better. After all, the role of my desktop is to allow me to manage my apps in a convenient way. It has to be as discrete as possible.

Using a set of extensions, I was able to achieve the merging of the activity bar and the title bar for maximized window, but recently one of these extensions stopped working and the configuration of the whole stuff was quite complex and unappealing. So I created Pixel Saver, an extension that save brave pixels !

Pixel Saver

The goal is simple : merge the title bar and the activity bar when a window is maximized, in the most natural possible way.

To do so, Pixel Saver do not require any configuration, it fetches the configuration of your title bar and adapt itself accordingly. It is easier for the user, and produce a nice result. Activate the extension and it works !

The title bar is completely gone and integrated to the activity bar.

The extension integrate title bar’s buttons in the activity bar, and change the title of the application menu to reflect the title of the maximized window. Really, nothing more need to be said.

That is Awesome ! I want to install it !

Follow the guide :

git clone
cd pixel-saver
# copy to extensions directory
cp -r ~/.local/share/gnome-shell/extensions
# activate
gnome-shell-extension-tool -e

Easy ! I hope to get it soon into gnome extension’s website.

A bug report ? A pull request ? Go on the Pixel Saver github’s page !

Happy Pixel Saving !

Type-safe tagged unions in the D programming language

A tagged union

Sometime you have some data that can be of type A or B. Very common use cases include decoding various file format or network protocols, communicate with memory mapped devices, some processing that can return an A or a B depending on its result, and many more.

To resolve that issue, it is common to use a union. A union is a aggregate where all members share the same memory. It is very handy, but use it wrongly and you end up badly messing up your memory. I’ll demonstrate in this article how to make this safe in D.

Continue reading