Variable size integer encoding

It is a known fact of nature that most integers are small. When transferring data over the network, or to/from disk, or pretty much anywhere the IO is slow, keeping data compact is a plus. Obviously, compression is an option, but it is fairly CPU heavy. Taking advantage of the fact that, in practice, most integer values are small is an option to effectively reduce the amount of data for a modest cost in term of CPU.

Continue reading

Using Merklix tree to shard block validation

When scaling a system, there always comes a time when all of the processing cannot be done on one machine. This problem is solved by spreading the workload across several machines. This presents some challenges however, solutions are well known in the industry.

In a trustless system such as Bitcoin, peers generally do not trust each other. Currently this is solved by having all peers doing all of the validation, which obviously doesn’t scale, at all. If we are to shard the blockchain, how can a node trust that other shards are validating properly ?

Continue reading

On code review

Code review has become a standard programming practice, for open source projects, but also for many companies that practice it in house. While this practice has proven to be very valuable, many developers struggle with it. So here are some tips and tricks that will help you work more efficiently when code review is involved.

There are (at least) 2 persons involved in code review. Someone that I’ll call a contributor, who want a piece of code to be merged, and a reviewer, who try to confirm that this piece of code meets quality standards and won’t introduce too much complexity compared to the value it provides. Making the whole exercise efficient requires each party to understand and empathize with the other and act accordingly.

Continue reading

Why Can’t Senior Programmers… Program?

This post is obviously a reference to the notorious Why Can’t Programmers.. Program?. I was earlier today reading an article about unreasonable assignment for junior programmer interview. In that article the author describe how some of one of his junior friend had to program a linked list in java that is compliant with the List interface during a job interview.

The whole assignment needed to be done in 30 minutes, and required to implement at least 2 classes and 37 public methods. It seems pretty unrealistic indeed to expect this from any programmer, especially from a junior programmer.

But that isn’t what is the most surprising about this article. The author then goes to mention he tried to do the assignment pair programming with another senior developer and they couldn’t get it nailed down in 6 hours. In fact, they didn’t even started the iterator part of the problem. Damn, 2 senior developers can’t do it that much time ? What can be so hard about it ?

And so I tried

I had to see by myself what was so hard about this, so I setup myself to do the assignment and see how it goes. It was understood that I had tight time constraint, so I had to blast through the spec getting all the methods do what they are supposed to with less than a minute each to get it by the time limit.

For reference, we are implementing the List interface and the ListIterrator interface.

Hopefully, most methods are actually quite trivial, and, without constraint on the complexity, I could do heavy code reuse. For instance, the addAll method will repeatedly call the add method, which is O((N + M) * M) while doing something in O(N + M) wouldn’t have been too hard. The documentation is also nice enough to provide you with all you need, even sometime sample code for complex condition.

Needless to say, I couldn’t get it done in 30 minutes. I needed 35. I would not recommend to use the result of these 35 minutes of intense coding in production. The complexity is high, allocations too frequent but overall, it works™. If I were to use that code, which would be stupid to boot, because Java already provide a linked list implementation in its standard library, but if I were, I would surely improve the test suite, fix the algorithms that exhibit high complexity and make the code more readable.

To my shame, I skipped retainAll, by choosing to throw an UnsupportedOperationException, which is valid per spec.

What to do during the next 5 hours and 25 minutes ?

As I expected, the assignment was way too complex for the time given, but I’m fairly confident that I can achieve some quality implementation in less than 6h by now. To boot, improving complexity, then, getting the gava test suite make sure all of it passes.

Which lead to the question: How come two senior developers can’t get it done in 6 hours ? The industry I work for is full of mysteries, and this is one of them. While I agree with the author sentiment on whiteboard interview to some extent, I may have some insight on the reasons why he never passes them. Sorry buddy.

Truth be told, pair programming, test driven development, denigrating recruiting practices and ruby on rail sounds very good on a blog, but all of it is no more than playing buzzword bingo if one can’t get shit done.

A good interview question ?

Asking to implement a List is a bad interview question. Its too long, and won’t give much signal, and don’t have a reachable first goal within the 30 minutes.

Ideally you want to have a question with an relatively easy answer, but on which one can improve. For instance, a problem with a relatively easy solution of high complexity. Various heuristics can be added and/or discussed to make it perform better.

The interviewer can get signal pretty quickly because there is an easy answer and discussion can follow on how to improve. The problem is open ended, which allow good candidates to really show how far they can go. It also allow for discussion between the interviewer and interviewee, the kind of which help to asses how one works.

Implementing List is tedious, with many corner cases, certainly not doable in 30 minutes and does not allow for much discussion between the interviewer and interviewee.

Don’t do that. Unless you want your candidate to suffer.

Especially for a ruby on rail position…

Type-safe tagged unions in the D programming language

A tagged union

Sometime you have some data that can be of type A or B. Very common use cases include decoding various file format or network protocols, communicate with memory mapped devices, some processing that can return an A or a B depending on its result, and many more.

To resolve that issue, it is common to use a union. A union is a aggregate where all members share the same memory. It is very handy, but use it wrongly and you end up badly messing up your memory. I’ll demonstrate in this article how to make this safe in D.

Continue reading

A story about optimization, LLVM, and the SentinelInputRange

Developers want code that runs fast and users do too. However, it is easy to fall into the trap of early optimization; this story shows that even in non-trivial cases, modern compilers can do wonders. On the other hand, the myth of the “sufficiently smart compiler” is still very relevant. The best way is to look at what is actually generated for an example and that is what I’ll do in this article. Enter the SentinelInputRange.

Continue reading

Visitor pattern revisited in D

OOP is known (or should be known) as a behavioral abstraction. It nicely solves a vast variety of problems, but enforces on the programmer an abstraction that doesn’t fit every problem. The fact is, you sometime need objects as data abstractions and not behavioral abstractions.

This is commonly the case for tools that manipulate code. A parser typically returns an AST that will be processed. The processing can be compilation, code formating, code analysis, and basically any other use related to source code manipulation.

Following regular OOP principles, manipulations have to be implemented by the AST node classes. This is a problem, because it prevents any 3rd party developer tool to reuse an existing parser and AST. The known solution is to use the Visitor pattern, which allows to dispatch in our code according to the AST node’s type.

The Visitor pattern has many known problems, to the point that many consider it an antipattern. A better solution using D has to be proposed.

Continue reading

Get an exception from a segfault on linux (x86 and x86_64), using some black magic !

Linux (as other UNIXes) allow you to register handlers for signals. Here we are interested in SIGSEGV. This signal is sent to your program when you try to use a memory location you shouldn’t. Typically, when deferencing null.

Language like Java send a NullPointerException, that can be caught and you can recover from it. However, in a system language, you usually get a cryptic « segmentation fault », you cannot recover from it and cannot have any information about it outside a debugger. Let’s see how we can fix this.

As C++ and D are system languages that support exceptions, we will use this mechanism to handle SIGSEGV. I’ll do it in D in this post, but the same is doable in C++. If you understand why it work, it shouldn’t be a problem.

Continue reading

How duplication insidiously invade our code

Duplication is probably the #1 enemy of clean code. Its presence make the code harder to maintain, harder to evolve, more bug prone, harder to test and probably eat babies every morning. If almost any developer will agree on that fact, me can also notice that almost every codebase is crippled with repetitions in the code. How could we explain that ?

Continue reading