Get an exception from a segfault on linux (x86 and x86_64), using some black magic !

Linux (as other UNIXes) allow you to register handlers for signals. Here we are interested in SIGSEGV. This signal is sent to your program when you try to use a memory location you shouldn’t. Typically, when deferencing null.

Language like Java send a NullPointerException, that can be caught and you can recover from it. However, in a system language, you usually get a cryptic « segmentation fault », you cannot recover from it and cannot have any information about it outside a debugger. Let’s see how we can fix this.

As C++ and D are system languages that support exceptions, we will use this mechanism to handle SIGSEGV. I’ll do it in D in this post, but the same is doable in C++. If you understand why it work, it shouldn’t be a problem.

It is not that simple

1
2
3
4
5
6
shared static this() {
    sigaction_t action;
    action.sa_sigaction = &handleSignal;
    action.sa_flags = SA_SIGINFO;
    sigaction(SIGSEGV, &action, null);
}

With this simple sample code, we can register our handler, called handleSignal. But this isn’t as simple as this. When you get into handleSignal, you are not in a standard execution mode. Linux stored the whole state of you application, then called you code, and then will restore thats state when you return. It makes it impossible to throw or get a correct stack trace.

Let’s fool linux into calling our code when returning from the handler !

Well, if we are not able to do whatever we want within the signal handler, then let’s modify the stored context, so linux will restore something different that execute the code we want.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
static REG_TYPE saved_EAX, saved_EDX;

extern(C)
void handleSignal(int signum, siginfo_t* info, void* contextPtr) {
    auto context = cast(ucontext_t*)contextPtr;
   
    // Save registers into global thread local, to allow recovery.
    saved_EAX = context.uc_mcontext.gregs[REG_EAX];
    saved_EDX = context.uc_mcontext.gregs[REG_EDX];
   
    // Hijack current context so we call our handler.
    auto eip = context.uc_mcontext.gregs[REG_EIP];
    auto addr = cast(REG_TYPE) info._sifields._sigfault.si_addr;
    context.uc_mcontext.gregs[REG_EAX] = addr;
    context.uc_mcontext.gregs[REG_EDX] = eip;
    context.uc_mcontext.gregs[REG_EIP] = &sigsegv_userspace_handler;
}

OK, so what is happening here ? First of all, linux is passing us a structure or type ucontext_t, that contains system dependent information about the context in which the segfault happened. This context is used by linux to restore the program state after we executed our signal handler. So let’s modify it to call the code we want.

On x86, EIP is the register that store the address of the instruction that is executed. If we modify the value of this register then the code placed at the new value will be executed when linux restore the program state. This is good, but not good enough.

It is also mandatory that we store the old EIP value. If we don’t, then we loose the information about where does the segfault happened. So we store the value of EAX and EDX into thread local global variables. And we put the address that cause the fault into EAX and the old EIP value into EDX. Now our handler have everything needed to generate a stack trace and react according to the faulting address.

Our userspace handler cannot be a regular function, that would be too easy

Our userspace handler will not be called like a regular function. The program will jump start execution its instruction directly into the context of the faulting code, except EAX, EDX ans EIP. We need to manipulate the stack to simulate a function call and save all this state before doing anything.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
void sigsegv_userspace_handler() {
    asm {
        naked;
       
        push EDX;   // return address (original EIP).
        push EBP;   // old EBP
        mov EBP, ESP;
       
        pushf;      // Save flags.
        push ECX;   // ECX is a trash register and must be preserved as local variable.
       
        // Parameter address is already set as EAX.
        call sigsegv_userspace_process;
       
        // Restore register values and return.
        call restore_registers;
       
        pop ECX;
        popf;       // Restore flags.
       
        // Return
        pop EBP;
        ret;
    }
}

// The return value is stored in EAX and EDX, so this function restore the correct value for theses registers.
REG_TYPE[2] restore_registers() {
    return [saved_EAX, saved_EDX];
}

The first 3 instruction are here to simulate a standard function call : the return address is pushed on the stack, then the base pointer (EBP) for the previous function, and finally, the base pointer is modified to reflect the new state of the stack.

Then ECX and flags are stored on he stack. On x86, EAX, ECX and EDX are trash register. It means that a function isn’t required to preserve their content. So, we need to same them before calling anything. And we call sigsegv_userspace_process, a function that will process the segfault into what ever we want.

In case our function do not throw, the assembly code bellow will restore the state of the CPU and return to the faulting address. This is useful in case we want to play with page protection, but I will not explains details of this in this post.

Now we have a routine that make the segfault appear just like a regular function call.

Wait a minute. And what if EIP is causing the segfault ?

One case isn’t handled by our code. We can decide to call or to jump into a memory that is page protected. And now, what do we do ? First of all, unless you do it manually in assembly code, the jump case can’t appear, so let’s not handle it. After all, if you do assembly by yourself, you are supposed to know what you are doing. But, with function pointers, it is possible to call an invalid piece of memory.

1
2
3
4
5
void function() fun = null;

void main() {
    fun();
}

In this case, the CPU will push the return address on the stack, but then nothing happen, because it will try to execute what is at an illegal address in memory. Let’s add this extra case in our sigsegv_userspace_handler.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void sigsegv_userspace_handler() {
    asm {
        naked;
       
        // Handle the stack for an invalid function call (segfault at EIP).
        push EBP;
        mov EBP, ESP;
       
        // We jump directly here if we are in a valid function call case.
        push EDX;   // return address (original EIP).
        push EBP;   // old EBP
        mov EBP, ESP;
       
        // Same code here, not repeated for brevity.
    }
}

And we also need to modify our signal handler, to jump at the right address depending on the case.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
static REG_TYPE saved_EAX, saved_EDX;

extern(C)
void handleSignal(int signum, siginfo_t* info, void* contextPtr) {
    auto context = cast(ucontext_t*)contextPtr;
   
    // Save registers into global thread local, to allow recovery.
    saved_EAX = context.uc_mcontext.gregs[REG_EAX];
    saved_EDX = context.uc_mcontext.gregs[REG_EDX];
   
    // Hijack current context so we call our handler.
    auto eip = context.uc_mcontext.gregs[REG_EIP];
    auto addr = cast(REG_TYPE) info._sifields._sigfault.si_addr;
    context.uc_mcontext.gregs[REG_EAX] = addr;
    context.uc_mcontext.gregs[REG_EDX] = eip;
    context.uc_mcontext.gregs[REG_EIP] = (eip != addr)?(cast(REG_TYPE) &sigsegv_userspace_handler + 0x03):(cast(REG_TYPE) &sigsegv_userspace_handler);
}

So, if the value of EIP is th same as the segfaulting address, then we are in the case where an illegal address is called. If not, then 0x03 is added to EIP to skip the instructions that handle this case. When I said black magic, I meant it !

Note that no restoring is done for such a case : this is impossible anyway and the only exit solution is to throw.

So we finally get somewhere to handle that segfault !

Yes, and now is is easy.

1
2
3
4
5
6
7
8
void sigsegv_userspace_process(void* address) {
    // The first page is protected to detect null deference.
    if((cast(size_t) address) < MEMORY_RESERVED_FOR_NULL_DEFERENCE) {
        throw new NullPointerError();
    }
   
    throw new Error("SIGSEGV");
}

What about x86_64 ? What about recovering ?

You have all required data to understand the code, so I’ll just let you read it : sigsegv.d

x86_64 assembly code is different, due to different architecture and calling convention. The example also show how you can recover without throwing by protecting a page, and unprotecting it within the userspace handler.

A pull request have been done to include that into the runtime of D : https://github.com/D-Programming-Language/druntime/pull/187

Special thank to Vladimir Panteleev and FeepingCreature for ideas that produces this code and blog post.

7 thoughts on “Get an exception from a segfault on linux (x86 and x86_64), using some black magic !

  1. A few notes:
    – With g++, I managed to “patch over” the stack, restore the context and get back to the instruction n+1 after the segfault, but my exceptions still never went through the context restoration and remained uncatched.
    If you use g++, just pass –fnon-call-exceptions to throw your exception and stay away from this SICK black magic.

    – I see no reason why the instruction that made the segfault, at which we jump back to, wouldn’t produce a segfault again, thus generating an infinite loop of segfaults (woohoo!). Am I missing something?

    – Yes this is picky, but handleSignal doesn’t need “extern(C)” in since we use it by its address and not its name so the name mangling is not a concern, whereas igsegv_userspace_process and restore_registers are called by their name in the assembly and may need it.

    Also, thanks for sharing this dirty hack!

    • First, extern(C) and extern(D) on linux have the same ABI. The main difference is mangling. It isn’t a problem here, both will do the work.

      Second, it may make sense to return to the instruction causing the segfault. But it make sense only if you make something in the handler to prevent this instruction to segfault again. It make sense, for instance, in a concurrent GC that want to use page protection to hook some logic of its own on some memory access. Most of the time, what you want is to throw. Even if not shown in the blog post – for brevity reasons -, I did a sample code to test such a situation when working on this trick and it worked fine.

      It is usually not what you want to jump to the next instruction, ignoring the instruction that caused a segfault. The only situation I see this being valid is the case where you process the instruction in the handler in a software manner – emulator style. This is something that can be considered seriously, but it is out of the scope of that blog post.

    • According to several people I talked to, it is possible to do a similar stuff on BSD like systems (including OSX). But I’m really not a specialist of such platforms, so I can’t really explain you how.

  2. This was helpful, thanks. I used the idea to deal with bad calls – in most cases they are caused by indirect calls such as call *(%eax), and if we modify eax, and get back to the call, the call will fire again but this time our throwing function (whos address we have put into eax) will be called. We don’t even need to modify code pages, just the indirect call register.

    The only problem is to detect if it was an indirect call, but that’s not too hard. Another problem is that if compiler does not expect the exception (e.g. the indirectly called function was marked throw()) it will immediately call terminate() from _cxa_throw, so no throw() specifiers is to be used for methods in the application.

    Still the trick above can be used to restore even from bad calls, because in most cases bad calls are indirect ones via registers where address is calculated (like virtual methods).

  3. Hey

    I get this error of REG_RIP code says its undeclared. I have included it in the code but still I get the same error. .

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>