Detecting Indirect Syscalls from Userland, A Naive Approach.


Our goal is to detect indirect syscalls. Indirect syscalls have taken the red teaming community by storm due to the newfound ability of EDRs to detect direct syscalls. To understand how to catch them, we must first understand a bit about them.

For example (have a look at SysWhispers for a pratical implmentation)
mov r10, rcx
mov rcx, <SSN>
syscall
ret

Various individuals utilized instrumentation callbacks, as documented in these two blog posts. 

Instrumentation callbacks are a post-op syscall hook that allows a user to execute a specified function before the kernel returns execution to the userland process. There's a great article on CodeProject which discusses this. The premise for these detections identifies where the syscall instruction comes from; if it is not NTDLL/win32u (the only 2 DLLs where syscall instructions should originate), then it is most likely a direct syscall.

Indirect syscalls sought to resolve this by just going to the syscall, and what this means is once we have set up our registers and stack (per the Windows Calling Convention) and have retrieved the SSN in some manner and the corresponding syscall address to that SSN we just jump to the syscall address of that function. Here's a PoC by httpyxel, and there's a great video by KlezVirus on said topic.

Given our previous explanation, it should be trivial to understand how indirect syscalls can bypass the detection routines laid out in the last blog posts. The syscall originates from the correct DLL, and we avoid all the userland hooks as we jump STRAIGHT to the syscall instruction and avoid the userland introspection.

Now, a couple of days ago I was going down a rabbit hole and ended up reading this blog post (outstanding) on one of the features of the Battleye anti-cheat engine. The entire series gave me a cluster of ideas, and it's well-written, so I recommend checking it out. One of those naive ideas was one of the features of the Battleye shellcode8kb, which was injected into the target application, registered a Vectored Exception Handler, and hooked a bunch of functions. It is not advisable for an EDR application to go around injecting shellcode into userland processes, so that option is out the window. Still, I liked the idea of selectively hooking a few functions. And when we have to hook a limited number of functions, what constrained resource can we use...? Debug registers!

Debug registers ARE a limited resource, as we only have 4 per thread. My initial idea was drawn in paint, as do all my ideas (though the idea of injecting shellcode was later discarded):






We'd set a hardware breakpoint on the syscall (or, equivalently, the ret) instruction. The indirect syscall will then jmp/call our instruction. It would then hit our breakpoint where we'd single step while our static count is != 0. We'd single step once to end up at the ret instruction and once more to take that ret. When we take that ret, we end up either in the indirect syscall function OR the indirect syscall, depending on how the program invoked the call. If this is a legitimate function, it should originate from one of the DLLs: e.g., Kernel32/Kernelbase. 

This, however, can be faked, as the rop wizard JonasLyk pointed out, by using a rop gadget in one of the legitimate DLLs such as ret (as well as jmp/call/iret). So we'll make a check when we've single-stepped from syscall -> ret -> check here to check if the *(RIP) == 0xC3 or any of the other encodings for ret.

The various use cases for this would be EDRs setting this from their userland DLL every X interval against high-value functions such as NtMapViewOfSection, and NtAllocateVirtualMemoryEx. This should be trivial to accomplish/port with the PoC I provided below.

 A wilder idea I discussed with mez0 was that we register an instrumentation callback that checks if the next instruction is a ret (to avoid hooking direct syscalls). If so, set a debug register against it. We then single step once, so we go from ret -> calling function (where we can remove the debug register) and perform all our checks. This would give us introspection over a broader range of syscalls instead of a select few!

This is an incredibly naive approach to detection, and I'm sure there are bypasses. As Sun Tzu famously said, "If you know the enemy and know yourself, you need not fear the result of a hundred battles." the mere knowledge of the existence of an endpoint security product immediately puts the defenders at a disadvantage.

This could also present false positives if the target function calls the Nt function directly and not through one of the wrapper functions. (e.g., calling NtCreateThreadEx directly and not through one of its wrapper functions, though this in itself is grounds for suspicious behavior)

// detect_indirect_syscalls.c - rad9800

#include <windows.h>
#include <stdio.h>
#include "c_syscalls.h" // janoglezcampos/c_syscalls

#define SINGLE_STEP_COUNT 2

uintptr_t k32_h;
uintptr_t kb_h;
DWORD k32_s;
DWORD kb_s;

uintptr_t
find_gadget(
    _In_ const uintptr_t function,
    _In_ const BYTE* stub,
    _In_ const UINT size
)
{
    for (unsigned int i = 0; i < 25u; i++)
    {
        if (memcmp((LPVOID)(function + i), stub, size) == 0) {
            return (function + i);
        }
    }
    return 0ull;
}


BOOL
set_hardware_breakpoint(
    _In_ const DWORD tid,
    _In_ const uintptr_t address,
    _In_ const UINT pos,
    _In_ const BOOL init
)
{
    CONTEXT context = { .ContextFlags = CONTEXT_DEBUG_REGISTERS };
    HANDLE thd = INVALID_HANDLE_VALUE;
    BOOL res = FALSE;

    if (tid == GetCurrentThreadId())
    {
        thd = GetCurrentThread();
    }
    else
    {
        thd = OpenThread(THREAD_ALL_ACCESS, FALSE, tid);
    }

    res = GetThreadContext(thd, &context);

    if (init && res)
    {
        (&context.Dr0)[pos] = address;
        context.Dr7 &= ~(3ull << (16 + 4 * pos));
        context.Dr7 &= ~(3ull << (18 + 4 * pos));
        context.Dr7 |= 1ull << (2 * pos);
    }
    else
    {
        if ((&context.Dr0)[pos] == address)
        {
            context.Dr7 &= ~(1ull << (2 * pos));
            (&context.Dr0)[pos] = 0ull;
        }
    }

    res = SetThreadContext(thd, &context);

    if (thd != INVALID_HANDLE_VALUE) CloseHandle(thd);

    return res;
}

LONG WINAPI exception_handler(
    const PEXCEPTION_POINTERS ExceptionInfo
)
{
    if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP)
    {
        static   unsigned short count = SINGLE_STEP_COUNT;
        if (count > 0)
        {
            ExceptionInfo->ContextRecord->EFlags |= 1 << 8; // TF
            count--;
        }
        else if (count == 0)
        {
            printf("[%u] syscall -> ret -> 0x%p\n", count, (PVOID)ExceptionInfo->ContextRecord->Rip);
            uintptr_t address = ExceptionInfo->ContextRecord->Rip;
            BOOL legit = FALSE;

            //
            // syscall -> ret -> ...
            //
            if (address >= k32_h && address <= k32_h + k32_s
                || address >= kb_h && address <= kb_h + kb_s)
            {
                //
                // check opcode is not ret opcode
                //
                char opcode = *(char*)ExceptionInfo->ContextRecord->Rip;
                if (opcode != 0xC3 && opcode != 0xCB
                 && opcode != 0xC2 && opcode != 0xCA)
                    legit = TRUE;

            }

            printf("\n[+] %s SYSCALL DETECTED\n\n", legit ? "LEGIT" : "INDIRECT");

            count = SINGLE_STEP_COUNT;
        }

        ExceptionInfo->ContextRecord->EFlags |= 1 << 16; // RF

        return EXCEPTION_CONTINUE_EXECUTION;
    }

}

DWORD WINAPI test_thread(
    _In_ LPVOID lpParameter)
{
    return 0;
}

uintptr_t set_module_values(
    _In_ uintptr_t module,
    _Out_ DWORD* size)
{
    PIMAGE_NT_HEADERS nt = (PIMAGE_NT_HEADERS)(module + ((PIMAGE_DOS_HEADER)module)->e_lfanew);

    for (int i = 0; i < nt->FileHeader.NumberOfSections; i++) {
        const PIMAGE_SECTION_HEADER section =
            (PIMAGE_SECTION_HEADER)((DWORD_PTR)IMAGE_FIRST_SECTION(nt) +
                (DWORD_PTR)IMAGE_SIZEOF_SECTION_HEADER * i);

        if ((*(ULONG*)section->Name | 0x20202020) == 'xet.') {
            ULONG dw;
            module = module + section->VirtualAddress;
            *size = section->Misc.VirtualSize;
            break;
        }
    }
    return module;
}


int main()
{
    //
    // Register our exception handler
    //
    const PVOID handler = AddVectoredExceptionHandler(1, exception_handler);


    //
    // Set the global values of the DLL .text sections VA and SZ
    //
    k32_h = set_module_values(GetModuleHandleA("KERNEL32.dll"), &k32_s);
    kb_h = set_module_values(GetModuleHandleA("KERNELBASE.dll"), &kb_s);

    //
    // Find address to breakpoint on and set it.
    //
    const uintptr_t syscall_address1 =
        find_gadget(GetProcAddress(GetModuleHandleA("NTDLL.dll"),
            "NtTestAlert"), "\x0F\x05", 2);
    set_hardware_breakpoint(GetCurrentThreadId(), syscall_address1, 1, TRUE);

    const uintptr_t syscall_address2 =
        find_gadget(GetProcAddress(GetModuleHandleA("NTDLL.dll"),
            "NtCreateThreadEx"), "\x0F\x05", 2);
    set_hardware_breakpoint(GetCurrentThreadId(), syscall_address2, 2, TRUE);


    //
    // Test cases
    //
    printf("[-] Testing indirect syscall.\n");
    NTSTATUS status = Syscall(NT_TEST_ALERT);
    printf("[-] Testing legitimate syscall.\n");
    const HANDLE t = CreateThread(NULL, 0, test_thread, NULL, 0, NULL);
    if (t) {
        WaitForSingleObject(t, INFINITE); CloseHandle(t);
    }

    //
    // Disable the hardware breakpoint
    //
    set_hardware_breakpoint(GetCurrentThreadId(), syscall_address1, 1, FALSE);
    set_hardware_breakpoint(GetCurrentThreadId(), syscall_address2, 2, FALSE);

    //
    // Remove our registered VEH
    //
    if (handler != NULL) RemoveVectoredExceptionHandler(handler);
}


Comments

Popular posts from this blog

D-Generating EDR Internals, Part 1

TamperingSyscalls