Exetools

Exetools (https://forum.exetools.com/index.php)
-   General Discussion (https://forum.exetools.com/forumdisplay.php?f=2)
-   -   Invoking Win32k syscalls from kernel space (https://forum.exetools.com/showthread.php?t=20039)

DavidXanatos 12-30-2021 20:06

Invoking Win32k syscalls from kernel space
 
I have noticed that while calling ntoskrnl.exe/ntdll.dll syscalls directly from the kernel works just fine, doing the same for win32k.sys/win32u.dll syscalls however fails when HVCI is enabled with the bug check 139, with the additional hint: Arg1: 0 - A stack-based buffer has been overrun.
Which is strange, as why would it work with HVCI enabled for the kernel itself but not for the win32 stuff? That would be problem 1.

Another issue is that before calling the first win32k sys call KiConvertToGuiThread must be triggered, unfortunately neither this nor PsConvertToGuiThread is exported, by the kernel. And this would be problem 2.

What I want to achieve as the final goal is to provide a syscall interface in my driver that an application could call, the driver would than do something in the kernel space, invoke the actual syscall, then do something else, before finally returning control to the user mode application.

Problem 2 could be successfully ignored as other un redirected win32k syscalls would have been executed at this point already.

But problem 1 is in urgent need of fixing and I’m out of ideas why this would not work with HVCI enabled for the new set of syscalls. The redirection is implemented in the same exact way as the old working one and without HVCI it works just fine.

Anyone here knowing whats going on and how to fix it?

imho the best would be a way to actually invoke the original syscall from the kernel such that it does everything including KiConvertToGuiThread , but I'm not sure if that is even possible.

Cheers
David

DavidXanatos 01-01-2022 00:50

After some debuggung and reading
https://www.crowdstrike.com/blog/state-of-exploit-development-part-1/
and
https://www.crowdstrike.com/blog/state-of-exploit-development-part-2/
I found the solution, it was quite trivial, I just had to disable "Control Flow Guard" for the one file doing this calls, LOL.

Ofcause a better solution would be to create a hand crafted trampoline instead, but well... some times its efficient to be lazy.

DavidXanatos 01-01-2022 02:53

and here we have the not lazy solution:

do_call.asm
Code:


.code

;----------------------------------------------------------------------------

ifdef _WIN64

Sbie_InvokeSyscall_asm PROC

    mov        qword ptr [rsp+20h], r9 
    mov        qword ptr [rsp+18h], r8 
    mov        qword ptr [rsp+10h], rdx 
    mov        qword ptr [rsp+8], rcx
   
    ; note: (count & 0x0F) + 4 = 19 arguments are the absolute maximum

    ; quick sanity check
    cmp        rdx, 13h ; if count > 19
    jle        arg_count_ok
    mov        rax, 0C000001Ch ; return STATUS_INVALID_SYSTEM_SERVICE
    ret
arg_count_ok:

    push        rsi
    push        rdi
    ; prepare enough stack for up to 19 arguments
    sub        rsp, 98h 
   
    ; save our 3 relevant arguments to spare registers
    mov        r11, r8  ; args
    mov        r10, rdx ; count
    mov        rax, rcx ; func

    ; check if we have higher arguments and if not skip
    cmp        r10, 4
    jle        copy_reg_args
    ; copy arguments 5-19
    mov        rsi, r11 ; source
    add        rsi, 20h
    mov        rdi, rsp ; destination
    add        rdi, 20h
    mov        rcx, r10 ; arg count
    sub        rcx, 4  ; skip the register passed args
    rep movsq

copy_reg_args:
    ; copy arguments 1-4
    mov        r9,  qword ptr [r11+18h]
    mov        r8,  qword ptr [r11+10h]
    mov        rdx, qword ptr [r11+08h]
    mov        rcx, qword ptr [r11+00h]

    ; call the function
    call        rax

    ; clear stack
    add        rsp, 98h 
    pop        rdi
    pop        rsi

    ret 

Sbie_InvokeSyscall_asm ENDP

else

_Sbie_InvokeSyscall_asm@12 PROC

    ; NTSTATUS Sbie_InvokeSyscall_asm(void* func, int count, void* args);

    ; quick sanity check
    cmp        dword ptr [esp+04h+4h], 13h ; @count
    jle        args_ok
    mov        eax, 0C000001Ch ; return STATUS_INVALID_SYSTEM_SERVICE
    ret
args_ok:

    ; prepare enough stack for up to 19 arguments
    push        ebp 
    push        esi
    push        edi
    mov        ebp, esp 
    sub        esp, 4Ch

    ; copy arguments 0-19
    mov        esi, dword ptr [ebp+10h+8h] ; source @args
    mov        edi, esp ; destination
    mov        ecx, dword ptr [ebp+10h+4h] ; arg count @count
    rep movsd

    ; call the function
    mov        eax, dword ptr [ebp+10h+0h] ; @func
    call        eax

    ; clear stack
function_end:

    mov        esp,ebp 
    pop        edi
    pop        esi
    pop        ebp
    ret 

_Sbie_InvokeSyscall_asm@12 ENDP
PUBLIC _Sbie_InvokeSyscall_asm@12

endif

;----------------------------------------------------------------------------

end

test.cpp:
Code:

#include <stdio.h>

NTSTATUS Test4(int arg1, int arg2, int arg3, int arg4)
{
    printf("arg1: %x, arg2: %x, arg3: %x, arg4: %x\r\n", arg1, arg2, arg3, arg4);
    return 0;
}

NTSTATUS Test8(int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8)
{
    printf("arg1: %x, arg2: %x, arg3: %x, arg4: %x; arg5: %x, arg6: %x, arg7: %x, arg8: %x\r\n", arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8);
    return 0;
}

NTSTATUS Test19(int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8, int arg9, int arg10, int arg11, int arg12, int arg13, int arg14, int arg15, int arg16, int arg17, int arg18, int arg19)
{
    return 123;
}

NTSTATUS Sbie_InvokeSyscall_asm(void* func, int count, void* args);

int main(int argc, char *argv[])
{
#ifdef _WIN64
    __int64 stack[19];
#else
    int stack[19];
#endif
    for (int i = 0; i < 19; i++)
        stack[i] = i + 1;

    Sbie_InvokeSyscall_asm(Test4, 4, stack);

    Sbie_InvokeSyscall_asm(Test8, 8, stack);

    Sbie_InvokeSyscall_asm(Test19, 19, stack);

    Sbie_InvokeSyscall_asm(Test19, 20, stack);

    return 0;
}


DavidXanatos 01-05-2022 06:33

I ran into a really strange issue, for some impossible reason the above code on windows 64 bit, while working fine when the app is 64 bit, fails if its a 32 bit app running under WOW64.

To my understanding this should not be possible as under WOW64 the 32 to 64 translation is done in user mode and the falls to the kernel are all 64 bit so from the kernels viewpoint everything is 64 bit.

I have monkeyed around with the code and narrowed the issue down to calls with 4 or less and 5 arguments for 6 and more the code works fine.
for 0-4 I crafted a different code that seams to work fine

Code:

Sbie_InvokeSyscall4_asm PROC

 mov        r11, rdx ; args
 mov        rax, rcx ; func

 mov        r9,  qword ptr [r11+18h]
 mov        r8,  qword ptr [r11+10h]
 mov        rdx, qword ptr [r11+08h]
 mov        rcx, qword ptr [r11+00h]

 jmp        rax

Sbie_InvokeSyscall4_asm ENDP

but for 5 arguments something is somehow broken

when I use C code to implement a caller for the 5 args case

Code:

typedef NTSTATUS (*P_SystemService05)(
ULONG_PTR arg01, ULONG_PTR arg02, ULONG_PTR arg03, ULONG_PTR arg04,
ULONG_PTR arg05);

_FX NTSTATUS Sbie_InvokeSyscall5(void* func, ULONG_PTR *stack) {

P_SystemService05 nt = (P_SystemService05)func;
return nt(stack[0], stack[1], stack[2], stack[3], stack[4]);

}

it works just fine, when I take the driver to idea, copy the ASM code of this function and put it into an ASM file

Code:

Sbie_InvokeSyscall5_asm PROC

sub rsp, 38h
mov rax, [rdx+20h]
mov r10, rdx
mov r9, [rdx+18h]
mov r11, rcx
mov r8, [rdx+10h]
mov rdx, [rdx+8]
mov rcx, [r10]
mov [rsp+38h-18h], rax
call r11
add rsp, 38h
ret

Sbie_InvokeSyscall5_asm ENDP

and use this instead, 32 bit apps fail again.

And yes I have looked in IDA on the driver with the asm version and its byte for byte same as the output of the C compiler, I even tried adding some int 3 before and after to have it aligned the same way with no result.

The only apparent difference is that its located in a different plaice of the driver image file.

Now the way the 32 bit apps fail is also quiet peculiar, they all fail with an access violation at location 0x0...0 and an empty stack trace, somehow the return from the syscall gets messed up and I have no idea how.

chants 01-05-2022 14:54

Sounds like details of calling conventions. Perhaps the stack isn't aligned to 16 bytes in the driver or something like that. I suspect the driver code more than the app code here. The WoW64 is just amplifying a preexisting bug that us perhaps extraordinarily unlikely or impossible on native 64.

App is easy to debug anyway you can check stack and register at time of call and return value immediately after. The driver is much more troublesome to debug. And for that reason it's always where the bugs end up, just to inconvenience us :)

Anyway it would be interesting to know the details of this if you figure it out.

DavidXanatos 01-05-2022 17:42

I have created a other test, where with a global variable i can toggle between the c and the asm version thats how it looks in code
Code:

.text:000000014001C66A                mov    eax, cs:g_test
.text:000000014001C670                mov    rcx, [rcx+18h]
.text:000000014001C674                test    eax, eax
.text:000000014001C676                jnz    Sbie_InvokeSyscall5_asm
.text:000000014001C67C                jmp    Sbie_InvokeSyscall5

The dispatch function is compiled with optimization and as it does not need local variables the compiler optimized the calls away.
I checked again that Sbie_InvokeSyscall5_asm and Sbie_InvokeSyscall5 are binary same, and they are.
Still toggling the variable breaks 32 bit apps.

At this point its just wired, I mean the "calling" convention is the same and the functions are the same yet the result is not, WTF :/

deepzero 01-05-2022 17:54

Can you share the binaries?

DavidXanatos 01-05-2022 18:22

Sure: [s]https://www10.zippyshare.com...[/s] PDB included
The dispatch function that calls the asm or c versions is called Syscall_Invoke its called from a couple places but most relevant from Syscall_Api_Invoke

DavidXanatos 01-05-2022 20:46

I have found the solution, I needed to add the FRAME, .allocstack and .endprolog

So in the end it was some sort of alignment issue or the compile puts some additional data some ware else that were relevant.

Code:


; NTSTATUS Sbie_InvokeSyscall_asm(void* func, ULONG count, void* args);
Sbie_InvokeSyscall_asm PROC FRAME

    ; prolog
    push        rsi
    .allocstack 8
    push        rdi
    .allocstack 8
    sub        rsp, 98h ; 8 * 19 - prepare enough stack for up to 19 arguments
    .allocstack 98h
    .endprolog
   
    ; quick sanity check
    cmp        rdx, 13h ; if count > 19
    jle        arg_count_ok
    mov        rax, 0C000001Ch ; return STATUS_INVALID_SYSTEM_SERVICE
    jmp        func_return
arg_count_ok:

    ; save our 3 relevant arguments to spare registers
    mov        r11, r8  ; args
    mov        r10, rdx ; count
    mov        rax, rcx ; func

    ; check if we have higher arguments and if not skip
    cmp        r10, 4
    jle        copy_reg_args
    ; copy arguments 5-19
    mov        rsi, r11 ; source
    add        rsi, 20h
    mov        rdi, rsp ; destination
    add        rdi, 20h
    mov        rcx, r10 ; arg count
    sub        rcx, 4  ; skip the register passed args
    rep movsq

copy_reg_args:
    ; copy arguments 1-4
    mov        r9,  qword ptr [r11+18h]
    mov        r8,  qword ptr [r11+10h]
    mov        rdx, qword ptr [r11+08h]
    mov        rcx, qword ptr [r11+00h]

    ; call the function
    call        rax

func_return:
    ; epilog
    add        rsp, 98h 
    pop        rdi
    pop        rsi

    ret 

Sbie_InvokeSyscall_asm ENDP


chants 01-06-2022 04:09

Yes the calling convention on 64bit has oddities with 16 byte alignment on calls, shadow spaces needing to be reserved and such. With driver calls it's not surprising these become necessary. The ABI convention is slightly different between Windows and Linux as well.

Quote:

https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
Quote:

Calling convention defaults
The x64 Application Binary Interface (ABI) uses a four-register fast-call calling convention by default. Space is allocated on the call stack as a shadow store for callees to save those registers.

Alignment
Most structures are aligned to their natural alignment. The primary exceptions are the stack pointer and malloc or alloca memory, which are 16-byte aligned to aid performance. Alignment above 16 bytes must be done manually. Since 16 bytes is a common alignment size for XMM operations, this value should work for most code. For more information about structure layout and alignment, see Types and Storage. For information about the stack layout, see x64 stack usage.

The callee is responsible for dumping the register parameters into their shadow space if needed.


All times are GMT +8. The time now is 14:51.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX