#1
|
|||
|
|||
Invoking Win32k syscalls from kernel space
I have noticed that while calling ntoskrnl.exe/ntdll.dll syscalls directly from the kernel works just fine, doing the same for win32k.sys/win32u.dll syscalls however fails when HVCI is enabled with the bug check 139, with the additional hint: Arg1: 0 - A stack-based buffer has been overrun.
Which is strange, as why would it work with HVCI enabled for the kernel itself but not for the win32 stuff? That would be problem 1. Another issue is that before calling the first win32k sys call KiConvertToGuiThread must be triggered, unfortunately neither this nor PsConvertToGuiThread is exported, by the kernel. And this would be problem 2. What I want to achieve as the final goal is to provide a syscall interface in my driver that an application could call, the driver would than do something in the kernel space, invoke the actual syscall, then do something else, before finally returning control to the user mode application. Problem 2 could be successfully ignored as other un redirected win32k syscalls would have been executed at this point already. But problem 1 is in urgent need of fixing and I’m out of ideas why this would not work with HVCI enabled for the new set of syscalls. The redirection is implemented in the same exact way as the old working one and without HVCI it works just fine. Anyone here knowing whats going on and how to fix it? imho the best would be a way to actually invoke the original syscall from the kernel such that it does everything including KiConvertToGuiThread , but I'm not sure if that is even possible. Cheers David |
#2
|
|||
|
|||
After some debuggung and reading
https://www.crowdstrike.com/blog/state-of-exploit-development-part-1/ and https://www.crowdstrike.com/blog/state-of-exploit-development-part-2/ I found the solution, it was quite trivial, I just had to disable "Control Flow Guard" for the one file doing this calls, LOL. Ofcause a better solution would be to create a hand crafted trampoline instead, but well... some times its efficient to be lazy. |
The Following User Says Thank You to DavidXanatos For This Useful Post: | ||
tonyweb (01-01-2022) |
#3
|
|||
|
|||
and here we have the not lazy solution:
do_call.asm Code:
.code ;---------------------------------------------------------------------------- ifdef _WIN64 Sbie_InvokeSyscall_asm PROC mov qword ptr [rsp+20h], r9 mov qword ptr [rsp+18h], r8 mov qword ptr [rsp+10h], rdx mov qword ptr [rsp+8], rcx ; note: (count & 0x0F) + 4 = 19 arguments are the absolute maximum ; quick sanity check cmp rdx, 13h ; if count > 19 jle arg_count_ok mov rax, 0C000001Ch ; return STATUS_INVALID_SYSTEM_SERVICE ret arg_count_ok: push rsi push rdi ; prepare enough stack for up to 19 arguments sub rsp, 98h ; save our 3 relevant arguments to spare registers mov r11, r8 ; args mov r10, rdx ; count mov rax, rcx ; func ; check if we have higher arguments and if not skip cmp r10, 4 jle copy_reg_args ; copy arguments 5-19 mov rsi, r11 ; source add rsi, 20h mov rdi, rsp ; destination add rdi, 20h mov rcx, r10 ; arg count sub rcx, 4 ; skip the register passed args rep movsq copy_reg_args: ; copy arguments 1-4 mov r9, qword ptr [r11+18h] mov r8, qword ptr [r11+10h] mov rdx, qword ptr [r11+08h] mov rcx, qword ptr [r11+00h] ; call the function call rax ; clear stack add rsp, 98h pop rdi pop rsi ret Sbie_InvokeSyscall_asm ENDP else _Sbie_InvokeSyscall_asm@12 PROC ; NTSTATUS Sbie_InvokeSyscall_asm(void* func, int count, void* args); ; quick sanity check cmp dword ptr [esp+04h+4h], 13h ; @count jle args_ok mov eax, 0C000001Ch ; return STATUS_INVALID_SYSTEM_SERVICE ret args_ok: ; prepare enough stack for up to 19 arguments push ebp push esi push edi mov ebp, esp sub esp, 4Ch ; copy arguments 0-19 mov esi, dword ptr [ebp+10h+8h] ; source @args mov edi, esp ; destination mov ecx, dword ptr [ebp+10h+4h] ; arg count @count rep movsd ; call the function mov eax, dword ptr [ebp+10h+0h] ; @func call eax ; clear stack function_end: mov esp,ebp pop edi pop esi pop ebp ret _Sbie_InvokeSyscall_asm@12 ENDP PUBLIC _Sbie_InvokeSyscall_asm@12 endif ;---------------------------------------------------------------------------- end Code:
#include <stdio.h> NTSTATUS Test4(int arg1, int arg2, int arg3, int arg4) { printf("arg1: %x, arg2: %x, arg3: %x, arg4: %x\r\n", arg1, arg2, arg3, arg4); return 0; } NTSTATUS Test8(int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8) { printf("arg1: %x, arg2: %x, arg3: %x, arg4: %x; arg5: %x, arg6: %x, arg7: %x, arg8: %x\r\n", arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8); return 0; } NTSTATUS Test19(int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8, int arg9, int arg10, int arg11, int arg12, int arg13, int arg14, int arg15, int arg16, int arg17, int arg18, int arg19) { return 123; } NTSTATUS Sbie_InvokeSyscall_asm(void* func, int count, void* args); int main(int argc, char *argv[]) { #ifdef _WIN64 __int64 stack[19]; #else int stack[19]; #endif for (int i = 0; i < 19; i++) stack[i] = i + 1; Sbie_InvokeSyscall_asm(Test4, 4, stack); Sbie_InvokeSyscall_asm(Test8, 8, stack); Sbie_InvokeSyscall_asm(Test19, 19, stack); Sbie_InvokeSyscall_asm(Test19, 20, stack); return 0; } Last edited by DavidXanatos; 01-01-2022 at 05:42. |
#4
|
|||
|
|||
I ran into a really strange issue, for some impossible reason the above code on windows 64 bit, while working fine when the app is 64 bit, fails if its a 32 bit app running under WOW64.
To my understanding this should not be possible as under WOW64 the 32 to 64 translation is done in user mode and the falls to the kernel are all 64 bit so from the kernels viewpoint everything is 64 bit. I have monkeyed around with the code and narrowed the issue down to calls with 4 or less and 5 arguments for 6 and more the code works fine. for 0-4 I crafted a different code that seams to work fine Code:
Sbie_InvokeSyscall4_asm PROC mov r11, rdx ; args mov rax, rcx ; func mov r9, qword ptr [r11+18h] mov r8, qword ptr [r11+10h] mov rdx, qword ptr [r11+08h] mov rcx, qword ptr [r11+00h] jmp rax Sbie_InvokeSyscall4_asm ENDP when I use C code to implement a caller for the 5 args case Code:
typedef NTSTATUS (*P_SystemService05)( ULONG_PTR arg01, ULONG_PTR arg02, ULONG_PTR arg03, ULONG_PTR arg04, ULONG_PTR arg05); _FX NTSTATUS Sbie_InvokeSyscall5(void* func, ULONG_PTR *stack) { P_SystemService05 nt = (P_SystemService05)func; return nt(stack[0], stack[1], stack[2], stack[3], stack[4]); } Code:
Sbie_InvokeSyscall5_asm PROC sub rsp, 38h mov rax, [rdx+20h] mov r10, rdx mov r9, [rdx+18h] mov r11, rcx mov r8, [rdx+10h] mov rdx, [rdx+8] mov rcx, [r10] mov [rsp+38h-18h], rax call r11 add rsp, 38h ret Sbie_InvokeSyscall5_asm ENDP And yes I have looked in IDA on the driver with the asm version and its byte for byte same as the output of the C compiler, I even tried adding some int 3 before and after to have it aligned the same way with no result. The only apparent difference is that its located in a different plaice of the driver image file. Now the way the 32 bit apps fail is also quiet peculiar, they all fail with an access violation at location 0x0...0 and an empty stack trace, somehow the return from the syscall gets messed up and I have no idea how. |
#5
|
|||
|
|||
Sounds like details of calling conventions. Perhaps the stack isn't aligned to 16 bytes in the driver or something like that. I suspect the driver code more than the app code here. The WoW64 is just amplifying a preexisting bug that us perhaps extraordinarily unlikely or impossible on native 64.
App is easy to debug anyway you can check stack and register at time of call and return value immediately after. The driver is much more troublesome to debug. And for that reason it's always where the bugs end up, just to inconvenience us Anyway it would be interesting to know the details of this if you figure it out. |
#6
|
|||
|
|||
I have created a other test, where with a global variable i can toggle between the c and the asm version thats how it looks in code
Code:
.text:000000014001C66A mov eax, cs:g_test .text:000000014001C670 mov rcx, [rcx+18h] .text:000000014001C674 test eax, eax .text:000000014001C676 jnz Sbie_InvokeSyscall5_asm .text:000000014001C67C jmp Sbie_InvokeSyscall5 I checked again that Sbie_InvokeSyscall5_asm and Sbie_InvokeSyscall5 are binary same, and they are. Still toggling the variable breaks 32 bit apps. At this point its just wired, I mean the "calling" convention is the same and the functions are the same yet the result is not, WTF :/ |
#7
|
||||
|
||||
Can you share the binaries?
|
#8
|
|||
|
|||
Sure: [s]https://www10.zippyshare.com...[/s] PDB included
The dispatch function that calls the asm or c versions is called Syscall_Invoke its called from a couple places but most relevant from Syscall_Api_Invoke Last edited by DavidXanatos; 01-06-2022 at 17:15. |
The Following User Gave Reputation+1 to DavidXanatos For This Useful Post: | ||
user1 (01-05-2022) |
#9
|
|||
|
|||
I have found the solution, I needed to add the FRAME, .allocstack and .endprolog
So in the end it was some sort of alignment issue or the compile puts some additional data some ware else that were relevant. Code:
; NTSTATUS Sbie_InvokeSyscall_asm(void* func, ULONG count, void* args); Sbie_InvokeSyscall_asm PROC FRAME ; prolog push rsi .allocstack 8 push rdi .allocstack 8 sub rsp, 98h ; 8 * 19 - prepare enough stack for up to 19 arguments .allocstack 98h .endprolog ; quick sanity check cmp rdx, 13h ; if count > 19 jle arg_count_ok mov rax, 0C000001Ch ; return STATUS_INVALID_SYSTEM_SERVICE jmp func_return arg_count_ok: ; save our 3 relevant arguments to spare registers mov r11, r8 ; args mov r10, rdx ; count mov rax, rcx ; func ; check if we have higher arguments and if not skip cmp r10, 4 jle copy_reg_args ; copy arguments 5-19 mov rsi, r11 ; source add rsi, 20h mov rdi, rsp ; destination add rdi, 20h mov rcx, r10 ; arg count sub rcx, 4 ; skip the register passed args rep movsq copy_reg_args: ; copy arguments 1-4 mov r9, qword ptr [r11+18h] mov r8, qword ptr [r11+10h] mov rdx, qword ptr [r11+08h] mov rcx, qword ptr [r11+00h] ; call the function call rax func_return: ; epilog add rsp, 98h pop rdi pop rsi ret Sbie_InvokeSyscall_asm ENDP |
The Following User Says Thank You to DavidXanatos For This Useful Post: | ||
niculaita (01-06-2022) |
#10
|
|||
|
|||
Yes the calling convention on 64bit has oddities with 16 byte alignment on calls, shadow spaces needing to be reserved and such. With driver calls it's not surprising these become necessary. The ABI convention is slightly different between Windows and Linux as well.
Quote:
Quote:
|
|
|