Exetools  

Go Back   Exetools > General > General Discussion

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 12-30-2021, 20:06
DavidXanatos DavidXanatos is offline
Family
 
Join Date: Jun 2018
Posts: 179
Rept. Given: 2
Rept. Rcvd 46 Times in 32 Posts
Thanks Given: 58
Thanks Rcvd at 350 Times in 116 Posts
DavidXanatos Reputation: 46
Invoking Win32k syscalls from kernel space

I have noticed that while calling ntoskrnl.exe/ntdll.dll syscalls directly from the kernel works just fine, doing the same for win32k.sys/win32u.dll syscalls however fails when HVCI is enabled with the bug check 139, with the additional hint: Arg1: 0 - A stack-based buffer has been overrun.
Which is strange, as why would it work with HVCI enabled for the kernel itself but not for the win32 stuff? That would be problem 1.

Another issue is that before calling the first win32k sys call KiConvertToGuiThread must be triggered, unfortunately neither this nor PsConvertToGuiThread is exported, by the kernel. And this would be problem 2.

What I want to achieve as the final goal is to provide a syscall interface in my driver that an application could call, the driver would than do something in the kernel space, invoke the actual syscall, then do something else, before finally returning control to the user mode application.

Problem 2 could be successfully ignored as other un redirected win32k syscalls would have been executed at this point already.

But problem 1 is in urgent need of fixing and I’m out of ideas why this would not work with HVCI enabled for the new set of syscalls. The redirection is implemented in the same exact way as the old working one and without HVCI it works just fine.

Anyone here knowing whats going on and how to fix it?

imho the best would be a way to actually invoke the original syscall from the kernel such that it does everything including KiConvertToGuiThread , but I'm not sure if that is even possible.

Cheers
David
Reply With Quote
The Following 2 Users Say Thank You to DavidXanatos For This Useful Post:
niculaita (12-30-2021), user1 (12-31-2021)
  #2  
Old 01-01-2022, 00:50
DavidXanatos DavidXanatos is offline
Family
 
Join Date: Jun 2018
Posts: 179
Rept. Given: 2
Rept. Rcvd 46 Times in 32 Posts
Thanks Given: 58
Thanks Rcvd at 350 Times in 116 Posts
DavidXanatos Reputation: 46
After some debuggung and reading
https://www.crowdstrike.com/blog/state-of-exploit-development-part-1/
and
https://www.crowdstrike.com/blog/state-of-exploit-development-part-2/
I found the solution, it was quite trivial, I just had to disable "Control Flow Guard" for the one file doing this calls, LOL.

Ofcause a better solution would be to create a hand crafted trampoline instead, but well... some times its efficient to be lazy.
Reply With Quote
The Following User Says Thank You to DavidXanatos For This Useful Post:
tonyweb (01-01-2022)
  #3  
Old 01-01-2022, 02:53
DavidXanatos DavidXanatos is offline
Family
 
Join Date: Jun 2018
Posts: 179
Rept. Given: 2
Rept. Rcvd 46 Times in 32 Posts
Thanks Given: 58
Thanks Rcvd at 350 Times in 116 Posts
DavidXanatos Reputation: 46
and here we have the not lazy solution:

do_call.asm
Code:
.code

;----------------------------------------------------------------------------

ifdef _WIN64

Sbie_InvokeSyscall_asm PROC

     mov         qword ptr [rsp+20h], r9  
     mov         qword ptr [rsp+18h], r8  
     mov         qword ptr [rsp+10h], rdx  
     mov         qword ptr [rsp+8], rcx 
     
     ; note: (count & 0x0F) + 4 = 19 arguments are the absolute maximum

     ; quick sanity check
     cmp         rdx, 13h ; if count > 19
     jle         arg_count_ok
     mov         rax, 0C000001Ch ; return STATUS_INVALID_SYSTEM_SERVICE
     ret
arg_count_ok:

     push        rsi
     push        rdi
     ; prepare enough stack for up to 19 arguments
     sub         rsp, 98h  
     
     ; save our 3 relevant arguments to spare registers
     mov         r11, r8  ; args
     mov         r10, rdx ; count
     mov         rax, rcx ; func

     ; check if we have higher arguments and if not skip 
     cmp         r10, 4
     jle         copy_reg_args
     ; copy arguments 5-19
     mov         rsi, r11 ; source
     add         rsi, 20h
     mov         rdi, rsp ; destination
     add         rdi, 20h
     mov         rcx, r10 ; arg count
     sub         rcx, 4   ; skip the register passed args
     rep movsq

copy_reg_args:
     ; copy arguments 1-4
     mov         r9,  qword ptr [r11+18h]
     mov         r8,  qword ptr [r11+10h]
     mov         rdx, qword ptr [r11+08h]
     mov         rcx, qword ptr [r11+00h]

     ; call the function
     call        rax

     ; clear stack
     add         rsp, 98h  
     pop         rdi
     pop         rsi

     ret  

Sbie_InvokeSyscall_asm ENDP

else

_Sbie_InvokeSyscall_asm@12 PROC

     ; NTSTATUS Sbie_InvokeSyscall_asm(void* func, int count, void* args);

     ; quick sanity check
     cmp         dword ptr [esp+04h+4h], 13h ; @count
     jle         args_ok
     mov         eax, 0C000001Ch ; return STATUS_INVALID_SYSTEM_SERVICE
     ret
args_ok:

     ; prepare enough stack for up to 19 arguments
     push        ebp  
     push        esi
     push        edi
     mov         ebp, esp  
     sub         esp, 4Ch

     ; copy arguments 0-19
     mov         esi, dword ptr [ebp+10h+8h] ; source @args
     mov         edi, esp ; destination
     mov         ecx, dword ptr [ebp+10h+4h] ; arg count @count
     rep movsd

     ; call the function
     mov         eax, dword ptr [ebp+10h+0h] ; @func
     call        eax

     ; clear stack
function_end:

     mov         esp,ebp  
     pop         edi
     pop         esi
     pop         ebp
     ret  

_Sbie_InvokeSyscall_asm@12 ENDP
PUBLIC _Sbie_InvokeSyscall_asm@12

endif

;----------------------------------------------------------------------------

end
test.cpp:
Code:
#include <stdio.h>

NTSTATUS Test4(int arg1, int arg2, int arg3, int arg4)
{
    printf("arg1: %x, arg2: %x, arg3: %x, arg4: %x\r\n", arg1, arg2, arg3, arg4);
    return 0;
}

NTSTATUS Test8(int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8)
{
    printf("arg1: %x, arg2: %x, arg3: %x, arg4: %x; arg5: %x, arg6: %x, arg7: %x, arg8: %x\r\n", arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8);
    return 0;
}

NTSTATUS Test19(int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7, int arg8, int arg9, int arg10, int arg11, int arg12, int arg13, int arg14, int arg15, int arg16, int arg17, int arg18, int arg19)
{
    return 123;
}

NTSTATUS Sbie_InvokeSyscall_asm(void* func, int count, void* args);

int main(int argc, char *argv[])
{
#ifdef _WIN64
    __int64 stack[19];
#else
    int stack[19];
#endif
    for (int i = 0; i < 19; i++)
        stack[i] = i + 1;

    Sbie_InvokeSyscall_asm(Test4, 4, stack);

    Sbie_InvokeSyscall_asm(Test8, 8, stack);

    Sbie_InvokeSyscall_asm(Test19, 19, stack);

    Sbie_InvokeSyscall_asm(Test19, 20, stack);

    return 0;
}

Last edited by DavidXanatos; 01-01-2022 at 05:42.
Reply With Quote
The Following 2 Users Say Thank You to DavidXanatos For This Useful Post:
niculaita (01-06-2022), WRP (01-02-2022)
  #4  
Old 01-05-2022, 06:33
DavidXanatos DavidXanatos is offline
Family
 
Join Date: Jun 2018
Posts: 179
Rept. Given: 2
Rept. Rcvd 46 Times in 32 Posts
Thanks Given: 58
Thanks Rcvd at 350 Times in 116 Posts
DavidXanatos Reputation: 46
I ran into a really strange issue, for some impossible reason the above code on windows 64 bit, while working fine when the app is 64 bit, fails if its a 32 bit app running under WOW64.

To my understanding this should not be possible as under WOW64 the 32 to 64 translation is done in user mode and the falls to the kernel are all 64 bit so from the kernels viewpoint everything is 64 bit.

I have monkeyed around with the code and narrowed the issue down to calls with 4 or less and 5 arguments for 6 and more the code works fine.
for 0-4 I crafted a different code that seams to work fine

Code:
Sbie_InvokeSyscall4_asm PROC

 mov         r11, rdx ; args
 mov         rax, rcx ; func

 mov         r9,  qword ptr [r11+18h]
 mov         r8,  qword ptr [r11+10h]
 mov         rdx, qword ptr [r11+08h]
 mov         rcx, qword ptr [r11+00h]

 jmp         rax

Sbie_InvokeSyscall4_asm ENDP
but for 5 arguments something is somehow broken

when I use C code to implement a caller for the 5 args case

Code:
typedef NTSTATUS (*P_SystemService05)(
ULONG_PTR arg01, ULONG_PTR arg02, ULONG_PTR arg03, ULONG_PTR arg04,
ULONG_PTR arg05);

_FX NTSTATUS Sbie_InvokeSyscall5(void* func, ULONG_PTR *stack) {

P_SystemService05 nt = (P_SystemService05)func;
return nt(stack[0], stack[1], stack[2], stack[3], stack[4]);

}
it works just fine, when I take the driver to idea, copy the ASM code of this function and put it into an ASM file

Code:
Sbie_InvokeSyscall5_asm PROC

sub rsp, 38h
mov rax, [rdx+20h]
mov r10, rdx
mov r9, [rdx+18h]
mov r11, rcx
mov r8, [rdx+10h]
mov rdx, [rdx+8]
mov rcx, [r10]
mov [rsp+38h-18h], rax
call r11
add rsp, 38h
ret

Sbie_InvokeSyscall5_asm ENDP
and use this instead, 32 bit apps fail again.

And yes I have looked in IDA on the driver with the asm version and its byte for byte same as the output of the C compiler, I even tried adding some int 3 before and after to have it aligned the same way with no result.

The only apparent difference is that its located in a different plaice of the driver image file.

Now the way the 32 bit apps fail is also quiet peculiar, they all fail with an access violation at location 0x0...0 and an empty stack trace, somehow the return from the syscall gets messed up and I have no idea how.
Reply With Quote
  #5  
Old 01-05-2022, 14:54
chants chants is offline
VIP
 
Join Date: Jul 2016
Posts: 725
Rept. Given: 35
Rept. Rcvd 48 Times in 30 Posts
Thanks Given: 666
Thanks Rcvd at 1,050 Times in 475 Posts
chants Reputation: 48
Sounds like details of calling conventions. Perhaps the stack isn't aligned to 16 bytes in the driver or something like that. I suspect the driver code more than the app code here. The WoW64 is just amplifying a preexisting bug that us perhaps extraordinarily unlikely or impossible on native 64.

App is easy to debug anyway you can check stack and register at time of call and return value immediately after. The driver is much more troublesome to debug. And for that reason it's always where the bugs end up, just to inconvenience us

Anyway it would be interesting to know the details of this if you figure it out.
Reply With Quote
  #6  
Old 01-05-2022, 17:42
DavidXanatos DavidXanatos is offline
Family
 
Join Date: Jun 2018
Posts: 179
Rept. Given: 2
Rept. Rcvd 46 Times in 32 Posts
Thanks Given: 58
Thanks Rcvd at 350 Times in 116 Posts
DavidXanatos Reputation: 46
I have created a other test, where with a global variable i can toggle between the c and the asm version thats how it looks in code
Code:
.text:000000014001C66A                 mov     eax, cs:g_test
.text:000000014001C670                 mov     rcx, [rcx+18h]
.text:000000014001C674                 test    eax, eax
.text:000000014001C676                 jnz     Sbie_InvokeSyscall5_asm
.text:000000014001C67C                 jmp     Sbie_InvokeSyscall5
The dispatch function is compiled with optimization and as it does not need local variables the compiler optimized the calls away.
I checked again that Sbie_InvokeSyscall5_asm and Sbie_InvokeSyscall5 are binary same, and they are.
Still toggling the variable breaks 32 bit apps.

At this point its just wired, I mean the "calling" convention is the same and the functions are the same yet the result is not, WTF :/
Reply With Quote
  #7  
Old 01-05-2022, 17:54
deepzero's Avatar
deepzero deepzero is offline
VIP
 
Join Date: Mar 2010
Location: Germany
Posts: 300
Rept. Given: 111
Rept. Rcvd 64 Times in 42 Posts
Thanks Given: 178
Thanks Rcvd at 215 Times in 92 Posts
deepzero Reputation: 64
Can you share the binaries?
Reply With Quote
  #8  
Old 01-05-2022, 18:22
DavidXanatos DavidXanatos is offline
Family
 
Join Date: Jun 2018
Posts: 179
Rept. Given: 2
Rept. Rcvd 46 Times in 32 Posts
Thanks Given: 58
Thanks Rcvd at 350 Times in 116 Posts
DavidXanatos Reputation: 46
Sure: [s]https://www10.zippyshare.com...[/s] PDB included
The dispatch function that calls the asm or c versions is called Syscall_Invoke its called from a couple places but most relevant from Syscall_Api_Invoke

Last edited by DavidXanatos; 01-06-2022 at 17:15.
Reply With Quote
The Following User Gave Reputation+1 to DavidXanatos For This Useful Post:
user1 (01-05-2022)
The Following 2 Users Say Thank You to DavidXanatos For This Useful Post:
niculaita (01-06-2022), user1 (01-05-2022)
  #9  
Old 01-05-2022, 20:46
DavidXanatos DavidXanatos is offline
Family
 
Join Date: Jun 2018
Posts: 179
Rept. Given: 2
Rept. Rcvd 46 Times in 32 Posts
Thanks Given: 58
Thanks Rcvd at 350 Times in 116 Posts
DavidXanatos Reputation: 46
I have found the solution, I needed to add the FRAME, .allocstack and .endprolog

So in the end it was some sort of alignment issue or the compile puts some additional data some ware else that were relevant.

Code:
; NTSTATUS Sbie_InvokeSyscall_asm(void* func, ULONG count, void* args);
Sbie_InvokeSyscall_asm PROC FRAME

     ; prolog
     push        rsi
     .allocstack 8
     push        rdi
     .allocstack 8
     sub         rsp, 98h ; 8 * 19 - prepare enough stack for up to 19 arguments
     .allocstack 98h
     .endprolog
     
     ; quick sanity check
     cmp         rdx, 13h ; if count > 19
     jle         arg_count_ok
     mov         rax, 0C000001Ch ; return STATUS_INVALID_SYSTEM_SERVICE
     jmp         func_return
arg_count_ok:

     ; save our 3 relevant arguments to spare registers
     mov         r11, r8  ; args
     mov         r10, rdx ; count
     mov         rax, rcx ; func

     ; check if we have higher arguments and if not skip 
     cmp         r10, 4
     jle         copy_reg_args
     ; copy arguments 5-19
     mov         rsi, r11 ; source
     add         rsi, 20h
     mov         rdi, rsp ; destination
     add         rdi, 20h
     mov         rcx, r10 ; arg count
     sub         rcx, 4   ; skip the register passed args
     rep movsq

copy_reg_args:
     ; copy arguments 1-4
     mov         r9,  qword ptr [r11+18h]
     mov         r8,  qword ptr [r11+10h]
     mov         rdx, qword ptr [r11+08h]
     mov         rcx, qword ptr [r11+00h]

     ; call the function
     call        rax

func_return:
     ; epilog
     add         rsp, 98h  
     pop         rdi
     pop         rsi

     ret  

Sbie_InvokeSyscall_asm ENDP
Reply With Quote
The Following User Says Thank You to DavidXanatos For This Useful Post:
niculaita (01-06-2022)
  #10  
Old 01-06-2022, 04:09
chants chants is offline
VIP
 
Join Date: Jul 2016
Posts: 725
Rept. Given: 35
Rept. Rcvd 48 Times in 30 Posts
Thanks Given: 666
Thanks Rcvd at 1,050 Times in 475 Posts
chants Reputation: 48
Yes the calling convention on 64bit has oddities with 16 byte alignment on calls, shadow spaces needing to be reserved and such. With driver calls it's not surprising these become necessary. The ABI convention is slightly different between Windows and Linux as well.

Quote:
https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
Quote:
Calling convention defaults
The x64 Application Binary Interface (ABI) uses a four-register fast-call calling convention by default. Space is allocated on the call stack as a shadow store for callees to save those registers.

Alignment
Most structures are aligned to their natural alignment. The primary exceptions are the stack pointer and malloc or alloca memory, which are 16-byte aligned to aid performance. Alignment above 16 bytes must be done manually. Since 16 bytes is a common alignment size for XMM operations, this value should work for most code. For more information about structure layout and alignment, see Types and Storage. For information about the stack layout, see x64 stack usage.

The callee is responsible for dumping the register parameters into their shadow space if needed.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



All times are GMT +8. The time now is 08:01.


Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX, chessgod101
( 1998 - 2024 )