#1
|
|||
|
|||
Windows on Arm64, x86/x64 emulation
I'm wondering how the x86/x64 emulation on arm64 works, is the image loaded to memory the x86/x64 original and there is just a transpiled arm64 shadow copy, or is the code transpiled before and the loaded image is arm64 only.
I don't have any arm hardware yet, but am already thinking about how function hooks in x86/x64 emulated on arm64 would, or maybe wouldn't, work. Anyone here who already has some experience and could shine some light on it? |
The Following User Says Thank You to DavidXanatos For This Useful Post: | ||
niculaita (01-25-2022) |
#2
|
|||
|
|||
I have found some information: https://blogs.blackberry.com/en/2019/09/teardown-windows-10-on-arm-x86-emulation
So it seams that what lives in memory is the x86/x64 image and its only transpiled piece by piece on demand. I hope a call to FlushInstructionCache will invalidate the cached transliled result ... EDIT: now I just need some good device to experiment on, any suggestions? |
#3
|
|||
|
|||
Was about to post that link but apparently I'm late and you found it, this one also has a good information https://wbenny.github.io/2018/11/04/wow64-internals.html
Also this fantastic CODE BLUE talk which has juicy details: - https://www.slideshare.net/ffri/appearances-are-deceiving-novel-offensive-techniques-in-windows-1011-on-arm-250472833 - https://www.youtube.com/watch?v=amHAot3X8cE |
The Following User Says Thank You to sh3dow For This Useful Post: | ||
DavidXanatos (02-12-2022) |
#4
|
|||
|
|||
You can try to run ARM64 windows on QEMU.
Im not sure about the part of capability of running x86/x64 app on ARM64 Windows tho |
#5
|
|||
|
|||
@sh3dow thanks
@kino0924 emulating arm64 on x64 which in turns emulates x86/x64 sounds painfully slow. That I'm probably better of trying to run it naively on an old raspberry pi |
#6
|
|||
|
|||
Also interesting: https://oofhours.com/2021/02/19/running-x64-on-windows-10-arm64-how-the-heck-does-that-work/
|
#7
|
|||
|
|||
So some things I learned about x64 on arm64, it seams MSFT went all in to provide a good interoperability between x64 and arm 64 code.
They have introduced a new type of PE so called CHPEv2 (Compiled Hybrid Portable Executable) which can contain booth x64 and arm64 code, as far as I understand in practice this is mostly arm64 code with x64 entry-points. These are called ARM64EC in the VS 2019/2022 tool chain. An executable compiled as x64 can load ARM64EC dll's and call them normally, the x64 wrappers have some (as far as I can tell) dummy prollogs, large enough to install x64 hooks. The intended use case for this is to load system libraries from system32 which on arm64 are all provided as such CHPE's so no C:\Windows\SysX8664 or alike. So to say 64 bit is 64 bit no mater if its ARM or AMD, also no separate registry paths. Its all thoroughly mixed together, unlike SysWOW64 or SysArm32 that booth get an own system directory and an own registry redirection. A executable compiled as ARM64EC can load x64 Dll's just fine, haven't looked into the hook ability in this scenario yet (TODO) The intended use case is to allow developers to port a part of their application to Arm64 and keep the rest x64 for the time being, as well as to provide compatibility with x64 plugins and extensions (according to MSFT's docs). So technically the x64 on arm thing is an own feature and should not be confused with being an other version or an extension of WOW its a separate interoperability feature, one which works quite different, unfortunately I don't know if MSFT named it somehow. Based on some dll and service names its probably called XTA that would probably stand for "x86_x64 to ARM", or something like that. XTA is a just in time compiler that converts x64 or x86 code when needed to arm64 which is the being executed, the loaded binary image in memory stays untouched x64/x86, hooking any portion of it seams to work just fine, of cause FlushInstructionCache is probably particularly important to ensure the XTA cache gets updated. When running a x86 application on a arm64 machine in fact booth WOW and XTA seam to work together, while the SysWOW64 directory contains only x86 dll's there is an other one called SyChpe32 that contains something like ARM64EC just in 32 bit so ARMEC (?) unfortunately MSFT did not provide a Toolchain to create such binaries for ourselves. There we have the most commonly used system dll's in such a hybrid format. So WOW takes care of the syscall translation from 32 bit to 64 bit close to the kernel, be that on arm64 or on x64, and filesystem + registry redirection. While XTA takes care of the transition from emulated native code as close to the loaded user code. In between lives native arm/arm64 code. When running x64 on arm 64 only XTA is active and no WOW is in place, a call to IsWow64Process2 confirms that an x64 application running on arm64 does so without WOW. Interestingly when querying NtQuerySystemInformationEx(SystemSupportedProcessorArchitectures, ... ARM64EC binaries give the same result x64 binaries on arm64 Also when debugging ARM64EC binaries you need to use the x64 debugger, there is no dedicated ARM64EC one. |
The Following User Says Thank You to DavidXanatos For This Useful Post: | ||
niculaita (02-15-2022) |
#8
|
|||
|
|||
DavidXanatos
Well, actually there are two kinds of CHPE2 (perhaps it would be better to say "CHPE64"): ARM64EC and ARM64X. The first of them, as you already said, is only for emulation under a different platform, it speaks for itself: EC - Emulation Compatible. The second - ARM64X - a pure chameleon and contains code for both ARM64 and x64 execution at the same time, and is used in various system components (examples can be found in system32 from Windows for arm64). You can create the first one yourself in recent Visual Studio, for the second one I have not met a public toolset yet. How can the same ARM64X PE be for different architectures? The point is a new type of DVRT entries (Dynamic Value Relocation Table, which can be found in IMAGE_LOAD_CONFIG_DIRECTORY): IMAGE_DYNAMIC_RELOCATION_ARM64X. Following the specified settings, the loader simply patches the mapped image for x64, namely: headers, RVAs of import directory, export directory, exceptions data table, etc. Resulting as if it were a real x64 image. And now... the patched image becomes ARM64EC By the way, almost everything described above can be demonstrated by my program PEAnatomist |
The Following User Says Thank You to RamMerLabs For This Useful Post: | ||
DavidXanatos (02-15-2022) |
#9
|
|||
|
|||
Interesting, I was under the impression it was booth the same, ARM64 code with provisions to
a) call x64 code and b) be callable from x64 code so the ARM64X version contains full code for booth platforms, hence would be runable on a x64 windows as well? |
#10
|
|||
|
|||
No, nativelly this is an arm64 image and the conversion takes place only to x64, not backward.
AFAIK, Windows x64 loader does not process DVRT's IMAGE_DYNAMIC_RELOCATION_ARM64X (The name of the definition speaks for itself, doesn't it?) |
#11
|
|||
|
|||
So how is ARM64X different form a dll compiled as ARM64EC?
|
#12
|
|||
|
|||
ARM64X contains both x64 and arm64 (but really portions of arm64 code is marked as arm64ec) executable code, imports, exports, loadconfig (with DVRT, which arm64ec doesn't have), etc. It has IMAGE_FILE_MACHINE_ARM64 in FILE_HEADER and can be converted by loader to IMAGE_FILE_MACHINE_AMD64 on-the-fly if needed.
It is not difficult to find and download WinARM64 builds (starting 21277 and up) to see it yourself. |
#13
|
||||
|
||||
So they have a new type of relocation that "relocates" imports, etc. rva depending on arm vs x64? That's actually pretty cool, I am now only half-mad they introduced yet another relocation type...
|
#14
|
|||
|
|||
deepzero
I think "relocates" is not quite the right word, it's better to say redirect entirely to another table. They came up with DVRT a few years ago to counter Spectre/Meltdown, similar to Retpoline. The original goal has almost disappeared, but now there are new goals. And IMAGE_DYNAMIC_RELOCATION_ARM64X is no longer the latest relocation type, there will be another one soon (something about hot-patching function bodies). Last edited by RamMerLabs; 02-15-2022 at 06:38. |
#15
|
|||
|
|||
I have been experimenting with my HackLib with code injection into/from Arm64 processes and noticed something unexpected but in hindsight logic ARM64EC processes are more x64 than arm even if most of their compiled code is arm.
That I mean, when I hijack the main thread before it resumes, and point it to a peace of shell code of mine the shell code must be x64 in order to work. The funny thing is my shell code loads a dll that hooks then in x64 various functions, for demo purposes for example MessageBoxW and alters its title and all that works as expected. The ARM64EC app having its main and rest of user written functions being ARM64 code when calling MessageBoxW invoked the hooked x64 version just fine. To be honest I would have expected a shorthand here where the arm code calls system API's directly, but no the control flow goes from Arm64 to the x64 stub which can be hooked and then back to Arm64 code in the system\user32.dll that is pretty nice. And in a way logical that a process that is supposed to be able to load native x64 dll's would for ideal compatibility have provisions to allow that dll to consistently hook functions even if that means a few more detours are taken. Now about cross architecture code injection (assuming shell code and injected dll have the right architecture): Code injection from an arm64 process into an x64 or arm64ec process works just fine using x64 code What how ever currently fails is code injection from a x64 or arm64ec process into a arm64 process and it seams for a quite mundane reason, NtSetContextThread when called from a x64 process to act upon a arm64 process, returns STATUS_SET_CONTEXT_DENIED I expect would the operation have been performed, the rest would succeed. Since arm64 to arm64 injection using the same method works just fine I expert this being som sort of intentional limitation, perhaps not even a real security measure but rather a safeguard against screwing things up. So the question, did anyone here already experienced this and would know if there is a easy way around, short of native arm64 spawning a helper process just to do that one function call. |
The Following User Gave Reputation+1 to DavidXanatos For This Useful Post: | ||
user1 (04-12-2022) |
Thread Tools | |
Display Modes | |
|
|