#16
|
|||
|
|||
With C you can reach only a partial decompiling due to the complexities caused by the optimizations in the compiler. The source code can have many statements that are simply optimized away when it is complied.
With C++, well, sorry, it is impossible. How on the earth you can reach the source code of a STL vector or a Boost smart pointer by looking at the machine code? They are already lost in the first compilation phases and even dont make it to the backend.... |
#17
|
|||
|
|||
Actually, I can remember true decompilers for FORTRAN created during the 70s and early 80s. Grad students would build such things during the wee hours. Each different machine had to have it's very own handcrafted version. The binary for a DEC and CDC were very different. As I recall, aside from the lost variable names, (no one commented their FORTRAN code), these programs did quite well in reproducing the original code. Of course by comparison, FORTRAN is a relatively simple language, no classes, simple data structures, etc.
I would be surprised if such custom-made decompilers don't exist for C++. I can't imagine that some kid from M$ with plenty of time at night hasn't coded one up for VC. cheers, jsteed |
#18
|
|||
|
|||
I think it will depend on how big and how complex the program is. If the program is very big like M$ office, I would say it's impossible.
|
#19
|
||||
|
||||
C++ get decompiled back to C to be exact, all the object things are converted to C structures such as the hierarchy, virtual functions and so on are effectively implemented using vtables and so on.
I read a study which described how to do object programming using simple plain c..it's not a hypothesis but a need on some platforms where there isn't available c++ compilers..
__________________
Ŝħůb-Ňìĝùŕřaŧħ ₪) There are only 10 types of people in the world: Those who understand binary, and those who don't http://www.accessroot.com |
#20
|
|||
|
|||
why?
Certainly, it would be impossible to get the exact code, just as the programmers had written it. The code optimizations made by the compiler make this impossible.
You may get a C/C++ code, but it would be impossible to read. Have you tried to read a simple program written by a bad programmer? I'm a student and I had got to check some codes from other students, on their first programming course. Even if you know what the code should do, it's really hard to understand everything. I think this would happen if you get something from ASM to C++. It would be a big mess. Maybe everything got sense thinking about classes, with functions making specific tasks. Now think the compiler will make a "few" changes. Then put it into assembler. Taking it back to C would complicate the things even more. Why would you want to get SOME (and not THE) C code from a program? I still don't see what is the idea behind this. |
#21
|
||||
|
||||
Hmm... I think that there is some confusion about this...
Decompilation to C++ is impossible. The decompiler can rebuild only information contained within its target: now, since its target is ASSEMBLER, which lacks anything related to HLL, it cannot surely rebuild things that are not included, like objects. Also, consider that some C++ concept are completely discarded after the code checking phase, and are never really used within the compiler: for example the PRIVATE/PUBLIC/PROTECTED directives are used only for security checking. |
#22
|
|||
|
|||
It may be rewrite in C++, not decompile.
|
#23
|
|||
|
|||
Recently I have disassembled some programs in RISA,
it is very easy to rewrite program in C++ from disassemble codes.If you want to decompile, maybe you should do it in different ways in different Platterm(i386,hpoa and so on). |
#24
|
|||
|
|||
If you are actually interested in learning about what the structures from c++ to assembly look like, Kris Kaspersky presents very clear and useful information in his book Hacker Disassembling Uncovered.
He presents lots of information about how compilers optimize code and why it would be impossible to write a program to decompile back to c++. IDAs FLIRT signatures are a big step, recognizing the patterns of known api's and displaying them, however even those arn't entirely correct. IDA often misrecognizes calling conventions, I can't imagine relying on a program to transform anything more complex than that if it isn't within reasonable amount of accuracy. |
#25
|
|||
|
|||
Exe2C
5-10 years ago I found a program called Exe2C. You can find some references to it on the Internet. It produced a C program that theoretically results in the same .exe. Of course the exact result depends on the compiler and the optimizations, but one can see the functions, the global data areas that are referenced by some functions etc. Theoretical thoughts If you fix some parameters: compiler, optimization flags, platform, endianness etc., exact language rules, then in my opinion it is possible to write a program that recompiles an .exe to such a source code that produces the same output compiling with the fixed parameters. Unfortunately not all these parameters can be retrieved from the .exe. If somehow we had this information, it is still impossible to get the same source code (regardless of the names of course). There are lots of info that is not preserved even if you have a not optimized compilation. Just an example: class C { int m_iX; public: static int GetX_static( C *pThis ) { return pThis->m_iX; } int GetX() { return m_iX; } friend void GetX_global( C *pThis ); }; void GetX_global( C *pThis ) { return pThis->m_iX; } There is no difference in the resulted code of these four (member )functions. Most C++ language elements have their equivalent in C, and it is impossible to differentiate the resulting assembly code (assuming there is no debug info in the .exe, but this is the usual case). There are some language elements (exceptions, virtual base classes) that cannot be directly translated to their C equivalent, so they can be recognized and rebuilt. For a long time a C++ compiler (cfront, originally written by B.Stroustrup) was just a C++ to C compiler. When new language elements have been added (exceptions, templates etc.) this became impossible. About the details of the implementation of different C++ language elements a very good description can be found in "C++ Object Model" of Stanley B. Lippman. It describes the internal structures for virtual inheritance and the structures used to handle member function pointers to virtual functions of virtual base classes among other things. Conclusions I think it is a reasonable target to write an .exe to C decompiler, but it is almost impossible to get back some really useful C++ extra. Knowing the compiler and having debug info can help a lot. Virtual tables and virtual functions can be recognized, but there is no cue for templates and inline functions. The optimization is a general problem that occurs in the case of all languages, because there is optimization at the language level, but there is also optimization at assembly level, that can hide the originally visible constructs. |
#26
|
|||
|
|||
uncompiler is not a easy thing...
it needs more other experienced KB. and more symbols and debug info ar lost during compiler, so uncompiler endeaver recover these thing. such as.. source code: void SwapTwoNumber(int* a,in* b) {................. } via uncompiler may be in these form: sub_0121(DWORD* a1,DWORD* a2) {...... } yep,SwapTwoNumber is info, u maybe will soon master some funcs by name,, So uncompiler will try to recover these name,this can be attained by AI. the above is one easy instance... Had time,we can dicuss these techz in detail.. |
#27
|
|||
|
|||
Inquisition IDA asm > C plugin
Thre are actualy 2 asm>C plugins for IDA decompiler, sometimes I combine 2 of them to get more clear view on code. This are not serious decompilers only just one more look from other perspective. Decomile to C hase better output than Inquisition plugin but it sometimes skips some parts of code that can not understand. So you are back at asm and IDA representation of code
|
#28
|
|||
|
|||
extra info in source code
It is worth seeing the home page of The International Obfuscated C Code Contest. (hxxp://www.ioccc.org)
I would be surprised if there would ever be such an AI that could retrieve those sources. Just an example to taste it: Code:
#include <stdio.h> int l;int main(int o,char **O, int I){char c,*D=O[1];if(o>0){ for(l=0;D[l ];D[l ++]-=10){D [l++]-=120;D[l]-= 110;while (!main(0,O,l))D[l] += 20; putchar((D[l]+1032) /20 ) ;}putchar(10);}else{ c=o+ (D[I]+82)%10-(I>l/2)* (D[I-l+I]+72)/10-9;D[I]+=I<0?0 :!(o=main(c/10,O,I-1))*((c+999 )%10-(D[I]+92)%10);}return o;} Ok, this (and the others on the IOCCC page) are not real-life examples, but as LoveExeZ pointed there are substantial information in the source code that is simply impossible to get back. On the other hand if we just get back only a small subset of this extra info, it can help a lot. If one gets back a part of the inheritance hierarchy, then it can be very useful. Polymorph classes and virtual function calls can be recognized because they use the vptr (exact implementation details differ from compiler to compiler). The hierarchy can be reproduced from the constructors and the destructors as they again have a certain structure (calling the ctor of base's base, the ctor of base etc.) Finding constructors and destructors is easy from the virtual table, and having these functions identified, lots of info can be given. Just imagine the following: Originally: Code:
function1() { int i1, i2, i3, i4, i5; function2( &i1 ); function3( &i4 ); function4( &i1 ); function5( &i4 ); function6( &i4 ); function7( &i1 ); } Code:
function1() { Class1 Object1; Class2 Object2; Object1.Member1(); Object2.Member2(); } |
#29
|
|||
|
|||
decompiling code is not readable
since there is optimization when compiling,compilier changed it too much.
I have try some decompiling tools before. But it very difficult to read and understand. The organization is very badly. |
#30
|
|||
|
|||
I think by it's very nature compilation is a one way process. You can reconstruct source code from a disassembled binary executable that may well closely resemble the original source code but as Sarge very astutely mentioned variable and function names will be mangled, comments will be lost, etc. Maybe if there was a decompiler that incorporated some kick-ass artificial intelligence that could magically analyze and emulate the personality and proclivities of the developer who wrote the code we'd see a decompiler of the nature discussed in this thread. Barring that, you can send me the money and I'll use it to buy crack.
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Decompiling the mov compiler | chants | General Discussion | 3 | 12-08-2016 21:16 |
Who are familiar with decompiling? | DMichael | General Discussion | 3 | 08-09-2013 01:04 |
VB3 decompiling | wasq | General Discussion | 23 | 05-23-2005 02:30 |