Exetools  

Go Back   Exetools > General > General Discussion

Notices

Reply
 
Thread Tools Display Modes
  #16  
Old 07-14-2004, 18:34
WARM3CH
 
Posts: n/a
With C you can reach only a partial decompiling due to the complexities caused by the optimizations in the compiler. The source code can have many statements that are simply optimized away when it is complied.
With C++, well, sorry, it is impossible. How on the earth you can reach the source code of a STL vector or a Boost smart pointer by looking at the machine code? They are already lost in the first compilation phases and even dont make it to the backend....
Reply With Quote
  #17  
Old 07-15-2004, 04:56
jsteed
 
Posts: n/a
Actually, I can remember true decompilers for FORTRAN created during the 70s and early 80s. Grad students would build such things during the wee hours. Each different machine had to have it's very own handcrafted version. The binary for a DEC and CDC were very different. As I recall, aside from the lost variable names, (no one commented their FORTRAN code), these programs did quite well in reproducing the original code. Of course by comparison, FORTRAN is a relatively simple language, no classes, simple data structures, etc.

I would be surprised if such custom-made decompilers don't exist for C++. I can't imagine that some kid from M$ with plenty of time at night hasn't coded one up for VC.

cheers, jsteed
Reply With Quote
  #18  
Old 07-16-2004, 09:39
tricky
 
Posts: n/a
I think it will depend on how big and how complex the program is. If the program is very big like M$ office, I would say it's impossible.
Reply With Quote
  #19  
Old 07-16-2004, 23:25
Shub-Nigurrath's Avatar
Shub-Nigurrath Shub-Nigurrath is online now
VIP
 
Join Date: Mar 2004
Location: Obscure Kadath
Posts: 919
Rept. Given: 60
Rept. Rcvd 419 Times in 94 Posts
Thanks Given: 68
Thanks Rcvd at 328 Times in 100 Posts
Shub-Nigurrath Reputation: 400-499 Shub-Nigurrath Reputation: 400-499 Shub-Nigurrath Reputation: 400-499 Shub-Nigurrath Reputation: 400-499 Shub-Nigurrath Reputation: 400-499
C++ get decompiled back to C to be exact, all the object things are converted to C structures such as the hierarchy, virtual functions and so on are effectively implemented using vtables and so on.

I read a study which described how to do object programming using simple plain c..it's not a hypothesis but a need on some platforms where there isn't available c++ compilers..
__________________
Ŝħůb-Ňìĝùŕřaŧħ ₪)
There are only 10 types of people in the world: Those who understand binary, and those who don't
http://www.accessroot.com
Reply With Quote
  #20  
Old 07-17-2004, 03:50
hmora
 
Posts: n/a
why?

Certainly, it would be impossible to get the exact code, just as the programmers had written it. The code optimizations made by the compiler make this impossible.

You may get a C/C++ code, but it would be impossible to read. Have you tried to read a simple program written by a bad programmer? I'm a student and I had got to check some codes from other students, on their first programming course. Even if you know what the code should do, it's really hard to understand everything.

I think this would happen if you get something from ASM to C++. It would be a big mess. Maybe everything got sense thinking about classes, with functions making specific tasks. Now think the compiler will make a "few" changes. Then put it into assembler. Taking it back to C would complicate the things even more.

Why would you want to get SOME (and not THE) C code from a program? I still don't see what is the idea behind this.
Reply With Quote
  #21  
Old 07-17-2004, 17:42
Polaris's Avatar
Polaris Polaris is offline
Friend
 
Join Date: Feb 2002
Location: Invincible Cyclones Of FrostWinds
Posts: 97
Rept. Given: 3
Rept. Rcvd 0 Times in 0 Posts
Thanks Given: 0
Thanks Rcvd at 2 Times in 2 Posts
Polaris Reputation: 0
Hmm... I think that there is some confusion about this...

Decompilation to C++ is impossible. The decompiler can rebuild only information contained within its target: now, since its target is ASSEMBLER, which lacks anything related to HLL, it cannot surely rebuild things that are not included, like objects.

Also, consider that some C++ concept are completely discarded after the code checking phase, and are never really used within the compiler: for example the PRIVATE/PUBLIC/PROTECTED directives are used only for security checking.
Reply With Quote
  #22  
Old 07-21-2004, 08:25
fsheron
 
Posts: n/a
It may be rewrite in C++, not decompile.
Reply With Quote
  #23  
Old 07-23-2004, 09:48
mmx133
 
Posts: n/a
Recently I have disassembled some programs in RISA,
it is very easy to rewrite program in C++ from disassemble
codes.If you want to decompile, maybe you should
do it in different ways in different Platterm(i386,hpoa and so
on).
Reply With Quote
  #24  
Old 07-23-2004, 13:35
Viasek
 
Posts: n/a
If you are actually interested in learning about what the structures from c++ to assembly look like, Kris Kaspersky presents very clear and useful information in his book Hacker Disassembling Uncovered.

He presents lots of information about how compilers optimize code and why it would be impossible to write a program to decompile back to c++. IDAs FLIRT signatures are a big step, recognizing the patterns of known api's and displaying them, however even those arn't entirely correct. IDA often misrecognizes calling conventions, I can't imagine relying on a program to transform anything more complex than that if it isn't within reasonable amount of accuracy.
Reply With Quote
  #25  
Old 08-13-2004, 00:06
mihaliczaj
 
Posts: n/a
Exe2C
5-10 years ago I found a program called Exe2C. You can find some references to it on the Internet.

It produced a C program that theoretically results in the same .exe. Of course the exact result depends on the compiler and the optimizations, but one can see the functions, the global data areas that are referenced by some functions etc.

Theoretical thoughts
If you fix some parameters: compiler, optimization flags, platform, endianness etc., exact language rules, then in my opinion it is possible to write a program that recompiles an .exe to such a source code that produces the same output compiling with the fixed parameters.

Unfortunately not all these parameters can be retrieved from the .exe.

If somehow we had this information, it is still impossible to get the same source code (regardless of the names of course). There are lots of info that is not preserved even if you have a not optimized compilation.
Just an example:
class C
{
int m_iX;
public:
static int GetX_static( C *pThis ) { return pThis->m_iX; }
int GetX() { return m_iX; }
friend void GetX_global( C *pThis );
};
void GetX_global( C *pThis ) { return pThis->m_iX; }

There is no difference in the resulted code of these four (member )functions.
Most C++ language elements have their equivalent in C, and it is impossible to differentiate the resulting assembly code (assuming there is no debug info in the .exe, but this is the usual case).

There are some language elements (exceptions, virtual base classes) that cannot be directly translated to their C equivalent, so they can be recognized and rebuilt.

For a long time a C++ compiler (cfront, originally written by B.Stroustrup) was just a C++ to C compiler. When new language elements have been added (exceptions, templates etc.) this became impossible.
About the details of the implementation of different C++ language elements a very good description can be found in "C++ Object Model" of Stanley B. Lippman. It describes the internal structures for virtual inheritance and the structures used to handle member function pointers to virtual functions of virtual base classes among other things.

Conclusions
I think it is a reasonable target to write an .exe to C decompiler, but it is almost impossible to get back some really useful C++ extra. Knowing the compiler and having debug info can help a lot.
Virtual tables and virtual functions can be recognized, but there is no cue for templates and inline functions.

The optimization is a general problem that occurs in the case of all languages, because there is optimization at the language level, but there is also optimization at assembly level, that can hide the originally visible constructs.
Reply With Quote
  #26  
Old 08-13-2004, 09:52
LoveExeZ
 
Posts: n/a
uncompiler is not a easy thing...
it needs more other experienced KB.
and more symbols and debug info ar lost during compiler,
so uncompiler endeaver recover these thing.
such as..
source code:
void SwapTwoNumber(int* a,in* b)
{.................
}

via uncompiler may be in these form:
sub_0121(DWORD* a1,DWORD* a2)
{......
}

yep,SwapTwoNumber is info, u maybe will soon master some funcs by name,,
So uncompiler will try to recover these name,this can be attained by AI.
the above is one easy instance...
Had time,we can dicuss these techz in detail..
Reply With Quote
  #27  
Old 08-14-2004, 03:34
McS2oo4
 
Posts: n/a
Inquisition IDA asm > C plugin

Thre are actualy 2 asm>C plugins for IDA decompiler, sometimes I combine 2 of them to get more clear view on code. This are not serious decompilers only just one more look from other perspective. Decomile to C hase better output than Inquisition plugin but it sometimes skips some parts of code that can not understand. So you are back at asm and IDA representation of code
Reply With Quote
  #28  
Old 08-14-2004, 04:39
mihaliczaj
 
Posts: n/a
extra info in source code

It is worth seeing the home page of The International Obfuscated C Code Contest. (hxxp://www.ioccc.org)
I would be surprised if there would ever be such an AI that could retrieve those sources.
Just an example to taste it:
Code:
#include <stdio.h>
int l;int main(int o,char **O,
int I){char c,*D=O[1];if(o>0){
for(l=0;D[l              ];D[l
++]-=10){D   [l++]-=120;D[l]-=
110;while   (!main(0,O,l))D[l]
+=   20;   putchar((D[l]+1032)
/20   )   ;}putchar(10);}else{
c=o+     (D[I]+82)%10-(I>l/2)*
(D[I-l+I]+72)/10-9;D[I]+=I<0?0
:!(o=main(c/10,O,I-1))*((c+999
)%10-(D[I]+92)%10);}return o;}
This is a square root calculator, note the form of the whitespaces

Ok, this (and the others on the IOCCC page) are not real-life examples, but as LoveExeZ pointed there are substantial information in the source code that is simply impossible to get back.

On the other hand if we just get back only a small subset of this extra info, it can help a lot. If one gets back a part of the inheritance hierarchy, then it can be very useful.
Polymorph classes and virtual function calls can be recognized because they use the vptr (exact implementation details differ from compiler to compiler). The hierarchy can be reproduced from the constructors and the destructors as they again have a certain structure (calling the ctor of base's base, the ctor of base etc.)
Finding constructors and destructors is easy from the virtual table, and having these functions identified, lots of info can be given.
Just imagine the following:

Originally:
Code:
function1()
{
   int i1, i2, i3, i4, i5;
   function2( &i1 );
   function3( &i4 );
   function4( &i1 );
   function5( &i4 );
   function6( &i4 );
   function7( &i1 );
}
Having ctor/dtor pairs identified:
Code:
function1()
{
   Class1 Object1;
   Class2 Object2;
   Object1.Member1();
   Object2.Member2();
}
Reply With Quote
  #29  
Old 08-14-2004, 04:43
sumeru
 
Posts: n/a
decompiling code is not readable

since there is optimization when compiling,compilier changed it too much.

I have try some decompiling tools before. But it very difficult to read and understand. The organization is very badly.
Reply With Quote
  #30  
Old 08-17-2004, 04:29
br00t_4_c
 
Posts: n/a
Talking

I think by it's very nature compilation is a one way process. You can reconstruct source code from a disassembled binary executable that may well closely resemble the original source code but as Sarge very astutely mentioned variable and function names will be mangled, comments will be lost, etc. Maybe if there was a decompiler that incorporated some kick-ass artificial intelligence that could magically analyze and emulate the personality and proclivities of the developer who wrote the code we'd see a decompiler of the nature discussed in this thread. Barring that, you can send me the money and I'll use it to buy crack.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Decompiling the mov compiler chants General Discussion 3 12-08-2016 21:16
Who are familiar with decompiling? DMichael General Discussion 3 08-09-2013 01:04
VB3 decompiling wasq General Discussion 23 05-23-2005 02:30


All times are GMT +8. The time now is 19:12.


Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX, chessgod101
( 1998 - 2024 )