Exetools  

Go Back   Exetools > General > General Discussion

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 10-23-2020, 18:55
binarylaw binarylaw is offline
Friend
 
Join Date: Jul 2019
Posts: 26
Rept. Given: 0
Rept. Rcvd 0 Times in 0 Posts
Thanks Given: 97
Thanks Rcvd at 5 Times in 4 Posts
binarylaw Reputation: 0
Is there a tool that automatically can determine data structures?

Some programs store their settings or data in a common format, like JSON or CSV or SQLite, or even a simple .INI file. But for programs that store data in a proprietary structure/format like a .DAT file, where you can open it in hex and see string data scattered about, but without any seeming precise pattern or structure (that's easily observable) -- is there a tool that can analyze a a data file like this and automatically figure out its data structuring, such that you could then use the tool to inject new data into the file yourself?

For example, there's a program, Silent Installer Builder, which allows one to create packaged installers using various configurable formats/options/functionality. But the v6 versions store this data in a .DAT file, which means that you have to use the SiB program to edit or change any custom install packages. You can see the text inside it, but it's scattered all over the place, so it's not possible to manually add entries into it yourself without using the program.

So I'm just wondering if there are any tools that could automatically analyze a file and determine that, for example, every 24 bytes a new file path begins, and each file path is allotted N number of bytes whether they're all used or not, before the next entry begins.

My guess is nothing like this exists, but I thought I would check nonetheless.
Reply With Quote
  #2  
Old 10-24-2020, 02:34
WhoCares's Avatar
WhoCares WhoCares is offline
who cares
 
Join Date: Jan 2002
Location: Here
Posts: 368
Rept. Given: 9
Rept. Rcvd 13 Times in 11 Posts
Thanks Given: 19
Thanks Rcvd at 85 Times in 39 Posts
WhoCares Reputation: 13
only human brain + manual reversing(debug + disassembly) can do it.
__________________
AKA Solomon/blowfish.
Reply With Quote
The Following 3 Users Say Thank You to WhoCares For This Useful Post:
binarylaw (10-29-2020), chants (10-24-2020), WRP (10-24-2020)
  #3  
Old 10-24-2020, 15:56
chants chants is offline
VIP
 
Join Date: Jul 2016
Posts: 576
Rept. Given: 7
Rept. Rcvd 35 Times in 21 Posts
Thanks Given: 501
Thanks Rcvd at 847 Times in 396 Posts
chants Reputation: 35
Machine learning and NNs might help but this totally custom format is not common enough perhaps to get a good training dataset.

This is more of a question that is in the forensic analysis area. With huge amounts of data, finding the signal from the noise which requires determining data format. So this will be researched more in the AI future for sure.

Okay but short of theoretical data science, for RE, WhoCares gave you the way to do it.
Reply With Quote
  #4  
Old 10-29-2020, 15:35
binarylaw binarylaw is offline
Friend
 
Join Date: Jul 2019
Posts: 26
Rept. Given: 0
Rept. Rcvd 0 Times in 0 Posts
Thanks Given: 97
Thanks Rcvd at 5 Times in 4 Posts
binarylaw Reputation: 0
OK, makes sense, thanks guys.

Quote:
Originally Posted by chants View Post
Machine learning and NNs might help but this totally custom format is not common enough perhaps to get a good training dataset.
What are "NNs"?

Quote:
Originally Posted by chants View Post
This is more of a question that is in the forensic analysis area. With huge amounts of data, finding the signal from the noise which requires determining data format. So this will be researched more in the AI future for sure.
Theoretically, would it not work just to take diffs of a file between changes/saves/whatever and find the pattern(s) in each incremental slice of difference? For example, a program saving its settings to a .dat file. Would this theoretically be so difficult if you could get 5-6 diff snapshots of settings-saves?

Not disagreeing with you, just asking from a theoretical perspective now, wondering if this would actually be so difficult as to need AI.
Reply With Quote
  #5  
Old 10-29-2020, 16:06
deepzero's Avatar
deepzero deepzero is offline
VIP
 
Join Date: Mar 2010
Location: Europe
Posts: 251
Rept. Given: 102
Rept. Rcvd 60 Times in 38 Posts
Thanks Given: 111
Thanks Rcvd at 122 Times in 64 Posts
deepzero Reputation: 60
NN = Neural Networks

Quote:
Would this theoretically be so difficult
Depends entirely on the program and the encoding it uses and what you want to achieve. Generally this is a valid approach of course, but experience shows it will only get you this far.

Afaik game modders and the like often reverse engineer custom file formats, maybe google for that.
Reply With Quote
  #6  
Old 10-29-2020, 16:23
chants chants is offline
VIP
 
Join Date: Jul 2016
Posts: 576
Rept. Given: 7
Rept. Rcvd 35 Times in 21 Posts
Thanks Given: 501
Thanks Rcvd at 847 Times in 396 Posts
chants Reputation: 35
My answer assumed that given: arbitrary custom data, then little can be done NNs are neural networks.

Now if you change the given to a function instead of data e.g.: chosen input -> custom file generator -> custom data corresponding with chosen input. Then certainly a lot of difference comparison utilities will help. But automating this and treating it as a blackbox is only done when necessary. Custom file generator is in effect your file format information. And the best idea is to treat it as a white box and reverse it. So best bet is to open SiB in IDA Pro find out where it reads or writes the custom data and reconstruct that function in higher level code which reveals the file format.

Treating it like a black box is something usually done as necessity. At least in the context in reversing as opposed to say network security where the function code is totally unavailable. But automating this is still basically ridiculous. Finding a function that maps some input to some output is incredibly complex. Especially when you have that function in machine code right in front of you. Sure difference tools might make the job faster than reversing in some contexts. But like said that is because you are using your mental capabilities to quickly identify some patterns.


Even the most simple cases of course are impossible.

Input is a number say 10. Output files contains 2 3 5 7 11 13 17 19 23. Now you try it with 11 and the number 29 is added to the file. So now we expect some automation to recognize this is the first n prime numbers and generate a possible maximum efficient pseudo code to represent the format of such data. Or perhaps it sees it's all text data
Or it is all increasing numbers separated by white space. There are many ways to look at it and automation except for specific cases is still a pipe dream without AI

Last edited by chants; 10-29-2020 at 16:34.
Reply With Quote
  #7  
Old 10-29-2020, 16:39
DARKER DARKER is offline
VIP
 
Join Date: Jul 2004
Location: Côte d'Ivoire
Posts: 284
Rept. Given: 13
Rept. Rcvd 91 Times in 36 Posts
Thanks Given: 2
Thanks Rcvd at 135 Times in 58 Posts
DARKER Reputation: 91
You can use any binary editor with templates support. 010 Editor has nice one. You just write your own template and apply when you need. You can create only partial records or more complex template for whole file. it has support for variety types: integers, floats, doubles, dates, times, strings, guids ...

Description:
Code:
https://www.sweetscape.com/010editor/templates.html
Introduction to Templates and Scripts (Step by step help):
Code:
https://www.sweetscape.com/010editor/manual/IntroTempScripts.htm
and you can inspire here with already done templates:
Code:
https://www.sweetscape.com/010editor/repository/templates/
also is good to mention Kaitai (Free & open source):
Code:
http://kaitai.io/index.html#what-is-it
Format Gallery:
Code:
http://formats.kaitai.io/
Online IDE:
Code:
https://ide.kaitai.io/

Last edited by DARKER; 10-29-2020 at 17:15.
Reply With Quote
The Following User Says Thank You to DARKER For This Useful Post:
Abaddon (10-30-2020)
  #8  
Old 10-29-2020, 19:52
Chr155Y Chr155Y is offline
Friend
 
Join Date: Jan 2019
Posts: 20
Rept. Given: 0
Rept. Rcvd 6 Times in 3 Posts
Thanks Given: 7
Thanks Rcvd at 64 Times in 22 Posts
Chr155Y Reputation: 6
Templates are good but they can be used only when you already know that data's structure. I think the OP asked for a tool to analyze a data file and figure out the data structure.
Reply With Quote
  #9  
Old 10-29-2020, 20:54
DARKER DARKER is offline
VIP
 
Join Date: Jul 2004
Location: Côte d'Ivoire
Posts: 284
Rept. Given: 13
Rept. Rcvd 91 Times in 36 Posts
Thanks Given: 2
Thanks Rcvd at 135 Times in 58 Posts
DARKER Reputation: 91
Quote:
Originally Posted by Chr155Y View Post
Templates are good but they can be used only when you already know that data's structure. I think the OP asked for a tool to analyze a data file and figure out the data structure.
Don't be lazy and create one There are no tool that do it automatically. It's combination of hexeditor + reverse analyse of target program + tests. You don't need to know all details about structure, you can create some dummy fields for unknown data that you can identify later. As say binarylaw: there are file paths, sizes and record size that are known... Hexeditor can highlight these places and you can focus only on "not processed" data. This is common approach for unknown structures.

Some tips for creating structures in IDA, Quickly creating structures:
Code:
https://www.hex-rays.com/blog/igor-tip-of-the-week-11-quickly-creating-structures/
Creating structures with known size:
Code:
https://www.hex-rays.com/blog/igor-tip-of-the-week-12-creating-structures-with-known-size/
Reply With Quote
The Following 2 Users Say Thank You to DARKER For This Useful Post:
ian (10-31-2020), niculaita (10-30-2020)
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



All times are GMT +8. The time now is 18:59.


Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX
( 1998 - 2020 )