#1
|
|||
|
|||
Is there a tool that automatically can determine data structures?
Some programs store their settings or data in a common format, like JSON or CSV or SQLite, or even a simple .INI file. But for programs that store data in a proprietary structure/format like a .DAT file, where you can open it in hex and see string data scattered about, but without any seeming precise pattern or structure (that's easily observable) -- is there a tool that can analyze a a data file like this and automatically figure out its data structuring, such that you could then use the tool to inject new data into the file yourself?
For example, there's a program, Silent Installer Builder, which allows one to create packaged installers using various configurable formats/options/functionality. But the v6 versions store this data in a .DAT file, which means that you have to use the SiB program to edit or change any custom install packages. You can see the text inside it, but it's scattered all over the place, so it's not possible to manually add entries into it yourself without using the program. So I'm just wondering if there are any tools that could automatically analyze a file and determine that, for example, every 24 bytes a new file path begins, and each file path is allotted N number of bytes whether they're all used or not, before the next entry begins. My guess is nothing like this exists, but I thought I would check nonetheless. |
#2
|
||||
|
||||
only human brain + manual reversing(debug + disassembly) can do it.
__________________
AKA Solomon/blowfish. |
#3
|
|||
|
|||
Machine learning and NNs might help but this totally custom format is not common enough perhaps to get a good training dataset.
This is more of a question that is in the forensic analysis area. With huge amounts of data, finding the signal from the noise which requires determining data format. So this will be researched more in the AI future for sure. Okay but short of theoretical data science, for RE, WhoCares gave you the way to do it. |
#4
|
|||
|
|||
OK, makes sense, thanks guys.
Quote:
Quote:
Not disagreeing with you, just asking from a theoretical perspective now, wondering if this would actually be so difficult as to need AI. |
#5
|
||||
|
||||
NN = Neural Networks
Quote:
Afaik game modders and the like often reverse engineer custom file formats, maybe google for that. |
#6
|
|||
|
|||
My answer assumed that given: arbitrary custom data, then little can be done NNs are neural networks.
Now if you change the given to a function instead of data e.g.: chosen input -> custom file generator -> custom data corresponding with chosen input. Then certainly a lot of difference comparison utilities will help. But automating this and treating it as a blackbox is only done when necessary. Custom file generator is in effect your file format information. And the best idea is to treat it as a white box and reverse it. So best bet is to open SiB in IDA Pro find out where it reads or writes the custom data and reconstruct that function in higher level code which reveals the file format. Treating it like a black box is something usually done as necessity. At least in the context in reversing as opposed to say network security where the function code is totally unavailable. But automating this is still basically ridiculous. Finding a function that maps some input to some output is incredibly complex. Especially when you have that function in machine code right in front of you. Sure difference tools might make the job faster than reversing in some contexts. But like said that is because you are using your mental capabilities to quickly identify some patterns. Even the most simple cases of course are impossible. Input is a number say 10. Output files contains 2 3 5 7 11 13 17 19 23. Now you try it with 11 and the number 29 is added to the file. So now we expect some automation to recognize this is the first n prime numbers and generate a possible maximum efficient pseudo code to represent the format of such data. Or perhaps it sees it's all text data Or it is all increasing numbers separated by white space. There are many ways to look at it and automation except for specific cases is still a pipe dream without AI Last edited by chants; 10-29-2020 at 16:34. |
#7
|
|||
|
|||
You can use any binary editor with templates support. 010 Editor has nice one. You just write your own template and apply when you need. You can create only partial records or more complex template for whole file. it has support for variety types: integers, floats, doubles, dates, times, strings, guids ...
Description: Code:
https://www.sweetscape.com/010editor/templates.html Code:
https://www.sweetscape.com/010editor/manual/IntroTempScripts.htm Code:
https://www.sweetscape.com/010editor/repository/templates/ Code:
http://kaitai.io/index.html#what-is-it Code:
http://formats.kaitai.io/ Code:
https://ide.kaitai.io/ Last edited by DARKER; 10-29-2020 at 17:15. |
The Following User Says Thank You to DARKER For This Useful Post: | ||
Abaddon (10-30-2020) |
#8
|
|||
|
|||
Templates are good but they can be used only when you already know that data's structure. I think the OP asked for a tool to analyze a data file and figure out the data structure.
|
#9
|
|||
|
|||
Quote:
Some tips for creating structures in IDA, Quickly creating structures: Code:
https://www.hex-rays.com/blog/igor-tip-of-the-week-11-quickly-creating-structures/ Code:
https://www.hex-rays.com/blog/igor-tip-of-the-week-12-creating-structures-with-known-size/ |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
IDA can't automatically recognize try/finally structures by Borland compilers | WhoCares | General Discussion | 2 | 10-09-2004 20:52 |