Exetools

Exetools (https://forum.exetools.com/index.php)
-   General Discussion (https://forum.exetools.com/forumdisplay.php?f=2)
-   -   Is there a tool that automatically can determine data structures? (https://forum.exetools.com/showthread.php?t=19686)

binarylaw 10-23-2020 18:55

Is there a tool that automatically can determine data structures?
 
Some programs store their settings or data in a common format, like JSON or CSV or SQLite, or even a simple .INI file. But for programs that store data in a proprietary structure/format like a .DAT file, where you can open it in hex and see string data scattered about, but without any seeming precise pattern or structure (that's easily observable) -- is there a tool that can analyze a a data file like this and automatically figure out its data structuring, such that you could then use the tool to inject new data into the file yourself?

For example, there's a program, Silent Installer Builder, which allows one to create packaged installers using various configurable formats/options/functionality. But the v6 versions store this data in a .DAT file, which means that you have to use the SiB program to edit or change any custom install packages. You can see the text inside it, but it's scattered all over the place, so it's not possible to manually add entries into it yourself without using the program.

So I'm just wondering if there are any tools that could automatically analyze a file and determine that, for example, every 24 bytes a new file path begins, and each file path is allotted N number of bytes whether they're all used or not, before the next entry begins.

My guess is nothing like this exists, but I thought I would check nonetheless.

WhoCares 10-24-2020 02:34

only human brain + manual reversing(debug + disassembly) can do it.

chants 10-24-2020 15:56

Machine learning and NNs might help but this totally custom format is not common enough perhaps to get a good training dataset.

This is more of a question that is in the forensic analysis area. With huge amounts of data, finding the signal from the noise which requires determining data format. So this will be researched more in the AI future for sure.

Okay but short of theoretical data science, for RE, WhoCares gave you the way to do it.

binarylaw 10-29-2020 15:35

OK, makes sense, thanks guys.

Quote:

Originally Posted by chants (Post 121515)
Machine learning and NNs might help but this totally custom format is not common enough perhaps to get a good training dataset.

What are "NNs"?

Quote:

Originally Posted by chants (Post 121515)
This is more of a question that is in the forensic analysis area. With huge amounts of data, finding the signal from the noise which requires determining data format. So this will be researched more in the AI future for sure.

Theoretically, would it not work just to take diffs of a file between changes/saves/whatever and find the pattern(s) in each incremental slice of difference? For example, a program saving its settings to a .dat file. Would this theoretically be so difficult if you could get 5-6 diff snapshots of settings-saves?

Not disagreeing with you, just asking from a theoretical perspective now, wondering if this would actually be so difficult as to need AI.

deepzero 10-29-2020 16:06

NN = Neural Networks

Quote:

Would this theoretically be so difficult
Depends entirely on the program and the encoding it uses and what you want to achieve. Generally this is a valid approach of course, but experience shows it will only get you this far.

Afaik game modders and the like often reverse engineer custom file formats, maybe google for that.

chants 10-29-2020 16:23

My answer assumed that given: arbitrary custom data, then little can be done NNs are neural networks.

Now if you change the given to a function instead of data e.g.: chosen input -> custom file generator -> custom data corresponding with chosen input. Then certainly a lot of difference comparison utilities will help. But automating this and treating it as a blackbox is only done when necessary. Custom file generator is in effect your file format information. And the best idea is to treat it as a white box and reverse it. So best bet is to open SiB in IDA Pro find out where it reads or writes the custom data and reconstruct that function in higher level code which reveals the file format.

Treating it like a black box is something usually done as necessity. At least in the context in reversing as opposed to say network security where the function code is totally unavailable. But automating this is still basically ridiculous. Finding a function that maps some input to some output is incredibly complex. Especially when you have that function in machine code right in front of you. Sure difference tools might make the job faster than reversing in some contexts. But like said that is because you are using your mental capabilities to quickly identify some patterns.


Even the most simple cases of course are impossible.

Input is a number say 10. Output files contains 2 3 5 7 11 13 17 19 23. Now you try it with 11 and the number 29 is added to the file. So now we expect some automation to recognize this is the first n prime numbers and generate a possible maximum efficient pseudo code to represent the format of such data. Or perhaps it sees it's all text data
Or it is all increasing numbers separated by white space. There are many ways to look at it and automation except for specific cases is still a pipe dream without AI

DARKER 10-29-2020 16:39

You can use any binary editor with templates support. 010 Editor has nice one. You just write your own template and apply when you need. You can create only partial records or more complex template for whole file. it has support for variety types: integers, floats, doubles, dates, times, strings, guids ...

Description:
Code:

https://www.sweetscape.com/010editor/templates.html
Introduction to Templates and Scripts (Step by step help):
Code:

https://www.sweetscape.com/010editor/manual/IntroTempScripts.htm
and you can inspire here with already done templates:
Code:

https://www.sweetscape.com/010editor/repository/templates/
also is good to mention Kaitai (Free & open source):
Code:

http://kaitai.io/index.html#what-is-it
Format Gallery:
Code:

http://formats.kaitai.io/
Online IDE:
Code:

https://ide.kaitai.io/

Chr155Y 10-29-2020 19:52

Templates are good but they can be used only when you already know that data's structure. I think the OP asked for a tool to analyze a data file and figure out the data structure.

DARKER 10-29-2020 20:54

Quote:

Originally Posted by Chr155Y (Post 121548)
Templates are good but they can be used only when you already know that data's structure. I think the OP asked for a tool to analyze a data file and figure out the data structure.

Don't be lazy and create one ;) There are no tool that do it automatically. It's combination of hexeditor + reverse analyse of target program + tests. You don't need to know all details about structure, you can create some dummy fields for unknown data that you can identify later. As say binarylaw: there are file paths, sizes and record size that are known... Hexeditor can highlight these places and you can focus only on "not processed" data. This is common approach for unknown structures.

Some tips for creating structures in IDA, Quickly creating structures:
Code:

https://www.hex-rays.com/blog/igor-tip-of-the-week-11-quickly-creating-structures/
Creating structures with known size:
Code:

https://www.hex-rays.com/blog/igor-tip-of-the-week-12-creating-structures-with-known-size/


All times are GMT +8. The time now is 19:21.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX