#1
|
||||
|
||||
pattern matching algorithm
Hi,
do someone know whether there is a tool which compares two different large files in binary mode and highlights the equalities ? If have seach the net for java source which is using a patter matching algo to handle this but i wasn't able to find something. If someone is experinced in this area pleaz tell me ;D OHPen |
#2
|
|||
|
|||
WinHex may do what you are after.
mark_E |
#3
|
|||
|
|||
As does HexWorkshop
-shadz |
#4
|
||||
|
||||
i don't know whether you ever have tried the resynch compare between two different large files in HexWorkshop but I HAVE...
It' isn't workin'. Maybe the author have implemented it wrong. But HexWorkShop wasn't able to extract signatures from different packers. And that's what i want to do. OHPen |
#5
|
|||
|
|||
As mark_E said, WinHex does this quite well.
|
#6
|
|||
|
|||
HexCMP v1.2 does that well also...
|
#7
|
||||
|
||||
i think it's my fault,
i haven't explaned well what i need exactly. OK: 2 Files: 1.exe (123.456 Byte) 2.exe (654.321 Byte) Both packed with Packer/Crypter XYZ. XYZ has a unique Signature 0xABCDEF12. In 1.exe Signature can be found at offset 0x123. In 2.exe Signature can be found at offset 0x321. My question is: Is there a tool or sourcecode which would be able to detect that 0xABCDEF12 is a Byte-Sequence which can be found in both files at different offsets. That's exactly what im searchin for. I have tested all of the tools you told me that they can, but THEY CAN'T. It's a problem so code such a "resyncronizing" compare algorithm, so i have to search a tool for it or have to find some source ( i love java ) regards OHPen |
#8
|
|||
|
|||
This is an algorithm that you will find in compression code (LZW especially), used for building the dictionary in an archive. The pattern finding routine for compression algos typically search for patterns in one file, but it should be easy to modify the code to search for distinct patterns across two files. Try looking at the source for ZLib, such code could be found there.
Writing an algorithm to do this is extremely simple, especially if you aren't concerned about speed. If it were acceptable for the code to take several minutes to execute, it would only take a very small amount of code to identify patterns in the way you describe. If you want the code to run quickly (under 20 seconds, lets say), this process doesn't really become any more complex, it usually just involves constraining the algorithm to only search for patters with lengths within a defined range (instead of patterns of any length). But, then you potentially miss patterns outside the length range you specify, which would generally be a bad thing. You see this in compression algos as a "dictionary size" limit. Really, the larger your allowed dictionary size, the better compression you can theoretically get, but the slower the compression. Same with your situation. Either way, just think about the process for a few minutes, and you should be able to write the algo yourself without using someone elses. It is really a simple process. Last edited by Satyric0n; 10-08-2003 at 09:06. |
#9
|
||||
|
||||
thx for your help, i think you are right, i will try to write it on my own...
OHPen |
#10
|
|||
|
|||
You might also want to try a piece of software called Beyond Compare, it can compare pretty much anything between two files, and show the differences. Don't know if it will actually show the binary fields in hex tho. But its kick butt software nonetheless.
-Lunar |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[C++] Pattern Scanner | atom0s | Source Code | 4 | 02-11-2016 06:03 |
Reversing Key using pattern | Maltese | General Discussion | 11 | 11-16-2007 19:49 |