Exetools  

Go Back   Exetools > General > Source Code

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 08-19-2023, 17:07
HarrySpoofer HarrySpoofer is offline
Friend
 
Join Date: Jul 2018
Posts: 25
Rept. Given: 0
Rept. Rcvd 3 Times in 2 Posts
Thanks Given: 6
Thanks Rcvd at 20 Times in 8 Posts
HarrySpoofer Reputation: 4
Segmented File Hashing Utility

This is a small command line utility for MS-Windows written in C in VisualStudio 2019, which creates multiple MD5 hashes of a single file. Also known as "segmented hashing".

USAGE: SegmentedHash.exe FileToHash NumberOfHashSegments OffsetRange e.g.: 1BEEF-20000

For example:

SegmentedHash.exe BigFile.bin 100

...calculates 100 consecutive MD5 hashes of the entire Bigfile.bin file. In other words: it divides the BigFile.bin into 100 equal size segments and calculates an MD5 hash of each segment.

SegmentedHash.exe BigFile.bin 100 642c06f40-642f509a6

...calculates 100 consecutive MD5 hashes of the partial Bigfile.bin starting from the hexadecimal offset 642c06f40 and ending at the offset 642f509a6 (inclusive).

The file offsets can also be specified in an open form, e.g.:
-1CB2 means from the beginning of the file (offset 0) up to the file offset 0x1cb2 and 1BC2- means from the file offset 0x1cb2 up to the end of file.

The hashing algorithm can be changed by altering the line: #define ALGORITHM

Other possible algorithms are: CALG_SHA1, CALG_SHA_256, CALG_SHA_512, CALG_3DES, CALG_AES_128, etc...

Note: This utility does not write any files. It only reads the file in BUFSIZE chunks (see the source in SegmentedHash.cpp) and calculates the hashes.

Q: WHAT IS THIS USEFUL FOR?

A: Scenario: You have been downloading a 16TB file over a slow FTP connection for a week but several bytes of the file came over corrupted.
This utility allows you to detect which bytes did not transfer correctly without doing the full 16TB file compare / re-download. This is done by running the SegmentedHash utility on the FTP server AND on the FTP client machine and comparing only the hashes of that big file before and after the transfer. Once a mismatching hash is identified, you can narrow down the search to a smaller range of file offsets and find the corrupted bytes. Just several kB of hashes need to be transferred and compared to find the culprit in a huge file. Once this is done the correct bytes can be downloaded anew and used to patch the huge downloaded corrupted file.

Code:
#include < stdio.h >
#include < assert.h >
#include < windows.h >
#include < Wincrypt.h >

#define ALGORITHM CALG_MD5
#define BUFSIZE 4096

DWORD PrintHash(HCRYPTHASH hHash)
{
    DWORD cbData = sizeof(DWORD);
    PBYTE pbData = NULL;
    DWORD cHashSize;
    CHAR Digits[] = "0123456789abcdef";
    DWORD dwStatus = 0;

    
    if (!CryptGetHashParam(hHash, HP_HASHSIZE, (PBYTE)&cHashSize, &cbData, 0) && (cbData != sizeof(cHashSize)))
        goto ErrorExit;

    pbData = (PBYTE)malloc(cHashSize);

    if ((pbData) && (CryptGetHashParam(hHash, HP_HASHVAL, pbData, &cHashSize, 0)))
    {
        for (DWORD i = 0; i < cHashSize; i++)
        {
            printf("%c%c", Digits[pbData[i] >> 4], Digits[pbData[i] & 0xf]);
        }
        printf("\n");
        goto Exit;
    }

ErrorExit:
    dwStatus = GetLastError();
    printf("ERROR: CryptGetHashParam failed: %08x\n", dwStatus);
Exit:
    if (pbData) free(pbData);
    return dwStatus;
    
}

LONGLONG mySetFilePointer(HANDLE hFile, LONGLONG distance, DWORD MoveMethod)
{
    LARGE_INTEGER dist;

    dist.QuadPart = distance;
    dist.LowPart = SetFilePointer(hFile, dist.LowPart, &dist.HighPart, MoveMethod);

    if (dist.LowPart == INVALID_SET_FILE_POINTER && GetLastError() != NO_ERROR)
    {
        dist.QuadPart = -1;
    }

    return dist.QuadPart;
}


int wmain(int argc, wchar_t* argv[])
{
    LARGE_INTEGER fsize;
    ULONGLONG NextToRead = 0;
    ULONGLONG SegFirst=0;
    ULONGLONG SegLast = 0;
    ULONGLONG SegSize;
    ULONGLONG WindowFirst = 0;
    ULONGLONG WindowLast = 0;
    ULONGLONG WindowSize;
    ULONGLONG nSegments = 1;
    ULONGLONG Remainder = 0;
    DWORD dwStatus = 0;
    BOOL bResult = FALSE;
    HCRYPTPROV hProv = 0;
    HCRYPTHASH hHash = 0;
    HANDLE hFile = NULL;
    BYTE Bufffer[BUFSIZE];
    DWORD cbRead = 0;
    wchar_t* EndPtr;
    ULONGLONG tmp;

    LPCWSTR filename = argv[1];
    // Logic to check usage goes here.

    if (argc < 2)
    {
        printf("USAGE: FileToHash NumberOfHashSegments  e.g.: 1BEEF-20000\n");
        return -1;
    }

    hFile = CreateFileW(filename, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL);

    if (INVALID_HANDLE_VALUE == hFile)
    {
        dwStatus = GetLastError();
        printf("Error opening file %ls\nError: %08x\n", filename, dwStatus);
        return dwStatus;
    }

    if (!GetFileSizeEx(hFile, &fsize))
    {
        CloseHandle(hFile);
        dwStatus = GetLastError();
        printf("Error obtainig file size %ls\nError: %08x\n", filename, dwStatus);
        return dwStatus;
    }
    WindowLast = fsize.QuadPart-1;

    if (argc > 2)
        nSegments = max (_wtoi(argv[2]), 1);

    if (argc > 3)
    {
        if (*argv[3] == L'-')
            WindowLast = wcstoull(argv[3] + 1, NULL, 16);  //_wcstoui64
        else
        {
            WindowFirst = wcstoull(argv[3], &EndPtr, 16);  //_wcstoui64
            if (*EndPtr == L'-')
            {
                if (*(EndPtr + 1) == L'\0')
                    WindowLast = (ULONGLONG)fsize.QuadPart - 1;
                else
                    WindowLast = wcstoull(EndPtr + 1, NULL, 16);  //_wcstoui64
            }
        }
    }

    if (WindowFirst > WindowLast)
    {
        tmp = WindowFirst;
        WindowLast = WindowFirst;
        WindowFirst = tmp;
    }

    WindowLast = min(WindowLast, (ULONGLONG)fsize.QuadPart-1);
    WindowSize = WindowLast - WindowFirst + 1;
    nSegments = min(nSegments, WindowSize);

    Remainder = WindowSize % nSegments;
    SegSize = WindowSize / nSegments + (Remainder > 0);   

    // Get handle to the crypto provider
    if (!CryptAcquireContext(&hProv, NULL, NULL, PROV_RSA_FULL, CRYPT_VERIFYCONTEXT))
    {
        dwStatus = GetLastError();
        printf("ERROR: CryptAcquireContext failed: %08x\n", dwStatus);
        CloseHandle(hFile);
        return dwStatus;
    }
 
    if (!CryptCreateHash(hProv, ALGORITHM, 0, 0, &hHash))
    {
        dwStatus = GetLastError();
        printf("ERROR: CryptCreateHash failed: %08x\n", dwStatus);
        CloseHandle(hFile);
        CryptReleaseContext(hProv, 0);
        return dwStatus;
    }

    printf("\nCalculating hashes for %llu segments of the file %ls\nfrom ofset %016llx to %016llx (inclusive)\n\n", nSegments, filename, WindowFirst, WindowLast);
    printf("|       File Offset Range       |\t|           MD5 Hash           |\n");
    printf("|-------------------------------|\t|------------------------------|\n");
    

    mySetFilePointer(hFile, WindowFirst, FILE_BEGIN);
    NextToRead = WindowFirst;
    SegFirst = WindowFirst;
    SegLast = min(WindowFirst+SegSize-1, WindowLast);

    while (bResult = ReadFile(hFile, Bufffer, (DWORD)min(BUFSIZE, SegLast - NextToRead+1), &cbRead, NULL))
    {
        assert(cbRead == min(BUFSIZE, SegLast - NextToRead+1));
        assert(NextToRead + cbRead - 1 <= WindowLast);

        NextToRead += cbRead;
        
        if ( (cbRead > 0) && (!CryptHashData(hHash, Bufffer, cbRead, 0)) )
        {
            dwStatus = GetLastError();
            printf("ERROR: CryptHashData failed: %08x\n", dwStatus);
            goto Exit;
        }
      
        if ( (NextToRead > SegLast) || (cbRead == 0) )
        {
            printf("%016llX-%016llX\t", SegFirst, SegLast);
            PrintHash(hHash);

            if ((NextToRead > WindowLast) || (cbRead == 0))
                break;

            CryptDestroyHash(hHash);

            if (!CryptCreateHash(hProv, ALGORITHM, 0, 0, &hHash))
            {
                dwStatus = GetLastError();
                printf("ERROR: CryptCreateHash failed: %08x\n", dwStatus);
                goto Exit;
            }

            SegFirst = min(SegLast + 1, WindowLast);
            SegLast = min(SegFirst + SegSize - 1, WindowLast);

            if (Remainder == 1)
                SegSize--;

            if (Remainder > 0)
                Remainder--;
        }
    }

    if (!bResult)
    {
        dwStatus = GetLastError();
        printf("ERROR: ReadFile failed: %08x\n", dwStatus);
        goto Exit;
    }

    dwStatus = 0;

Exit:
    CryptDestroyHash(hHash);
    CryptReleaseContext(hProv, 0);
    CloseHandle(hFile);

    return dwStatus;
}

Last edited by HarrySpoofer; 08-19-2023 at 17:16.
Reply With Quote
The Following 5 Users Say Thank You to HarrySpoofer For This Useful Post:
MarcElBichon (08-19-2023), ontryit (08-24-2023), pnta (10-14-2023), tonyweb (08-21-2023), Zeokat (08-20-2023)
  #2  
Old 08-19-2023, 17:37
sendersu sendersu is offline
VIP
 
Join Date: Oct 2010
Posts: 1,067
Rept. Given: 332
Rept. Rcvd 223 Times in 115 Posts
Thanks Given: 235
Thanks Rcvd at 513 Times in 288 Posts
sendersu Reputation: 200-299 sendersu Reputation: 200-299 sendersu Reputation: 200-299
Nice piece of code!
is it possible to make it cross-platform and be available for Linux flavors as well?
Reply With Quote
  #3  
Old 09-04-2023, 00:23
chants chants is offline
VIP
 
Join Date: Jul 2016
Posts: 738
Rept. Given: 37
Rept. Rcvd 48 Times in 30 Posts
Thanks Given: 671
Thanks Rcvd at 1,064 Times in 482 Posts
chants Reputation: 48
Similar to how the torrent protocol works. Unfortunately ftp and http don't have this functionality built into the protocol so without something like ssh access this isn't going to do much such as downloading from public servers.

Practically I don't think a segment of over 2 megabytes without a hash is a good idea. Companies like Microsoft give md5 or sha1 hashes of multigigabyte ISOs for example. But giving a 2MB segmented sets of hashes is still a very small amount of data. Not sure why at this day and age, this is not solved. Bandwidth efficiency for large downloads remains an annoying issue in various contexts.
Reply With Quote
  #4  
Old 09-16-2023, 21:14
Abdul Moeed Abdul Moeed is offline
Friend
 
Join Date: Sep 2023
Location: Cleveland, Ohio
Posts: 20
Rept. Given: 1
Rept. Rcvd 1 Time in 1 Post
Thanks Given: 3
Thanks Rcvd at 9 Times in 6 Posts
Abdul Moeed Reputation: 2
Quote:
Originally Posted by HarrySpoofer View Post

Q: WHAT IS THIS USEFUL FOR?

A: Scenario: You have been downloading a 16TB file over a slow FTP connection for a week but several bytes of the file came over corrupted.
This utility allows you to detect which bytes did not transfer correctly without doing the full 16TB file compare / re-download. This is done by running the SegmentedHash utility on the FTP server AND on the FTP client machine and comparing only the hashes of that big file before and after the transfer.
This is a great concept but unless you own or at least have access to the FTP server, how on Earth would you be able to run that utility on the ftp server?
This means that the FTP server owner also must know about this tool and then run this tool on their server side.

Or it is useful only in cases when you are transferring files between your own servers...
Reply With Quote
The Following User Says Thank You to Abdul Moeed For This Useful Post:
Rebe (09-17-2023)
  #5  
Old 09-17-2023, 21:35
Rebe Rebe is offline
Guest
 
Join Date: Sep 2023
Posts: 2
Rept. Given: 0
Rept. Rcvd 0 Times in 0 Posts
Thanks Given: 1
Thanks Rcvd at 0 Times in 0 Posts
Rebe Reputation: 0
That's a good point. Is there a compiled version handy?
Reply With Quote
  #6  
Old 10-11-2023, 23:16
h8er h8er is offline
Friend
 
Join Date: Jan 2002
Posts: 43
Rept. Given: 45
Rept. Rcvd 7 Times in 6 Posts
Thanks Given: 140
Thanks Rcvd at 13 Times in 6 Posts
h8er Reputation: 7
Hello, nice tool, another good option for something like this is using PAR2 / PAR3 which support error recovering too

https://en.wikipedia.org/wiki/Parchive
Reply With Quote
Reply

Tags
file, hashing, utility


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On



All times are GMT +8. The time now is 11:27.


Always Your Best Friend: Aaron, JMI, ahmadmansoor, ZeNiX, chessgod101
( 1998 - 2024 )