Login

JRace · (This post was last modified: 05-14-2024, 06:28 AM by JRace.)

When two different inputs to a checksum or hash function produce identical checksums or hashes, that's called a "collision." Checksum and hash algorithms mathematically reduce data (sometimes huge amounts of it!) down to a fixed number of bits which is often much smaller than the original data, so collisions DO occur. A 4 megabyte file could contain one of a gazillion unique sequences of bits, while a humble 32-bit checksum can only represent 4 billion unique numbers, so it's impossible to produce a unique 32-bit checksum for every possible input file.

The MD5 and SHA1 algorithms were once used in some forms of encryption. These days they are no longer considered secure for heavyweight cryptography, but they are still quite useful for checksum creation.

Algorithms like MD5, SHA1, ADLER32 , and CRC32 are often used sequentially by "duplicate file finder" programs: Rather than waste all day performing byte-by-byte comparisons on hundreds or thousands of large files, one fast checksum algorithm will be used on all files which could have duplicates. If that algorithm produces identical checksums for multiple files, the dupe-finder will try a different algorithm on those particular files. If the dupe-finder exhausts all of its algorithms then it must fall back to byte-by-byte comparison to determine whether the files are truly identical.

A strong checksum algorithm will suffer a collision once in a while, when two files produce the same checksum, but the chance of simultaneous collisions when using multiple unrelated strong methods is quite low.

Login
Username/Email:
Password:	Lost Password?
	Remember me