Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extended KotD #4, #5, and #6: _CRC32, _ADLER32, _MD5$
#1
So three new Keywords of the Day all at once!  Aren't you guys special, or what?!!

(It's "or what", in case anyone was wondering.  You guys aren't special at all.  It's just that all three of these are basically the same thing!  Big Grin )

If you guys haven't read my lesson on CHECKSUMS, kindly go do that now:  https://qb64phoenix.com/forum/showthread.php?tid=2674

The reason I wrote that little lesson for checksums, is simply because that's all each of these functions are -- varios methods to generate checksums for data.

_CRC32 is a *VERY* common method used to generate checksums -- if you guys have ever opened a ZIP, RAR, or 7Z archive and looked at the individual files, you'll see that everyone one of them has a _CRC32 value for them.  That value is basically their checksum that the file generated BEFORE compression, and it's the same value that it should generate AFTER decompression.  If the before and after values don't match, then that file is corrupted and isn't the same as what it was supposed to be originally!!

_ADLER32 is another common checksum that I've ran into and used dozen of times.  PNG-format image files are compressed, and PNG uses _ADLER32 to check the integretity of those files, much as ZIP uses _CRC32.

_MD5$ was once used as a secure checksum method, but then it was discovered that it actually doesn't generate unique values.  (Like in my post I linked to above, where I generally discuss checksum, "ABC" might have a value of 6, while "AAD" would also have that same value of 6.)  This lack of weakness in security has most security experts yelling and screaming, "GAH!! DON'T USE IT!!"...   and the rest of the world just kind of shrugs and goes, "Ehh...  It's good enough for my needs."   _MD5$ is *still* used as a checksum for all sorts of things, so it's still a very common method to see out in the wild world of programming.  I wouldn't use it for trying to store or transfer something like bank passwords, but as a log-in checksum for one of the games I write in QB64PE??   SURE!!  WTH not?  Tongue

All three methods are quite common and in constant usage for generating checksums for various things, and honestly, that's the BASIC reason why we included them into the language for folks to use.  WE -- as in you, and me, and anyone who uses QB64PE -- are already using each of the checksums at some point in our program.   Ever load a PNG file for use with _LOADIMAGE in QB4PE?  Use _Deflate or_Inflate?  Https communications?  Various routines in our user libraries *ALREADY* require the use of these routines, so it just makes sense to uncover these and make them available for our user-base as well.

As I mentioned before, and as the wiki mentions for each of these -- the way they work is really quite simple:

Send them a string of data, they send you back a checksum for that data.

Honestly, that's all there is to these things!!

And the purpose of this checksum?  Basically just a quick data-intregrity check.  That's all there is to it. Big Grin

If I pass you a 3 digit string, then ask you to pass me back the _CRC32 checksum that you got from that string, I'll instantly know if you got all the data properly, or if there was packet loss, or if what you got was corrupted or instantly infected via a virus from your machine....   If the checksum you generate doesn't match the number I generated before I sent it to you, then that file has been altered/corrupted in some manner!!

That's what checksums basically do for us, and that's what all three of these commands do -- you send them a string, they send you a checksum that you can later use to validate that the string hasn't changed from what it was when you grabbed that checksum of it.

Simple enough.  Right?  Wink
Reply
#2
Quote:"ABC" might have a value of 6, while "AAD" would also have that same value of 6.
if one method fails, would one of the other methods succeed?
Reply
#3
(05-13-2024, 09:38 AM)Jack Wrote:
Quote:"ABC" might have a value of 6, while "AAD" would also have that same value of 6.
if one method fails, would one of the other methods succeed?

Aye, as each are different algorithms for finding the checksum.  

And it's not so much that they "fail".  It's that the don't always guarantee UNIQUE results.

"Applesauce" might checksum to "AXYZ97".
At the same time, "Byttermilk" might also checksum to "AXYZ97".

Now, chances are, if I send "Applesauce" to you, it's not going to corrupt to the point where it becomes "Byttermilk".

If your checksum doesn't match mine, you 100% know something is wrong.  <<-- this is what you're using checksums to check for.  Not the rare "oh, this might match too", issue.

Think of it like car keys.  A million cars made each year, with maybe a thousand different keys between them.  Will your key open a car in every state?  Sure, it probably will!  But will it open your neighbor's car?  Wouldn't think so.  Big Grin
Reply
#4
When two different inputs to a checksum or hash function produce identical checksums or hashes, that's called a "collision."  Checksum and hash algorithms mathematically reduce data (sometimes huge amounts of it!) down to a fixed number of bits which is often much smaller than the original data, so collisions DO occur.  A 4 megabyte file could contain one of a gazillion unique sequences of bits, while a humble 32-bit checksum can only represent 4 billion unique numbers, so it's impossible to produce a unique 32-bit checksum for every possible input file.

The MD5 and SHA1 algorithms were once used in some forms of encryption.  These days they are no longer considered secure for heavyweight cryptography, but they are still quite useful for checksum creation.

Algorithms like MD5, SHA1, ADLER32 , and CRC32 are often used sequentially by "duplicate file finder" programs: Rather than waste all day performing byte-by-byte comparisons on hundreds or thousands of large files, one fast checksum algorithm will be used on all files which could have duplicates.  If that algorithm produces identical checksums for multiple files, the dupe-finder will try a different algorithm on those particular files.  If the dupe-finder exhausts all of its algorithms then it must fall back to byte-by-byte comparison to determine whether the files are truly identical.

A strong checksum algorithm will suffer a collision once in a while, when two files produce the same checksum, but the chance of simultaneous collisions when using multiple unrelated strong methods is quite low.
Reply
#5
https://www.avira.com/en/blog/md5-the-broken-algorithm Wrote:the probability of collision of hashes even for MD5 is terribly low. That probability is lower than the number of water drops contained in all the oceans of the earth together.

However, you may produce artificial collissions, read the article for more...
Reply




Users browsing this thread: 2 Guest(s)