2006-01-13, 18:57 | Link #1 |
Senior Member
Join Date: Jan 2004
|
Crc-32
I do not understand why so many groups hold on to this mathematically flawed method for error detection. I bring this up in groups I am in all the time (and a few have actually listened long enough to make the proper changes).
CRC-32 remainders are often appended to the end of an entire file transfered over the internet for error checking. This is the supposed to give the downloader a way to check that the file was downloaded without error, by running a CRC-32 check over the complete file and comparing it to the remainder. The catch is that CRC-32 is also used for TCP/IP error detection at the packet level. Therefore any TCP/IP error that is able to get through this CRC-32 (collision / false negative) will remain undetected when a CRC-32 check is performed on the entire file, leading to a corrupted file that is not detected (this is a mathematical property of CRC). In essence nearly any corrupted file caused by a packet-level false negative CRC-32 collision would also pass CRC-32, leading the downloader to falsely believe the file is proper. Finally anyone understanding the basics of polynomial math can modify the file such that the CRC-32 code is not changed, also leading to a corrupted file that is not detected. Therefore these CRC-32 codes only give the unknowledgeable downloader a false sense of security, and anyone who understands the math behind these things understands how useless they are for final-stage download verification made over TCP/IP (the internet) to begin with. The simple solution here is to use a more robust hash function like md5, that is still reasonably secure for now, or even SHA-1, or to go to something like PGP for real authentication that is near unbreakable. I think all groups should at least consider this since CRC-32 is of no real value and only gives people a false sense of security. |
2006-01-13, 19:12 | Link #2 | |
Senior Member
Join Date: Nov 2003
|
Quote:
I also forgot, these are two different things performing the CRC. You are assuming that they will make the same screwup. Last edited by bayoab; 2006-01-13 at 19:36. |
|
2006-01-13, 19:36 | Link #3 | |
Junior Member
Join Date: Jul 2004
Location: Italy
|
Quote:
It's true that it's easy to fool CRC due to its linear structure, but this means that someone is intentionally corrupting the data. CRC is easy to compute and allows detection of corruptions caused by noise (i.e. not intetional) like bad RAM, xdcc doing stupid things, bad resume, hd going fubar, etc. Oh, and if you're using bittorrent then each piece of the file is protected by a SHA-1 hash. Anyway today it's very rare to see corruption above the link layer... |
|
2006-01-13, 20:01 | Link #4 | |
Senior Member
Join Date: Jan 2004
|
Quote:
md5 collisions: Which is why something like PGP would be even more highly recommended. I am not whining, I am offering numerous solutions to a real problem. To me the only thing that I am amazed by is how much some people stick to the old ways, for no purpose and without reason whatsoever. Doing this right doesn't require much more effort, so why not do it right. |
|
2006-01-13, 21:02 | Link #5 |
Love Yourself
Join Date: Mar 2003
Location: Northeast USA
Age: 38
|
People use CRC32 over MD5 and any of the SHA hashes because it fits in a filename. Nobody wants to bundle an additional checking file, or even a text containing hash values (though I have seen this done before).
When it comes down to it, look: it's anime. It's not even supposed to be a permanantly archived thing. CRC32 is a weaker data integrity check than many other algorithms, but it easily sticks onto the end of a filename and it does its job in making it easier to tell if playback errors are caused by a corrupt file or something else. We aren't dealing with files where data integrity is that important. If you want to suggest another algorithm, also look into the "economics" of file distribution and please suggest a viable means of including the hash value with the file. It could be hosted on a website, but what a bother.
__________________
|
2006-01-13, 21:03 | Link #6 | |
Senior Member
Join Date: Mar 2004
|
Quote:
It may occasionally cause problems, but not enough for most people to do anything about. Using MD5 or something else is good and all, but can you imagine filenames for bt and stuff if they put those hashs into the filename? I mean, you'd probably be doing some screen stretching with something like [A-E]_Yakitate_Japan_51_[ab39e3e01ec79d68558e267eba10ea3d].avi And lets face it, people will stick with methods their viewers are used to. You don't see every group jumping over to mkv even though it's not that much more difficult to encode.
__________________
|
|
2006-01-13, 21:21 | Link #7 |
Mass Dictionary Lookup...
Join Date: Dec 2005
Location: A Japanese Dictionary
|
Also for TCP at the router layer there is a CRC check during each hop (main reason for latency). For every router there is a check of the packet. Not to mention codec compression and some containers which have built in rebuilds. For any packect to be modified we would be talking about someone hacking a packet while in transport which is crazy in the sense that there are literally millions of packets passing over a single link between routers, servers and no one could catch and modify a packet in transport due to the way TCP is designed to discard and ask for a resend in the windowing of data transfer and also on how you would have to catch that specific packet. Also one kilobyte of data is only either governing the colour of one pixel or header of a frame; if either were to be corrupted nothing bad would occur, even for the frame header as many players have inbuilt skipping or rebuilding functions, else we would have higher incidence of files not playing. Also the time and effort spent using PGP or MD5 is rather more than what the average user would care about unless he/seh were getting a large bundle of files in one connection; which means usually BitTorrent who has a SHA-1 hash.
|
2006-01-13, 21:23 | Link #8 |
Excessively jovial fellow
Join Date: Dec 2005
Location: ISDB-T
Age: 37
|
It's a good thing we have long filenames, since MD5 checksums are 32 characters long, and SHA1 ones are 40...
Of course, one could just distribute a .md5 with the ep. But if it's on a XDCC bot, people probably will ignore it, and if it's on BT it's unnecessary. BTW, didn't you see that article where they modified a postscript file to something completely different while keeping the MD5 sum? MD5 really isn't a much better alternative (even though it's definitely more cryptographically secure than CRC32). A similiar program, but for CRC32, was submitted to the International Obfuscated C Coding Contest 2004. It modifies the file to match the desired checksum... http://www.de.ioccc.org/2004/omoikane.hint
__________________
|
2006-01-13, 21:37 | Link #9 | |
Senior Member
Join Date: Jan 2004
|
Quote:
If md5 or SHA-1 appended to the filename is too much a hassle, put it into the comment field of the AVI or something like that. If you use PGP you don't need to append anything to the file, you just need to post your groups or the encoder's public key on one of the free key servers. Then the free applications out there make it completely transparent, you get a simple and reliable pass or fail without even having to enter or compare any data. Really can't say much more on this topic, I'm just trying to point out flaws and simple corrections, take it for what it is. Groups are already using PGP internally to protect their scripts, raws, other sensitive information from being stolen, so why not start using it for external verification also. |
|
2006-01-13, 21:43 | Link #10 | |
Excessively jovial fellow
Join Date: Dec 2005
Location: ISDB-T
Age: 37
|
Quote:
__________________
|
|
2006-01-13, 21:48 | Link #11 | |
Senior Member
Join Date: Jan 2004
|
Quote:
|
|
2006-01-13, 22:14 | Link #12 | |
Rozen Detective
Join Date: Dec 2005
Location: Germany
Age: 40
|
Quote:
|
|
2006-01-14, 18:02 | Link #13 | |
Love Yourself
Join Date: Mar 2003
Location: Northeast USA
Age: 38
|
Quote:
Once again, this is fansubbing. We're not dealing with highly confidential documents, or things that need to be stored for a very long time. We go with what does the purpose and is most convenient. MD5 may be a better data integrity check than CRC32, but for our functions CRC32 has worked fine and continues to work fine. We don't face algorithm attacks, and we're not interested in archiving our works for eternity. If a fansub group is willing to figure out a means to attach these longer hashes to their files or make them easily available, good on them. However, I would be against making it more of a hassle to check for corruption - even if it were a little more secure - just because it'd be a hassle. Once again, these are fansubs. I'll use SHA-512 and PAR2 files for full integrity security when I'm dealing with critical files, but for fansubs CRC32 is convenient and easy to use. There's no need to change it.
__________________
|
|
2006-01-15, 05:50 | Link #14 |
Senior Member
Join Date: Nov 2003
|
Good thing this is a torrent site, where every download come with a nice .torrent file containing a full SHA-1 hashtree....
Btw: MD5 _has_ been broken, but only for situations of creating 2 new files that only differ in one bit at once. There is no way short of brute forcing to create a new file based on a given hash. Ok, it not _really_ brute force, because some weakenings reduce the effective bit-count, but its still enough work for every computer of the world to spend months on... |
2006-01-15, 18:35 | Link #16 | |
Excessively jovial fellow
Join Date: Dec 2005
Location: ISDB-T
Age: 37
|
Quote:
__________________
|
|
2006-01-16, 12:27 | Link #17 | |
Senior Member
Join Date: Jan 2004
|
Quote:
|
|
2006-01-16, 13:05 | Link #18 |
Rozen Detective
Join Date: Dec 2005
Location: Germany
Age: 40
|
I don't quite see the point. An insider with access to the "secret" files (without breaking a security system) would presumably also be able to decrypt them. (You are talking about encryption, right? Signing would do nothing.) If they are not able to decrypt them, then why do they have access to those files in the first place?
|
2006-01-16, 18:25 | Link #19 | |
Senior Member
Join Date: Jan 2004
|
Quote:
The really secure (or paranoid) groups would use a true public/private key system where only the intended recipient could decrypt the file. But this was more to protect against spoofing than actual theft, and even with this system, you need at least one person you can trust. Do any groups still use the latter today? None that I know of, I wouldn't think so, I mean people these days aren't as paranoid as they were back then. |
|
2006-01-16, 20:04 | Link #20 | |
Rozen Detective
Join Date: Dec 2005
Location: Germany
Age: 40
|
Quote:
|
|
|
|