AnimeSuki Forums

Register Forum Rules FAQ Community Today's Posts Search

Go Back   AnimeSuki Forum > Anime Related Topics > General Anime > Fansub Groups

Notices

Reply
 
Thread Tools
Old 2006-01-13, 18:57   Link #1
Access
Senior Member
 
Join Date: Jan 2004
Crc-32

I do not understand why so many groups hold on to this mathematically flawed method for error detection. I bring this up in groups I am in all the time (and a few have actually listened long enough to make the proper changes).

CRC-32 remainders are often appended to the end of an entire file transfered over the internet for error checking. This is the supposed to give the downloader a way to check that the file was downloaded without error, by running a CRC-32 check over the complete file and comparing it to the remainder. The catch is that CRC-32 is also used for TCP/IP error detection at the packet level. Therefore any TCP/IP error that is able to get through this CRC-32 (collision / false negative) will remain undetected when a CRC-32 check is performed on the entire file, leading to a corrupted file that is not detected (this is a mathematical property of CRC). In essence nearly any corrupted file caused by a packet-level false negative CRC-32 collision would also pass CRC-32, leading the downloader to falsely believe the file is proper. Finally anyone understanding the basics of polynomial math can modify the file such that the CRC-32 code is not changed, also leading to a corrupted file that is not detected. Therefore these CRC-32 codes only give the unknowledgeable downloader a false sense of security, and anyone who understands the math behind these things understands how useless they are for final-stage download verification made over TCP/IP (the internet) to begin with.

The simple solution here is to use a more robust hash function like md5, that is still reasonably secure for now, or even SHA-1, or to go to something like PGP for real authentication that is near unbreakable. I think all groups should at least consider this since CRC-32 is of no real value and only gives people a false sense of security.
Access is offline   Reply With Quote
Old 2006-01-13, 19:12   Link #2
bayoab
Senior Member
 
Join Date: Nov 2003
Quote:
Originally Posted by Access
T The catch is that CRC-32 is also used for TCP/IP error detection at the packet level. Therefore any TCP/IP error that is able to get through this CRC-32 (collision / false negative) will remain undetected when a CRC-32 check is performed on the entire file, leading to a corrupted file that is not detected (this is a mathematical property of CRC).

The simple solution here is to use a more robust hash function like md5, that is still reasonably secure for now, or even SHA-1, or to go to something like PGP for real authentication that is near unbreakable. I think all groups should at least consider this since CRC-32 is of no real value and only gives people a false sense of security.
Collisions have already been found for MD5. Also, from what I remember of my networking, the CRC is not just of the data, but all of the packet. Therefore you need to create two collisions (not impossible, but very hard). and you need it to happen for the ENTIRE file (or enough of the correct parts of the file to give a collision). Just because one section has been tampered with does not guarantee the file has the same, correct CRC.

I also forgot, these are two different things performing the CRC. You are assuming that they will make the same screwup.

Last edited by bayoab; 2006-01-13 at 19:36.
bayoab is offline   Reply With Quote
Old 2006-01-13, 19:36   Link #3
Kronos
Junior Member
 
Join Date: Jul 2004
Location: Italy
Quote:
Originally Posted by Access
The catch is that CRC-32 is also used for TCP/IP error detection at the packet level.
Actually both TCP and IP compute one's complement of one’s complement sum of each octect of header + data. This algorithm was supposed to be temporary and should have been replaced by something more robust (like CRC) in the early days of the Internet; this never happened though.
It's true that it's easy to fool CRC due to its linear structure, but this means that someone is intentionally corrupting the data. CRC is easy to compute and allows detection of corruptions caused by noise (i.e. not intetional) like bad RAM, xdcc doing stupid things, bad resume, hd going fubar, etc.
Oh, and if you're using bittorrent then each piece of the file is protected by a SHA-1 hash.

Anyway today it's very rare to see corruption above the link layer...
Kronos is offline   Reply With Quote
Old 2006-01-13, 20:01   Link #4
Access
Senior Member
 
Join Date: Jan 2004
Quote:
Originally Posted by bayoab
Collisions have already been found for MD5. Also, from what I remember of my networking, the CRC is not just of the data, but all of the packet. Therefore you need to create two collisions (not impossible, but very hard). and you need it to happen for the ENTIRE file (or enough of the correct parts of the file to give
For any given CRC polynomial there are certain key data inversions that will cause a collision (ie. not be detected as an error). CRC-32 processes the whole file sequentially. If one part of the file has one of these undetected data inversions, it doesn't matter, you could run the CRC over just that part, over the whole file, over that file and 10 other good files all concatenated, you just won't detect it.

md5 collisions: Which is why something like PGP would be even more highly recommended.

I am not whining, I am offering numerous solutions to a real problem. To me the only thing that I am amazed by is how much some people stick to the old ways, for no purpose and without reason whatsoever. Doing this right doesn't require much more effort, so why not do it right.
Access is offline   Reply With Quote
Old 2006-01-13, 21:02   Link #5
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
People use CRC32 over MD5 and any of the SHA hashes because it fits in a filename. Nobody wants to bundle an additional checking file, or even a text containing hash values (though I have seen this done before).

When it comes down to it, look: it's anime. It's not even supposed to be a permanantly archived thing. CRC32 is a weaker data integrity check than many other algorithms, but it easily sticks onto the end of a filename and it does its job in making it easier to tell if playback errors are caused by a corrupt file or something else. We aren't dealing with files where data integrity is that important.

If you want to suggest another algorithm, also look into the "economics" of file distribution and please suggest a viable means of including the hash value with the file. It could be hosted on a website, but what a bother.
__________________
Ledgem is offline   Reply With Quote
Old 2006-01-13, 21:03   Link #6
jpwong
Senior Member
 
 
Join Date: Mar 2004
Quote:
Originally Posted by Access
I am not whining, I am offering numerous solutions to a real problem. To me the only thing that I am amazed by is how much some people stick to the old ways, for no purpose and without reason whatsoever. Doing this right doesn't require much more effort, so why not do it right.
So, basically, you're saying you've been downloading tremendous amounts of anime that is corrupted yet computes the CRC fine. I have yet to download or BT a file where the file is corrupt with a correct CRC.

It may occasionally cause problems, but not enough for most people to do anything about. Using MD5 or something else is good and all, but can you imagine filenames for bt and stuff if they put those hashs into the filename? I mean, you'd probably be doing some screen stretching with something like
[A-E]_Yakitate_Japan_51_[ab39e3e01ec79d68558e267eba10ea3d].avi

And lets face it, people will stick with methods their viewers are used to. You don't see every group jumping over to mkv even though it's not that much more difficult to encode.
__________________
jpwong is offline   Reply With Quote
Old 2006-01-13, 21:21   Link #7
07ChanF
Mass Dictionary Lookup...
 
 
Join Date: Dec 2005
Location: A Japanese Dictionary
Also for TCP at the router layer there is a CRC check during each hop (main reason for latency). For every router there is a check of the packet. Not to mention codec compression and some containers which have built in rebuilds. For any packect to be modified we would be talking about someone hacking a packet while in transport which is crazy in the sense that there are literally millions of packets passing over a single link between routers, servers and no one could catch and modify a packet in transport due to the way TCP is designed to discard and ask for a resend in the windowing of data transfer and also on how you would have to catch that specific packet. Also one kilobyte of data is only either governing the colour of one pixel or header of a frame; if either were to be corrupted nothing bad would occur, even for the frame header as many players have inbuilt skipping or rebuilding functions, else we would have higher incidence of files not playing. Also the time and effort spent using PGP or MD5 is rather more than what the average user would care about unless he/seh were getting a large bundle of files in one connection; which means usually BitTorrent who has a SHA-1 hash.
07ChanF is offline   Reply With Quote
Old 2006-01-13, 21:23   Link #8
TheFluff
Excessively jovial fellow
 
 
Join Date: Dec 2005
Location: ISDB-T
Age: 37
It's a good thing we have long filenames, since MD5 checksums are 32 characters long, and SHA1 ones are 40...

Of course, one could just distribute a .md5 with the ep. But if it's on a XDCC bot, people probably will ignore it, and if it's on BT it's unnecessary.

BTW, didn't you see that article where they modified a postscript file to something completely different while keeping the MD5 sum? MD5 really isn't a much better alternative (even though it's definitely more cryptographically secure than CRC32). A similiar program, but for CRC32, was submitted to the International Obfuscated C Coding Contest 2004. It modifies the file to match the desired checksum... http://www.de.ioccc.org/2004/omoikane.hint
__________________
| ffmpegsource
17:43:13 <~deculture> Also, TheFluff, you are so fucking slowpoke.jpg that people think we dropped the DVD's.
17:43:16 <~deculture> nice job, fag!

01:04:41 < Plorkyeran> it was annoying to typeset so it should be annoying to read
TheFluff is offline   Reply With Quote
Old 2006-01-13, 21:37   Link #9
Access
Senior Member
 
Join Date: Jan 2004
Quote:
Originally Posted by jpwong
So, basically, you're saying you've been downloading tremendous amounts of anime that is corrupted yet computes the CRC fine. I have yet to download or BT a file where the file is corrupt with a correct CRC.
Never said that; I'd say it's safe to say few if any people have managed to get a corrupt download over BT to begin with. Most of the corrupt things people get are over DCC, in my experience at least.

If md5 or SHA-1 appended to the filename is too much a hassle, put it into the comment field of the AVI or something like that. If you use PGP you don't need to append anything to the file, you just need to post your groups or the encoder's public key on one of the free key servers. Then the free applications out there make it completely transparent, you get a simple and reliable pass or fail without even having to enter or compare any data.

Really can't say much more on this topic, I'm just trying to point out flaws and simple corrections, take it for what it is. Groups are already using PGP internally to protect their scripts, raws, other sensitive information from being stolen, so why not start using it for external verification also.
Access is offline   Reply With Quote
Old 2006-01-13, 21:43   Link #10
TheFluff
Excessively jovial fellow
 
 
Join Date: Dec 2005
Location: ISDB-T
Age: 37
Quote:
Originally Posted by Access
If md5 or SHA-1 appended to the filename is too much a hassle, put it into the comment field of the AVI or something like that.
How clever, then when the AVI file gets corrupted the checksum itself may get corrupted. A great way to create confusion.
__________________
| ffmpegsource
17:43:13 <~deculture> Also, TheFluff, you are so fucking slowpoke.jpg that people think we dropped the DVD's.
17:43:16 <~deculture> nice job, fag!

01:04:41 < Plorkyeran> it was annoying to typeset so it should be annoying to read
TheFluff is offline   Reply With Quote
Old 2006-01-13, 21:48   Link #11
Access
Senior Member
 
Join Date: Jan 2004
Quote:
Originally Posted by TheFluff
alternative (even though it's definitely more cryptographically secure than CRC32). A similiar program, but for CRC32, was submitted to the International Obfuscated C Coding Contest 2004. It modifies the file to match the desired checksum... http://www.de.ioccc.org/2004/omoikane.hint
Yeah I know about that one, I remember when some groups thought it would be 'cute' to make all the CRC32 codes for each one of their releases be the same and a moniker of the group's name. And a certain distro for april fools' put up a bunch of fake releases modified with the proper filesize and CRC-32 codes. The algorhythm has been known for quite some time, it's nothing new. Collisions for md5 were found much more recently, at least.
Access is offline   Reply With Quote
Old 2006-01-13, 22:14   Link #12
Jekyll
Rozen Detective
 
 
Join Date: Dec 2005
Location: Germany
Age: 40
Quote:
Originally Posted by Access
Really can't say much more on this topic, I'm just trying to point out flaws and simple corrections, take it for what it is. Groups are already using PGP internally to protect their scripts, raws, other sensitive information from being stolen, so why not start using it for external verification also.
It's probably not so much a flaw, as a usability trade-off (PGP is not simple for most of the users). As it is quite unlikely that anybody deliberately would try to corrupt a download (while forging a correct check sum), a checksum that "guarantees" data integrity is not needed. What is needed is error detection and CRC32 works sufficiently well for that, as it is able to detect every one or two bit error, all uneven numbers of errors, all burst errors of a length less or equal to the order of the CRC polynomial and all independent errors which can be represented as a polynomial of an order less than the CRC polynomial.
Jekyll is offline   Reply With Quote
Old 2006-01-14, 18:02   Link #13
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
Quote:
Originally Posted by Access
Groups are already using PGP internally to protect their scripts, raws, other sensitive information from being stolen, so why not start using it for external verification also.
Which groups? Obviously I have not worked with every fansub group in existence, but with the share that I have, I've never seen this. Nor, for that matter, would I foresee them ever using it in the future. Constant FTP login and password changes were the extent of what groups did when they became paranoid of break-ins.

Once again, this is fansubbing. We're not dealing with highly confidential documents, or things that need to be stored for a very long time. We go with what does the purpose and is most convenient. MD5 may be a better data integrity check than CRC32, but for our functions CRC32 has worked fine and continues to work fine. We don't face algorithm attacks, and we're not interested in archiving our works for eternity. If a fansub group is willing to figure out a means to attach these longer hashes to their files or make them easily available, good on them. However, I would be against making it more of a hassle to check for corruption - even if it were a little more secure - just because it'd be a hassle. Once again, these are fansubs. I'll use SHA-512 and PAR2 files for full integrity security when I'm dealing with critical files, but for fansubs CRC32 is convenient and easy to use. There's no need to change it.
__________________
Ledgem is offline   Reply With Quote
Old 2006-01-15, 05:50   Link #14
IMSabbel
Senior Member
 
Join Date: Nov 2003
Good thing this is a torrent site, where every download come with a nice .torrent file containing a full SHA-1 hashtree....

Btw: MD5 _has_ been broken, but only for situations of creating 2 new files that only differ in one bit at once. There is no way short of brute forcing to create a new file based on a given hash. Ok, it not _really_ brute force, because some weakenings reduce the effective bit-count, but its still enough work for every computer of the world to spend months on...
IMSabbel is offline   Reply With Quote
Old 2006-01-15, 15:39   Link #15
subcool
Arienai Co-Founder
 
 
Join Date: Feb 2004
Location: Holland
Age: 40
Send a message via ICQ to subcool Send a message via AIM to subcool
yes, now lets start putting the SHA1 hash in the filenames too...

a hash collision on CRC doesn't matter for fansubs at all... only when the 2 releases are from the same group and the same episode and the same name :P
subcool is offline   Reply With Quote
Old 2006-01-15, 18:35   Link #16
TheFluff
Excessively jovial fellow
 
 
Join Date: Dec 2005
Location: ISDB-T
Age: 37
Quote:
Originally Posted by IMSabbel
Btw: MD5 _has_ been broken, but only for situations of creating 2 new files that only differ in one bit at once. There is no way short of brute forcing to create a new file based on a given hash. Ok, it not _really_ brute force, because some weakenings reduce the effective bit-count, but its still enough work for every computer of the world to spend months on...
Not quite correct. http://www.cits.rub.de/MD5Collisions/
__________________
| ffmpegsource
17:43:13 <~deculture> Also, TheFluff, you are so fucking slowpoke.jpg that people think we dropped the DVD's.
17:43:16 <~deculture> nice job, fag!

01:04:41 < Plorkyeran> it was annoying to typeset so it should be annoying to read
TheFluff is offline   Reply With Quote
Old 2006-01-16, 12:27   Link #17
Access
Senior Member
 
Join Date: Jan 2004
Quote:
Originally Posted by Ledgem
Which groups? Obviously I have not worked with every fansub group in existence, but with the share that I have, I've never seen this. Nor, for that matter, would I foresee them ever using it in the future. Constant FTP login and password changes were the extent of what groups did when they became paranoid of break-ins.
It was common in some of the first-generation or original groups. It is not as common today. It's not so much the fear of external break-ins as much as the fear of a 'spy' or 'plant' inside the group who either copies the scripts before they leave or feeds them to others in secret.
Access is offline   Reply With Quote
Old 2006-01-16, 13:05   Link #18
Jekyll
Rozen Detective
 
 
Join Date: Dec 2005
Location: Germany
Age: 40
I don't quite see the point. An insider with access to the "secret" files (without breaking a security system) would presumably also be able to decrypt them. (You are talking about encryption, right? Signing would do nothing.) If they are not able to decrypt them, then why do they have access to those files in the first place?
Jekyll is offline   Reply With Quote
Old 2006-01-16, 18:25   Link #19
Access
Senior Member
 
Join Date: Jan 2004
Quote:
Originally Posted by Jekyll
I don't quite see the point. An insider with access to the "secret" files (without breaking a security system) would presumably also be able to decrypt them. (You are talking about encryption, right? Signing would do nothing.) If they are not able to decrypt them, then why do they have access to those files in the first place?
Not everyone has access to every key. Even in the most common system, people are only given the keys they need. Unique keys are necessary to decrypt each file. But this rather common method is flawed in that if a key is leaked in secret, the file is compromised and no one may know about it.

The really secure (or paranoid) groups would use a true public/private key system where only the intended recipient could decrypt the file. But this was more to protect against spoofing than actual theft, and even with this system, you need at least one person you can trust.

Do any groups still use the latter today? None that I know of, I wouldn't think so, I mean people these days aren't as paranoid as they were back then.
Access is offline   Reply With Quote
Old 2006-01-16, 20:04   Link #20
Jekyll
Rozen Detective
 
 
Join Date: Dec 2005
Location: Germany
Age: 40
Quote:
Originally Posted by Access
Not everyone has access to every key. Even in the most common system, people are only given the keys they need. Unique keys are necessary to decrypt each file. But this rather common method is flawed in that if a key is leaked in secret, the file is compromised and no one may know about it.
My point was: Wouldn't some server side access control have done the job just as well? Why give somebody access to encrypted files when you can just give them no access at all?
Jekyll is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 00:36.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
We use Silk.