2006-08-01, 10:39 | Link #1 |
Excessively jovial fellow
Join Date: Dec 2005
Location: ISDB-T
Age: 38
|
VFR for Fansub Encoders - how, why, WTF?
I think it's time that I contributed some to the community by sharing some know-how about handling variable framerate (VFR) raws.
I will assume that you are familiar with at least basic Avisynth usage, and that your brain is in a fully functional state (don't read this half-asleep), and since you're dealing with anime I will assume that the world is NTSC (PAL VFR is possible but very uncommon). I will also assume that you're using MKV as the container. While VFR in MP4 is possible, actually creating such files is kind of a pain with the current tools (I know the theory but I've never done it myself, either). I won't go into specifics of why VFR stuff exists, or how you go about converting it to CFR. There are excellent guides for that already. What I will cover is:
The basics VFR is kinda like Zen in some ways. Or a koan, if you will. It makes no sense at all and seems like utter nonsense until you suddenly one day after much meditation become enlightened and realize everything you previously assumed about the relationship between frames and times was wrong. Then everything starts to make sense and you subsequently reach Nirvana. (Disclaimer: this guide won't be dealing with that last step.) The first thing you will have to unlearn is everything you ever knew about framerates. Forget about them, they don't exist anymore, they are a lie. Instead you need to start thinking of a video file as a series of frames ordered along a timeline, each with its own timestamp (that is, a time when it is supposed to be displayed for the viewer). In a CFR (constant framerate) file, these timestamps just happen to be evenly spaced (for example, in a file with 25fps they're 40ms apart, because 1000/25 = 40). With a VFR file, however, you still have all these frames, but the timestamps are no longer evenly spaced. For example, in some places, frames may be 41.7ms from each other, and in other places the space between them may be only 33.4ms. In some files you will find that certain frames may be displayed for as long as half a second or more. The next step is realizing that the audio and the subtitles each live in their own universes completely separated from the video. They run at their own speed along the same timeline as the video. For example there might be a gunshot in the audio that starts at 00:03:42.250, or a subtitle line that will show up at 00:05:31.900. Taken together, these two facts leads us to a few interesting conclusions, most importantly that if you take all the frames of a file that is VFR and play them back at a constant framerate (disregarding their timestamps) the frames will no longer sync with the events in the audio and subtitles, since they're not at the point they're supposed to be on the timeline. It also follows that if you want to convert a VFR file to CFR, you will have to remove or duplicate frames to get the correct frame to display at the correct time. This also leads us to a very very short explanation of why we have to deal with VFR in the first place here. Why not assume that everything runs at the same speed and be done with it, saving us a lot of headache? Well, unfortunately that's not how the world works. I won't go into details, but for certain reasons a lot of anime is created with some sections having motion in 23.976 frames each second, while other sections have motion in 29.97 frames each second. You can store it all in one framerate of course, but if you do you will either have to duplicate or remove frames, and that will create jerkiness in the motion. Which is undesirable since it looks bad. If the above explanation made no sense whatsoever to you, go sit under a tree for a while and meditate on it. Enlightenment will reach you sooner or later. Enough theory, onwards to the practice! Timecodes files These are the most important part of VFR in MKV. Despite this, I will be brief, so pay attention. A timecodes file specifies at which timestamp a given frame should show up, and hence determines the framerate at any given time. There are two common formats, v1 and v2. Examples: Code:
#timecodes format v1 Assume 23.976000 0,2000,29.970000 3000,4000,59.940000 Code:
#timecodes format v2 0.000000 40.000000 80.000000 120.000000 160.000000 (...) v1 timecodes works by setting an assumed framerate (the "Assume xx.xxx" line at the top) and then defining ranges of frames as having other framerates. The format is startframe,endframe,frames per second v1 timecodes are nice, because they're a lot more readable than v2 timecodes, and because they're human-editable. v2 timecodes on the other hand work by defining a timestamp (in milliseconds) for each frame in the video. The timestamp determines the frame start time, and hence the first line after the v2 format definition must always be 0 or weird things can happen. The example above shows a framerate of 25 (because 1000/25 = 40 milliseconds per frame). v2 timecodes are kind of a pain, because they require that the output has the exact same amount of frames as the input did. This can occasionally be very annoying. However, there are tools to convert v2 timecodes to v1 ones. See the tools section at the bottom. VFR raws - the good, the bad, and the ugly There are three kinds of VFR raws that you are likely to run into:
VFR MKV or MP4 The simplest way to handle these is to use ffmpegsource() (see the tools section). It's an Avisynth plugin that works much like the well-known Avisource() except it can also spit out a timecodes file. It gives you all frames, but since Avisynth always assumes everything is CFR and doesn't understand VFR at all, it sets a bogus framerate. Use it to encode workraw and everything else, and remember to set the timecodes parameter at least once so you get a v2 timecodes file to mux in later. If it's XviD or DivX or something similar in MKV (streamtype V_MS/VFW/FOURCC), you can also use mkv2vfr (again, see the tools section). Fire up a commandline prompt, navigate to the directory containing the MKV raw, and type in this: Code:
mkv2vfr "some vfr raw.mkv" "output.avi" "timecodes.txt" Do what you usually do with the audio (extracting it with mkvextract and reencoding it for example), encode a workraw, either from the Avisynth script (ffmpegsource) or from the AVI you just created (mkv2vfr) and give it to the rest of the crew to chew on. When they're done, encode the final version from that same AVI, fire up mkvmerge GUI, drop the video and audio in it, click the video track and apply the timecodes file, then mux. 120fps AVI There's really no challenge whatsoever here. Use tritical's avi2tc package to get a decimated VFRaC raw containing all the frames, and a timecodes file (usage of the avi2tc package should be obvious). Encode said decimated raw, mux with timecodes. Simple. As far as I can tell, this should work with H.264 in AVI as well. WMV in .wmv This may be slightly tricky. Either you use ffmpegsource() to encode to a lossless AVI (since WMV reading isn't 100% guaranteed to be frame accurate) and encode workraw/other stuff from that, or you use GDSMux to transmux to a MKV, which you then handle with mkv2vfr (see above). Brief GDSMux primer: 1) rightclick the input area and hit "add source", find your .wmv and hit OK 2) rightclick the audio stream, choose "encode" and pick "PCM" 3) click the output button and select where to save the mkv 4) hit start 5) wait. This will give you a VFR MKV with uncompressed PCM (WAV) audio. VFR and hardsubs Hardsubbing VFR stuff isn't trivial. If you try to apply subs to your assumed CFR (VFRaC) AVI raws, they'll be off by miles (since the VFRaC raw is assumed to be a constant framerate). You need to use a VFR-aware program to transform all timestamps in the subs file to fit the frame timestamps in the raw. At the moment, SSAtool and Aegisub can do this. I've not used SSAtool, but in Aegisub it's done by loading the timecodes file and the video, and then using the file -> export dialog box, with VFR transform checked. Softsubs doesn't have this problem since they're applied to the video when it's already proper VFR and the frames have the correct timestamps. VFR for non-encoders There are basically two ways to get the rest of the sub crew (excluding AFX typesetters, we'll get to that later) to work with VFR stuff. One is to, er, not get them to work with VFR stuff. I.e., you convert the VFR source to CFR for them (by duplicating and/or removing frames) and let them pretend that the show is in fact CFR. This means a bit more work for you as the encoder, since you need to fix all the TS'ing manually, and make sure that the dialogue is scenetimed (if the ordinary timer scenetimed, you can use Aegisub's timing postprocessor to make sure that the scenetiming fits with the VFR - use keyframe snapping only with a limit of 1 frame). The other way is to get everyone to use Aegisub and timecodes files. Just make sure that they use 1.10 prerelease or later. VFR and Adobe AfterEffects This is where it may get complicated. AFX, just like Avisynth is not VFR-aware and cannot be made so. If it's only typesetting, this is not a problem since you can give your AFX'er a VFRaC raw and tell him to pretend it's CFR, which will work since he does everything on a frame-to-frame basis and ignores timestamps. However, if you want to have AFX karaoke, things become evil. You need to make sure that the entire section of the video where the karaoke shows has a constant framerate all the way through, and you need to make a special CFR workraw for only that section, or the karaoke won't synch with the audio. Tools avi2tc package - For handling 120fps AVI's and converting between v1 and v2 timecodes Aegisub - for VFR transformation and general subbing work Haali Media Splitter, GDSMux and mkv2vfr Mini-dictionary of confusing acronyms and other technobabble
Final notes The guide is pretty terse and not very detailed at the moment. Feel free to harass me with questions. Acknowledgements and thanks - #darkhold and the people within, especially Haali and pengvado for putting up with the stupid questions of us lesser mortals - ArchMageZeratuL and jfs for Aegisub and other tools - GizmoTech-Mobile, Myrsloik, Mentar, Nicholi and the rest of the pioneers
__________________
Last edited by TheFluff; 2007-11-21 at 19:16. Reason: updated since it was extremely outdated. |
2006-08-01, 12:24 | Link #3 |
Excessively jovial fellow
Join Date: Dec 2005
Location: ISDB-T
Age: 38
|
Well, there's a rather large explanation of VFR at the Avisynth.org wiki, but it's mostly targeted at people who want to encode to CFR, recommends 120fps AVI as the "most compatible hybrid option" (which, IMHO, is pretty stupid), and doesn't mention such fansub-specific problems as AFX stuff or hardsubbing.
__________________
Last edited by TheFluff; 2006-08-01 at 15:41. |
2006-08-01, 15:39 | Link #4 |
My E-Penis > Your E-Penis
Fansubber
Join Date: Feb 2006
Location: Brussels, Belgium
Age: 39
|
mendoi had some problems with that on himawari. if i recall correctly, the only way of having a decent CFR (avi) encode was to use 120fps. every time you tried to bring the frame count down, the video would stutter like crap. mkv is truly the only and best way for fansubbers to handle VFR raws.
|
2006-08-01, 17:11 | Link #7 |
Senior Member
Fansubber
Join Date: Dec 2005
|
Two questions. As for cleaning/processing a raw, work is to be done before or after getting a decimated raw? Also, is there a scenario where one would prefer v2 timecodes over v1? Thanks nice guide. I (luckily) rarely work with VFR material so this is a rather nice dive-in.
|
2006-08-01, 17:21 | Link #8 |
Excessively jovial fellow
Join Date: Dec 2005
Location: ISDB-T
Age: 38
|
@xat: filtering should be done after getting the decimated raw, since the decimation generally requires the original container information.
As for v2 vs. v1... not really. v2 timecodes might be slightly more accurate, but it's not like anyone would care about a few milliseconds...
__________________
Last edited by TheFluff; 2006-08-01 at 17:38. |
2006-08-01, 18:50 | Link #9 |
King of Hosers
Join Date: Dec 2005
Age: 41
|
Misconception, timecodes are not stored as either v1 or v2. These are simply two (of three) possible ways of writing down timecodes to be used as input by mkvmerge. Timecodes are stored in the same manner whether v1 or v2 are used. Thus truly the output is only as accurate as the input, which can be achieved with either of the two.
Also there is a very important mkvmerge parameter which will determine the precision of the timecodes stored. Code:
--timecode-scale 1000000 |
2006-08-01, 19:26 | Link #10 | |
Translator, Producer
Join Date: Nov 2003
Location: Tokyo, Japan
Age: 44
|
Quote:
__________________
|
|
2006-08-01, 19:26 | Link #11 |
Excessively jovial fellow
Join Date: Dec 2005
Location: ISDB-T
Age: 38
|
@Nicholi: I'm very aware of that - I was speaking of what timestamps would be stored in the MKV, depending on the input timecodes file.
However, I made a rather embarassing braino in my last post. v1 timecodes can be just as accurate as v2 ones, provided that you have enough decimal places.
__________________
|
2006-08-01, 21:04 | Link #12 | |
翻訳家わなびぃ
Fansubber
|
Quote:
At this time, I'm only using this for 30fps ending clip for Strawberry Panic. The rest of the episode may contain some hybrid stuff, but I'm ignoring those for the time being. It's not significant enough for this type of the show, and I don't feel competent to do the full blown vfr encode/muxing. (Or maybe I'm just too lazy.) |
|
2006-08-02, 01:52 | Link #13 | |||
Two bit encoder
Fansubber
Join Date: Jan 2006
Location: Chesterfield, UK
Age: 40
|
Oh lawd, is that sum longpost(tm)?
Quote:
1) Using MKV timecodes and tc2mp4 to generate NHML data and modify the STTS atom. 2) Manually creating 29.97/23.976 sections and concatenating them. 3) With ASP, set a frame drop ratio, and dropped frames are not coded, and instead NVOPs are placed, which get nuked by MP4box at muxing and modifies the STTS atom. None of which are really hard, but the second method is a pain in the ass, and "not fully VFR". This was my second attempt at VFR MP4 about 2 weeks ago using tc2mp4. It turned out well. My first shot at it was with the 30MB Kamichu about a week previous to the .hack test, in which I found a problem with tc2mp4, fortunately I reported back and it was corrected. Support for the other timecode format was added also. http://rapidshare.de/files/25954454/..._AAC_.mp4.html Ignore the whacky resolution, I was just proving a point to someone. Quote:
Quote:
As for VFR, I'd recommend it to anyone if you have a relatively clean CG source. Why code identical frames when you can not code them at all? Even though they use a matter of bytes, it adds up over thousands of frames. VFR with cel, I would imagine can look quite ugly if you go around decimating stuff with slight differences (such as telecine judder or that "swaying" camera effect), which would make it appear erratic. Is it worth me writing a bit about tc2mp4, or is this solely an MKV whoring?
__________________
|
|||
2006-08-02, 02:40 | Link #14 |
Banned
Join Date: Nov 2003
Location: Hamburg
Age: 54
|
Actually, AFX is no real problem at all - the key is just to prepare a Typeset-raw as a constant framerate avi (even though the content may actually have different framerates), which the AFX typesetter can then use to do his work.
AFX typesets framewise, meaning "from frame x to frame y". It really doesn't care about vfr at all, it only needs to adjust to the cfr avi, and that's it. No, the tricky parts about typesetting in vfr are SSA/ASS-signs which are timecode-dependent and not frame-dependent. Here you can either use tools which support vfr (like Aegisub) or hardsub the signs the AFX way (on the lossless which was used to create the typeset-raw). To sum it up, if you want to typeset the soft way in vfr, you may be in for some tricky parts, because then you should definitely bleedcheck. Hardsubbing the signs is generally easier, as long as you take the time to create a typeset-raw (which I strongly recommend). |
2006-08-02, 04:22 | Link #15 | |
Excessively jovial fellow
Join Date: Dec 2005
Location: ISDB-T
Age: 38
|
@Zero1: not as long as some of yours :P
Typo fixed. Quote:
The problem is AFX karaoke, since it has to sync with the audio...
__________________
|
|
2006-08-02, 06:07 | Link #16 | |
Banned
Join Date: Nov 2003
Location: Hamburg
Age: 54
|
Quote:
|
|
2006-08-02, 06:30 | Link #17 |
Excessively jovial fellow
Join Date: Dec 2005
Location: ISDB-T
Age: 38
|
The problem enters when the music section of the video has several different zones with different fps'es. I haven't been able to figure out a way to deal with this (yet), except the ugly solution of forcing the entire music section to be CFR. Maybe it can be done by one or more AFX overlays per line...
__________________
|
2006-08-02, 07:02 | Link #18 |
Banned
Join Date: Nov 2003
Location: Hamburg
Age: 54
|
That's a little bit ugly when it happens - but it can still be done.
In this case you usually have parts with 29.970fps and parts with 23.976fps. I recommend to render the ENTIRE AFX Kara with 29.970 then, and _afterwards_ you use decimate() on the 23.976-ranges. USUALLY the then-decimated frames are NOT too noticeable in the Karaoke, unless you have very slow "gliding" effects. Still, normally the result will be just peachy. |
2006-08-03, 05:11 | Link #20 |
Member
Join Date: May 2006
Location: Spain
|
Many thanks for the useful information! I really want to do it correctly. With the clues here, I can start to work.
The vfr things is "not" giving me troubles because I simply encoding it as UGLY cbr. I do know that's a problem and hoping to solve the issues. It doesn't seems easy though. It's lucky that so far I only encode in mkv, so I won't receive any complain if I switch to vbr, hopefully. |
|
|