View Single Post
Old 2006-08-01, 10:39   Link #1
Excessively jovial fellow
Join Date: Dec 2005
Location: ISDB-T
Age: 31
VFR for Fansub Encoders - how, why, WTF?

I think it's time that I contributed some to the community by sharing some know-how about handling variable framerate (VFR) raws.

I will assume that you are familiar with at least basic Avisynth usage, and that your brain is in a fully functional state (don't read this half-asleep), and since you're dealing with anime I will assume that the world is NTSC (PAL VFR is possible but very uncommon). I will also assume that you're using MKV as the container. While VFR in MP4 is possible, actually creating such files is kind of a pain with the current tools (I know the theory but I've never done it myself, either).
I won't go into specifics of why VFR stuff exists, or how you go about converting it to CFR. There are excellent guides for that already. What I will cover is:
  • What scenarios you are likely to run into as an anime encoder and how to encode proper VFR material from them, and
  • VFR-related problems for the rest of the sub crew and how to handle them.
In other words, a practical guide. So, without further ado, we start at the beginning:

The basics
VFR is kinda like Zen in some ways. Or a koan, if you will. It makes no sense at all and seems like utter nonsense until you suddenly one day after much meditation become enlightened and realize everything you previously assumed about the relationship between frames and times was wrong. Then everything starts to make sense and you subsequently reach Nirvana. (Disclaimer: this guide won't be dealing with that last step.)

The first thing you will have to unlearn is everything you ever knew about framerates. Forget about them, they don't exist anymore, they are a lie. Instead you need to start thinking of a video file as a series of frames ordered along a timeline, each with its own timestamp (that is, a time when it is supposed to be displayed for the viewer). In a CFR (constant framerate) file, these timestamps just happen to be evenly spaced (for example, in a file with 25fps they're 40ms apart, because 1000/25 = 40). With a VFR file, however, you still have all these frames, but the timestamps are no longer evenly spaced. For example, in some places, frames may be 41.7ms from each other, and in other places the space between them may be only 33.4ms. In some files you will find that certain frames may be displayed for as long as half a second or more.

The next step is realizing that the audio and the subtitles each live in their own universes completely separated from the video. They run at their own speed along the same timeline as the video. For example there might be a gunshot in the audio that starts at 00:03:42.250, or a subtitle line that will show up at 00:05:31.900.

Taken together, these two facts leads us to a few interesting conclusions, most importantly that if you take all the frames of a file that is VFR and play them back at a constant framerate (disregarding their timestamps) the frames will no longer sync with the events in the audio and subtitles, since they're not at the point they're supposed to be on the timeline. It also follows that if you want to convert a VFR file to CFR, you will have to remove or duplicate frames to get the correct frame to display at the correct time.

This also leads us to a very very short explanation of why we have to deal with VFR in the first place here. Why not assume that everything runs at the same speed and be done with it, saving us a lot of headache? Well, unfortunately that's not how the world works. I won't go into details, but for certain reasons a lot of anime is created with some sections having motion in 23.976 frames each second, while other sections have motion in 29.97 frames each second. You can store it all in one framerate of course, but if you do you will either have to duplicate or remove frames, and that will create jerkiness in the motion. Which is undesirable since it looks bad.

If the above explanation made no sense whatsoever to you, go sit under a tree for a while and meditate on it. Enlightenment will reach you sooner or later.

Enough theory, onwards to the practice!

Timecodes files
These are the most important part of VFR in MKV. Despite this, I will be brief, so pay attention.

A timecodes file specifies at which timestamp a given frame should show up, and hence determines the framerate at any given time. There are two common formats, v1 and v2. Examples:
#timecodes format v1
Assume 23.976000
#timecodes format v2
Note that while the # sign starts a comment line, having the first line that defines the format is required by many tools. Don't remove it.

v1 timecodes works by setting an assumed framerate (the "Assume" line at the top) and then defining ranges of frames as having other framerates. The format is
startframe,endframe,frames per second
v1 timecodes are nice, because they're a lot more readable than v2 timecodes, and because they're human-editable.

v2 timecodes on the other hand work by defining a timestamp (in milliseconds) for each frame in the video. The timestamp determines the frame start time, and hence the first line after the v2 format definition must always be 0 or weird things can happen. The example above shows a framerate of 25 (because 1000/25 = 40 milliseconds per frame). v2 timecodes are kind of a pain, because they require that the output has the exact same amount of frames as the input did. This can occasionally be very annoying. However, there are tools to convert v2 timecodes to v1 ones. See the tools section at the bottom.

VFR raws - the good, the bad, and the ugly
There are three kinds of VFR raws that you are likely to run into:
  • VFR MKV or MP4: the good - Paradoxally enough, these are probably the least common VFR raws. They're fairly easy to handle - you just need ffmpegsource (or mkv2vfr) and possibly a v2-to-v1 timecodes converter.
  • 120fps AVI: the bad - By FAR the most common VFR variant. Very easy to handle, you only need the avi2tc package.
  • WMV in the .wmv container: the ugly - Somewhat more common than VFR MKV. Kinda tricky to handle. You need ffmpegsource, or alternatively you can use GDSMux (included with Haali's Media Splitter and the CCCP) and mkv2vfr or mkvtoolnix.

The simplest way to handle these is to use ffmpegsource() (see the tools section). It's an Avisynth plugin that works much like the well-known Avisource() except it can also spit out a timecodes file. It gives you all frames, but since Avisynth always assumes everything is CFR and doesn't understand VFR at all, it sets a bogus framerate. Use it to encode workraw and everything else, and remember to set the timecodes parameter at least once so you get a v2 timecodes file to mux in later.

If it's XviD or DivX or something similar in MKV (streamtype V_MS/VFW/FOURCC), you can also use mkv2vfr (again, see the tools section). Fire up a commandline prompt, navigate to the directory containing the MKV raw, and type in this:
mkv2vfr "some vfr raw.mkv" "output.avi" "timecodes.txt"
This will give you an AVI file containing all the frames that is given a bogus CFR framerate, and a v1 timecodes file. Note that mkv2vfr writes a bogus "Assume" line (it's always set to 23.976000) and defines everything as sections.

Do what you usually do with the audio (extracting it with mkvextract and reencoding it for example), encode a workraw, either from the Avisynth script (ffmpegsource) or from the AVI you just created (mkv2vfr) and give it to the rest of the crew to chew on. When they're done, encode the final version from that same AVI, fire up mkvmerge GUI, drop the video and audio in it, click the video track and apply the timecodes file, then mux.

120fps AVI
There's really no challenge whatsoever here. Use tritical's avi2tc package to get a decimated VFRaC raw containing all the frames, and a timecodes file (usage of the avi2tc package should be obvious). Encode said decimated raw, mux with timecodes. Simple. As far as I can tell, this should work with H.264 in AVI as well.

WMV in .wmv
This may be slightly tricky. Either you use ffmpegsource() to encode to a lossless AVI (since WMV reading isn't 100% guaranteed to be frame accurate) and encode workraw/other stuff from that, or you use GDSMux to transmux to a MKV, which you then handle with mkv2vfr (see above).
Brief GDSMux primer:
1) rightclick the input area and hit "add source", find your .wmv and hit OK
2) rightclick the audio stream, choose "encode" and pick "PCM"
3) click the output button and select where to save the mkv
4) hit start
5) wait.
This will give you a VFR MKV with uncompressed PCM (WAV) audio.

VFR and hardsubs
Hardsubbing VFR stuff isn't trivial. If you try to apply subs to your assumed CFR (VFRaC) AVI raws, they'll be off by miles (since the VFRaC raw is assumed to be a constant framerate). You need to use a VFR-aware program to transform all timestamps in the subs file to fit the frame timestamps in the raw. At the moment, SSAtool and Aegisub can do this. I've not used SSAtool, but in Aegisub it's done by loading the timecodes file and the video, and then using the file -> export dialog box, with VFR transform checked.

Softsubs doesn't have this problem since they're applied to the video when it's already proper VFR and the frames have the correct timestamps.

VFR for non-encoders
There are basically two ways to get the rest of the sub crew (excluding AFX typesetters, we'll get to that later) to work with VFR stuff.

One is to, er, not get them to work with VFR stuff. I.e., you convert the VFR source to CFR for them (by duplicating and/or removing frames) and let them pretend that the show is in fact CFR. This means a bit more work for you as the encoder, since you need to fix all the TS'ing manually, and make sure that the dialogue is scenetimed (if the ordinary timer scenetimed, you can use Aegisub's timing postprocessor to make sure that the scenetiming fits with the VFR - use keyframe snapping only with a limit of 1 frame).

The other way is to get everyone to use Aegisub and timecodes files. Just make sure that they use 1.10 prerelease or later.

VFR and Adobe AfterEffects
This is where it may get complicated. AFX, just like Avisynth is not VFR-aware and cannot be made so. If it's only typesetting, this is not a problem since you can give your AFX'er a VFRaC raw and tell him to pretend it's CFR, which will work since he does everything on a frame-to-frame basis and ignores timestamps. However, if you want to have AFX karaoke, things become evil. You need to make sure that the entire section of the video where the karaoke shows has a constant framerate all the way through, and you need to make a special CFR workraw for only that section, or the karaoke won't synch with the audio.

avi2tc package - For handling 120fps AVI's and converting between v1 and v2 timecodes
Aegisub - for VFR transformation and general subbing work
Haali Media Splitter, GDSMux and mkv2vfr

Mini-dictionary of confusing acronyms and other technobabble
  • VFR - Variable FrameRate. What this guide is dealing with.
  • CFR - Constant FrameRate
  • VFRaC - Variable FrameRate assumed Constant. A clip that is VFR, but whose frames for some reason has been stored in a container that lacks VFR support and hence the framerate is assumed to be constant.
  • Decimation - the process of removing certain frames from a clip. Usually (but not always) reduces the framerate.
  • FPS - Frames Per Second. The framerate is usually measured in this unit.
  • Zen - a school of Mahāyāna Buddhism notable for its emphasis on practice and experiential wisdom - particularly as realized in the form of meditation known as zazen - in the attainment of awakening.
  • Koan - see

Final notes
The guide is pretty terse and not very detailed at the moment. Feel free to harass me with questions.

Acknowledgements and thanks
- #darkhold and the people within, especially Haali and pengvado for putting up with the stupid questions of us lesser mortals
- ArchMageZeratuL and jfs for Aegisub and other tools
- GizmoTech-Mobile, Myrsloik, Mentar, Nicholi and the rest of the pioneers
| ffmpegsource
17:43:13 <~deculture> Also, TheFluff, you are so fucking slowpoke.jpg that people think we dropped the DVD's.
17:43:16 <~deculture> nice job, fag!

01:04:41 < Plorkyeran> it was annoying to typeset so it should be annoying to read

Last edited by TheFluff; 2007-11-21 at 19:16. Reason: updated since it was extremely outdated.
TheFluff is offline   Reply With Quote