AnimeSuki Forum - VFR for Fansub Encoders

AnimeSuki Forum (http://forums.animesuki.com/index.php)

- Fansub Groups (http://forums.animesuki.com/forumdisplay.php?f=17)

- - VFR for Fansub Encoders - how, why, WTF? (http://forums.animesuki.com/showthread.php?t=34738)

VFR for Fansub Encoders - how, why, WTF?

I think it's time that I contributed some to the community by sharing some know-how about handling variable framerate (VFR) raws.

I will assume that you are familiar with at least basic Avisynth usage, and that your brain is in a fully functional state (don't read this half-asleep), and since you're dealing with anime I will assume that the world is NTSC (PAL VFR is possible but very uncommon). I will also assume that you're using MKV as the container. While VFR in MP4 is possible, actually creating such files is kind of a pain with the current tools (I know the theory but I've never done it myself, either).
I won't go into specifics of why VFR stuff exists, or how you go about converting it to CFR. There are excellent guides for that already. What I will cover is:

What scenarios you are likely to run into as an anime encoder and how to encode proper VFR material from them, and
VFR-related problems for the rest of the sub crew and how to handle them.

In other words, a practical guide. So, without further ado, we start at the beginning:

The basics
VFR is kinda like Zen in some ways. Or a koan, if you will. It makes no sense at all and seems like utter nonsense until you suddenly one day after much meditation become enlightened and realize everything you previously assumed about the relationship between frames and times was wrong. Then everything starts to make sense and you subsequently reach Nirvana. (Disclaimer: this guide won't be dealing with that last step.)

The first thing you will have to unlearn is everything you ever knew about framerates. Forget about them, they don't exist anymore, they are a lie. Instead you need to start thinking of a video file as a series of frames ordered along a timeline, each with its own timestamp (that is, a time when it is supposed to be displayed for the viewer). In a CFR (constant framerate) file, these timestamps just happen to be evenly spaced (for example, in a file with 25fps they're 40ms apart, because 1000/25 = 40). With a VFR file, however, you still have all these frames, but the timestamps are no longer evenly spaced. For example, in some places, frames may be 41.7ms from each other, and in other places the space between them may be only 33.4ms. In some files you will find that certain frames may be displayed for as long as half a second or more.

The next step is realizing that the audio and the subtitles each live in their own universes completely separated from the video. They run at their own speed along the same timeline as the video. For example there might be a gunshot in the audio that starts at 00:03:42.250, or a subtitle line that will show up at 00:05:31.900.

Taken together, these two facts leads us to a few interesting conclusions, most importantly that if you take all the frames of a file that is VFR and play them back at a constant framerate (disregarding their timestamps) the frames will no longer sync with the events in the audio and subtitles, since they're not at the point they're supposed to be on the timeline. It also follows that if you want to convert a VFR file to CFR, you will have to remove or duplicate frames to get the correct frame to display at the correct time.

This also leads us to a very very short explanation of why we have to deal with VFR in the first place here. Why not assume that everything runs at the same speed and be done with it, saving us a lot of headache? Well, unfortunately that's not how the world works. I won't go into details, but for certain reasons a lot of anime is created with some sections having motion in 23.976 frames each second, while other sections have motion in 29.97 frames each second. You can store it all in one framerate of course, but if you do you will either have to duplicate or remove frames, and that will create jerkiness in the motion. Which is undesirable since it looks bad.

If the above explanation made no sense whatsoever to you, go sit under a tree for a while and meditate on it. Enlightenment will reach you sooner or later.

Enough theory, onwards to the practice!

Timecodes files
These are the most important part of VFR in MKV. Despite this, I will be brief, so pay attention.

A timecodes file specifies at which timestamp a given frame should show up, and hence determines the framerate at any given time. There are two common formats, v1 and v2. Examples:

Code:

#timecodes format v1

Assume 23.976000

0,2000,29.970000

3000,4000,59.940000

Code:

#timecodes format v2

0.000000

40.000000

80.000000

120.000000

160.000000

(...)

Note that while the # sign starts a comment line, having the first line that defines the format is required by many tools. Don't remove it.

v1 timecodes works by setting an assumed framerate (the "Assume xx.xxx" line at the top) and then defining ranges of frames as having other framerates. The format is
startframe,endframe,frames per second
v1 timecodes are nice, because they're a lot more readable than v2 timecodes, and because they're human-editable.

v2 timecodes on the other hand work by defining a timestamp (in milliseconds) for each frame in the video. The timestamp determines the frame start time, and hence the first line after the v2 format definition must always be 0 or weird things can happen. The example above shows a framerate of 25 (because 1000/25 = 40 milliseconds per frame). v2 timecodes are kind of a pain, because they require that the output has the exact same amount of frames as the input did. This can occasionally be very annoying. However, there are tools to convert v2 timecodes to v1 ones. See the tools section at the bottom.

VFR raws - the good, the bad, and the ugly
There are three kinds of VFR raws that you are likely to run into:

VFR MKV or MP4: the good - Paradoxally enough, these are probably the least common VFR raws. They're fairly easy to handle - you just need ffmpegsource (or mkv2vfr) and possibly a v2-to-v1 timecodes converter.
120fps AVI: the bad - By FAR the most common VFR variant. Very easy to handle, you only need the avi2tc package.
WMV in the .wmv container: the ugly - Somewhat more common than VFR MKV. Kinda tricky to handle. You need ffmpegsource, or alternatively you can use GDSMux (included with Haali's Media Splitter and the CCCP) and mkv2vfr or mkvtoolnix.

VFR MKV or MP4
The simplest way to handle these is to use ffmpegsource() (see the tools section). It's an Avisynth plugin that works much like the well-known Avisource() except it can also spit out a timecodes file. It gives you all frames, but since Avisynth always assumes everything is CFR and doesn't understand VFR at all, it sets a bogus framerate. Use it to encode workraw and everything else, and remember to set the timecodes parameter at least once so you get a v2 timecodes file to mux in later.

If it's XviD or DivX or something similar in MKV (streamtype V_MS/VFW/FOURCC), you can also use mkv2vfr (again, see the tools section). Fire up a commandline prompt, navigate to the directory containing the MKV raw, and type in this:

Code:

mkv2vfr "some vfr raw.mkv" "output.avi" "timecodes.txt"

This will give you an AVI file containing all the frames that is given a bogus CFR framerate, and a v1 timecodes file. Note that mkv2vfr writes a bogus "Assume xx.xxx" line (it's always set to 23.976000) and defines everything as sections.

Do what you usually do with the audio (extracting it with mkvextract and reencoding it for example), encode a workraw, either from the Avisynth script (ffmpegsource) or from the AVI you just created (mkv2vfr) and give it to the rest of the crew to chew on. When they're done, encode the final version from that same AVI, fire up mkvmerge GUI, drop the video and audio in it, click the video track and apply the timecodes file, then mux.

120fps AVI
There's really no challenge whatsoever here. Use tritical's avi2tc package to get a decimated VFRaC raw containing all the frames, and a timecodes file (usage of the avi2tc package should be obvious). Encode said decimated raw, mux with timecodes. Simple. As far as I can tell, this should work with H.264 in AVI as well.

WMV in .wmv
This may be slightly tricky. Either you use ffmpegsource() to encode to a lossless AVI (since WMV reading isn't 100% guaranteed to be frame accurate) and encode workraw/other stuff from that, or you use GDSMux to transmux to a MKV, which you then handle with mkv2vfr (see above).
Brief GDSMux primer:
1) rightclick the input area and hit "add source", find your .wmv and hit OK
2) rightclick the audio stream, choose "encode" and pick "PCM"
3) click the output button and select where to save the mkv
4) hit start
5) wait.
This will give you a VFR MKV with uncompressed PCM (WAV) audio.

VFR and hardsubs
Hardsubbing VFR stuff isn't trivial. If you try to apply subs to your assumed CFR (VFRaC) AVI raws, they'll be off by miles (since the VFRaC raw is assumed to be a constant framerate). You need to use a VFR-aware program to transform all timestamps in the subs file to fit the frame timestamps in the raw. At the moment, SSAtool and Aegisub can do this. I've not used SSAtool, but in Aegisub it's done by loading the timecodes file and the video, and then using the file -> export dialog box, with VFR transform checked.

Softsubs doesn't have this problem since they're applied to the video when it's already proper VFR and the frames have the correct timestamps.

VFR for non-encoders
There are basically two ways to get the rest of the sub crew (excluding AFX typesetters, we'll get to that later) to work with VFR stuff.

One is to, er, not get them to work with VFR stuff. I.e., you convert the VFR source to CFR for them (by duplicating and/or removing frames) and let them pretend that the show is in fact CFR. This means a bit more work for you as the encoder, since you need to fix all the TS'ing manually, and make sure that the dialogue is scenetimed (if the ordinary timer scenetimed, you can use Aegisub's timing postprocessor to make sure that the scenetiming fits with the VFR - use keyframe snapping only with a limit of 1 frame).

The other way is to get everyone to use Aegisub and timecodes files. Just make sure that they use 1.10 prerelease or later.

VFR and Adobe AfterEffects
This is where it may get complicated. AFX, just like Avisynth is not VFR-aware and cannot be made so. If it's only typesetting, this is not a problem since you can give your AFX'er a VFRaC raw and tell him to pretend it's CFR, which will work since he does everything on a frame-to-frame basis and ignores timestamps. However, if you want to have AFX karaoke, things become evil. You need to make sure that the entire section of the video where the karaoke shows has a constant framerate all the way through, and you need to make a special CFR workraw for only that section, or the karaoke won't synch with the audio.

Tools
avi2tc package - For handling 120fps AVI's and converting between v1 and v2 timecodes
Aegisub - for VFR transformation and general subbing work
Haali Media Splitter, GDSMux and mkv2vfr

Mini-dictionary of confusing acronyms and other technobabble

VFR - Variable FrameRate. What this guide is dealing with.
CFR - Constant FrameRate
VFRaC - Variable FrameRate assumed Constant. A clip that is VFR, but whose frames for some reason has been stored in a container that lacks VFR support and hence the framerate is assumed to be constant.
Decimation - the process of removing certain frames from a clip. Usually (but not always) reduces the framerate.
FPS - Frames Per Second. The framerate is usually measured in this unit.
Zen - a school of Mahāyāna Buddhism notable for its emphasis on practice and experiential wisdom - particularly as realized in the form of meditation known as zazen - in the attainment of awakening.
Koan - see http://en.wikipedia.org/wiki/Koan

Final notes
The guide is pretty terse and not very detailed at the moment. Feel free to harass me with questions.

Acknowledgements and thanks
- #darkhold and the people within, especially Haali and pengvado for putting up with the stupid questions of us lesser mortals
- ArchMageZeratuL and jfs for Aegisub and other tools
- GizmoTech-Mobile, Myrsloik, Mentar, Nicholi and the rest of the pioneers

Making a complete guide for VFR at all would be rough :), you would need like a full on lecture. I think this does the job plenty well. The tools are the major issue, read their documentation and everything should be peachy.

Well, there's a rather large explanation of VFR at the Avisynth.org wiki, but it's mostly targeted at people who want to encode to CFR, recommends 120fps AVI as the "most compatible hybrid option" (which, IMHO, is pretty stupid), and doesn't mention such fansub-specific problems as AFX stuff or hardsubbing.

mendoi had some problems with that on himawari. if i recall correctly, the only way of having a decent CFR (avi) encode was to use 120fps. every time you tried to bring the frame count down, the video would stutter like crap. mkv is truly the only and best way for fansubbers to handle VFR raws.

I think we'll be exploring in to ways to handle vfr hardsubs. Creating vfr mp4 got a lot easier recently, thanks to tc2mp4.

@Sylf: Well, hardsubbing does work, the problem is AFX, really...
Noted the presence of mentioned mp4 tool, will investigate and add to guide soon'ish.

Two questions. As for cleaning/processing a raw, work is to be done before or after getting a decimated raw? Also, is there a scenario where one would prefer v2 timecodes over v1? Thanks ;) nice guide. I (luckily) rarely work with VFR material so this is a rather nice dive-in. :)

@xat: filtering should be done after getting the decimated raw, since the decimation generally requires the original container information.

As for v2 vs. v1... not really. v2 timecodes might be slightly more accurate, but it's not like anyone would care about a few milliseconds...

Misconception, timecodes are not stored as either v1 or v2. These are simply two (of three) possible ways of writing down timecodes to be used as input by mkvmerge. Timecodes are stored in the same manner whether v1 or v2 are used. Thus truly the output is only as accurate as the input, which can be achieved with either of the two.

Also there is a very important mkvmerge parameter which will determine the precision of the timecodes stored.

Code:

--timecode-scale 1000000

The above being the default, makes timecodes accurate to within 1ms.

Quote:

Originally Posted by Sylf

I think we'll be exploring in to ways to handle vfr hardsubs. Creating vfr mp4 got a lot easier recently, thanks to tc2mp4.

I was not aware of this. I guess I've been traveling too damn much this summer. Expect further releases of Sugar Sugar Rune to be true vfr mp4 if I can get it to work right :).

@Nicholi: I'm very aware of that - I was speaking of what timestamps would be stored in the MKV, depending on the input timecodes file.
However, I made a rather embarassing braino in my last post. v1 timecodes can be just as accurate as v2 ones, provided that you have enough decimal places.

Quote:

Originally Posted by Quarkboy

I was not aware of this. I guess I've been traveling too damn much this summer. Expect further releases of Sugar Sugar Rune to be true vfr mp4 if I can get it to work right :).

I was only pointed out to this device last week or so. Then experimented some on it today.

At this time, I'm only using this for 30fps ending clip for Strawberry Panic. The rest of the episode may contain some hybrid stuff, but I'm ignoring those for the time being. It's not significant enough for this type of the show, and I don't feel competent to do the full blown vfr encode/muxing. (Or maybe I'm just too lazy.)

Oh lawd, is that sum longpost(tm)?

Quote:

Originally Posted by TheFluff

While VFR in MP4 is possible, actually creating such files is kind of a pain with the current tools (I know the theory but I've never done it myself, either).

VFR MP4 can be done any of 3 ways.
1) Using MKV timecodes and tc2mp4 to generate NHML data and modify the STTS atom.
2) Manually creating 29.97/23.976 sections and concatenating them.
3) With ASP, set a frame drop ratio, and dropped frames are not coded, and instead NVOPs are placed, which get nuked by MP4box at muxing and modifies the STTS atom.

None of which are really hard, but the second method is a pain in the ass, and "not fully VFR".

This was my second attempt at VFR MP4 about 2 weeks ago using tc2mp4. It turned out well. My first shot at it was with the 30MB Kamichu about a week previous to the .hack test, in which I found a problem with tc2mp4, fortunately I reported back and it was corrected. Support for the other timecode format was added also.

http://rapidshare.de/files/25954454/..._AAC_.mp4.html

Ignore the whacky resolution, I was just proving a point to someone.

Quote:

Originally Posted by TheFluff

However, there are tools to convert v2 timecodes to v2 ones. See the tools section at the bottom.

Typo?

Quote:

Originally Posted by Sylf

I think we'll be exploring in to ways to handle vfr hardsubs. Creating vfr mp4 got a lot easier recently, thanks to tc2mp4.

It's simple, sub yer video (and make a lossless), run it through AVIsynth using dedup, encode decimated video, mux to MP4, use tc2mp4 to modify NHML data, then mux audio etc.

As for VFR, I'd recommend it to anyone if you have a relatively clean CG source. Why code identical frames when you can not code them at all? Even though they use a matter of bytes, it adds up over thousands of frames. VFR with cel, I would imagine can look quite ugly if you go around decimating stuff with slight differences (such as telecine judder or that "swaying" camera effect), which would make it appear erratic.

Is it worth me writing a bit about tc2mp4, or is this solely an MKV whoring? :p

Actually, AFX is no real problem at all - the key is just to prepare a Typeset-raw as a constant framerate avi (even though the content may actually have different framerates), which the AFX typesetter can then use to do his work.

AFX typesets framewise, meaning "from frame x to frame y". It really doesn't care about vfr at all, it only needs to adjust to the cfr avi, and that's it.

No, the tricky parts about typesetting in vfr are SSA/ASS-signs which are timecode-dependent and not frame-dependent. Here you can either use tools which support vfr (like Aegisub) or hardsub the signs the AFX way (on the lossless which was used to create the typeset-raw).

To sum it up, if you want to typeset the soft way in vfr, you may be in for some tricky parts, because then you should definitely bleedcheck. Hardsubbing the signs is generally easier, as long as you take the time to create a typeset-raw (which I strongly recommend).

@Zero1: not as long as some of yours :P
Typo fixed.

Quote:

Originally Posted by Mentar

Wasn't that what I said? :)
The problem is AFX karaoke, since it has to sync with the audio...

Quote:

Originally Posted by TheFluff

The problem is AFX karaoke, since it has to sync with the audio...

Actually, when it comes to karaoke, I consider VFR releases much EASIER to handle. Why? Because you can easily play around with the timecode file a bit until the audio synchs with the hardsubbed karaoke. Kara is a little bit late? Just alter the timecode file a bit so that some speedup is placed before the kara - whoop de doo, synchs perfectly fine, and you don't even have to reencode it (the problem with avis), a simple remux suffices ;)

The problem enters when the music section of the video has several different zones with different fps'es. I haven't been able to figure out a way to deal with this (yet), except the ugly solution of forcing the entire music section to be CFR. Maybe it can be done by one or more AFX overlays per line...

That's a little bit ugly when it happens - but it can still be done.

In this case you usually have parts with 29.970fps and parts with 23.976fps. I recommend to render the ENTIRE AFX Kara with 29.970 then, and _afterwards_ you use decimate() on the 23.976-ranges.

USUALLY the then-decimated frames are NOT too noticeable in the Karaoke, unless you have very slow "gliding" effects. Still, normally the result will be just peachy.

Hmm, since AFX is scriptable, sure you should be able to somehow handle Matroska timecode files with it? Somehow... to do something similar to the VFR transform in Aegisub.

Many thanks for the useful information! I really want to do it correctly. With the clues here, I can start to work.

The vfr things is "not" giving me troubles because I simply encoding it as UGLY cbr. I do know that's a problem and hoping to solve the issues. It doesn't seems easy though. It's lucky that so far I only encode in mkv, so I won't receive any complain if I switch to vbr, hopefully.