Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Should audio last (almost) as long as video?
#1
Hi guys,

This probably isn't a big deal but I'm wondering for the purposes of a very very precise sync I'm currently doing (to the level of individual audio samples). I figure if I'm being this accurate I may as well get this bit right. Apologies for the number of calculations here, but it's probably necessary to explain precisely why I'm asking what I'm asking; if you want to cut to the chase and see what the question actually is, I've highlighted it below in bold. Explanation follows from here...

I'm taking an audio stream containing 362,602,496 audio samples at 48 kHz, adding 18,018 samples to sync, and combining it with a custom video stream containing 181393 frames of video. To compare the video length to the audio length very precisely, this number of video frames is equivalent to 181393 / (24000/1001) x 48000 = 363,148,786 audio samples. This means that the custom video outlasts the synced audio by 363,148,786 - (362,602,496 + 18,018) = 528,272‬ samples, or rather 528,272 / 48000 = 11.005666... seconds.

The audio is from a different video file which contains 181121 frames (equiv. 181121 / (24000/1001) x 48000 = 362,604,242 audio samples). As it appeared there, the audio lasts almost as long as the video, such that there is audio playing during almost every video frame although not all the way through to the end of the last frame (it's a few audio frames short). More precisely, the difference is 362,604,242 - 362,602,496 = 1746 samples, or rather 1746 / 48000 = 0.036375‬ seconds (36.375 ms). The duration of one video frame is 1 / (24000/1001) = ‭0.041708333... seconds = 41.708... ms, so the audo stream is still active until the second-to-last video frame, although in practice it's already fallen silent a few seconds before that.

Here's the question, then:

Is there any benefit to adding silence to the end of an audio bitstream in order to more closely match the duration of the video stream it's muxed to, for the purposes of Blu-ray Disc compliance?

If not, I'll just leave it so that the audio track fades to silence then ends shortly after that, then the video (hopefully) will continue playing for another 11 seconds after that. However, I'm wondering if this might cause some software or hardware players to freak out and possibly stop the video once the audio ends; I know some players will do this if the tracks are the other way around (with the video shorter than the audio) but I'm guessing that the video is always prioritised so it may not matter when it's this way round. If it is an issue though, I'll add silence to the end of the audio as follows...

363,148,786 - (362,602,496 + 18,018) = 528272 samples, but I'm encoding with DTS-HD MA which contains a DTS core with 512 samples per frame and 363148786 is not divisible by 512 (363,148,786 / 512 = 709,274.972... audio frames) so I'd round down the audio stream to 709,274 x 512 = 363,148,288‬ samples. This would make the discrepancy only 363,148,786 - 363,148,288‬ = 498 audio samples, or in real terms, 498 / 48000 x 1000 = 10.375 ms. That's almost identical to the difference between the original audio and video durations on the source I took the audio from, which may not necessarily be coincidental!

EDIT: Mostly inconsequential but just in case anybody notices and thinks I missed it... I realise that if I just leave the audio as is without adding silence to the end, it's 362,620,514 which isn't divisible by 512 (362,620,514 / 512 = 708,243.19140...). There are two simple solutions to that: either encode it as is and the DTS encoder rounds it up to the nearest whole 512-sample frame, or add samples before encoding to achieve the exact same result. Either way, it'd have 708,244 x 512 = 362,620,928‬ samples in the audio if I don't add loads more to the end so that it more closely matches the number of video frames.
Reply
Thanks given by:
#2
I didn't have the patience to go through all your figures, but the answer to the question in boldface is that it's quite common for the audio and video to be of slightly different lengths (even on commercial discs). If I'm editing the audio anyway, I trim/pad it to match the video for the sake of neatness; otherwise, I leave it. This shouldn't cause a playback problem on a hardware player. Software players exhibit a variety of behaviors (looping the ending, freeze-framing, stopping with the shortest), but none is likely to result in a problem, as such.
Reply
Thanks given by: pipefan413
#3
(2020-05-28, 10:37 PM)Chewtobacca Wrote: I didn't have the patience to go through all your figures, but the answer to the question in boldface is that it's quite common for the audio and video to be of slightly different lengths (even on commercial discs).  If I'm editing the audio anyway, I trim/pad it to match the video for the sake of neatness; otherwise, I leave it.  This shouldn't cause a playback problem on a hardware player.  Software players exhibit a variety of behaviors (looping the ending, freeze-framing, stopping with the shortest), but none is likely to result in a problem, as such.

Hahaha, yeah, I'm not surprised: too many numbers and words. The very definition of TL;DR if ever I've seen one. But I gave that detail just in case the answer would depend on the accuracy of the actual distances involved. I do realise that there is usually a bit of a difference (maybe milliseconds, maybe a second or two), and certainly ALWAYS at least a miniscule difference (in terms of audio samples, since formats other than PCM require complete frames which involves rounding up samples using zero bytes). What I'm less sure about is whether the extent of that difference matters: I'm talking about a difference of > 11 seconds, rather than just a few milliseconds.

My instinct is to make it as neat as possible, like you say: I'm syncing to the sample and encoding back to DTS-HD MA regardless so I may as well add silence to the end since it'll presumably make bugger all difference because of compression anyway. All the more reason to do that if some software players may freak out, as I suspected they might and your answers seems to confirm!

Thanks for the response, I appreciate it.
Reply
Thanks given by:
#4
The biggest delay that I've applied is (I think) ~23s, and I've had cause to apply quite a few that were > 11s. There were no issues.

Almost every time we remux a track using a delay-value, the lengths are different. It's never caused me a problem.

And I don't think software players will freak out -- just exhibit a variety of behaviors, none of which are really problematic. Unless you have something absolutely vital in the last few seconds, which the viewer absolutely must see, this is a non-issue.
Reply
Thanks given by: pipefan413
#5
(2020-05-28, 11:04 PM)Chewtobacca Wrote: The biggest delay that I've applied is (I think) ~23s, and I've had cause to apply quite a few that were > 11s.  There were no issues.

Almost every time we remux a track using a delay-value, the lengths are different.  It's never caused me a problem.

And I don't think software players will freak out -- just exhibit a variety of behaviors, none of which are really problematic.  Unless you have something absolutely vital in the last few seconds, which the viewer absolutely must see, this is a non-issue.

I reckon for the sake of my curiosity I'll just encode both: one containing 362,620,514 samples (ending about 11 seconds before the video) and another containing 363,148,288 samples (ending only 498 samples, or ~10 ms before the video). I fully expect that the files will be pretty much identical size because compressing bitstream silence with DTS-HD MA is presumably very efficient anyway.

The beginning is fine anyway because I inserted silent samples there to ensure sync was bang on, it's just the end I was unsure about. As you say, no biggie really because it's just the end of the credits, but I'd still prefer to avoid any potential issues *at all*.
Reply
Thanks given by:
#6
I don't know about Bluray compliance, but I'm pretty sure it won't cause any sync issues with most software players, the only things that matter for that are sample rate, delay and framerate afaik. At least for as long as both video and audio exist. What happens after I don't know, but I think MPC-HC will just stay quiet and show only the video.
Reply
Thanks given by:
#7
(2020-05-29, 12:51 AM)TomArrow Wrote: I don't know about Bluray compliance, but I'm pretty sure it won't cause any sync issues with most software players, the only things that matter for that are sample rate, delay and framerate afaik. At least for as long as both video and audio exist. What happens after I don't know, but I think MPC-HC will just stay quiet and show only the video.

Here's something interesting: both new encodes are quite a bit smaller than the source 7.1 encode. Nothing was removed; I just added silence to the start of both and to the end of one (to pad it to the video length).

Source .dtshd BD file: 5.58 GB (5,998,921,228 bytes)
synced, unpadded .dtshd file: 5.10 GB (5,481,447,456 bytes)
synced & padded .dtshd file: 5.10 GB (5,483,639,296 bytes)

... What the heck is in the source file that's packing it out by an extra half GB, I wonder? Both new files were encoded correctly, as DTS-HD MA, with full 1509 kbps DTS cores. All PCM WAV files were 24-bit when they went in one end and remain 24-bit at the other.

I know some software will require the .dtspbr file to mux for disc, and if it doesn't have it, it won't author. Maybe it's something to do with that. Who knows.
Reply
Thanks given by:
#8
What method did you use to decode the .dtshd track to wav? Maybe you used a bad method that only decoded the core. That would be one thing to explain it possibly.
Reply
Thanks given by:
#9
(2020-05-29, 01:26 AM)TomArrow Wrote: What method did you use to decode the .dtshd track to wav? Maybe you used a bad method that only decoded the core. That would be one thing to explain it possibly.

That seems unlikely, because I used eac3to like I always have. Specifically: eac3to sourcefile.dtshd output.wavs

I suppose I'm going to have to run a number of tests to see which bit in the chain might be discarding data. Why is nothing ever simple, eh?

It's probably easier to just do what I originally did and apply a delay in eac3to (which appears to keep the filesize large) but I'm really curious what might be happening here. It could be Audacity but I wouldn't think so since all I did with that was uniformly add silence to mono wavs then export back to 24-bit WAVs again. Bizarre.
Reply
Thanks given by:
#10
Is it an up to date version of eac3to? I think at least the older ones needed some extra library to properly decode dts hd ma. With newer ones idk, but I know ffmpeg decodes DTS-HD MA perfectly, at least in current versions.
Reply
Thanks given by:


Possibly Related Threads…
Thread Author Replies Views Last Post
  Editing DTS-HD Master Audio without transcoding Kreeep 8 272 2020-09-13, 08:15 AM
Last Post: Kreeep
  Beginners Guide To Syncing Audio alexpeden2000 27 10,337 2020-09-12, 04:46 PM
Last Post: Chewtobacca
  Divinci Resolve video output Kreeep 2 108 2020-08-27, 11:17 AM
Last Post: Kreeep
  [Help] Cut out a part of a video file and add it to another video file allldu 7 249 2020-07-30, 12:17 AM
Last Post: Chewtobacca
  Davinci Resolve audio Dr. Smyslov 3 167 2020-07-24, 01:49 PM
Last Post: TomArrow
  [Help] Changing PAL audio speed alleycat 2 168 2020-07-09, 04:23 PM
Last Post: alleycat
  [Help] 2 VHS Audio Captures to remove faults. CSchmidlapp 13 3,613 2020-07-02, 09:31 PM
Last Post: zoidberg
  Audio Restoration - Odd Looking File alleycat 2 895 2019-12-04, 09:54 AM
Last Post: alleycat
  Getting Two Audio Files to Sync Lio 3 625 2019-11-12, 05:33 AM
Last Post: TomArrow
  VHS Audio Question alleycat 5 1,091 2019-09-07, 10:27 AM
Last Post: Evit

Forum Jump:


Users browsing this thread: 1 Guest(s)