71

I have a 5.1 audio track from a film where front left and front right contains music, and center contains dialogue. Playing the 5.1 track in VLC blends everything together nicely.

I'm trying to convert the 5.1 track to stereo using ffmpeg -ac 2, however the resulting stereo mix has a much weaker volume than playing the 5.1 track natively.

Adding -af "pan=stereo|c0=FL|c1=FR" gives the correct volume, but then there is no dialogue because the center channel is not included.

So the solution is maybe to mix left/center/right into stereo, and throw out the back end subwoofer channels? (I'm guessing here...)

So the question is: How do I make ffmpeg downmix 5.1 to stereo the same way VLC does it, with the same strong volume in the end result?

forthrin
  • 1,907
  • 6
  • 23
  • 31
  • Are you sure VLC is actually playing the additional channels? Downmixing can result in normalization so that the sum of each input per output channel does not result in overload so clipping is prevented. This can make it sound quieter. – llogan Dec 14 '14 at 18:15
  • 1
    The basics: My file is 5.1. My speakers are stereo. I don't know what VLC does, but it creates a great end result in my stereo speakers from the 5.1 source data (strong volume, both music and dialogue included). ffmpeg, on the other hand, creates a "low volume" result when using `-ac 2`. So I'm asking how to make ffmpeg generate the same good result as VLC does. – forthrin Dec 16 '14 at 10:20
  • "the resulting stereo mix has a much weaker volume than playing the 5.1 track natively". This is because of how audio mixing works. You can't expect six sources of audio to sound as loud as two sources because there is simply not enough dynamic headroom. Just turn up the volume. – Arete Nov 16 '22 at 09:05
  • Here's the command line that worked for me (I cannot post it as an answer): ffmpeg -i "original_video.mp4" -ac 2 -strict -2 -map 0 -c:s copy "converted_video.mp4" – Pedro Araujo Jorge Jun 27 '23 at 09:39

13 Answers13

94

The answers on this question have since become a bit of a mess, with many containing redundant information and others complete inaccuracies. This answer is an attempt to streamline the information in these answers while doing away with the problems in them.

Most importantly, it's worth bearing in mind that Gregory's answer, currently the top-voted answer to this question, is no different than using the -ac 2 switch - more on this below.


Downmixing a 5.1 channel audio stream to stereo with -ac 2

FFmpeg comes with built-in capabilities for downmixing a 5.1 track to stereo, and this is also the solution that FFmpeg's own documentation recommends:

Note: ffmpeg integrates a default down-mix (and up-mix) system that should be preferred (the -ac option) over the pan filter unless you have very specific needs.

The -ac 2 switch works by mixing proportions of the first 5 channels from the source's 6-channel stream - Back Left, Back Right, Front Left, Front Right and Front Center - into the Front Left and Front Right channels of the output stereo stream:

enter image description here

When doing so, audio from the LFE channel (the .1 in 5.1, reserved for the subwoofer and used for deep, low-frequency effects) is discarded completely when using this option.

Unfortunately, in my tests -ac 2 resulted in overall levels of both music and dialogue that were the most different to the source, making it the downmix formula that gives the worst output out of all the formulae I tested, although you may test it and find that it gives you a perfectly adequate downmix for your needs, in which case using any other formula would be overkill for you.


To downmix a DTS track with -ac 2 without transcoding it (i.e. to keep its codec and extension the same):

ffmpeg -i "sourcetrack.dts" -c:a dca -ac 2 "stereotrack.dts"

As pointed out by Mephisto in his answer, if the dialogue and the music sound well-balanced among each other to you but simply lack volume, you can downmix the stream while also increasing its volume:

ffmpeg -i "sourcetrack.dts" -c:a dca -ac 2 -vol 425 "stereotrack.dts"

For the -vol switch, 100% volume in the source is equivalent to the integer value 256, and using a larger value than this will increase the overall volume of the audio stream. However, note that doing so too much may result in distortion or artifacts, especially during its louder sections.

To downmix an audio stream to stereo and transcode it to the AC3 codec, for example:

ffmpeg -i "sourcetrack.dts" -c:a ac3 -ac 2 "stereotrack.ac3"

Downmixing a 5.1 channel audio stream to stereo with a custom mix algorithm

If you want a more high quality downmix, or you absolutely must include the LFE stream into your output, you can use FFmpeg's audio filter switch (-af) to downmix the audio using a custom mix formula.

Downmixing with the ATSC formula (Gregory's answer)

As of the time of posting this answer, the top-voted answer to this question was Gregory's, which puts the formula from the ATSC specification (see section 7.8.2, Downmixing into Two Channels) into an FFmpeg audio filter. This specification is itself directly linked to by the FFmpeg documentation on the topic, indicating it's highly likely to be the same formula that FFmpeg already implements for its -ac 2 switch. If this is true, then typing out the entire formula in Gregory's answer would be no different than using the -ac 2 switch, and therefore a waste of time.

I decided to test this for certain by re-encoding the same input audio using both -ac 2 and the -af filter from Gregory's answer (the exact commands used can be seen in the footnotes to this answer).

I then compared the sizes of the resulting output files and found they were, byte-for-byte, the same size:

enter image description here

Finally, I opened both of the two output files in Audacity, and compared their waveforms to confirm they were identical (click to enlarge):

enter image description here

It therefore seems pretty conclusive that the ATSC formula detailed in Gregory's answer is the same one already implemented by FFmpeg, and that using it is entirely redundant when it does nothing that -ac 2 doesn't, and is a much more cumbersome command.

Downmixing without discarding the LFE channel (Dave_750's answer)

Of the several included in the answers, this is the only one of the downmix formulae that appears to mix the LFE channel into the output stereo instead of discarding it entirely, and as a result, the one that ensures the least sound from the source is lost.

The overall volume level is higher and fuller than doing -ac 2, but also still lower than the below Nightmode Dialogue downmix. However, music levels are much closer to source than the Nightmode Dialogue downmix, and due to inclusion of the LFE track, increasing the volume of the output while using this downmix formula can create an output stream that sounds truer to the 5.1 source than all other formulae I tested.

If you have the ability, I would highly recommend encoding your audio stream(s) using both this downmix formula and the Nightmode Dialogue downmix, and carefully comparing the waveforms of the two to determine which one is better.

To downmix a 5.1 track to stereo using this formula and increase its volume level to 425 (where 256 is 100% of the original source's volume level):

ffmpeg -i "sourcetrack.dts" -c dca -vol 425 -af "pan=stereo|c0=0.5*c2+0.707*c0+0.707*c4+0.5*c3|c1=0.5*c2+0.707*c1+0.707*c5+0.5*c3" "outputstereo.dts"

Downmixing with Robert Collier's Nightmode Dialogue (Shane Harrelson's answer)

The Nightmode Dialogue formula, created by Robert Collier on the Doom9 forum and sourced by Shane Harrelson in his answer, results in a far better downmix than the -ac 2 switch - instead of overly quiet dialogues, it brings them back to levels that are much closer to the source.

From Robert Collier's description of the mix:

After converting many DTS movie tracks from 5.1 to 2.0 using eac3to, I have found the default eac3to channel mappings to result in very quiet dialogues and overly loud music and action scenes. Although the eac3to channel downmix coefficients have a scientific basis, they often do not sound good in practice bceause of low dialogue volume. This preset is for those looking for clear dialogues with left and right channel music still being audible but more in the background.

As you can see - front center (dialogues) come in properly now and stay at the original level - while the music and explosions remain a background effect and don't overpower you. This preset solves the problem of you having to constantly fiddle with the volume knob when watching DTS 5.1 converted to 2.0 movies in order to hear dialogues. (Especially for watching movies in the night where you don't want to wake others but still want to be able to hear dialogues).

Unfortunately, the music of this downmix formula is much lower than in the 5.1 source (which was likely by design considering Collier's intention to create a "nightmode" mix) and due to complete loss of the LFE track, the overall output audio doesn't sound as full or close to source as Dave_750's formula with boosted volume.

However, if for some reason you want to avoid boosting the overall volume of the stream, then the Nightmode Dialogue would likely be your best option - though again, I would highly recommend encoding your audio stream to both and comparing the waveforms of the two carefully.

To downmix with the Nightmode Dialogue formula in FFmpeg:

ffmpeg -i "sourcetrack.dts" -c dca -af "pan=stereo|c0=c2+0.30*c0+0.30*c4|c1=c2+0.30*c1+0.30*c5" "stereotrack.dts" 

Tarc's answer

This answer simply puts the Nightmode Dialogue downmix formula from Shane Harrelson's answer into a command to convert the audio stream in an MKV container. While the command given in this answer would work fine on such an audio stream, adapting it for a standalone audio track would give the error:

Filtering and streamcopy cannot be used together

This is because the audio codec cannot be copied when downmixing - like all other changes FFmpeg makes to an output stream, a downmix requires that the track be re-encoded for the changes to be applied.

This command also included a redundant -ac 2 switch which FFmpeg would have ignored.


Test commands

To demonstrate the reliability of the tests I conducted for this answer, below are all of the commands I used to test each downmix formula.

The test command used for the -ac 2 option:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -ac 2 "Audio 1 (-ac 2).wav"

The test command used for Gregory's answer:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -af "pan=stereo|c0 < 1.0*c0 + 0.707*c2 + 0.707*c4|c1 < 1.0*c1 + 0.707*c2 + 0.707*c5" "Audio 2 (ATSC Algorithm Downmix).wav"

The test command used for Dave_750's answer:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -vol 425 -af "pan=stereo|c0=0.5*c2+0.707*c0+0.707*c4+0.5*c3|c1=0.5*c2+0.707*c1+0.707*c5+0.5*c3" "Audio 4 (Dave750 Downmix).wav"

The test command used for Shane Harrelson's answer:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -af "pan=stereo|c0=c2+0.30*c0+0.30*c4|c1=c2+0.30*c1+0.30*c5" "Audio 3 (Nightmode Dialogue Downmix).wav"
Hashim Aziz
  • 11,898
  • 35
  • 98
  • 166
  • 2
    Impressive insight! Thanks for taking the time to share this. Strange then, that `-ac 2` gave me an inferior result to begin with, which prompted the original posting. I will try this again and if possible, share a 5.1 excerpt which doesn't give a satisfactory result with the built-in down-mix. Also very nice to know you can down-mix without transcoding! – forthrin Mar 02 '19 at 09:11
  • 1
    @forthrin Bear in mind that encoding and transcoding are two different things. Transcoding converts from one codec/extension to another, and encoding converts to the same codec/extension. You can downmix and apply other FFmpeg effects to a stream without transcoding, but not without encoding. The `ac -2` option gave me the most inferior result of all the downmix formulae too, I think this is just a failing of the ATSC standard's formula. – Hashim Aziz Mar 02 '19 at 20:19
  • I tried this now. It seems that `ffmpeg -i 5.1.mp4 -ac 2 2.mp4` works, but `ffplay -i 5.1.mp4 -ac 2` doesn't. – forthrin Mar 11 '19 at 14:41
  • 3
    FYI, `.wav` is totally uncompressed so *all* these downmixes will have the exact same size down to the byte, regardless. You could have complete silence and it would still be the same size if it was the same length (and sampling rate, bit depth, etc. were also identical) – NullUser Feb 11 '20 at 04:00
  • 2
    This is one of the most thorough answers I've seen on this site. +1 – forresthopkinsa Mar 03 '20 at 22:12
  • 1
    Not sure if it's source or version related, but ffmpeg detects all the 5.1 audio I've used as "5.1(side)". This means the surround channels are `SL` and `SR`, not `BL` and `BR`. If you use `BL`/`BR` it won't produce any error, but will mix in empty back channels instead of the side channels with actual audio in them. – TrentP Nov 08 '20 at 08:35
  • 1
    it turns out `-vol` is deprecated and the volume audio filter must be used instead. Not sure how that works, perhaps someone can update the answer accordingly for the Dave_750 method? – gills Sep 14 '21 at 06:48
  • actually, it looks like it's possible to utilize normalization with `filter:a`, but the process is lost on me. That'd be an even better solution if someone can figure it out. – gills Sep 14 '21 at 06:58
  • 2
    @etudes et al, volume is added directly to the audio filter as -af "volume=#.#" where #.# is the multiplier where 1.0 would be no change. The audio filter for Dave_750's answer example becomes: -af "volume=1.66,pan=stereo|FL=0.5*FC+0.707*FL+0.707*BL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.5*LFE" – Kurt Fitzner Dec 06 '21 at 16:16
  • 1
    Can confirm I also need `SL`/`SR` and `BL`/`BR` do not work. – Tom Ellis Jan 15 '22 at 21:15
  • 1
    @Tom Ellis @TrentP Thanks to both of you. I've now updated all the commands in the answer to use channel numbers as detailed in [Minty's answer](https://superuser.com/a/1663861/323079), so they should all now mix the appropriate channels for both `5.1` and `5.1(side)` layouts. – Hashim Aziz Jan 16 '22 at 00:02
  • 3
    I don't think the proposed commands for the formulae using `pan` work. ffmpeg outputs an error such as "Can not mix named and numbered channels", which I found to mean `c4` notation can't be used together with `FL`, `LFE` and so on. – Fabio Freitas Feb 15 '22 at 20:06
  • @FabioFreitas Yes, you're right. Thanks for letting me know, I'll have to fix this when I get the chance. – Hashim Aziz Feb 15 '22 at 20:18
  • 2
    @HashimAziz all of the answers / "Test commands" will produce the error `Can not mix named and numbered channels`. As per the ffmpeg pan documentation, you cannot mix channel names (like c0, c1) with speaker names (FL, FR)! https://ffmpeg.org/ffmpeg-filters.html#pan It really looks like your c4 and c5 should be BL and BR – aoeu Mar 24 '22 at 03:25
  • 2
    I've now updated all the commands in this answer to use channel numbers instead of names in an attempt to (programmatically) fix the SL/SR and BL/BR issue, although I have a feeling that they have separate channel numbers anyway - no way to make sure because these aren't actually documented anywhere. If this still doesn't work to add in both back and side channel audio I'll revert the last few edits and just edit all the commands to perform the same operations on the SL/SR channels as the BL/BR channels. – Hashim Aziz Dec 03 '22 at 18:51
33

I found the answer Shane provided to provide too little of the other channels and too much of the center. Movies with headphones sounded off balance, with all dialogue and not enough background music/effects.

According to ATSC standards (section 7.8, page 91), the following formula is used to downmix 5.1 to conventional stereo (as opposed to matrix):

Lo = 1.0 * L + clev * C + slev * Ls ;
Ro = 1.0 * R + clev * C + slev * Rs ;

clev and slev should be .707, according to tables 5.9 and 5.10 in the aforementioned document, assuming a center/surround mix level of 0. Other values are provided in those tables which reduces the amount of center mix, which I don't find useful.

With this in mind, the following ffmpeg option produces a good balanced sound with audible dialogue. Note that specifying the audio channels is not necessary.

-af "pan=stereo|FL < 1.0*FL + 0.707*FC + 0.707*BL|FR < 1.0*FR + 0.707*FC + 0.707*BR"

A note on the use of the less-than symbol, from the pan filter documentation:

If the ‘=’ in a channel specification is replaced by ‘<’, then the gains for that specification will be renormalized so that the total is 1, thus avoiding clipping noise.

Gregory
  • 438
  • 4
  • 4
  • 5
    The ATSC standards you've linked here were linked to from the [FFmpeg wiki on the topic](https://trac.ffmpeg.org/wiki/AudioChannelManipulation#a5.1stereo), so it's unsurprising that the formula used here is the same one implemented by FFmpeg with its `ac -2` switch. In other words, the only difference between using this filter and doing `ac -2` is a lot more typing. – Hashim Aziz Feb 28 '19 at 19:19
  • 3
    @Hashim Not only typing. An answer with a thorough explanation of the underpinnings is objectively better than "type this to get that". – Sevastyan Savanyuk Jan 04 '20 at 03:57
15

Try this downmix:

-ac 2 -af "pan=stereo|FL=FC+0.30*FL+0.30*BL|FR=FC+0.30*FR+0.30*BR" 

as suggested by Robert Collier in the Doom9 forum.

Hashim Aziz
  • 11,898
  • 35
  • 98
  • 166
Shane Harrelson
  • 177
  • 1
  • 4
  • 3
    What do all those options mean? If you explain them, people will be able to use your answer to solve different problems instead of just copy-pasting. – David Richerby Mar 04 '16 at 21:12
  • 2
    @DavidRicherby -ac = Audio Channels (2 for stereo), -af = Audio Filter – Cestarian Mar 23 '16 at 04:14
  • 4
    Tried this for a 5.1 movie and at least the output stereo sounded completely fine to me. Clear dialogue and nothing else seemed to be missing. Would be great if someone with VLC knowledge could share exactly what is done in the default 5.1 to 2.0 downmix there. – forthrin Jul 08 '16 at 10:28
  • 2
    @DavidRicherby: The options inside the audio filter (-af) are: FL=Front-left; BL=Back-left; FC=Front-center; FR=Front-right; BR=Back-right. The floats are linear factors to reduce (<1) or increase(>1) the volume of the multiplied channel. FL=FC+0.30*FL+0.30*BL is setting the Front-left channel to the Front-Center channel plus 30% of the Front-left and 30% of the Back-left channels. – kronenpj Jan 15 '17 at 22:13
  • 2
    FWIW: I find this mix make dialogues be way too loud compared to the music and ambient sounds. The technically more correct mix given in Tarc's answer is much more pleasing to me. So I guess you might have to try what works best for you, it depends on the situation. – jlh Feb 07 '18 at 22:12
  • It may have been edited, @jlh: but the filter settings are identical in both answers. There's no reason they should sound differently to you. – psouza4 May 06 '18 at 18:48
5

RFC 7845 Section 5.1.1.5 has coefficients for various stereo downmixes, which preserve the LFE.

$ ffmpeg -i INFILE -c:a libopus -b:a 256k -af "pan=stereo|OUTDEFS" OUTFILE

Replace INFILE and OUTFILE with the file names, and OUTDEFS with the definition below (with the newline replaced by a space):

5.1 Surround downmix to stereo

FL, FC, FR, BL, BR, LFE -> FL, FR

FL = 0.374107*FC + 0.529067*FL + 0.458186*BL + 0.264534*BR + 0.374107*LFE |
FR = 0.374107*FC + 0.529067*FR + 0.458186*BR + 0.264534*BL + 0.374107*LFE

7.1 Surround

FL, FC, FR, SL, SR, BL, BR, LFE -> FL, FR

FL = 0.274804*FC + 0.388631*FL + 0.336565*SL + 0.194316*SR + 0.336565*BL + 0.194316*BR + 0.274804*LFE |
FR = 0.274804*FC + 0.388631*FR + 0.336565*SR + 0.194316*SL + 0.336565*BR + 0.194316*BL + 0.274804*LFE

6.1 Surround

FL, FC, FR, SL, SR, BC, LFE -> FL, FR

FL = 0.321953*FC + 0.455310*FL + 0.394310*SL + 0.227655*SR + 278819*BC + 0.321953*LFE |
FR = 0.321953*FC + 0.455310*FR + 0.394310*SR + 0.227655*SL + 278819*BC + 0.321953*LFE

5.0 Surround

FL, FC, FR, BL, BR -> FL, FR

FL = 0.460186*FC + 0.650802*FL + 0.563611*BL + 0.325401*BR |
FR = 0.460186*FC + 0.650802*FR + 0.563611*BR + 0.325401*BL

Quadraphonic Channel

FL, FR, BL, BR -> FL, FR

FL = 0.422650*FL + 0.366025*BL + 0.211325*BR |
FR = 0.422650*FR + 0.366025*BR + 0.211325*BL

Linear Surround Channel

FL, FC, FR -> FL, FR

FL = 0.414214*FC + 0.585786*FL |
FR = 0.414214*FC + 0.585786*FR

The RFC explains how they chose the coefficients:

Implementations MAY use the matrices in Figures 4 through 9 to implement downmixing from multichannel files using channel mapping family 1 (Section 5.1.1.2), which are known to give acceptable results for stereo. Matrices for 3 and 4 channels are normalized so each coefficient row sums to 1 to avoid clipping. For 5 or more channels, they are normalized to 2 as a compromise between clipping and dynamic range reduction.

In these matrices the front-left and front-right channels are generally passed through directly. When a surround channel is split between both the left and right stereo channels, coefficients are chosen so their squares sum to 1, which helps preserve the perceived intensity. Rear channels are mixed more diffusely or attenuated to maintain focus on the front channels.


According to this answer, it is also possible to use -ac 2 and -lfe_mix_level <level> to include the LFE.

$ ffmpeg -i INFILE -c:a libopus -b:a 256k -ac 2 -lfe_mix_level 1 OUTFILE
Victor
  • 225
  • 3
  • 8
5

I've reviewed answers here and while Hashim Aziz made a fantastic summary, there is one thing missing I've noticed only after an investigation on my own. There are two 5.1 layouts ffmpeg recognizes (see ffmpeg -layouts). 5.1, and 5.1(side). Some recording devices etc. use the latter, where there are no BL and BR channels, but rather SL and SR.

Using BL/BR in your script won't raise an error: they're just silence. I don't really see a way to detect it, but one doesn't need to. One can either add the nonexistent channels (again, one set or the other will always be empty):

Gregory's ATSC formula (ffmpeg -ac 2)  FL<1.0*FL+0.707*FC+0.707*BL+0.707*SL|FR<1.0*FR+0.707*FC+0.707*BR+0.707*SR
Robert Collier's Nightmode Dialogue    FL=FC+0.30*FL+0.30*BL+0.30*SL|FR=FC+0.30*FR+0.30*BR+0.30*SR
Dave_750                               FL=0.5*FC+0.707*FL+0.707*BL+0.707*SL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.707*SR+0.5*LFE
RFC 7845 Section 5.1.1.5               FL=0.374107*FC+0.529067*FL+0.458186*BL+0.458186*SL+0.264534*BR+0.264534*SR+0.374107*LFE|FR=0.374107*FC+0.529067*FR+0.458186*BR+0.458186*SR+0.264534*BL+0.264534*SL+0.374107*LFE

Or just use channel numbers (which match for both 5.1 and 5.1(side)):

5.1            FL+FR+FC+LFE+BL+BR
5.1(side)      FL+FR+FC+LFE+SL+SR
               c0+c1+c2+ c3+c4+c5

I personally settled on Dave's formula (RFC was second for my uses) and using channel numbers:

ffmpeg.exe -i input51.mkv -c:s copy -c:v copy -c:a libopus -b:a 104k -af "pan=stereo|FL<0.5*c2+0.707*c0+0.707*c4+0.5*c3|FR<0.5*c2+0.707*c1+0.707*c5+0.5*c3" output20.mkv
Minty
  • 149
  • 1
  • 12
4

So, by combining @Shane Harrelson's with @Jordan Harris's answer to another question - with lazy mode turned on - here what's needed to convert input_51.mkv (5.1) into output_stereo.mkv (stereo):

ffmpeg -i input_51.mkv -c:v copy \
    -ac 2 -af "pan=stereo|FL=FC+0.30*FL+0.30*BL|FR=FC+0.30*FR+0.30*BR" \
    output_stereo.mkv

The -c:v copy part means that the video stream is not being touched (I guess that the video codec settings is being copied). Without it, it will take much longer. Just repeating from the above answer for completeness, -ac 2 means two audio channels and -af specifies an audio filter.

After looking into the command a bit, I figured out that it's setting how the two stereo channels are composed; the FL (front left channel) is taken from the original FC (front center) plus 0.30*FL (30% from the front left) plus 0.30*BL (30% from the back left) and so on.

Hashim Aziz
  • 11,898
  • 35
  • 98
  • 166
Tarc
  • 171
  • 1
  • 7
4

This is an old question now, but pointed me in the right direction and wanted to share my result:

-af "pan=stereo|FL=0.5*FC+0.707*FL+0.707*BL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.5*LFE"

Putting half of the FC and LFE into left and right gives a total of 1 for their effective volumes from both speakers. Using .707 * Front/Back Left/Right brings those channels down to a good level so they don't overpower the center.

Hashim Aziz
  • 11,898
  • 35
  • 98
  • 166
Dave_750
  • 141
  • 1
2

After reading this whole page and some experiments I came up with this script called "down_mix":

#!/bin/bash -x

FL="0.5*FC + 0.707*FL + 0.707*BL + 0.5*LFE"
FR="0.5*FC + 0.707*FR + 0.707*BR + 0.5*LFE"
AUDIO_FMT="libopus"
CONTAINER="mkv"

ffmpeg -i "$1" -c:v copy -c:s copy \
    -c:a $AUDIO_FMT \
    -af "pan=stereo|FL=$FL|FR=$FR" \
    "${1%.*}"_dm.$CONTAINER

    # how to test a snippet of movie
    # -ss 41:07.0 -t 4 \

Tweak the variables above to your liking. I didn't have a problem with low volume so left that out, but easily added.

Update: after experiencing a number of movies with whisper-quiet dialog combined with ear-splitting explosions I am now using this, which can fix even the dumbest mix:

# Moar center for Bond!!
FL="0.8*FC + 0.6*FL + 0.6*BL + 0.5*LFE"
FR="0.8*FC + 0.6*FR + 0.6*BR + 0.5*LFE"
Gringo Suave
  • 1,309
  • 11
  • 10
1
-ac 2

The volume of channels in downmix is unchanged with floating point codec -> pcm_f32le, aac

The volume in downmix (5.1 to 2.0 without LFE) is reduced by 1/2.5 = -7.96 dB with integer codec -> pcm_s16le, libfdk_aac

Movies have sound pointed in one direction, and no max sound pressure at all channels. So reduced downmix volume is wrong, little level compression is the right way. That's what Dolby does.

Giacomo1968
  • 53,069
  • 19
  • 162
  • 212
0

If the -ac 2 option gives you a balanced downmix where neither the music nor the speech sounds too much more than the other components, you just need to boost the volume with

-vol 512

I used 512 in the example, which increases the sound making it two times louder. The rule is that 256 is equivalent to 100%

Do not go too high with the value, and be sure to check the results in those parts of the movie with explosions or loud noise. Is is very easy to introduce distorsion by using a too high value.

Mephisto
  • 241
  • 1
  • 11
0

The ffmpeg filter "-ac 2" works fine as long as your target is pcm_s16le encoded. When encoding to pcm_f32le in wav format the volume is increased by 9dB and more. Hence: Don't use the "-ac 2" filter in such cases.

0

I use this one for all downmixing to stereo: -af 'lowpass=c=LFE:f=120,pan=stereo|FL=.3FL+.21FC+.3FLC+.3SL+.3BL+.21BC+.21LFE|FR=.3FR+.21FC+.3FRC+.3SR+.3BR+.21BC+.21LFE'

Works for various channel configurations, including standard 7.1, 6.1, 5.1.

damian101
  • 47
  • 4
0

Old question but still interesting to me...

First, I never encountered a global volume issue. After reading the answer from @Franz-Michael Fisher I made a few tests, starting from a file with a DTS 5.1 track and transcoding it to pcm_s16le, pcm_f32le, and aac, all of them with the -ac 2 option. When playing the files with VLC and headphones, all of them sound the same as the original, except the pcm_s16le one that sounds quieter. Since I always use aac, the global volume is apparently not an issue.

Second, I sometimes face the problem of too low perceived dialogs compared to the music/sounds. So it's indeed tempting to downmix with alternate formulas that give more weight to the central channel FC, and I did that for a while... However, it turns out that FC does not contain only voices but also a large part of the music and sounds: as a consequence, overweighting FC is also narrowing the stereo image, which is not desirable...

I kept wondering why the dialogs are sometimes perceived too low after downmixing, and I have a possible explanation: the brain is very good at isolating a voice buried in the ambient noise according to the direction it comes from. That's why people with hearing aids still have difficulties to follow a conversation when multiple people speak at the same time: the earings aids can restore the volume, but the directivity is lost... So, with a real 5.1 or 7.1 setup the brain is not bothered by the side/rear channels when it comes to focus on the dialog, because they come from fully different directions. After downmix this is not the same: what was coming from the side/rear channels is now coming from the front, making the separation task more difficult for the brain. The solution is hence to downweight the side/rear channels: instead of the ATSC formula

-af "pan=stereo|FL < 1.0*FL + 0.707*FC + 0.707*BL|FR < 1.0*FR + 0.707*FC + 0.707*BR"

I am now using

-af "pan=stereo|FL < 1.0*FL + 0.707*FC + 0.4*BL|FR < 1.0*FR + 0.707*FC + 0.4*BR"

EDIT: beware, if the input 5.1 stream is declared as "side" (as shown in ffmpeg or ffprobe), then BL and BR are zero and the surround channels are SL and SR instead. It is actually safe to write:

-af "pan=stereo|FL < 1.0*FL + 0.707*FC + 0.4*BL + 0.4*SL|FR < 1.0*FR + 0.707*FC + 0.4*BR + 0.4*SR"

PierU
  • 1,539
  • 5
  • 20