ffmpeg - Normalize Audio With Varying Audio Volume

Question

I have a .MOV file whose sound I would like to normalize. However, when I run the following command I see that the volume varies (mono to stereo and with varying max_volume levels) throughout the video:

ffmpeg -i video.avi -af "volumedetect" -f null /dev/null

I would like to normalize the whole file but am not sure how to do so. The output is as follows:

ffmpeg version N-50911-g9efcfbe Copyright (c) 2000-2013 the FFmpeg developers   built on Mar 13 2013
21:26:48 with gcc 4.7.2 (GCC)   configuration: --enable-gpl
--enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfi g --enable-frei0r --enable-gnutls --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetype --enable-libg sm --enable-libilbc --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --ena ble-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-lib twolame --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxa vs --enable-libxvid --enable-zlib   libavutil      52. 19.100 / 52. 19.100   libavcodec     55.  0.100 / 55.  0.100   libavformat    55.  0.100 / 55.  0.100   libavdevice    54.  4.100 / 54.  4.100   libavfilter     3. 45.103 / 
3. 45.103   libswscale      2.  2.100 /  2.  2.100   libswresample   0. 17.102 /  0. 17.102   libpostproc    52.  2.100 / 52.  2.100 [mov,mp4,m4a,3gp,3g2,mj2 @ 020d9920] multiple edit list entries, a/v
desync might occur, patch welcome Input #0, mov,mp4,m4a,3gp,3g2,mj2,
from '.\IMG_2783.mov':   Metadata:
    major_brand     : qt
    minor_version   : 0
    compatible_brands: qt
    creation_time   : 2014-08-20 02:36:31   Duration: 00:06:06.43, start: 0.000000, bitrate: 10206 kb/s
    Stream #0:0(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 78 kb/s
    Metadata:
      creation_time   : 2014-08-20 02:36:32
      handler_name    : Core Media Data Handler
    Stream #0:1(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720, 10116 kb/s, 30 fps, 30 tbr, 600 tbn, 12 00 tbc
    Metadata:
      creation_time   : 2014-08-20 02:36:32
      handler_name    : Core Media Data Handler Output #0, null, to '/dev/nullclear':   Metadata:
    major_brand     : qt
    minor_version   : 0
    compatible_brands: qt
    encoder         : Lavf55.0.100
    Stream #0:0(und): Video: rawvideo (I420 / 0x30323449), yuv420p, 1280x720, q=2-31, 200 kb/s, 90k tbn, 30 tbc
    Metadata:
      creation_time   : 2014-08-20 02:36:32
      handler_name    : Core Media Data Handler
    Stream #0:1(und): Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      creation_time   : 2014-08-20 02:36:32
      handler_name    : Core Media Data Handler Stream mapping:   Stream #0:1 -> #0:0 (h264 -> rawvideo)   Stream #0:0 -> #0:1 (aac ->
pcm_s16le) Press [q] to stop, [?] for help [null @ 03e023a0] Encoder
did not produce proper pts, making some up. Input stream #0:0 frame
changed from rate:44100 fmt:fltp ch:2 chl:stereo to rate:44100
fmt:fltp ch:1 chl:mono [Parsed_volumedetect_0 @ 03e02ea0] n_samples:
3346432 [Parsed_volumedetect_0 @ 03e02ea0] mean_volume: -36.2 dB
[Parsed_volumedetect_0 @ 03e02ea0] max_volume: -0.0 dB
[Parsed_volumedetect_0 @ 03e02ea0] histogram_0db: 8
[Parsed_volumedetect_0 @ 03e02ea0] histogram_1db: 18
[Parsed_volumedetect_0 @ 03e02ea0] histogram_2db: 38
[Parsed_volumedetect_0 @ 03e02ea0] histogram_3db: 58
[Parsed_volumedetect_0 @ 03e02ea0] histogram_4db: 94
[Parsed_volumedetect_0 @ 03e02ea0] histogram_5db: 58
[Parsed_volumedetect_0 @ 03e02ea0] histogram_6db: 80
[Parsed_volumedetect_0 @ 03e02ea0] histogram_7db: 196
[Parsed_volumedetect_0 @ 03e02ea0] histogram_8db: 222
[Parsed_volumedetect_0 @ 03e02ea0] histogram_9db: 236
[Parsed_volumedetect_0 @ 03e02ea0] histogram_10db: 404
[Parsed_volumedetect_0 @ 03e02ea0] histogram_11db: 522
[Parsed_volumedetect_0 @ 03e02ea0] histogram_12db: 766
[Parsed_volumedetect_0 @ 03e02ea0] histogram_13db: 738 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:1 chl:mono to rate:44100 fmt:fltp ch:2 chl:stereo [Parsed_volumedetect_0 @ 03834340]
n_samples: 626688 [Parsed_volumedetect_0 @ 03834340] mean_volume:
-39.6 dB [Parsed_volumedetect_0 @ 03834340] max_volume: -15.5 dB [Parsed_volumedetect_0 @ 03834340] histogram_15db: 17
[Parsed_volumedetect_0 @ 03834340] histogram_16db: 19
[Parsed_volumedetect_0 @ 03834340] histogram_17db: 25
[Parsed_volumedetect_0 @ 03834340] histogram_18db: 41
[Parsed_volumedetect_0 @ 03834340] histogram_19db: 88
[Parsed_volumedetect_0 @ 03834340] histogram_20db: 231
[Parsed_volumedetect_0 @ 03834340] histogram_21db: 439 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:2 chl:stereo to rate:44100 fmt:fltp ch:1 chl:mono [Parsed_volumedetect_0 @ 08f34b00]
n_samples: 192512 [Parsed_volumedetect_0 @ 08f34b00] mean_volume:
-19.7 dB [Parsed_volumedetect_0 @ 08f34b00] max_volume: 0.0 dB [Parsed_volumedetect_0 @ 08f34b00] histogram_0db: 2048 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:1 chl:mono to rate:44100 fmt:fltp ch:2 chl:stereo [Parsed_volumedetect_0 @ 08f34aa0]
n_samples: 769024 [Parsed_volumedetect_0 @ 08f34aa0] mean_volume:
-28.2 dB [Parsed_volumedetect_0 @ 08f34aa0] max_volume: -0.1 dB [Parsed_volumedetect_0 @ 08f34aa0] histogram_0db: 15
[Parsed_volumedetect_0 @ 08f34aa0] histogram_1db: 29
[Parsed_volumedetect_0 @ 08f34aa0] histogram_2db: 72
[Parsed_volumedetect_0 @ 08f34aa0] histogram_3db: 102
[Parsed_volumedetect_0 @ 08f34aa0] histogram_4db: 158
[Parsed_volumedetect_0 @ 08f34aa0] histogram_5db: 163
[Parsed_volumedetect_0 @ 08f34aa0] histogram_6db: 299 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:2 chl:stereo to rate:44100 fmt:fltp ch:1 chl:mono [Parsed_volumedetect_0 @ 08f34800]
n_samples: 96256 [Parsed_volumedetect_0 @ 08f34800] mean_volume: -65.0
dB [Parsed_volumedetect_0 @ 08f34800] max_volume: -37.3 dB
[Parsed_volumedetect_0 @ 08f34800] histogram_37db: 2
[Parsed_volumedetect_0 @ 08f34800] histogram_38db: 0
[Parsed_volumedetect_0 @ 08f34800] histogram_39db: 2
[Parsed_volumedetect_0 @ 08f34800] histogram_40db: 10
[Parsed_volumedetect_0 @ 08f34800] histogram_41db: 8
[Parsed_volumedetect_0 @ 08f34800] histogram_42db: 4
[Parsed_volumedetect_0 @ 08f34800] histogram_43db: 14
[Parsed_volumedetect_0 @ 08f34800] histogram_44db: 12
[Parsed_volumedetect_0 @ 08f34800] histogram_45db: 12
[Parsed_volumedetect_0 @ 08f34800] histogram_46db: 20
[Parsed_volumedetect_0 @ 08f34800] histogram_47db: 16 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:1 chl:mono to rate:44100 fmt:fltp ch:2 chl:stereo [Parsed_volumedetect_0 @ 08f34800]
n_samples: 533504 [Parsed_volumedetect_0 @ 08f34800] mean_volume:
-43.9 dB [Parsed_volumedetect_0 @ 08f34800] max_volume: -23.4 dB [Parsed_volumedetect_0 @ 08f34800] histogram_23db: 47
[Parsed_volumedetect_0 @ 08f34800] histogram_24db: 453
[Parsed_volumedetect_0 @ 08f34800] histogram_25db: 685 Input stream
#0:0 frame changed from rate:44100 fmt:fltp ch:2 chl:stereo to rate:44100 fmt:fltp ch:1 chl:mono [Parsed_volumedetect_0 @ 08f34980]
n_samples: 98304 [Parsed_volumedetect_0 @ 08f34980] mean_volume: -16.4
dB [Parsed_volumedetect_0 @ 08f34980] max_volume: -6.2 dB
[Parsed_volumedetect_0 @ 08f34980] histogram_6db: 28
[Parsed_volumedetect_0 @ 08f34980] histogram_7db: 290 frame=10993
fps=817 q=0.0 Lsize=N/A time=00:06:38.32 bitrate=N/A video:687kB
audio:68764kB subtitle:0 global headers:0kB muxing overhead
-100.000031% [Parsed_volumedetect_0 @ 08f34920] n_samples: 13807616 [Parsed_volumedetect_0 @ 08f34920] mean_volume: -28.3 dB
[Parsed_volumedetect_0 @ 08f34920] max_volume: 0.0 dB
[Parsed_volumedetect_0 @ 08f34920] histogram_0db: 188
[Parsed_volumedetect_0 @ 08f34920] histogram_1db: 532
[Parsed_volumedetect_0 @ 08f34920] histogram_2db: 1078
[Parsed_volumedetect_0 @ 08f34920] histogram_3db: 1416
[Parsed_volumedetect_0 @ 08f34920] histogram_4db: 1948
[Parsed_volumedetect_0 @ 08f34920] histogram_5db: 2805
[Parsed_volumedetect_0 @ 08f34920] histogram_6db: 4288
[Parsed_volumedetect_0 @ 08f34920] histogram_7db: 6068

I'm not sure if this'll help, but since the lowest value of max_volume is -37.3 dB, maybe try this: `ffmpeg -i INPUT.mov -af "volume=37.3dB" -c:v copy -c:a aac -strict -2 -b:a 160k OUTPUT.mov` — Vinayak, Sep 02 '14 at 21:08
Might be worth trying: `-af "aformat=channel_layouts=stereo,volumedetect"`. If that seems to work, then you can use `aformat` when you modify the volume (also see `tools/normalize.py`). — llogan, Sep 02 '14 at 22:34
You might need to extract the audio, cut it into homogenous parts, normalize each part, concatenate into one audio file, then substitute in the video. I think you could do it directly on the video in one go, but the ffmpeg command for that is quite complex. — harrymc, Sep 03 '14 at 18:38
Some media players support normalize audio volume on the fly(for windows Gom Player+plugin) — crazypotato, Sep 06 '14 at 19:28
@crazypotato and some don't. That is why a conversion like this is an important one! You don't have much choice of software if you have a set top box. — TWiStErRob, Aug 22 '15 at 15:55

score 0 · Answer 1 · answered Sep 08 '14 at 00:57

Coming as an audio guy who's done some reading; You might want to approach this by splitting up these different sections with something like atrim, normalising (replaygain to find peak), and stitching back together with the other sections that have had the same processing applied.

If the dynamic range was much lower, you could have considered something like compand to reduce the dynamic level of the top end of dynamics, allowing the level of the whole file to be increased. Compand may well be something useful to do after normalisation anyway.

score 0 · Answer 2 · edited Mar 20 '17 at 10:16

0

I think your question has already been answered here.

It looks like your file has two stereo audio streams though... an AAC and a WAVE stream. Pick the WAVE stream as the source for your re-encode.

edited Mar 20 '17 at 10:16

Community

1

answered Sep 08 '14 at 22:12

Tim

121
3

ffmpeg - Normalize Audio With Varying Audio Volume

2 Answers2