Here is what I did with "original.mp4" (00:15:22 duration, ~80MB filesize):
Extracted video stream from mp4
ffmpeg -i original.mp4 -c copy -an vid_only.mp4Extracted audio stream from mp4
ffmpeg -i original.mp4 -vn -acodec copy aud_only.aacMerged the extracted audio and video streams to form new mp4
ffmpeg -i aud_only.aac -i vid_only.mp4 new_mix.mp4
And found that "new_mix.mp4" was mixed accurately, duration was same as before, but the file-size had dropped to 36MB (from ~80MB). Using ffprobe on both "original.mp4" and "new_mix.mp4", I found that the difference is in bitrate of the audio and video streams.
Step #1 and #2, extracted the audio and video streams without any reencoding. Step #3, supposedly merged audio and video streams, but took quite a long time, so probably performed reencoding. Is that right ? And if so, is there a way to tell ffmpeg to not reencode while merging, to do it faster and retain original bitrate ?