General

The Final Data to FFmpeg

On this files, we will plow thru the original issues of FFmpeg. Nonetheless sooner than that, we will duvet some defective floor to will enable you to discover current media ideas and FFmpeg. Feel free to skip the parts that are already trivial for you!

Introduction to FFmpeg

FFmpeg.org‘s definition is the following: “FFmpeg is the leading multimedia framework, ready to decode, encode, transcode, mux, demux, circulation, filter and play somewhat fundamental anything that humans and machines contain created. It supports the most obscure aged formats as a lot as the slicing edge. Regardless of if they were designed by some standards committee, the community or a company.”

I judge FFmpeg as the trek-to application for audio/video manipulation in an automatic or scripted blueprint.

When or not it is some distance a will have to contain to place in pressure a carrier that manipulates video, or simply contain 300 media files that have to be transformed into a abnormal layout, FFmpeg is your – nerdy – friend.

FFmpeg can attain tremendous chunks of the elemental functionalities of a as a lot as the moment Non-linear (NLE) video editors, e.g., Davinci Resolve Studio or Premiere Pro. Nonetheless, it would not contain a graphical interface in that sense as those behemoths attain, and unarguably it is map less friendly.

In a protracted-established NLE, you would possibly per chance per chance well presumably also attain things like these:

  1. Click to import a file
  2. Descend it into the timeline
  3. Neat and Cut
  4. Add an overlay image
  5. Slash that overlay
  6. Add vignette
  7. Add some coloration altering results, e.g. change the hue
  8. Add a further audio song to the mix
  9. Alternate the amount
  10. Add some results, e.g.: echo
  11. Export into assorted formats
  12. Export into a deployable video layout
  13. Export the master audio in wav

Or, to total the accurate identical ingredient, you presumably can also quit this account for:

ffmpeg -y      -ss 20 -t 60 -i bbb_sunflower_1080p_60fps_normal.mp4     -i train.jpg     -ss 4 -i voice_recording.wav     -filter_complex "[0:v]hue=h=80:s=1[main] ; [1:v]crop=w=382:h=304:x=289:y=227[train] ; [main][train]overlay=x=200:y=200,vignette=PI/4 ; [2:a]volume=1.5,aecho=0.8:0.9:100:0.3[speech] ; [0:a][speech]amix=duration=shortest,asplit[audio1][audio2]"     -map '' -map '[audio1]' -metadata title="Editor's cut" bbb_edited.mp4     -map '[audio2]' bbb_edited_audio_only.wav

Lag, it is not in actuality friendly at all, but it is some distance terribly, very grand while you change into pals with FFmpeg.

Investigate cross-check this comparability of the distinctive and the edited one:

When you would possibly per chance per chance strive this account for out, procure the instance files and witness it for your self!

Installing FFmpeg

FFmpeg is accessible for most long-established and even extraordinary platforms and architectures. You’ll be ready to even be on Linux, Mac OS X or Microsoft Dwelling windows, and to boot you would possibly per chance per chance well presumably plug or link to FFmpeg.

Installing FFmpeg is easy on most platforms! There isn’t any installer, assuredly just a compressed archive or not it is some distance a will have to contain to procure for your platform and structure.

In the case of Linux, most distributions consist of a pre-built FFmpeg of their tool repositories. Therefore, you would possibly per chance per chance well presumably set up FFmpeg from those fundamental more hasty.

FFmpeg history

The mission develop to be as soon as started in 2000 by the superior Fabrice Bellard. The title is a concatenation of “FF” meaning “immediate-ahead” and MPEG, the title of a video standards group. It has been very well, energetic and alive since then, releasing a brand original free up about every three months.

FFmpeg supported codecs and formats

The default FFmpeg shipped with my Ubuntu Linux distribution supports about 460 codecs and 370 formats.

Discover about it for your self:

ffmpeg -codecsffmpeg -formats

Compilation of FFmpeg

Take into story that the supported codecs and formats (and filters, demuxers, muxers, input and output systems, etc.) are highly dependent on the so-known as compilation flags.

This means that the above number most effective represents the truth that it supports not not as a lot as this many codecs and formats. Serene, there are fundamental more that the equipment builders excluded for assorted reasons, e.g.: licensing, structure, measurement issues, etc.

Since FFmpeg is begin offeryou would possibly per chance per chance well presumably assemble FFmpeg for your self at any time.

Affirm shall we utter, that you simply care about your layer’s measurement (therefore the bootstrap drag) in AWS Lambda. On this case, you would possibly per chance per chance well presumably assemble an FFmpeg binary that most effective contains the mp3 encoder shall we utter, and nothing else.

Also, you would possibly per chance per chance well presumably also not resolve on to plug into licensing issues and leave out stuff that can per chance well motive issues for your employ case. Therefore you plot shut to leave out explicit codecs/formats. I highly imply attempting out the “–enable-gpl”, “–enable-nonfree” and “–enable-version3” compilation flags on this case, to boot to this.

Or it is some distance largely helpful to contain a standalone FFmpeg binary for your mission (e.g.: embedded, or some cloud occasion), that would not depend on any working scheme libraries. Then you definately have to invent a so-known as static invent, that compiles in all the libraries into a single binary file, and would not rely for your OS’ libraries and the runtime loading of assorted FFmpeg libraries. Search spherical for “–enable-static” on this case.

Finally, yow will in finding pre-built static FFmpeg builds just here too.

FFmpeg’s strengths

FFmpeg reads and writes most video and audio formats that subject for most of us. It is a extraordinarily capable and excessive-efficiency tool for converting and manipulating these formats.

Nonetheless FFmpeg can attain fundamental more!

Filtering

FFmpeg has extensive amounts of filters for audio and video. Therefore, video manipulation will seemingly be a key just of FFmpeg.

Hardware acceleration

It does toughen many forms of hardware accelerations! Video encoding is a extraordinarily resource-intensive operation, and to boot you would possibly per chance per chance well presumably also encounter a range of hardware gadgets or capabilities which would possibly per chance per chance also drag up your process!

Most seriously, whenever you happen to can also contain an NVIDIA card, you would possibly per chance per chance well presumably lengthen your H.264 or H.265 encoding and decoding throughput by multipliers when put next to your CPU. Nonetheless assorted things, such as VDPAU, VAAPI, or OpenCL, would possibly per chance per chance well even be leveraged to receive your pipeline’s throughput.

Be taught more referring to the supported hardware acceleration systems here.

Versatile input/output systems

FFmpeg will seemingly be very capable when it involves gaining access to input and output files.

Honest to title about a: it would possibly per chance per chance per chance employ your webcam, file from your microphone, snatch your display mask, or take from your Blackmagic DeckLink. Nonetheless FFmpeg can bag straight from an online take care of, begin all forms of streams, be taught from a pipe, a socket, and surely, from files.

The identical holds most gripping for outputting the guidelines. It goes to write to your webcam, play audio for your microphone… Honest kidding:) It goes to output to files, streams, pipes, sockets and so forth.

Running instance instructions

This article is filled with FFmpeg instructions that are working examples. The motive in the back of that is that you simply would possibly per chance per chance well presumably also take a look at these out for your self! Nonetheless the account for line interfaces of assorted working programs are rather assorted, so the instructions listed listed below are supposed to be carried out in a Linux bash shell.

To undertake these account for traces to Microsoft Dwelling windows, you would possibly per chance per chance well presumably also have to:

  1. Alternate (cd) into the itemizing where you extracted the ffmpeg.exe. Alternatively, add that itemizing to the direction to invent it callable from any place.
  2. You would possibly per chance per chance well also have to change “ffmpeg” to “ffmpeg.exe”
  3. You will want to change ““-s (backslashes) at the stop of the traces with “^“-s (hats)
  4. You will want to change the fontfile argument’s trace to something like this:fontfile=/Windows/Fonts/arial.ttf to procure instructions with the drawtext filter working.

MacOS users will need steps #1 and #4.

Now let’s contain a short overview of media ideas. These ideas will seemingly be a will have to contain for us if we resolve to achieve the latter sections of this text and FFmpeg’s workings. To protect this section short, it is some distance a increased-stage, simplified clarification of these ideas.

Audio

We’ll hasty duvet the following terms:

  1. Sampling rate
  2. Bitrate
  3. Channels

Sampling Fee

The sampling rate is the ingredient that shows how consistently we measure/scan/sample the input files circulation.

The image below shows the measurement residence windows (quantization) as gray bars.

Why does this subject? Consequently of it is some distance a balancing act. If we measure the signal less assuredly, we will lose more significant aspects (deplorable). Also, by having fewer samples, we will contain less files in the stop. Therefore the file measurement will seemingly be smaller (genuine).

Listed below are some ballpark values:

  • 8 kHz (GSM – Low quality)
  • 44.1 kHz (CD – High quality)
  • Forty eight kHz (Very excessive quality)
  • 88.2 kHz (Insane – assuredly for production most effective)
  • 96 kHz (Insane – assuredly for production most effective)

There are no trek “just solutions” here. The quiz is what is “genuine ample” for your employ case? GSM specializes in speech, and not even quality but understandability and the least imaginable quantity of files. Therefore, they stumbled on that 8 kHz is ample (there are a range of more systems), for their applications.

The “CD quality” aimed for excessive quality. Therefore they chose 44.1 kHz, that number has some history in it, but the distinguished motive in the back of aiming above 40 kHz lies in physics and the map in which the human ear works.

There were two very well-organized guys whose theorem in total says that in expose for you a somewhat genuine signal illustration, or not it is some distance a will have to contain to sample it at twice the rate as its well-liked frequency. Human listening to assuredly works up except about 20 kHz, so in expose for you “genuine quality”, you should aim for not not as a lot as 40 kHz. And 40 kHz + some headroom + some more physics + historical reasons=44.1 kHz! 🙂

As for the increased rates, those are most effective feeble when very excessive-quality audio editing is required.

Bitrate

Bitrate represents the amount of files per 2nd that results from our transcoding/quantization process. If it is 1411 kbit/s, meaning that for every 2nd of audio files, about 1411 kbit of output files will seemingly be produced.

Therefore, you would possibly per chance per chance well presumably utter that 1 minute of audio with 1411 kbit/sec would require:

(1411 kbit / 8) kbyte * 60 second=10582 kbyte=10.33 mbyte

Now, it is most effective easy like that with raw audio files and with about a easy codecs, e.g. PCM in WAVs.

Codecs compressing exhausting can also throw your numbers spherical just a minute, as input files will seemingly be compressible with assorted rates. Variable bitrate is assuredly occurring to assign rental. The encoder can also output a lower bitrate if the guidelines is “easy” and would not require excessive precision.

Listed below are some ballpark values:

  • 13 kbits/s (GSM quality)
  • 320 kbit/s (Excessive-quality MP3)
  • 1411 kbit/s (16bit WAV, CD quality, PCM)

Channels

Inside of most audio formats, you would possibly per chance per chance well presumably contain more audio channels. This means more than one, separated audio streams would possibly per chance per chance well even be in the the same file.

Consistently, more than one channels contain their very have title:

  • When you would possibly per chance per chance well presumably also contain a single microphone, you will most seemingly file it into a single channel known as Mono.
  • Traditional song from the FM radio or streaming providers and products assuredly has two channels in a so-known as “Stereo” configuration.

With stereo, there would possibly per chance per chance well be several systems how the audio “image” would possibly per chance per chance well even be made richer by leveraging audio panningtime and segment-interesting and heaps more. There would possibly per chance be a assorted recording blueprint too, known as Binaural recordingwhich is extensive superior. Wear headphones for thisand do not be skittish:)

Shall we embrace, listed below are Great Buck Bunny‘s audio waveforms in Audacity:

You’ll be ready to witness that there are two traces of waveforms and to boot that they are somewhat the same. That is long-established, as you largely hear the the same ingredient alongside with your two ears, but the subject is in the soft differences between the two. That’s where directionality, richness, and all forms of assorted results lie.

Nonetheless why quit at two? The checklist continues:

  • 2.1, because it is assuredly known as, blueprint three channels: 2 for stereo and one for the LFE (“low-frequency results” a.k.a.: “bass”).
  • 5.1 is the same, with five directional channels (2 front, 1 center, 2 rear) and the LFE.

So channels are only separate “recordings” or “streams” of audio signals.

Portray properties

For photos, there are a range of parameters, but we will evaluation out most effective these:

  • Determination
  • Bit-depth
  • Transparency

Determination

An image consists of pixels, single aspects which contain a single coloration. The resolution of an image determines how many columns and rows of pixels are in an image. In assorted words: an image has a width and a height.

This image shows the first 10 pixels in the first row.

Listed below are some ballpark values for resolution:

  • “HD” or “Full HD” or “1K” or “1080p” blueprint 1920×1080 pixels.
  • “4K” can also mean about a values, but it must be about 2160×3840 pixels.
  • A fashioned 16mp photo you invent of your cat is about 4608×3456 pixels.
  • Traditional social media image posts are about 1080×1080 pixels.

Bit-depth

Bit-depth represents the preference of bits feeble for storing a single pixel’s coloration trace. Right here is the the same balancing sport, and or not it is some distance a will have to contain to recount between quality or file measurement.

Traditional ballpark values for bit-depth:

Bits Colours Notes
1 2 Dim & White
8 256 B/W or Exiguous coloration palette
24 16.7m 3x8 bit for R-G-B “Honest coloration”
30 1073m 3x10 bit for R-G-B “Deep coloration”

These closing two now and again are known as “8 bit” or “10 bit” respectively, especially when talking about movies. That blueprint 8/10 bits per single coloration channel.

Transparency

Some image formats toughen a extra channel alongside side the crimson, inexperienced, and blue components: the alpha channel. The alpha channel determines how clear a single pixel is, and it would possibly per chance per chance per chance contain assorted bit-depths, it is assuredly both 1, 8 or 16 bits.

If the alpha channel is 1 bit, then the layout can encode a pixel to be both clear or non-clear. If it is 8 or more bits, then the layout can encode 256 or more steps of transparency.

Video properties

Video files is built by single photos confirmed just after every assorted. This brings in most attributes of photos and some more!

So a video has a resolution that is its width and height.

Then the first obtrusive parameter of a video is the frameratewhich defines how many photos are confirmed in a 2nd. In model values for this are 24, 25, 30, or 60.

A video file also has a codec assigned to it, which is the layout describing how all those photos were compressed into this video file. There are a range of more attributes of films, but that is a genuine initiate.

Video codecs

Compression is a giant significant ingredient when it involves video because you would possibly per chance per chance well presumably also contain thousands of photos to protect collectively. When you do not seem like doing it in a well-organized map, then the resulting video will seemingly be very, very tremendous.

Honest accept as true with a 2-minute video, with 30 fps. That blueprint this would per chance well contain 60 s * 2 * 30 fps=3600 frames! I contain just taken a screenshot of an HD video, which develop to be as soon as 730 kbyte in JPEG layout. Now 3600 frame * 730 kbyte equals 2.5 gigabytes!

Are you able to accept as true with that? I am hoping not, and that’s the reason because compression brings that map, map down, to the stage of tens of megabytes. In the meanwhile a video of that measurement is barely excessive quality and about 2 hours lengthy. Also, have in mind, that JPEG is already compressed, a single frame would be 6 mbyte when uncompressed. Now that 2-minute video would be 21 gigabytes if we would retailer it uncompressed.

Traditional codecs such as H.264 and H.265 are doing very artful and intricate operations to total excessive compression ratios with genuine quality.

Honest reflect about that, most frames in a video are somewhat the same, most effective containing limited differences. So if we can also most effective retailer that minute distinction between frames, we would obtained a giant bonus! And that’s the reason just one amongst the a range of systems codecs attain.

Codec designers are also exploiting the weaknesses and capabilities of the human scrutinize. Equivalent to the truth that we’re more snug to light depth changes than coloration changes (utter hello to YUV). And they would possibly be able to procure away with lower quality significant aspects for parts that are interesting immediateand so forth.

Consequently of why lose treasured bits for things that you simply can not even survey?!

There are a range of codecs accessible, with assorted objectives in tips, despite the incontrovertible truth that the bulk point of curiosity on keeping the file measurement low.

  • H.264, H.265: These are the hottest ones, with the widest toughen in browsers, phones, players, etc. It specializes in limited file sizes with genuine quality. (On the price of resource intensiveness.)
  • Apple ProRes, DNxHD: These are long-established formats for production. They point of curiosity on quality and ease of processing and not on file measurement.

Audio codecs

The aim of audio codecs is the the same as what we saw with the video codecs. It is just harder to display it as audio would not encompass single image frames but audio frames/packets. So an analog audio signal is of a virtually limitless, or not not as a lot as very excessive quality whenever you happen to suspect of it.

On the lowest stage, the rate and amplitude resolution is extraordinarily excessive. We can also utter “atomic”, as we now contain to measure and retailer the rate and direction of atoms. So in expose so that you simply can retailer that precisely, that will require a giant excessive-quality measurement, that can per chance also consequence in a extraordinarily excessive bitrate files circulation.

Fortuitously, the sound isn’t not as a lot as not propagating with light drag so we can assign somewhat rather just by that truth. (There isn’t any need for an coarse sampling rate.) Then our listening to is extraordinarily exiguous if we rob the previous paragraph as a scale, so we plot shut there yet again. We don’t resolve on most of that high precision that is there.

Nonetheless restful, if we rob our listening to capacity and resolve on to retailer raw audio files with about 44.1 kHz of sample rate with about 1 Mbit/sec bitrate, we would restful procure somewhat rather of files. Check the calculations in the audio bitrate section above.

So raw audio would possibly per chance per chance well even be compressed extra, which is what many in model codecs attain. They also exploit the human senses, but this time the human ear. We started with the basics that the human ear has a limit on the frequencies it would possibly per chance per chance per chance detect. Therefore, we can assign loads by slicing out the fluctuate of frequencies outside our listening to fluctuate. Except you are a bat, you are most gripping between 20-20khz! 🙂

Nonetheless there are assorted systems, shall we utter, auditory overlaying. That blueprint that the presence of one frequency can have an effect on your capacity to detect a abnormal frequency. From the codec’s point of view, it would possibly per chance per chance per chance skip encoding about a frequencies if it is well-organized ample to know which of them you will not survey. I’m sure there are loads more systems, let me know whenever you happen to know about about a more gripping ones!

Right here’s a checklist of long-established codecs:

  • MP3, AAC, OGG: These are long-established lossy audio formats.
  • PCM (e.g. in a WAV container), FLAC: These are lossless formats.
  • MIDI: It is a droll layout. It is like a song sheet which would possibly per chance per chance also sound assorted on assorted players or settings. It is assuredly not constituted of accurate audio files, but from recording a digital keyboard or as an output from an audio composing tool.

Containers

Now we obtained thru the elemental building blocks, the image, the video, the video codecs, and the audio codecs, and we reached the pinnacle of this iceberg: the containers.

A container is a layout specification, that combines all these streams into a single file layout. It defines how to position all these files collectively, how to assign metadata (e.g. creator, description, etc), how to synchronize these streams, and now and again a container even contains indexes to advantage looking for.

So, shall we utter, a MOV container can indulge in an H.264 video circulation and an AAC audio circulation collectively.

In model containers:

  • MOV
  • MP4
  • MKV
  • WebM
  • WAV (audio most effective)

Instance Fabric

I will employ these instance materials as inputs in the following parts of this text. In expose so that you simply can discover alongside, assign these files for your self!

Name Resource
Great Buck Bunny ///http://distribution.bbb3d.renderfarming.gain/video/mp4/bbb_sunflower_1080p_60fps_normal.mp4
Practice prepare.jpg
Smiley smiley.png
Instruct recording voice_recording.wav
Great Buck Bunny’s audio ffmpeg -i bbb_sunflower_1080p_60fps_normal.mp4 -map 0:1 bbb_audio.wav

n

And we are able to invent our have audio file by extracting the audio from the Great Buck Bunny movie! We’ll employ this file as an instance, so after downloading the video file, please quit this:

ffmpeg -i bbb_sunflower_1080p_60fps_normal.mp4 -map 0:1 bbb_audio.wav

By the center of this text , you will see this account for, but for now, just invent sure to contain the WAV file subsequent to your video file to envision out the instructions later in the article.

We’ll employ these files in the following parts of this text. Therefore invent sure to procure them!

FFplay and FFprobe

FFmpeg is the title of the distinguished binary and the mission itself, but it is shipped alongside side two assorted binaries, ffplay and ffprobe.

Let’s evaluation them out hasty, just in the account for line!

FFplay

FFplay is a current video player, that can per chance well even be feeble for taking part in media. It be not a friendly video player, but it is some distance a genuine checking out floor for assorted things.

To quit it, just merely provide a media file:

ffplay bbb_sunflower_1080p_60fps_normal.mp4

When you would possibly per chance per chance take a look at this accurate account for, you will want to procure the instance files.

Shall we embrace, it would possibly per chance per chance per chance well even be feeble to preview filters (we will discuss those later), but let’s witness an instance:

ffplay -vf "drawtext=text='HELLO THERE':y=h-text_h-10:x=(w/2-text_w/2):fontsize=200:f

FFprobe

FFprobe, as its title implies, is a tool for getting details about media files.

This account for:

ffprobe bbb_sunflower_1080p_60fps_normal.mp4

Will return us some long-established files referring to the video file:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'bbb_sunflower_1080p_60fps_normal.mp4':  Metadata:[...]    title           : Big Buck Bunny, Sunflower version    artist          : Blender Foundation 2008, Janus Bager Kristensen 2013[...]  Stream #0:0[0x1](und): Video: h264 [...][...]  Stream #0:1[0x2](und): Audio: mp3 [...][...]  Stream #0:2[0x3](und): Audio: ac3 [...]

I contain abbreviated it heavily, as we will evaluation this out later.

Nonetheless FFprobe is map more grand than just this!

With the following account for, we can procure the the same itemizing in JSON layout, which is machine-readable!

ffprobe -v error -hide_banner -print_format json -show_streams bbb_sunflower_1080p_60fps_normal.mp4

The clarification of this account for is the following:

  • -v error -hide_banner“: This segment hides extra output, such as headers and the default invent files.
  • -print_format json“: Obviously, this causes ffprobe to output a JSON.
  • -show_streams” is the distinguished change that requests the circulation files.
{  "streams": [    {      "index": 0,      "codec_name": "h264",      "codec_long_name": "H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10",      "width": 1920,      "height": 1080,      "bit_rate": "4001453",      "duration": "634.533333",      "############################": "[~50 lines removed]"    },    {      "index": 1,      "codec_name": "mp3",      "channels": 2,      "bit_rate": "160000",      "############################": "[~40 lines removed]"    },    {      "index": 2,      "codec_name": "ac3",      "channels": 6,      "############################": "[~20 lines removed]"    }  ]}

On this output, you would possibly per chance per chance well presumably witness three streams of files on this video file. The first (index: 0) is a video circulation, that is an HD video with an H.264 codec. Then we now contain two audio streams, the first (index: 1) is a easy mp3 circulation with stereo audio, and the 2nd (index: 2) is an ac3 circulation with 6 channels, most seemingly in an 5.1 configuration.

I contain eliminated somewhat rather of output for brevity, but you would possibly per chance per chance well presumably procure map more files out of these streams, e.g. fps for the video circulation and so forth.

Quite loads of than -show_streamsthere are 3 more: -show_format, -show_packets and -show_frames. Except you are in actuality deep in the rabbit gap, you will not need the closing two, but -show_format would possibly per chance per chance well be invaluable:

ffprobe -v error -hide_banner -print_format json -show_format bbb_sunflower_1080p_60fps_normal.mp4
{  "format": {    "filename": "bbb_sunflower_1080p_60fps_normal.mp4",    "nb_streams": 3,    "nb_programs": 0,    "format_name": "mov,mp4,m4a,3gp,3g2,mj2",    "format_long_name": "QuickTime / MOV",    "start_time": "0.000000",    "duration": "634.533333",    "size": "355856562",    "bit_rate": "4486529",    "probe_score": 100,    "tags": {      "major_brand": "isom",      "minor_version": "1",      "compatible_brands": "isomavc1",      "creation_time": "2013-12-16T17:59:32.000000Z",      "title": "Big Buck Bunny, Sunflower version",      "artist": "Blender Foundation 2008, Janus Bager Kristensen 2013",      "comment": "Creative Commons Attribution 3.0 - ///http://bbb3d.renderfarming.net",      "genre": "Animation",      "composer": "Sacha Goedegebure"    }  }}

Right here is an outline of “what is this file”. As we witness, it is some distance a MOV file (format_name), with three streams (nb_streams), and it is 634 seconds lengthy. Also, there are some tags where we can witness the title, the artist, and various files.

FFmpeg ideas

Right here’s a short intro to how FFmpeg the truth is works!

For folks that are only becoming a member of in: please procure the instance assets in expose so that you simply can envision out the instructions confirmed on this chapter!

FFmpeg opens the file, decodes it into memory, then encodes the in-memory packets back and puts them into some container: some output file. The term “codec” is a mixture of the words “codis & encoder”. These are the magic parts sooner than and after the “decoded frames”.

The decoded frames are uncompressed photos in-memory, e.g. the commonest pixel layout for video frames is called “rgb24”. This just shops crimson, inexperienced, and blue values just after every assorted in 3×8 bits, or 3×1 byte, which would possibly per chance per chance also protect 16m colors.

The importance of that is that assorted than about a exceptionsyou would possibly per chance per chance well presumably most effective manipulate or encode the decoded frames. So after we procure to assorted audio/video filters or transcoding, you can have the decoded frames for all that. Nonetheless don’t wretchedness, FFmpeg does this automatically for you.

Inputs

So you witness and presumably guessed, that FFmpeg have to procure admission to the input files in a blueprint. FFmpeg knows how to handle most media files, as the superior of us that invent FFmpeg and the associated libraries made encoders and decoders for most formats available!

Contain not reflect that it is some distance a trivial ingredient.  Many formats are reverse engineered, a exhausting process requiring excellent of us.

So despite the incontrovertible truth that we on a fashioned basis discuss with input files, the input can also attain from many sources, such as the community, a hardware instrument and so forth. We’ll be taught more about that later on listed here.

Many media files are containers for assorted streams, meaning that a single file can also indulge in more than one streams of shriek material.

Shall we embrace, a .mov file can also indulge in just a few streams:

  • video tracks
  • audio tracks (e.g. for the assorted languages or audio formats such as stereo or 5.1)
  • subtitle tracks
  • thumbnails

All these are streams of files from the purpose of idea of FFmpeg. Enter files and their streams are numerically differentiated with a 0-basically based entirely index. So, shall we utter, 1:0 blueprint the first(0) circulation of the 2nd(1) input file. We’ll be taught more about that later too!

Crucial to present that FFmpeg can begin any preference of input files simultaneously, and the filtering and mapping will recount what this would per chance well attain with those. Again more on that later!

Streams

As we now contain viewed in the previous section, streams are the elemental building blocks of containers. So every input file will have to haven’t not as a lot as one circulation. And that’s the reason what you would possibly per chance per chance well presumably checklist by the easy ffmpeg -i account for shall we utter.

A circulation can also indulge in an audio layout such as MP3, or a video layout such as an H.264 circulation.

Also, a circulation, reckoning on the codec, can also indulge in more than one “things”. Shall we embrace, an mp3 or a WAV circulation can also consist of assorted audio channels.

So the building block hierarchy, on this case is: File → Circulate → Channels.

Outputs

Unnecessary to sing, an output would possibly per chance per chance well be a native file, but it would not have to be. It goes to be a socket, a circulation and so forth. In the the same map as with inputs, you would possibly per chance per chance well presumably also contain more than one outputs, and the mapping determines what goes into which output file.

The output also will have to contain some layout or container. A entire lot of the time FFmpeg can and must wager that for us, largely from the extension, but we can specify it too.

Mapping

Mapping refers back to the act of connecting input file streams with output file streams. So whenever you happen to give 3 input files and 4 output files to FFmpeg, or not it is some distance a have to to also account for what must trek to where.

When you give a single input and a single output, then FFmpeg will wager it for you without specifying any mapping, but invent sure you know the map in which precisely that occurs, to e book particular of surprises. More on all that later!

Filtering

Filtering stands for the just of FFmpeg to alter the decoded frames (audio or video). Quite loads of capabilities can also name them results, but i am sure there would possibly per chance be a motive why FFmpeg calls them filters.

There are two forms of filtering supported by FFmpeg, easy and intricate. On this text we will most effective discuss the complicated filters, because it is some distance a superset of the easy filters, and this blueprint, we steer particular of misunderstanding and redundant shriek material.

Straight forward filters are a single chain of filters between a single input and output. Advanced filters can contain more chains of filters, with any preference of inputs and outputs.

The next resolve extends the previous overview image with the filtering module:

A complex filter graph is built from filter chainsthat are built from filters.

So a single filter does a single ingredient, shall we utter, changes the volume. This filter is barely trivial, it has a single input, changes the amount, and it has a single output.

For video, we can also evaluation out the scale filter, which will seemingly be somewhat easy: it has a single input, scales the incoming frames, and it has a single output too.

You’ll be ready to chain these filters, meaning that you simply connect the output of one to the input of the following one! So you would possibly per chance per chance well presumably contain a volume filter after an echo filter, shall we utter, and this blueprint, you will add echo, after which you change the amount.

This map, your chain will contain a single input, and this would per chance well attain several things with it and must output something at the stop.

Now, the “complicated” is available in must you would possibly per chance per chance well presumably also contain more than one chains of these filters!

Nonetheless sooner than we trek there, you should also know that some single filters can also contain more than one inputs or outputs!

Shall we embrace:

  • The overlay filter puts 2 video streams above every assorted and must output a single video circulation.
  • The ruin up filter splits a single video circulation into 2+ video streams (by copying).

So let’s discuss a complicated instance from a rooster’s scrutinize ogle! I contain two video files, I have to position them above every assorted, and I need the output in two files/sizes, 720p and 1080p.

Now, that’s where complicated filtering will seemingly be faithful to its title: to total this, you can have several filter chains!

  • Chain 1: [input1.mp4] [input2.mp4]overlayruin up[overlaid1] [overlaid2]
  • Chain 2: [overlaid1]scale[720p_output]
  • Chain 3: [overlaid2]scale[1080p_output]

As you witness, you would possibly per chance per chance well presumably connect chains, and to boot you would possibly per chance per chance well presumably connect chains to output files. There would possibly per chance be a rule that you simply would possibly per chance per chance well presumably most effective eat a sequence as soon as, and that’s reasons why we feeble ruin up as an different of the the same input for chains 2 and 3.

The takeaway is this: with complicated filter graphs (and mapping), you would possibly per chance per chance well presumably:

  • invent particular person chains of filters
  • connect input files to filter chains
  • connect filter chains to filter chains
  • connect filter chains to output files

FFmpeg’s account for line scheme

For folks that are only becoming a member of in: please procure the instance assets in expose so that you simply can envision out the instructions confirmed on this chapter!

FFmpeg CLI

Finally, we arrived at FFmpeg, and belief me, we will quit it somewhat rather of times! Let’s witness how FFmpeg’s account for line choices are organized, as that is the first complicated segment we now contain to achieve!

FFmpeg largely thinks about input and output files and their choices alongside side global choices. You specify input files with the “-i” flag followed by a file title. For the output file, specify it as-is with none previous CLI (account for line interface) flag.

Specifying an input file

Let’s specify just an input file:

ffmpeg -i bbb_sunflower_1080p_60fps_normal.mp4 

The next image helps to achieve the output:

  1. First, you procure the “banner”, where you witness the invent files and lib variations. When you explore carefully, you will witness the compilation flags, starting with e.g. –enable-shared.
  2. Then you definately procure the the same output as we now contain viewed with ffprobe earlier.
  3. After which you procure a complaint that there would possibly per chance be no output file(s) specified. That’s most gripping for now.

You’ll be ready to take dangle of the banner here with “-hide_banner”, but for brevity’s sake I won’t consist of that anymore in the instructions here, and I will leave it out from the outputs too.

Now, let’s procure mettlesome, and specify an output file!

Specifying an output

As I’ve stated earlier, the output file is identified by FFmpeg because it is just a filename. Nonetheless more particularly, it is after the input(s) specifications, and it is not a trace of any assorted switches.

Contain not be pressured for now, but trek, FFmpeg can contain as many inputs and outputs as you would really like. We’ll duvet that in extra component soon!

This account for line specifies a single output file:

ffmpeg -i bbb_sunflower_1080p_60fps_normal.mp4 audio_only.wav

Sooner than taking a contain a look at the output, let me congratulate you! You would possibly per chance per chance well also contain just transformed a video file into an audio file, by keeping just the audio shriek material!

Right here is how you transcode! Unnecessary to sing, you will resolve on to specify more parameters in a while.

So, here is the output:

Let’s analyze it!

(1) First, we now contain our input metadata printing, which we saw consistently already.

(2) Then we now contain something known as “circulation mapping”. We compelled FFmpeg into a resolution advise, as we specified an input file with 1 video and a pair of audio streams. We stated we wished an audio output (guessed from the .wav extension). Nonetheless we did not specify which audio circulation we wished, so let’s witness what FFmpeg determined:

  • Circulate #0:2” blueprint “The first input file’s third circulation” or “input file index 0’s circulation with index 2.” Right here is the input.
  • -> #0:0” blueprint the first output file’s first circulation. Right here is the output.
  • Right here you would possibly per chance per chance well presumably be taught more about how FFmpeg recount this.
  • Later on, we will manually override the mapping.
  • Summary: FFmpeg determined to seriously change the third circulation in the input file (the ac3 5.1 audio) into the first circulation of the output file.

(3) Then we now contain our output metadata files. This reveals what FFmpeg will output. It assuredly copies loads of the metadata, and here you furthermore would possibly per chance witness the container/layout files too.

(4) After which we witness the output summary. Shall we embrace, the transcoding develop to be as soon as 181x faster than the playback drag. Nice!

Figuring out the account for line expose

Sooner than going extra, let’s discover FFmpeg’s account for line arguments from a rooster’s scrutinize ogle!

In the handbookyou will witness this:

ffmpeg [global_options] {[input_file_options] -i input_url} ... {[output_file_options] output_url} ...

(Components in […] are supposed to be non-significant, and parts in {…} are supposed to be specified 1 or more times.)

Right here is the long-established outline of how to specify inputs, outputs, input choices, output choices, and global choices. The expose issues, but it is some distance straight forward to endure in tips: global choices, inputs and outputs. Also, i/o choices attain BEFORE the i/o specification.

Let’s assign these into pseudo account for line choices, to discover it better:

# One inputs, one output, nothing fancyffmpeg -i input1.mp4 output1.wav# Two inputs, one output ffmpeg -i input1.mp4 -i input2.mp4 output1.wav# Two inputs, two outputs ffmpeg -i input1.mp4 -i input2.mp4 output1.wav output2.mp3# One input, one output, with optionsffmpeg [input1 options] -i input1.mp4 [output2 options] output1.wav# Two inputs, two outputs with optionsffmpeg [input1 options] -i input1.mp4        [input2 options] -i input2.mp4        [output1 options] output1.wav        [output2 options] output2.mp3

As for the global choices, these are those you would possibly per chance per chance well presumably also care about:

  • -hide_banner: To skip printing the banner.
  • -y: To overwrite the output despite the incontrovertible truth that it exists.

Shall we embrace, you would possibly per chance per chance well presumably plug this as consistently as you’d like:

ffmpeg -y -hide_banner -i bbb_sunflower_1080p_60fps_normal.mp4 audio_only.wav

And this would per chance well overwrite the output and be less verbose than earlier.

Without explaining the decisions themselves, let’s just witness some accurate-world examples with choices:

And here it is with two inputs and two outputs:

Mapping files

We saw above that this account for:

ffmpeg -i bbb_sunflower_1080p_60fps_normal.mp4 audio_only.wav

… will consequence in an audio file that contains one amongst the audio streams from the input video chosen by FFmpeg. This automatic circulation preference is assuredly to hand when it is trivial. Shall we embrace, must you would possibly per chance per chance well presumably also contain one circulation as input and one output file, you do not have to specify any mapping manually.

Nonetheless in cases where it is not so trivial, you are assuredly better off manually specifying what you in actuality resolve on to achieve.

The next image summarises what our present advise is:

The video circulation develop to be as soon as not matched, as the output layout develop to be as soon as an audio file (.wav). Nonetheless then FFmpeg chose Circulate #2, since it has more channels.

So what if we must at all times procure the stereo song as an different? That is where mapping is available in! The mapping is a parameter of the OUTPUT file. Therefore the mapping arguments must attain just sooner than our output file definition!

ffmpeg -i bbb_sunflower_1080p_60fps_normal.mp4 -map 0:1 stereo_audio_only.wav

The argument -map 0:1 blueprint, that in the output (since we specify it as an output option) we will have to contain Input #0‘s (the first input file) Stream #1!

Let’s witness the relevant parts from the output!

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'bbb_sunflower_1080p_60fps_normal.mp4':[...]Stream mapping:  Stream #0:1 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))  [...]Output #0, wav, to 'stereo_audio_only.wav':  Metadata:[...]    Stream #0:0(und): [...] stereo [...]

The “Circulate #0:1 -> #0:0” segment blueprint that we now contain efficiently overridden the mapping, to procure the mp3 circulation (0:1) into our output! Also, the output metadata reveals that we are able to procure a stereo consequence as an different of the 5.1 earlier.

Multiple outputs

You’ll be ready to contain more than one outputs from a single input, let’s witness when that is also invaluable!

As an instance, we resolve to extract BOTH audio streams into two separate WAV files! It is extensive easy:

ffmpeg -y -i bbb_sunflower_1080p_60fps_normal.mp4 -map 0:1 stereo_audio_only.wav -map 0:2 ac3_audio_only.wav

Discover about? I contain just specified two output files with two mapping specifications! Also, I contain sneaked in the “-y” to contain it overwrite our previous file!

Let’s evaluation out the relevant parts of the output!

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'bbb_sunflower_1080p_60fps_normal.mp4':[...]Stream mapping:  Stream #0:1 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))  Stream #0:2 -> #1:0 (ac3 (native) -> pcm_s16le (native))[...]Output #0, wav, to 'stereo_audio_only.wav':    Stream #0:0(und): [...] stereo    [...]Output #1, wav, to 'ac3_audio_only.wav':    Stream #1:0(und): Audio: [...] 5.1(side)

Now the mapping reveals two traces, as we now contain two outputs! And indeed, you will procure two .wav files as the output, one is stereo, and one is 5.1!

There’ll seemingly be several assorted the the clarification why you would resolve on to procure more than one outputs. Let’s hasty evaluation out about a!

Quite loads of formats:

ffmpeg -y -i bbb_sunflower_1080p_60fps_normal.mp4 stereo_audio_only.wav  stereo_audio_only.mp3 

Wow, did you rob that? We just created a WAV and an mp3 in a single account for line! I’ve reverted to the automatic circulation preference for brevity’s sake.

A minute bit nearer to accurate-life needs, you would possibly per chance per chance well presumably also need assorted output qualities:

ffmpeg -y  -i bbb_sunflower_1080p_60fps_normal.mp4  -map 0:1 -b:a 320k stereo_audio_only_high_quality.mp3 -map 0:1 -b:a 64k  stereo_audio_only_low_quality.mp3 

Right here -b:a 320k blueprint “bitrate of audio must be spherical 320 kbit/sec“. So I contain requested FFmpeg to invent two mp3s for me, from the stereo circulation of the input.

Checking on the files, that is what we obtained:

 25Mb stereo_audio_only_high_quality.mp34,9Mb stereo_audio_only_low_quality.mp3

Any other long-established motive in the back of having more than one outputs or the employ of mapping is after we introduce filters into our pipeline, but that will seemingly be talked about later!

Now you see the foundations of how to communicate your current requirements to FFmpeg by process of its account for line! Ample job! Now we can dive even deepert.

Fingers-on with FFmpeg

On this section, we are able to look and even evaluation out some long-established capabilities of FFmpeg!

For folks that are only becoming a member of in: please procure the instance assets in expose so that you simply can envision out the instructions confirmed on this chapter!

Inputs

Let’s witness the long-established systems FFmpeg is fed with assorted files!

File

Unnecessary to sing, you would possibly per chance per chance well presumably also contain already viewed that whenever you happen to can also contain a native file for your filesystem, FFmpeg is chuffed to be taught it!

ffmpeg -i bbb_sunflower_1080p_60fps_normal.mp4 -map 0:1 stereo_audio_only.wav

This account for which is strictly the the same as one amongst our previous ones just reads a native file. Undoubtedly, that’s it.

Network

Did you know, that FFmpeg can begin a file straight on the community?!

ffmpeg -t 5 -i ///http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_60fps_normal.mp4 bbb_first_5_seconds.mp4

The account for above opens the file straight from the community and saves the first 5 seconds into a native file!

I wished to spare bandwidth for these superior guys over renderfarming.gain, so I added the duration flag: -t 5. FFmpeg would not even bag the paunchy video for this operation. Will not be in actuality that aesthetic?!

Webcam

FFmpeg can even begin your webcam!

Right here is an instance account for for Linux:

ffmpeg -f v4l2 -framerate 25 -video_size 640x480 -t 10 -i /dev/video0 10seconds_of_webcam.webm

This would file 10 seconds of your webcam!

Gaining access to the webcam occurs in a different way on assorted platforms. Also specifying parameters is assorted for every platform, so for this motive, whenever you happen to’d like to procure admission to your webcam with FFmpeg, please discuss with the documentation:

Microphone

Let’s file some audio straight from your microphone!

Checklist microphones:

arecord -l

Delivery up 10 seconds of recording:

ffmpeg -f alsa -i hw:0,0 -t 10 out.wav

This account for develop to be as soon as supposed to work on Linux, but you would possibly per chance per chance well presumably evaluation out how to achieve that on Microsoft Dwelling windows or macOS.

Pipe

Finally, FFmpeg can be taught from a pipe, and to boot output to a pipe.

On Linux, you would possibly per chance per chance well presumably also attain something like this:

cat bbb_sunflower_1080p_60fps_normal.mp4 | ffmpeg -i - -f wav pipe:1 | pv> output.wav# Alternative, without pv:cat bbb_sunflower_1080p_60fps_normal.mp4 | ffmpeg -i - -f wav pipe:1> output.wav

This account for would employ the cat program to merely be taught in the video file and output it to its long-established output. Then this output is piped INTO FFmpeg, thru its long-established input. The combo “-i –” blueprint “be taught from long-established input”. By the model, long-established input would be your keyboard otherwise, if we would not employ any redirection here.

Then we specify the significant output layout for FFmpeg, with “-f wav“. Right here is required because now we will save not contain any output file title, and FFmpeg won’t be ready to wager the layout. Then we specify “pipe:1” as an output, meaning we need FFmpeg to output to its long-established output.

From then, we pipe the guidelines into a program known as “pv“, it is just a metering tool, that dumps files on the throughput (from its stdin to its stdout). Finally, we redirect pv’s output into a WAV file.

You would possibly per chance per chance well also request why we would resolve on to achi eve that, why we discuss about this. Piping would possibly per chance per chance well even be invaluable whenever you happen to invent a complicated pipeline from assorted capabilities or in expose so that you simply can spare reading and writing to a native file.

Shall we embrace, the node equipment fluent-ffmpeg can leverage this functionality by supplying input and output streams. Shall we embrace, you would possibly per chance per chance well presumably be taught from an S3 bucket and write to one straight.

Nonetheless be warned, hell is waiting for you on that boulevard. No kidding. You wish to learn the boundaries of this blueprint. Shall we embrace, many formats can not be streamed on this form, as they need random procure admission to to the output files to write the indices at the starting of the file after processing.

Outputs

FFmpeg can output into many protocols, from native file storage and ftp to message queue protocols all the model to streaming protocols.

For more files, evaluation out the documentation here.

Transcoding audio with FFmpeg

On this chapter, we will be going to witness how to transcode into audio with FFmpeg!

The long-established blueprint is:

ffmpeg -i {input audio or video file with audio} [output options] output_audio.ext

Selecting a layout

FFmpeg is barely well-organized, and by the extension, it would possibly per chance per chance per chance resolve which codec to employ. When you specify “audio.wav” or “audio.mp3” shall we utter, FFmpeg will employ the suitable codec to achieve the encoding.

It is perfectly guessing loads of the time. Nonetheless in expose so that you simply can specify the layout manually, then the “-f” flag is your friend.

For this, it is some distance largely helpful to discuss with the checklist of formats:

ffmpeg -formats

So, these three instructions will attain precisely the the same, but the closing two requires the -f flag.

# Output codec is determined from the extensionffmpeg -i bbb_audio.wav bbb_audio.mp3# No extension in the filenameffmpeg -i bbb_audio.wav -f mp3 bbb_audio# Piped output therefore no filename, so no extension to use for guessingffmpeg -i bbb_audio.wav -f mp3 pipe:1> bbb_audio

Atmosphere the bitrate

In most cases. you would possibly per chance per chance specify the aim bitrate you request from your codec to output. When you are in doubt what bitrate is, please be taught this text’s audio bitrate section.

To specify the audio bitrate, employ the “-b:a” option with a corresponding trace, e.g.:

  • -b:a 320k: For the mp3 codec that is conception about excessive quality.
  • -b:a 128k: Decrease quality.
  • -b:a 64k: Low quality.

Shall we embrace:

ffmpeg -i bbb_audio.wav -b:a 320k bbb_audio_320k.mp3

Atmosphere the sample rate

You would possibly per chance per chance well also resolve on to specify the sample rate to invent sure quality or low output file measurement. Half the sample rate can also mean half the output file measurement. When you are in doubt what the sample rate is, please be taught the “audio sample rate” section of this text.

To specify the audio sample rate, employ the “-ar” option with a corresponding trace, e.g.:

  • -ar 48000: For excessive quality.
  • -ar 44100: For CD quality (restful excessive).
  • -ar 22500: A minute bit of a compromise, not rapid for song, but for speech, it will seemingly be ample.
  • -ar 8000: Low quality, e.g. whenever you happen to most effective need “understandable” speech.

Shall we embrace:

ffmpeg -i bbb_audio.wav -ar 44100 bbb_audio_44100khz.mp3

Atmosphere the channel rely

Atmosphere the channel rely would possibly per chance per chance well even be invaluable, shall we utter, whenever you happen to can also contain a stereo recording of a single particular person’s speech. If that is the case, you would possibly per chance per chance well presumably even be shriek material with just a mono output half the scale of the distinctive recording.

When you are in doubt what an audio channel is, please be taught the “audio channels” section of this text.

To specify the channel rely employ the  “-ac” option with a corresponding trace, e.g.:

  • -ac 1: For mono
  • -ac 2: For stereo
  • -ac 6: For 5.1

Shall we embrace:

ffmpeg -i bbb_audio.wav -ac 1 bbb_audio_mono.mp3

Total account for line for converting audio with FFmpeg

Right here is how you originate a excessive-quality output:

# Convert wav to mp3ffmpeg -i bbb_audio.wav -ac 2 -ar 44100 -b:a 320k bbb_audio_hqfull.mp3# Convert wav to m4a (aac)ffmpeg -i bbb_audio.wav -ac 2 -ar 44100 -b:a 320k bbb_audio_hqfull.m4a# Convert wav to ogg (vorbis)ffmpeg -i bbb_audio.wav -ac 2 -ar 44100 -b:a 320k bbb_audio_hqfull.ogg

Investigate cross-check this documentation about genuine quality audio transcoding too!.

Lossless formats

When you would possibly per chance per chance seriously change audio into a lossless layout, listed below are about a picks for you:

# Convert to flac (Free Lossless Audio Codec)ffmpeg -i bbb_audio.wav -compression_level 12 bbb_audio_lossless_12.flac # Best compression, slowestffmpeg -i bbb_audio.wav -compression_level 5 bbb_audio_lossless_5.flac   # Defaultffmpeg -i bbb_audio.wav -compression_level 0 bbb_audio_lossless_0.flac   # Least compression, fastest# Convert to wavcp bbb_audio.wav bbb_audio_lossless.wav # Just kidding:)# Convert to wav ffmpeg -i any_audio.ext bbb_audio_lossless.wav

It be genuine whenever you happen to know that flac results in a smaller file than WAV, as WAV would not the truth is compress by default:

117M bbb_audio.wav52M  bbb_audio_lossless_0.flac45M  bbb_audio_lossless_5.flac43M  bbb_audio_lossless_12.flac

WAV is commonly considered a lossless layout, but rob into consideration that the WAV container can indulge in lossy shriek material too, but by default FFmpeg uses the pcm_s16le layout, which is the 16 bit PCM, that can per chance well be understood as lossless.

Be taught more here and here.

Transcoding video with FFmpeg

On this chapter, we will be going to witness how to transcode a video file into the two most long-established formats!

Converting to H.264

H264 is one amongst the most well-most in model video codecs. Most gadgets, browsers and video players discover how to play it. It is atmosphere friendly in storing video shriek material, but as with most developed video codecs, it is some distance a resource intensive-process to encode and decode.

A total account for line for a excessive-quality H.264 transcoding with excessive-quality AAC audio is the following:

ffmpeg -y -i bbb_sunflower_1080p_60fps_normal.mp4 -c:v libx264 -preset slow -crf 22 -profile:v main -g 250 -pix_fmt yuv420p -map 0:0 -map 0:1 -acodec aac -ar 44100 -b:a 320k bbb_transcoded_h264_HQ.mov

Be trek you realize this account for and to customize it to envision your needs.

To will enable you to achieve that, let’s dissect this account for!

World choices:

  • -y: Overwrite the output.

Enter choices:

  • -i bbb_sunflower_1080p_60fps_normal.mp4: The input file.

Output choices:

-c:v libx264: Place the codec to libx264.

-preset unhurried: libx264 has a range of variables that you simply presumably will seemingly be tune, and most of them steadiness the coding drag and the resulting file measurement. To invent your life more straightforward, there are presets whereby you would possibly per chance per chance well presumably without peril relate what you would possibly per chance per chance well presumably like: limited measurement or drag.

-crf 22: Right here is the fixed rate ingredient, the distinguished option for atmosphere image quality. It is a host between 0-51, where 0 is lossless, and 51 is the worst quality. Mainly, you’d like something between 17 and 28. Right here is the probability to tune the steadiness between image quality and file measurement. Check my comparability video here.

-profile:v distinguished -g 250 -pix_fmt yuv420p: These are developed choices, guaranteeing you a somewhat backward like minded consequence. (Discover about this, thisand this.)

-map 0:0 -map 0:1: You would possibly per chance per chance well also not need this: these choices are deciding on the most gripping video and audio streams. In our casewe now contain two audio streams, and we need the stereo one to e book particular of some issues with our aac circulation.

-acodec aac: Retract the AAC (Progressed Audio Coding) codec for the audio in the output. We must be more particular than just -f for the layout. Now we contain to specify the audio codec here manually.

-ar 44100: Place the audio sampling rate (be taught more about that in previous chapters of this text).

-b:a 320k: Place the audio bitrate (be taught more about that in previous chapters of this text).

30seconds_of_bb.mkv: The output file title. The total choices for the reason that closing -i (or the closing output file) conception about to be a modifier for this output.

Let’s witness the output:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'bbb_sunflower_1080p_60fps_normal.mp4':[...]Stream mapping:  Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))  Stream #0:1 -> #0:1 (mp3 (mp3float) -> aac (native))[...]Output #0, mov, to 'bbb_transcoded_h264_HQ.mov':    Stream #0:0(und): Video: h264 (libx264) (avc1 / 0x31637661), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, 60 fps, 15360 tbn, 60 tbc (default)    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, 5.1(side), fltp, 320 kb/s (default)[...]frame=38074 fps=35 q=-1.0 Lsize= 324855kB time=00:10:34.51 bitrate=4194.1kbits/s dup=2 drop=0 speed=0.58x 

From this, we discover that FFmpeg chose the mp3 circulation from the input file because we told it to achieve so. (Undergo in tips, it has two audio streams in it, a stereo mp3 and a 5.1 ac3.) We also witness that my machine can also transcode with 35fps (0.58 times the playback drag), and our settings resulted in a median video bitrate of 4200 kbit/s.

The video bitrate is a enthralling quiz on this mode. With the CRF option, we specify the “fixed visible quality” we need. To attain a fixed visible quality, the encoder works exhausting to wager how fundamental it would possibly per chance per chance per chance compress trek parts of every frame, and the outcomes of that wager defines the closing average video bitrate.

When you would possibly per chance per chance well presumably like even better results with H.264, and to boot you would possibly per chance per chance well presumably bag the money for map more processing time and map more refined process, evaluation out the 2-circulation encoding as an different of the fixed rate ingredient blueprint launched above.

To be taught more about these two assorted rate management systems, be taught the superior Figuring out Fee Alter Modes article. And to be taught more referring to the intricacies of H.264 encoding, evaluation out the H264 encoding files.

Finally, in a whileI will impress you a comparability video that shows how assorted CRF values designate!

Converting to H.265

H.265 is the successor of H.264, in accordance with the official FFmpeg handbookit affords 25-50% bitrate savings whereas keeping the the same visible quality.

A total account for line for a excessive-quality H.265 transcoding with excessive-quality AAC audio is the following:

ffmpeg -y -i bbb_sunflower_1080p_60fps_normal.mp4 -c:v libx265 -preset slow -crf 27 -profile:v main -g 250 -pix_fmt yuv420p -map 0:0 -map 0:1 -acodec aac -ar 44100 -b:a 320k bbb_transcoded_h265_HQ.mov

And the consequence is:

...encoded 38074 frames in 3384.84s (11.25 fps), 1720.32 kb/s, Avg QP:35.29

H.265 also has more than one rate management algorithms, I feeble the CRF blueprint here. When you would possibly per chance per chance employ a abnormal rate management algorithm, then you would possibly per chance per chance well presumably also evaluation out the H.265 encoding files. Also, evaluation out the following section, where I will demonstrate how assorted CRF values designate!

This account for is nearly the the same as what we feeble in the H.264 instance above, so please discuss with that section to achieve the arguments.

If we evaluation H.264 and H.265 with our instructions above, taking into story this 10-minute lengthy video on my scheme, these are the outcomes:

  • H.264 is Three times faster (35 fps vs 11 fps)
  • H.264 produces a 2 times increased file (318 mb vs 156 mb)

Evaluating CRF values with H.264 and H.265

I contain created a video for your comfort, that shows the assorted crf values in action. The selected frame had some motion on it with the leaves in the bunny’s hand. Circulate is serious with video codecs, as assuredly that’s where quality losses are first visible.

This video shows how the assorted CRF values designate, from 0-51 with the H.264 and H.265 formats!

H.264 & H.265 CRF comparability video

(Are you able to wager which program I develop to be as soon as the employ of to invent this?:))

In model editing with FFmpeg

On this section, we will stop current editing obligations by the employ of FFmpeg most effective!

We’ll just procure a current mp4 with default settings in these examples to protect things easy. Nonetheless to encode the consequence in a genuine, excessive quality map, please evaluation the earlier sections where we learned how to encode into H.264 and H.265!

Trimming from the starting of the clip

It is imaginable to specify an in-point for a media file. By doing that, you in actuality slash back off the specified quantity from the starting of the input file. Therefore, FFmpeg will skip the first segment of the file and most effective transcode the the rest!

For this, you would possibly per chance per chance well presumably like the “-ss” flag! The price would possibly per chance per chance well even be specified in seconds (5 or 5.2) or as a timestamp (HOURS:MM:SS.MILLISECONDS).

To procure the outro most effective, we can also search for all the model to the stop of the video! (It is 00:10:34.fifty three or 635 seconds lengthy!)

# Get # 635 - 4=631ffmpeg -y -ss 631 -i bbb_sunflower_1080p_60fps_normal.mp4 last_4_seconds.mp4# 00:10:34.53 - 4=00:10:30.53ffmpeg -y -ss 00:10:30.53 -i bbb_sunflower_1080p_60fps_normal.mp4 last_4_seconds.mp4

In the hunt for assuredly is rather complicated, so it is some distance largely helpful to be taught more about looking for here.

Trimming from the stop of the clip

You’ll be ready to also predicament an out-point for an input file, therefore shortening it. There are two choices for this:

  • -t: This sets the duration.
  • -to: This sets the timestamp where the input video must quit.

These two are mutually engrossing, and to boot they attain the the same if no -ss is specified. The price would possibly per chance per chance well even be specified in seconds (5 or 5.2) or as a timestamp (HOURS:MM:SS.MILLISECONDS).

Let’s experiment with them!

# "Get 30 seconds of the input."ffmpeg -y -t 30 -i bbb_sunflower_1080p_60fps_normal.mp4 first_30_seconds.mp4ffmpeg -y -t 00:00:30.0 -i bbb_sunflower_1080p_60fps_normal.mp4 first_30_seconds.mp4# "Get everything until the content's 30th second." ffmpeg -y -to 30 -i bbb_sunflower_1080p_60fps_normal.mp4 first_30_seconds.mp4ffmpeg -y -to 00:00:30.0 -i bbb_sunflower_1080p_60fps_normal.mp4 first_30_seconds.mp4

All four above instructions consequence in precisely the the same video. (For nerds: even the md5sum is the the same.)

Nonetheless let’s witness how they designate after we introduce looking for!

# "Seek to the 10th second and get me 30 seconds of the input."ffmpeg -y -ss 10 -t 30 -i bbb_sunflower_1080p_60fps_normal.mp4 part_between_10_and_40.mp4# "Seek to the 10th second and get the content until the 30th second."ffmpeg -y -ss 10 -to 30 -i bbb_sunflower_1080p_60fps_normal.mp4 part_between_10_and_30.mp4

The first account for will consequence in a 30 2nd lengthy video, whereas the 2nd account for will seemingly be 20 seconds lengthy most effective!

The resolve below shows the variation:

Bettering without reencoding

FFmpeg can attain something I’m not attentive to in any assorted in model NLE: it would possibly per chance per chance per chance edit movies without reencoding them!

The identical outdated workflow is to decode the guidelines frames (a/v) into memory, modify them as fundamental as we like after which encode them into a brand original video file. The advise with that is that except you work with raw or lossless codecs, you will lose some quality in the process. Any other wretchedness with this blueprint is that it is computationally intensive.

Evidently operations, you would possibly per chance per chance well presumably configure FFmpeg, to protect the guidelines frames intact, and this blueprint, you would possibly per chance per chance well presumably steer particular of decoding and encoding them! Right here is amazingly faster than fashioned transcoding, assuredly a entire bunch of times faster.

The “trek operations” are of us that don’t have to modify the guidelines frames themselves. Shall we embrace, you would possibly per chance per chance well presumably slash back and clean this blueprint. Also, you would possibly per chance per chance well presumably manipulate streams whereas keeping others, like you would possibly per chance per chance well presumably change the audio song without touching the video frames.

All that is rather of magic, and there are caveats or not it is some distance a will have to contain to prepare for, but it is genuine whenever you happen to know about this, because it is assuredly to hand!

The trick lies in two choices:

  • -c:v replica: The “replica” video codec
  • -c:a copy: The “replica” audio codec

Let’s witness about a examples!

In finding audio whereas keeping the video without reencoding

ffmpeg -i bbb_sunflower_1080p_60fps_normal.mp4 -c:v copy -an copied_video_only.mp4

Right here, we feeble the “-an” option, which eliminates all audio streams. I remembered it as “ashare no”, but that is only my mnemonic:)

Let’s witness how briskly it develop to be as soon as:

frame=38072 fps=20950 q=-1.0 Lsize= 310340kB time=00:10:34.51 bitrate=4006.7kbits/s speed=349x

So It processed your total 10 minutes of video in 2 seconds, 349x faster than playback, with 20950 fps!

In finding video whereas keeping the audio without reencoding

ffmpeg -i bbb_sunflower_1080p_60fps_normal.mp4 -c:a copy -vn copied_audio_only.wav

Right here, we feeble the “-vn” option, which eliminates all video streams. I remembered it as “video no”.

Let’s witness how briskly it develop to be as soon as:

size=  24772kB time=00:10:34.14 bitrate=320.0kbits/s speed=776x 

776x faster than playback, carried out in about a 2nd, not deplorable!

Cut and clean without reencoding

ffmpeg -ss 10 -t 10  -i bbb_sunflower_1080p_60fps_normal.mp4 -c:a copy -c:v copy part_from_10_to_20_copied.mp4

There would possibly per chance per chance well be precision issues with looking for whereas you attain this, so it is some distance largely helpful to be taught more about looking for and copying here.

Replace audio on video file without reencoding

Now we contain eliminated audio and video already, but what if we resolve to swap them?

ffmpeg -y -i bbb_sunflower_1080p_60fps_normal.mp4 -i voice_recording.wav -map "0:v" -map "1:a" -c:v copy -c:a copy bbb_with_replaced_audio.mov

There is barely rather happening in here, so let’s expose the parts!

First, we now contain two inputs (-i), meaning we’re better off manually specifying the mapping. The account for would work without the “-map” choices, but it would possibly per chance per chance per chance well ignore our 2nd input.

-map "0:v" -map "1:a" blueprint that please employ the first file’s (first) video circulation and the 2nd file’s (first) audio circulation.

With -c:v copy -c:a copywe require FFmpeg to replica the already encoded files packets without touching them. Therefore FFmpeg’s work is largely in actuality just copying bytes, no decoding, no encoding.

No longer surprisingly, that’s what we witness in the circulation mapping too:

Stream mapping:  Stream #0:0 -> #0:0 (copy)  Stream #1:0 -> #0:1 (copy)Press [q] to stop, [?] for helpframe=38072 fps=9750 q=-1.0 Lsize= 320645kB time=00:10:34.51 bitrate=4139.7kbits/s speed=162x  

And since it is just copying, it develop to be as soon as crazy immediate, 162x of the playback drag, or nearly 10k frames per 2nd!

Nonetheless!

Lift out the accurate identical account for, but with “bbb_with_replaced_audio.mp4” (.mp4 container as an different of .mov) as an output file! You’re going to procure this:

Could not find tag for codec pcm_s16le in stream #1, codec not currently supported in container

The message is barely particular. You’ll be ready to not contain a pcm_s16le (raw WAV, utter that 10 times:)) circulation in an MP4 container. I’m not sure if it is FFmpeg’s or the container’s lack of toughen, but we now contain to clear up this. When you plug into this advise, you would possibly per chance per chance well presumably also contain in tips two choices:

  1. Alternate the container: I’ve just tried MOV, and it labored.
  2. Encode the audio: We restful replica the video files, and encoding audio is rarely at all times in actuality that painful.

I just showed you option #1, so let’s witness option #2:

ffmpeg -y -i bbb_sunflower_1080p_60fps_normal.mp4 -i voice_recording.wav -map "0:v" -map "1:a" -c:v copy -c:a aac -b:a 320k -ar 44100 bbb_with_replaced_audio_aac.mp4

This copies the video frames and encodes our WAV into a supported codec to be held in the mp4 container. You’ll be ready to refer back to the audio encoding section in expose so that you simply can be taught more about that.

Right here is the output:

Stream mapping:  Stream #0:0 -> #0:0 (copy)  Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))Press [q] to stop, [?] for help...frame=38072 fps=2176 q=-1.0 Lsize= 313058kB time=00:10:34.51 bitrate=4041.8kbits/s speed=36.3x 

“Utterly” 36x faster than playback, 2176 fps, restful not that deplorable!

Filtering overview

FFmpeg supports many audio and video filters. Presently, there are 116 audio and 286 video filters, but there are map more if we rely the hardware accelerated ones too.

So how will we leverage them?

There are two systems to account for filters, but I will expose the complicated filter, as the variation isn’t fundamental, but it is more versatile. So there would possibly per chance be a world option for FFmpeg, known as: -filter_complex. With somewhat a unfamiliar syntax, you would possibly per chance per chance well presumably specify all of your filters and their parameters just after this selection.

You’ll be ready to recount referring to the process with the following image:

Mainly, your filter graph can procure admission to all the inputs (-i a.mp4 -i b.mp4 -i c.mp4), and it would possibly per chance per chance per chance originate as many outputs as you like (-map will seemingly be wanted).

In model syntax

Let’s rob a contain a look at a easy, current instance:

ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "drawtext=text='HELLO THERE':y=20:x=30:fontsize=200:fontfile=/usr/share/fonts/truetype/freefont/FreeSerif.ttf" filter_complex1.mp4

Even supposing -filter_complex is a world option, I like to position it after the inputs and sooner than the outputs because it is some distance rather more straightforward to miss out on your total account for that map. Fortuitously the account for line parser of FFmpeg is well-organized ample, and it works.

The account for above produces a 5-2nd-lengthy video, where the text “HELLO THERE” is overlaid on the intro of Great Buck Bunny.

Let’s discover the unfamiliar layout for specifying filters!

We’ll trek backside-up, and we will invent it from there. So the commonest layout is this:

FILTER_NAME=ARGUMENT1=VALUE1:ARGUMENT2=VALUE2

Shall we embrace:

drawtext=text='HELLO THERE':y=20:x=30

The first ingredient sooner than the first equal (=) signal is the filter’s title, which is the drawtext filter on this case. Then we now contain our first argument, “text” and its trace “‘HELLO THERE'”. True after that, separated with a colon (:) comes the following argument, “y” with a trace of “20”.

You’ll be ready to wager what every of the text, y, x, fontsize and fontfile arguments attain, because it is somewhat self-explaining. Nonetheless especially for the first time, you will heavily depend on the filtering documentation to achieve every filter and each argument.

Also, several characters are reserved, such as: , := and some others depending for your atmosphere, so sooner or later or not it is some distance a will have to contain to be taught about escaping too.

To recap, our pipeline appears to be like as if this now:

Multiple filters in a sequence

This previous account for is a single filter chain that consists of a single filter most effective, but you would possibly per chance per chance well presumably also contain more filters assign just after every assorted! It blueprint that the output of one filter stands out as the input for the following! attain that is by atmosphere apart them with a comma!

Let’s plot two containers with the drawbox filter!

ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "  drawbox=x=10:y=10:w=100:h=100:color=red  ,  drawbox=x=200:y=200:w=100:h=100:color=blue  " filter_complex2.mp4

Discover about? The output of the first filter is passed to the output of the 2nd filter!

Let’s visualize our pipeline yet again:

Enter and output pads

Now, we now contain skipped something this some distance, because for easy uses FFmpeg is well-organized ample to achieve it for us. And that’s the specification of a sequence’s input and output pads!

Let’s plot just a single rectangle for now:

ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "drawbox=x=10:y=10:w=100:h=100:color=red" filter_complex3.mp4

FFmpeg sees that the input for our filter chain is a single video file, and the output is a single output video file. Therefore, it safely assumes that we need that single input as the input of our single filter chain. And that single output must be the single output of our single output chain.

That’s in actuality nice, as, in easy instances like this, we don’t have to set and map inputs and outputs manually! Nonetheless after we procure more inputs, filter chains, or outputs, it is no longer imaginable. Therefore, we now contain to achieve how to set inputs and outputs!

To begin with, let’s evaluation the following two account for traces. They consequence in precisely the the same consequence, but the 2nd one represents what FFmpeg does internally (roughly):

ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "drawbox=x=10:y=10:w=100:h=100:color=red" filter_complex3.mp4ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "[0:v]drawbox=x=10:y=10:w=100:h=100:color=red[out_link_0]" -map "[out_link_0]" filter_complex3.mp4

Lift out you witness the variation? Sooner than our filter chain, an “input pad” is outlined: [0:v]. The anticipated layout between the square brackets is documented in the circulation preference section of the official documentation, and this text already covered it.

Nonetheless, a short summary:

  • 0:v: This means the first video circulation of the first input file.
  • 0:v:0: Means precisely the the same ingredient but in a lengthy procure.
  • 0:0: Means the first circulation of the first input file (not rapid, because it is some distance also anything in belief. It goes to be a subtitle circulation, a thumbnail, a video or an audio circulation…)
  • 0:a: This means the first audio circulation of the first input file.
  • 0:a:0: Means precisely the the same ingredient but in a lengthy procure.
  • 0:a:1: Means the 2nd (index #1) audio circulation of the first input file.

So we can specify which input file must be connected to which input of the filter graph!

Also, something the same goes on at the stop! Lift out you witness, the [out_link_0] output pad definition at the stop of our filter chain?

The naming here is easier, as in total you would possibly per chance per chance well presumably specify any arbitrary title in here. It roughly blueprint, “please retailer the output files below this title”.

And must you specify your output file, you would possibly per chance per chance well presumably or have to map it by deciding on one amongst your filter graph outputs! Therefore, we must at all times add the -map “[out_link_0]” option sooner than our output file.

This map option blueprint this: “Please assign the guidelines circulation with this title into the following output file.”

Right here is guidelines on how to visualise this input/output mapping:

Multiple chains

Coming from the previous sections, you are now ready to witness and discover an fundamental more refined configuration, which has more than one input files, output files, and filter chains!

ffmpeg -y  -i train.jpg -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "[0:v]drawbox=x=10:y=10:w=100:h=100:color=red[train_box] ; [1:v]drawbox=x=10:y=10:w=100:h=100:color=red[bbb_box]" -map "[train_box]" filter_complex4_train.jpg -map "[bbb_box]" filter_complex4_bbb.mp4

Let’s witness the output (two files subsequent to every assorted):

We had two inputs, and we obtained two output files, an image, and a video, with a crimson rectangle on them, with a single account for!

Are you continue to here? I am hoping! Let’s discover what took predicament in that crazy account for! Now we contain two input files:

  • -i prepare.jpg: A easy image file
  • -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4: Our video file, but to invent it snappy, just the first five seconds of it

Then the first ingredient to present is that we now contain two filter chains! They are separated with a “;“.

Our first filter graph is this: [0:v]...[train_box]

  • This requests the first input file as an input
  • Draws a crimson field
  • Saves the output into the “train_box” output pad

Our 2nd filter graph is this: [1:v]...[bbb_box]

  • This requests the 2nd input file as an input
  • Draws a crimson field
  • Saves the output into the “bbb_box” output pad

And at closing, we obtained two outputs, every mapping to one amongst the outputs of the filter graph:

  • -map “[train_box]” filter_complex4_train.jpg
  • -map “[bbb_box]” filter_complex4_bbb.mp4

Right here is the the same ingredient visually:

When you are engrossing about making it fundamental more complicated and making filter graphs that mix more than one inputs into one shall we utter, you are on the just song! It is imaginable, and we are able to procure to that!

This develop to be as soon as the introduction to the filtering scheme and its syntax.

Bettering video

Now let’s procure to know about a filters and invent some gripping stuff!

Resizing or scaling

The scale filter is a easy one, yet it is somewhat grand!

ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "scale=width=600:height=-1:force_original_aspect_ratio=decrease" filter_complex5_scaled1.mp4

The arguments communicate for themselves, but about a things:

  • Specifying -1 to both width or height blueprint rescaling whereas keeping the facet ratio.
  • “force_original_aspect_ratio” would possibly per chance per chance well even be increase, decrease. That blueprint this would per chance well lengthen or decrease the image to fit the specified bounding field whereas keeping the facet ratio.

Including text

Now we contain already covered this just a minute, so let’s dive deeper!

Right here is what we feeble earlier:

ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "drawtext=text='HELLO THERE':y=20:x=30:fontsize=200:fontfile=/usr/share/fonts/truetype/freefont/FreeSerif.ttf" filter_complex1.mp4

Now let’s look how to align the text!

Many filters, alongside side drawtext, toughen variables in about a of its argument’s values. When you scroll down in the documentation of drawtextyou would possibly per chance per chance well presumably also bag this:

“The parameters for x and y are expressions containing the following constants and capabilities: “

And after this segment, you will witness many variables which you would possibly per chance per chance well presumably consist of for your x and y variables!

Let’s witness:

# Align the text to the centerffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "drawtext=text='HELLO THERE':y=h/2-text_h/2:x=w/2-text_w/2:fontsize=200:fontfile=/usr/share/fonts/truetype/freefont/FreeSerif.ttf" filter_complex6_center.mp4# y=h/2-text_h/2 means: y position=(image height / 2) - (text height / 2)# Align the text to the right:ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "drawtext=text='HELLO THERE':y=30:x=w-text_w-20:fontsize=200:fontfile=/usr/share/fonts/truetype/freefont/FreeSerif.ttf" filter_complex6_right.mp4# x=w-text_w-20 means: x position=image width - text width - 20pixel padding# Align the text to the bottom:ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "drawtext=text='HELLO THERE':y=h-text_h-20:x=30:fontsize=200:fontfile=/usr/share/fonts/truetype/freefont/FreeSerif.ttf" filter_complex6_bottom.mp4# y=h-text_h-20 means: y position=image height - text height - 20pixel padding

And that’s what we will procure in the stop:

I have to mention one genuine trick that can even not be obtrusive first and major. So the text_h variable is an advanced one, because assorted text will seemingly be of assorted height! E.g.: “____” and “WWW”  will consequence in a abnormal height.

For this motive, you attain not at all times resolve on to employ text_h and even just a fixed y=trace expression but barely, or not it is some distance a will have to contain to align text by its baseline. So just endure in tips to employ the “ascent” variable on every occasion or not it is some distance a will have to contain to align text vertically!

Investigate cross-check these two examples! Every has two drawtext filters printing “_” and “_H”:

# This one uses y=200 for both, still the text isn't aligned properly!ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "drawtext=text='_':y=200:x=30:fontsize=200:fontfile=/usr/share/fonts/truetype/freefont/FreeSerif.ttf,drawtext=text='_H':y=200:x=500:fontsize=200:fontfile=/usr/share/fonts/truetype/freefont/FreeSerif.ttf" filter_complex7_bad_text.mp4# This one uses y=200-ascent for both and the text is aligned as expected!ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -filter_complex "drawtext=text='_':y=200-ascent:x=30:fontsize=200:fontfile=/usr/share/fonts/truetype/freefont/FreeSerif.ttf,drawtext=text='_H':y=200-ascent:x=500:fontsize=200:fontfile=/usr/share/fonts/truetype/freefont/FreeSerif.ttf" filter_complex7_good_text.mp4

Now let’s evaluation the variation:

Discover about? Right here is the variation between aligning the “high left” or the “baseline” of the text!

Including an overlay

Preserving is a extraordinarily gripping ingredient to achieve with FFmpeg. Let’s soar just in!

In model

ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4  -i smiley.png -filter_complex "overlay" filter_complex8_overlay1.mp4

Easy as that!

Unnecessary to sing, the overlay filter has a ton of choices, but I wished to display the highest imaginable account for line. We don’t even have to mess with input/output pads, as FFmpeg automatically understands the advise: two inputs for the overlay filter and its single output into a single output.

Nonetheless simply to exercise, we can also contain carried out it like this:

ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4  -i smiley.png -filter_complex "[0:v][1:v]overlay[output]" -map "[output]" filter_complex8_overlay2.mp4

And this would possibly per chance per chance consequence in the the same output! Check it out, now I contain specified the two inputs for the overlay: [0:v][1:v]!

Aligned

Let’s align the smiley into the center!

As we now contain viewed with the drawtext, the overlay filter’s arguments also toughen about a dynamic variables. We’ll employ those to total what we need!

ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4  -i smiley.png -filter_complex "overlay=x=main_w/2-overlay_w/2:y=main_h/2-overlay_h/2" filter_complex8_overlay3.mp4

Preprocessing the input for overlay

Let’s procure rather creative!

I have to invent it smallerand I also resolve on to blur it!

Now live for a minute, and evaluation it, how you would attain that?!

Ready?

ffmpeg -y  -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4  -i smiley.png -filter_complex "[1:v]scale=w=200:h=-1,gblur=sigma=3[smiley] ; [0:v][smiley]overlay=x=100:y=100" filter_complex8_overlay4.mp4

For this we wanted to contain two filter graphs!

The first one is this: [1:v]scale=w=200:h=-1,gblur=sigma=3[smiley]

  • Scales the input image (the smiley).
  • Then the scaled output will seemingly be blurred.
  • Then the output is saved into the output pad named “smiley”.

Then, we now contain our 2nd filter graph: [0:v][smiley]overlay=x=100:y=100

  • This takes as input the first input file (the video).
  • This also takes as input the output pad named “smiley”. (We are connecting two chains this time!)
  • Then the overlay filter does its overlaying ingredient, and we belief FFmpeg to pair the unnamed output with the single output file we specified.

Reusing shriek material

Let’s attain one more, a extraordinarily refined one!

Let’s contain the outro overlaid over the intro!

ffmpeg -y -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4 -t 5 -ss 00:09:40 -i bbb_sunflower_1080p_60fps_normal.mp4  -filter_complex " [1:v]scale=w=1920/2:h=-1[outro]; [0:v][outro]overlay" filter_complex8_overlay5.mp4

We can also contain done it in several systems, e.g. we can also employ the clean filter, but to protect it easy, we just begin the the same file twice and search for/clean them.

  • -t 5 -i bbb_sunflower_1080p_60fps_normal.mp4: Open the video, and protect the first five seconds of it.
  • -t 5 -ss 00:09:40 -i bbb_sunflower_1080p_60fps_normal.mp4: Open the the same video yet again, but search for to the stop and protect five seconds from there.

Then we now contain two filter graphs yet again, one scales down the outro, and the 2nd is only an overlay.

Are you furious?:) I am hoping these made-up examples spread out your scrutinize for the odds, and I am hoping you will invent very creative stuff with this files!

Chroma keying, inexperienced display mask, blue display mask

On this section, we will employ chroma keying to take dangle of the background from Great Buck Bunny’s intro, after which we are able to assign the clear logo over the distinctive video, as if it would possibly per chance per chance per chance well be some form of an emblem overlay!

ffmpeg -y -ss 0.5 -t 2 -i bbb_sunflower_1080p_60fps_normal.mp4 -ss 10 -i bbb_sunflower_1080p_60fps_normal.mp4  -filter_complex " [0:v]chromakey=color=0xfdfdfd:similarity=0.1:blend=0.2 , scale=w=-1:h=300 , loop=loop=-1:start=0:size=120[intro] ; [1:v][intro]overlay=x=-40:y=-40" -t 10 filter_complex9.mp4

So simply to recap, Great Buck Bunny’s first few seconds are like this:

And that’s the consequence:

Also, the butterfly moves its wings consistently!

Let’s reflect the account for!

  • -ss 0.5 -t 2 -i bbb_sunflower_1080p_60fps_normal.mp4: We be taught in the intro from 0.5 to 2.5 seconds.
  • -ss 10 -i bbb_sunflower_1080p_60fps_normal.mp4: We be taught in the video, ranging from the tenth 2nd.

Then we now contain two filter graphs, the first being this:

[0:v]chromakey=color=0xfdfdfd:similarity=0.1:blend=0.2 , scale=w=-1:h=300 , loop=loop=-1:start=0:size=120[intro]

As we witness, we now contain three filters in here!

  • chromakey: This one takes a coloration and some parameters as input, and outputs clear frames. The specified coloration + the blended areas stands out as the clear sections. In our case we replaced the white-ish (#fdfdfd) background coloration with transparency.
  • scale: We resize the paunchy 1080p image into something spherical 300px excessive.
  • loop: With the loop filter, we repeat all the 2 seconds price of 120 frames (60*2) over and yet again, to contain the butterfly switch its wings constantly.

After which, at closing we now contain the 2nd filter graph:

[1:v][intro]overlay=x=-40:y=-40

Nothing love, just an overlay of the distinctive video and our chrome keyed intro.

What else?

You would possibly per chance per chance well also resolve on to contain a look at out about a more filtersthat I did not duvet here.

Listed below are only about a gripping ones:

Audio manipulation

On this chapter, we will be going to contain a look at out some audio manipulation ways with FFmpeg!

To begin with, let’s witness our instance file:

It is a converse recording, and it is intentionally… well, somewhat deplorable.

From the waveform, it is obtrusive that there are very assorted volume ranges in it. Right here is an instance recording where every sentence develop to be as soon as be taught in assorted strengths: “long-established”, “order” or “grand”, that’s the reason you witness repeating patterns of amplitude ranges on the image.

It is not in actuality visible, but it has some noise too, and surely, it is not normalized or enhanced in anyway. But.

Please present that there are assorted scenarios, requirements, and systems to toughen audio. Right here’s a simplified choice to impress the outline of the process listed here. I’m not an audio engineer, despite the incontrovertible truth that I contain some expertise in the rental. So whenever you happen to know it better, in actuality be at liberty to most gripping-tune it for your self fundamental more, or contact me and imply improvements!

I’m exhibiting an instance here with a extraordinarily rough input, one which you’d just reject in accurate life because it would possibly per chance per chance per chance well be unnecessary attributable to its quality. Nonetheless it is a classy instance to impress the assorted steps of the bettering process and to witness what would possibly per chance per chance well even be carried out to it!

The next steps are built upon every assorted, and we will attain the total account for at the stop!

Contain not neglect that these settings are particular to this converse recording. Sadly this can even not be generalized too fundamental.

Gate

Let’s initiate with the gate filter!

A gate is like a metamorphosis that opens most effective if the signal is stronger than the brink. So if the signal stage is lower than the brink, it cuts to total silence. Even supposing you would possibly per chance per chance well presumably also soften or prolong this slash back with the knee, assaultand free up arguments.

We’ll employ this filter as a current noise slash price blueprint now! This helps us take dangle of the noise between words and sentences by slicing it to silence. It would not take dangle of noise in any assorted map, e.g. it would not contact the static on the converse itself.

Check this out!

ffmpeg -y -i voice_recording.wav -filter_complex "agate=threshold=0.01:attack=80:release=840:makeup=1:ratio=3:knee=8" gate.wav

Let’s hear it: gate.wav

And let’s witness it:

As you would possibly per chance per chance well presumably witness, the “quiet” parts were attenuated heavily, whereas the above-the-threshold parts remained the same. These parts were restful plagued by the knee, assault, and free up arguments determining how exhausting (knee) and snappy (assault/free up) the slash back is.

I’ve left a somewhat excessive free up timeout here to e book particular of unexpected dips in the amplitude.

Right here is where we’re just now:

The quiet parts are more quiet than sooner than, but restful, the amplitude fluctuate or the dynamic fluctuate is barely excessive. You wish to change your volume ranges to listen to all the pieces and void blowing your speakers/mind out.

Equalization

Sooner than fixing that, let’s attain map more housekeeping. Let’s attain some equalization and frequency filtering!

We’ll employ these filters:

ffmpeg -y -i gate.wav  -filter_complex "highpass=f=100:width_type=q:width=0.5 , lowpass=f=10000 , anequalizer=c0 f=250 w=100 g=2 t=1|c0 f=700 w=500 g=-5 t=1|c0 f=2000 w=1000 g=2 t=1" gate_eq.wav

Let’s hear it: gate_eq.wav

This account for gradually attenuates frequencies below 100hz, as there have to not fundamental treasured shriek material in there, but it would possibly per chance per chance per chance in actuality lower the clarity of the speech.

Then we attain the the same, but for frequencies above 10 kHz. Right here is largely wanted because we now contain a range of excessive-frequency noise, so that is a workaround for those. Also, a male converse is commonly deeper than a woman’s, so it is some distance largely helpful to hearken to how low you would possibly per chance per chance well presumably assign the bar.

Then comes anequalizer, which has a crazy an grand map of atmosphere its arguments:

This: anequalizer=c0 f=250 w=100 g=2 t=1|c0 f=700 w=500 g=-5 t=1|c0 f=2000 w=1000 g=2 t=1 blueprint:

  • at 250hz with a width of 100hz boost by 2 db, with Chebyshev form 1 filter on channel 0.
  • at 700hz with a width of 500hz attenuate by 5 db, with Chebyshev form 1 filter on channel 0.
  • at 2000hz with a width of 1000hz attenuate by 2 db, with Chebyshev form 1 filter on channel 0.

I agree. You would possibly per chance per chance well also contain feeble a friendlier equalizer for your life than this one:)

These values are in step with experimentation and aged strategies for converse. Feel free to tune it for your have needs!

Let’s evaluation the frequency plots sooner than and after:

Tip: To appear the frequency predicament in Audacity, begin a file, choose all, and engage Analyze → Plot spectrum!

Compression

The compressor filter applies dynamic fluctuate compression on the incoming audio files. To simplify this, the compressor varies the attenuation in step with the incoming signal stage. Mainly, must you explore a badly mastered movie, that is what you are doing. When it is map too loud in some action scene, you attain for the some distance away management or mouse to lower the amount, but in the following 2nd, you won’t hear what your heroes are pronouncing, so that you simply lengthen it back yet again.

Dynamic fluctuate compression roughly does the the same. You would possibly per chance per chance well also predicament it up in a model so as that it would possibly per chance per chance per chance well attenuate louder parts, therefore keeping the total volume fluctuate barely limited.

It assuredly occurs that performers on the stage employ a excessive dynamic fluctuate. Many performers will bawl at one 2nd after which order in the following to broaden drama or protect the attention. When you would possibly per chance per chance steer particular of manually adjusting the amount in accurate-time (whereas blowing off your speakers and pulling your hair out), then a compressor will assign you in these instances!

Right here is why our instance audio consists of assorted talking strengths, so as that we can also witness the dramatic quit of this filter.

ffmpeg -y -i gate_eq.wav -filter_complex "acompressor=level_in=6:threshold=0.025:ratio=20:makeup=6" gate_eq_comp.wav

Let’s hear it: gate_eq_comp.wav

And let’s evaluation the outcomes of this with the distinctive waveform!

Fashioned:

Consequence:

Slightly dramatic, is rarely at all times in actuality it?:)

Let’s analyze this: acompressor=level_in=6:threshold=0.025:ratio=20:makeup=6

First, level_in=6 sets the input originate. It is 1 by default, but since our instance, audio is extraordinarily quiet at locations, we drag up your total ingredient sooner than processing.

Then threshold=0.025 defines that all the pieces above 0.025 must be attenuated.

Per the image below, I’ve determined to slash back at this point, as that is above loads of the whispering, which cuts exhausting pops and “s”-es even in the “order zone”.

Then ratio=20 blueprint 1:20 in attenuation ratio, meaning that if the stage rises 20 dB above the brink, that is also most effective 1 dB above the line after the attenuation. Mainly, that is a extraordinarily stable compression ratio, it is nearly a limiter.

This some distance, we boosted the signal, then grew to change into down all the pieces that develop to be as soon as above our “order line” with a somewhat stable ratio, and now, all the pieces is in total at the order stage, even the parts that are shouting.

Finally, with the makeup=6 we just elevate back all the pieces to the stage where the “long-established” parts were sooner than.

Let’s rob a look back now, to achieve why we feeble the gate and did the equalization sooner than the compressor.

Mainly, you would possibly per chance per chance take dangle of unneeded parts and frequencies sooner than compression, as the compressor will seemingly lengthen those too! So by taking away loads of the noise in the gaps, we refrained from level_in=6 to broaden them too! And the the same goes for the excessive- and lowpass filtering.

Altering the amount

Now, if we resolve to invent the consequence rather louder, we can also lengthen the previous step’s makeup argument, or leverage the amount filter.

Whereas we’re at it, let’s slash back the first 4 seconds too with -ss 4.

ffmpeg -y -ss 4 -i gate_eq_comp.wav -filter_complex "volume=1.1" gate_eq_volume_comp.wav

Let’s hear it: gate_eq_volume_comp.wav

Let’s invent audio gate yet again

Excuse me for that title:)

So as I’ve described earlier, compression can amplify the noises, so it is some distance largely helpful to plug the consequence thru a gate yet again:

ffmpeg -y -i gate_eq_volume_comp.wav -filter_complex "agate=threshold=0.1:attack=50:release=50:ratio=1.5:knee=4" gate_eq_volume_comp_gate.wav

Let’s hear it: gate_eq_volume_comp_gate.wav

On this case, I’ve feeble a softer gate, with ratio=1.5. This skill that, I will also employ shorter assault and free up delays too, as the attenuation isn’t that stable, it is not in actuality causing exhausting dips in the audio.

Putting it all collectively

Honest a single account for can also contain done all the steps above:

ffmpeg -y -i voice_recording.wav -filter_complex "agate=threshold=0.01:attack=80:release=840:makeup=1:ratio=3:knee=8 , highpass=f=100:width_type=q:width=0.5 , lowpass=f=10000 , anequalizer=c0 f=250 w=100 g=2 t=1|c0 f=700 w=500 g=-5 t=1|c0 f=2000 w=1000 g=2 t=1 , acompressor=level_in=6:threshold=0.025:ratio=20:makeup=6 , volume=1.1 , agate=threshold=0.1:attack=50:release=50:ratio=1.5:knee=4" gate_eq_volume_comp_gate_together.wav

I just replica-pasted all the filters just after every assorted with a comma between them.

Will not be in actuality it most gripping? Yeah, it is not in actuality, but it is some distance terribly wise:)

For the closing time, evaluation out the variation:

It has less noise, more particular converse, and a limited volume fluctuate. Therefore it is some distance straight forward for your ears!

What else?

You would possibly per chance per chance well also resolve on to contain a look at out about a more filters that I did not duvet here.

Listed below are only about a gripping ones:

Documentation

On your comfort, let me checklist the largest documentations that is also significant for you! These forms of were already linked consistently listed here.

When you obtained this some distance from high to backside, then you are to take into accounta good hero! I am hoping you enjoyed this, and I also hope that it impressed you to invent something superior with FFmpeg!

Please contain in tips donating to FFmpeg they’re superior.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button