Add Stunning AI B-Roll toVideos for Free: A Complete Tutorial

Anil Chandra Naidu Matcha
7 min readJul 5, 2024

--

If you are a content creator looking to make your videos more engaging then you know how crucial b-roll footage is. But finding the right B-roll footage to complement your original content can be time-consuming and frustrating. But what if I told you there’s a way to automate this process using AI? If you wish to learn from a video tutorial instead here is a video guide

In today’s guide, I’ll show you how to seamlessly use AI technology to quickly find and incorporate the perfect B-roll, saving you time and enhancing your videos effortlessly. First let’s understand what is a B-roll.

What is B-Roll?

B-roll refers to supplementary footage that is intercut with the main footage in a video. It enhances the storytelling by providing additional context, supporting the narrative, or illustrating points being made in the primary footage (known as A-roll). B-roll can include a variety of clips, such as cutaways, transitions, or background shots, that help to create a more engaging and visually appealing video. For content creators, effective use of B-roll can significantly elevate the production quality, making videos more dynamic and captivating for the audience.

The process of identifying B-roll for a video involves either shooting them separately or picking relevant footage from online sources such as stock providers like Pexels etc. manually. So let’s now try and automate this process

Introducing the AI B-Roll Generator

This AI B-Roll generator is an open-source tool available on GitHub. It uses AI to automate the process of finding and integrating B-roll footage into your videos. Below is the workflow we follow to add b-roll to our video

Workflow

  1. Generate captions for input video
  2. Identify keywords which represent these captions
  • Example : AI is used to automate many of the human tasks -> Automation

3. Fetch pexels videos for these keywords to use as b-roll videos

4. Stitch together the b-roll videos with original video

Let’s do these step by step

Step 1: Setting Up the Environment

First, you’ll need to install the required dependencies. These include:

  1. Whisper Library: Used to generate captions for your audio.
  2. Pytube: Used to download videos from YouTube.
  3. OpenAI: Used for AI-related tasks, including keyword generation.
  4. MoviePy: Used for video editing.

Step 2: Generate Captions for Your Video

We start by downloading the input video using Pytube. You can either enter a YouTube URL or upload a video directly to Google Colab. We can modify video_url from the below code to download any video of our choice. The video will be download to file named video.mp4

from pytube import YouTube
import os

# Function to download a YouTube video
def download_youtube_video(url, output_path='.', filename=None):
try:
# Create a YouTube object
yt = YouTube(url)

# Get the highest resolution stream available
video_stream = yt.streams.get_highest_resolution()

# Download the video with the specified filename
downloaded_file_path = video_stream.download(output_path=output_path, filename=filename)

# If a filename is specified, rename the file
if filename:
base, ext = os.path.splitext(downloaded_file_path)
new_file_path = os.path.join(output_path, f"{filename}")
os.rename(downloaded_file_path, new_file_path)
print(f"Downloaded and renamed to: {new_file_path}")
else:
print(f"Downloaded: {downloaded_file_path}")

except Exception as e:
print(f"An error occurred: {e}")

# Example usage
if __name__ == "__main__":
# URL of the YouTube video to be downloaded
video_url = 'https://www.youtube.com/watch?v=8ZyShHwF_g0'

# Output path where the video will be saved
output_path = '.'

# Download the video
download_youtube_video(video_url, output_path, filename="video.mp4")

Next, we extract the audio from the video using FFmpeg and pass it to the Whisper library to generate captions using the below command. If we are running the code in Google colab, ffmpeg is installed by default and hence there is no need to install it separately. We then load whisper medium and run through our audio to generate the captions

!ffmpeg -i video.mp4 -ab 160k -ac 2 -ar 44100 -vn audio.wav

import whisper

# Load the model
model = whisper.load_model("medium")
result = model.transcribe("audio.wav")

Step 3: Identify Keywords

Once we have the captions, we divide them into groups of 20 sentences. We do this as we observed that if we have a lot sentences the keywords are not identified for every sentence. Once the division is complete we pass it to the OpenAI API to generate relevant keywords for each sentence in a group. These keywords represent the main ideas in the captions and are used to find matching B-roll footage.

segments = result["segments"]
extracted_data = [{'start': item['start'], 'end': item['end'], 'text': item['text']} for item in segments]
data = [x["text"] for x in extracted_data]

def split_array(arr, max_size=20):
# List to store the split arrays
result = []

# Iterate over the array in chunks of size max_size
for i in range(0, len(arr), max_size):
result.append(arr[i:i + max_size])

return result

# Example usage
my_array = list(range(100)) # Example array with 100 elements
split_arrays = split_array(data, max_size=20)

Now each of these group of 20 sentences is passed to a prompt as can be seen in the code below to identify visual keywords which can be used to fetch relevant pexels videos. Finally we merge the keywords from each group to get the complete broll data which contains keywords for each sentence of input video.

from openai import OpenAI
import json

OPENAI_API_KEY = "openai-api-key"
broll_info = []
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
client = OpenAI(
api_key=OPENAI_API_KEY,
)
for i, x in enumerate(split_arrays):
prompt = """This is a transcript from a shorts video with 20 sublists. Each sublist represents a segment of the conversation. Your task is to identify a keyword from each sublist that can be used to search for relevant b-roll footage. B-roll footage should complement the conversation topics and should be identified such that it can give relevant results when searched in pexels api. Please provide one keyword per sublist. Never skip any sublist and always give in order i.e from 0 to 19. Need output with keyword and list index. Strictly give json\n\n**Input**\n\n"""+str(x)+"""\n\n**Output format**\n\n[{"k": keyword1, "i":0},{"k":keyword2, "i":1}]"""
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": prompt,
}
],
model="gpt-4o",
)
broll_data = chat_completion.choices[0].message.content
print("Data", broll_data)
try:
broll_data = json.loads(broll_data)
except:
broll_data = broll_data.split('```json')[1].split('```')[0].replace('\n', '')
broll_data = json.loads(broll_data)
broll_data = [{"k":x["k"], "i":20*i+x["i"]} for x in broll_data]
broll_info.extend(broll_data)

Now that we have keyword for each sentence of the conversation in a video, we will select 50% of the keywords for fetching B-roll videos. This way 50% of the original video will be covered with b-rolls. Based on requirement you can choose a higher or lower percentage of b-rolls.

import random
num_to_select = int(len(broll_info) * 0.5)
enumerated_list = list(enumerate(broll_info))
selected_with_indices = random.sample(enumerated_list, num_to_select)
selected_elements = [elem for index, elem in selected_with_indices]
selected_indices = [index for index, elem in selected_with_indices]
for x in selected_indices:
element = broll_info[x]
extracted_data[x]["video"] = fetch_pexels_video(element["k"])

Step 4: Fetch Relevant B-Roll Footage

With the keywords in hand, we use the Pexels API to fetch relevant videos. First you need to create an account and Pexels and from documentation need to copy the Pexels api key and paste in the below code. If we don’t find any videos for a keyword we return “Invalid keyword”

import requests

PEXELS_API_KEY = "pexels-api-key"

def fetch_pexels_video(keyword, orientation="landscape"):
url = f"https://api.pexels.com/videos/search?query={keyword}&orientation={orientation}&size=medium"
headers = {
"Authorization": PEXELS_API_KEY
}
response = requests.get(url, headers=headers)
data = response.json()

if data['total_results'] > 0:
video_info = data['videos'][0]
video_url = video_info['video_files'][0]['link']
thumbnail_url = video_info['image']
video_url = data['videos'][0]['video_files'][0]['link']
return {'video': video_url, 'thumbnail': thumbnail_url}
else:
return "Invalid keyword"

Step 5: Stitching It All Together

Now that we have all the content readily available, we can stitch everything together. We iterate over all the sentences in the captions, creating clips for each sentence. If a sentence is selected for a B-roll, we incorporate the corresponding B-roll video; otherwise, we retain the original clip. The B-roll clips are resized and trimmed to match the duration of the respective sentences.

Now we add the audio back to the created video above using concatenate_clips_with_audio function to sync both and generate the final video.

import os
import requests
from moviepy.editor import VideoFileClip, concatenate_videoclips, concatenate_audioclips
from tempfile import TemporaryDirectory
from moviepy.video.fx.all import resize

def download_video(url, temp_dir):
local_filename = os.path.join(temp_dir, url.split('/')[-1])
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
return local_filename

def process_broll_clip(b_roll_clip, segment_duration, original_audio, start):
b_roll_duration = b_roll_clip.duration
if b_roll_duration < segment_duration:
num_loops = int(segment_duration / b_roll_duration) + 1
b_roll_clip = concatenate_videoclips([b_roll_clip] * num_loops)
b_roll_clip = b_roll_clip.subclip(0, segment_duration)
else:
b_roll_clip = b_roll_clip.subclip(0, segment_duration)

b_roll_clip = resize(b_roll_clip, newsize=(original_clip.w, original_clip.h))

# Set audio from the original video to the b-roll clip
b_roll_clip = b_roll_clip.set_audio(original_audio.subclip(start, start + segment_duration))

return b_roll_clip

def concatenate_clips_with_audio(clips):
audio_clips = [clip.audio for clip in clips if clip.audio is not None]
video_clips = [clip for clip in clips]

final_video = concatenate_videoclips(video_clips, method="compose")

if audio_clips:
final_audio = concatenate_audioclips(audio_clips)
final_video = final_video.set_audio(final_audio)

return final_video

# Load the original video
original_video_path = 'video.mp4'
original_video = VideoFileClip(original_video_path)
original_audio = original_video.audio

with TemporaryDirectory() as temp_dir:
final_clips = []

for segment in extracted_data:
start = segment['start']
end = segment['end']
segment_duration = end - start

original_clip = original_video.subclip(start, end)

if 'video' in segment and segment["video"] != "Invalid keyword":
print("Segment", segment)
b_roll_video_url = segment['video']['video']
b_roll_video_path = download_video(b_roll_video_url, temp_dir)
b_roll_clip = VideoFileClip(b_roll_video_path)

b_roll_clip = process_broll_clip(b_roll_clip, segment_duration, original_audio, start)

final_clips.append(b_roll_clip)
else:
final_clips.append(original_clip)

final_video = concatenate_clips_with_audio(final_clips)

final_video.write_videofile('final_video_with_broll.mp4', audio_codec='aac')

Below is a demo video with AI b-roll added. Here is the original video https://github.com/Anil-matcha/AI-B-roll/blob/main/video.mp4

Conclusion

Using AI to automate the process of finding and incorporating B-roll footage can save you a significant amount of time and effort. This guide showed you how to use the AI B-Roll generator to enhance your videos effortlessly. To get access to the complete code, check out the GitHub repository linked below.

Relevant Links:

--

--