Movie Recommendation System.pptx

Movie Recommendation
System
With Machine Learning, Cosine Similarity and TF-IDF

Informatics College Pokhara
Submitted By:
• Name: Rojan Acharya
• London Met ID: 20048713
• Group: C4
• Date: January 11, 2023
Submitted To:
• Mr. Abhinav Dahal
• Mr. Mahesh Dhungana
• Artificial Intelligence

AI Concepts Used
 Machine Learning
 Cosine Similarity
 Term Frequency-Inverse Document Frequency

Machine Learning
A branch of artificial intelligence known as machine learning (ML) is
concerned with the development of computer algorithms that can
process enormous datasets, find recurrent patterns and correlations
among numerous variables, and create mathematical models
illuminating them.

Uses of Machine Learning
 Machine learning is widely used by
various e-commerce and
entertainment companies such as
Amazon, Netflix, etc., for product
recommendation to the user.
 Image recognition is one of the most
common applications of machine
learning. It is used to identify objects,
persons, places, digital images, etc.

Cosine Similarity
The cosine similarity is the simplest algorithm needed to find the similarity of the
vectors. In cosine similarity, data objects in a dataset are treated as a vector. The
formula to find the cosine similarity between two vectors is –
Cos(x, y) = x . y / ||x|| * ||y||
The cosine similarity is useful because even though the two comparable papers are
far away by the Euclidean distance chances are, they may still be orientated closer
together. The lesser the angle, higher the cosine similarity.

Uses of Cosine Similarity
 A scenario that involves the
requirement of identifying the
similarity between pairs of a
document is a good use case for the
utilization of cosine similarity as a
quantification of the measurement of
similarity between two objects.
 Pose matching involves comparing
the poses containing key points of
joint locations.

TF-IDF
Information retrieval employs TF-IDF for feature extraction as a branch of natural
language processing (NLP).
Term Frequency: TF of a term or word is the number of times the term appears in
a document compared to the total number of words in the document.
TF = (number of times the term appears in doc.) / (total number of terms in doc.)
Inverse Document Frequency: The IDF of a term indicates the percentage of
corpus documents that include the term. Technical jargon phrases, for example,
have a greater relevance value than words that are used in a tiny fraction of all
papers (e.g., a, the, and).
IDF(t) = log_e(Total number of documents / Number of documents with term t in it)

Uses of TF-IDF
 In order to provide results that are most
pertinent to our search, TF-IDF was
created for document search. If we had
a search engine, someone may be
looking for James. The outcomes will be
presented in relevancy order. In other
words, because the term James
receives a higher score from TF-IDF, the
most pertinent sports articles will be
listed higher.

Move Recommender Engine Using
Collaborative Filtering

Move Recommendation algorithm based
on improved k-clique

The majority of people watch movies in today's culture, but they are only allowed to watch
one before they feel confused about what to watch next. What if there was a system that
could comprehend you and provide recommendations for you based on your interests?
Recommendation systems are there to help with it.
Customers frequently check at the product recommendations from their most recent
browsing. Customer happiness is the most crucial factor, and the recommendation system
has been helping with that for years.
User-specific recommendations are provided by recommender systems, which also assist
users in making informed choices during online transactions. Sales are increased, the web
surfing experience is changed, customers are retained, and the shopping experience is
improved.

Explanation of the solution and developed
application

Solution
Python was used as a development language.
It makes use of NumPy's set of routines for processing arrays,
Pandas' quick, adaptable, and expressive data structures for working
with relational data, and the cross-platform Matplotlib package for
data visualization can offer movies with a similar genre or title.

Working of the System
Collecting data is the first and most crucial stage in the creation of a
recommendation engine. The system uses implicit data (web search history,
clicks, search log, and viewed history).
Data must be stored after it has been collected. The volume of data will
increase dramatically over time. This necessitates the availability of
substantial, scalable storage. A variety of storage options are available
depending on the sort of data you gather.

Working of the System
The data must next be examined in depth and evaluated in order to
be put to use. As it is produced, data is processed. Data is handled
on a regular basis. The screening process comes last. When filtering
recommendations based on content, various matrices, mathematical
algorithms, and rules are applied to the data. The recommendations
are the product of this filtering.

Achieved Results
Based on Genre Based on Title

How does it solve real word problems?

Creating a system for movie recommendations can be helpful in the real-world problems. It
assists the customer and the firm in locating the greatest movies in accordance with their
tastes and also enables the latter to generate revenue. The majority of users have to spend
a lot of time looking up movies in their genre. In this situation, a system for suggesting
movies aids these users in saving time. There are many excellent films that were
underappreciated; with the aid of a recommendation system, these films will be reviewed
and received favorably by the majority of users.

IMPORT numpy as np
IMPORT pandas as pd
IMPORT matplotlib.pyplot as plt
from PIL IMPORT Image
SET mov_img TO
Image.open("movies.jpg")
SET movies TO
pd.read_csv('movies.csv', sep=',',
encoding='latin-1',
usecols=['title', 'genres'])
movies.head()
SET movies['genres'] TO
movies['genres'].str.split('|')
SET movies['genres'] TO
movies['genres'].fillna("").astype('str')
from sklearn.feature_extraction.text
IMPORT TfidfVectorizer
SET tf TO
TfidfVectorizer(analyzer='word',
ngram_range=(1, 3),
min_df=0, stop_words='english')
SET tfidf_matrix TO
tf.fit_transform(movies['genres'])
tfidf_matrix.shape
from sklearn.metrics.pairwise IMPORT
cosine_similarity
SET cosine_sim TO
cosine_similarity(tfidf_matrix,
tfidf_matrix)
cosine_sim[:5, :5]
SET titles TO movies['title']
SET indices TO pd.Series(movies.index,
index=movies['title’])
DEFINE FUNCTION
genre_recommendations(title):
SET idx TO indices[title]
SET similarScore TO
list(enumerate(cosine_sim[idx]))

SET similarScore TO
sorted(sim_scores, key=lambda x: x[1],
reverse=True)
SET similarScore TO
similarScores[2:15]
SET movieIdx TO [i[0] FOR i IN
similarScore]
RETURN titles.iloc[movieIdx]
SET tf TO
TfidfVectorizer(analyzer='word',
ngram_range=(2, 3),
min_df=0,
stop_words='english')
SET tfidf_matrix TO
tf.fit_transform(movies['title'])
tfidf_matrix.shape
SET cosine_sim TO
cosine_similarity(tfidf_matrix,
tfidf_matrix)
cosine_sim[:5, :5]
SET titles TO movies['title']
SET indices TO pd.Series(movies.index,
index=movies['title'])
DEFINE FUNCTION
title_recommendations(title):
SET idx TO indices[title]
SET similarScore TO
list(enumerate(cosine_sim[idx]))
SET similarScore TO
sorted(sim_scores, key=lambda x: x[1],
reverse=True)
SET similarScore TO similarScore
[2:15]
SET movieIdx TO [i[0] FOR i IN
similarScore]
RETURN titles.iloc[movie_indices]
title_recommendations('Dark Knight
').head(40)

Diagrammatic representation of the
solution

Flowchart and Transition Diagram
Flowchart Transition

Movie Recommendation System.pptx

More Related Content

Movie Recommendation System.pptx