SlideShare a Scribd company logo
Movie Recommendation
System
With Machine Learning, Cosine Similarity and TF-IDF
Informatics College Pokhara
Submitted By:
• Name: Rojan Acharya
• London Met ID: 20048713
• Group: C4
• Date: January 11, 2023
Submitted To:
• Mr. Abhinav Dahal
• Mr. Mahesh Dhungana
• Artificial Intelligence
AI Concepts Used
 Machine Learning
 Cosine Similarity
 Term Frequency-Inverse Document Frequency
Machine Learning
A branch of artificial intelligence known as machine learning (ML) is
concerned with the development of computer algorithms that can
process enormous datasets, find recurrent patterns and correlations
among numerous variables, and create mathematical models
illuminating them.
Uses of Machine Learning
 Machine learning is widely used by
various e-commerce and
entertainment companies such as
Amazon, Netflix, etc., for product
recommendation to the user.
 Image recognition is one of the most
common applications of machine
learning. It is used to identify objects,
persons, places, digital images, etc.
Cosine Similarity
The cosine similarity is the simplest algorithm needed to find the similarity of the
vectors. In cosine similarity, data objects in a dataset are treated as a vector. The
formula to find the cosine similarity between two vectors is –
Cos(x, y) = x . y / ||x|| * ||y||
The cosine similarity is useful because even though the two comparable papers are
far away by the Euclidean distance chances are, they may still be orientated closer
together. The lesser the angle, higher the cosine similarity.
Uses of Cosine Similarity
 A scenario that involves the
requirement of identifying the
similarity between pairs of a
document is a good use case for the
utilization of cosine similarity as a
quantification of the measurement of
similarity between two objects.
 Pose matching involves comparing
the poses containing key points of
joint locations.
TF-IDF
Information retrieval employs TF-IDF for feature extraction as a branch of natural
language processing (NLP).
Term Frequency: TF of a term or word is the number of times the term appears in
a document compared to the total number of words in the document.
TF = (number of times the term appears in doc.) / (total number of terms in doc.)
Inverse Document Frequency: The IDF of a term indicates the percentage of
corpus documents that include the term. Technical jargon phrases, for example,
have a greater relevance value than words that are used in a tiny fraction of all
papers (e.g., a, the, and).
IDF(t) = log_e(Total number of documents / Number of documents with term t in it)
Uses of TF-IDF
 In order to provide results that are most
pertinent to our search, TF-IDF was
created for document search. If we had
a search engine, someone may be
looking for James. The outcomes will be
presented in relevancy order. In other
words, because the term James
receives a higher score from TF-IDF, the
most pertinent sports articles will be
listed higher.
Research Evidence
Move Recommender Engine Using
Collaborative Filtering
Move Recommendation algorithm based
on improved k-clique
Reason for selection of topic
The majority of people watch movies in today's culture, but they are only allowed to watch
one before they feel confused about what to watch next. What if there was a system that
could comprehend you and provide recommendations for you based on your interests?
Recommendation systems are there to help with it.
Customers frequently check at the product recommendations from their most recent
browsing. Customer happiness is the most crucial factor, and the recommendation system
has been helping with that for years.
User-specific recommendations are provided by recommender systems, which also assist
users in making informed choices during online transactions. Sales are increased, the web
surfing experience is changed, customers are retained, and the shopping experience is
improved.
Explanation of the solution and developed
application
Solution
Python was used as a development language.
It makes use of NumPy's set of routines for processing arrays,
Pandas' quick, adaptable, and expressive data structures for working
with relational data, and the cross-platform Matplotlib package for
data visualization can offer movies with a similar genre or title.
Working of the System
Collecting data is the first and most crucial stage in the creation of a
recommendation engine. The system uses implicit data (web search history,
clicks, search log, and viewed history).
Data must be stored after it has been collected. The volume of data will
increase dramatically over time. This necessitates the availability of
substantial, scalable storage. A variety of storage options are available
depending on the sort of data you gather.
Working of the System
The data must next be examined in depth and evaluated in order to
be put to use. As it is produced, data is processed. Data is handled
on a regular basis. The screening process comes last. When filtering
recommendations based on content, various matrices, mathematical
algorithms, and rules are applied to the data. The recommendations
are the product of this filtering.
Achieved Results
Achieved Results
Based on Genre Based on Title
How does it solve real word problems?
Creating a system for movie recommendations can be helpful in the real-world problems. It
assists the customer and the firm in locating the greatest movies in accordance with their
tastes and also enables the latter to generate revenue. The majority of users have to spend
a lot of time looking up movies in their genre. In this situation, a system for suggesting
movies aids these users in saving time. There are many excellent films that were
underappreciated; with the aid of a recommendation system, these films will be reviewed
and received favorably by the majority of users.
Pseudo Code of the solution
IMPORT numpy as np
IMPORT pandas as pd
IMPORT matplotlib.pyplot as plt
from PIL IMPORT Image
SET mov_img TO
Image.open("movies.jpg")
SET movies TO
pd.read_csv('movies.csv', sep=',',
encoding='latin-1',
usecols=['title', 'genres'])
movies.head()
SET movies['genres'] TO
movies['genres'].str.split('|')
SET movies['genres'] TO
movies['genres'].fillna("").astype('str')
from sklearn.feature_extraction.text
IMPORT TfidfVectorizer
SET tf TO
TfidfVectorizer(analyzer='word',
ngram_range=(1, 3),
min_df=0, stop_words='english')
SET tfidf_matrix TO
tf.fit_transform(movies['genres'])
tfidf_matrix.shape
from sklearn.metrics.pairwise IMPORT
cosine_similarity
SET cosine_sim TO
cosine_similarity(tfidf_matrix,
tfidf_matrix)
cosine_sim[:5, :5]
SET titles TO movies['title']
SET indices TO pd.Series(movies.index,
index=movies['title’])
DEFINE FUNCTION
genre_recommendations(title):
SET idx TO indices[title]
SET similarScore TO
list(enumerate(cosine_sim[idx]))
SET similarScore TO
sorted(sim_scores, key=lambda x: x[1],
reverse=True)
SET similarScore TO
similarScores[2:15]
SET movieIdx TO [i[0] FOR i IN
similarScore]
RETURN titles.iloc[movieIdx]
SET tf TO
TfidfVectorizer(analyzer='word',
ngram_range=(2, 3),
min_df=0,
stop_words='english')
SET tfidf_matrix TO
tf.fit_transform(movies['title'])
tfidf_matrix.shape
SET cosine_sim TO
cosine_similarity(tfidf_matrix,
tfidf_matrix)
cosine_sim[:5, :5]
SET titles TO movies['title']
SET indices TO pd.Series(movies.index,
index=movies['title'])
DEFINE FUNCTION
title_recommendations(title):
SET idx TO indices[title]
SET similarScore TO
list(enumerate(cosine_sim[idx]))
SET similarScore TO
sorted(sim_scores, key=lambda x: x[1],
reverse=True)
SET similarScore TO similarScore
[2:15]
SET movieIdx TO [i[0] FOR i IN
similarScore]
RETURN titles.iloc[movie_indices]
title_recommendations('Dark Knight
').head(40)
Diagrammatic representation of the
solution
Flowchart and Transition Diagram
Flowchart Transition
Thank You!

More Related Content

Movie Recommendation System.pptx

  • 1. Movie Recommendation System With Machine Learning, Cosine Similarity and TF-IDF
  • 2. Informatics College Pokhara Submitted By: • Name: Rojan Acharya • London Met ID: 20048713 • Group: C4 • Date: January 11, 2023 Submitted To: • Mr. Abhinav Dahal • Mr. Mahesh Dhungana • Artificial Intelligence
  • 3. AI Concepts Used  Machine Learning  Cosine Similarity  Term Frequency-Inverse Document Frequency
  • 4. Machine Learning A branch of artificial intelligence known as machine learning (ML) is concerned with the development of computer algorithms that can process enormous datasets, find recurrent patterns and correlations among numerous variables, and create mathematical models illuminating them.
  • 5. Uses of Machine Learning  Machine learning is widely used by various e-commerce and entertainment companies such as Amazon, Netflix, etc., for product recommendation to the user.  Image recognition is one of the most common applications of machine learning. It is used to identify objects, persons, places, digital images, etc.
  • 6. Cosine Similarity The cosine similarity is the simplest algorithm needed to find the similarity of the vectors. In cosine similarity, data objects in a dataset are treated as a vector. The formula to find the cosine similarity between two vectors is – Cos(x, y) = x . y / ||x|| * ||y|| The cosine similarity is useful because even though the two comparable papers are far away by the Euclidean distance chances are, they may still be orientated closer together. The lesser the angle, higher the cosine similarity.
  • 7. Uses of Cosine Similarity  A scenario that involves the requirement of identifying the similarity between pairs of a document is a good use case for the utilization of cosine similarity as a quantification of the measurement of similarity between two objects.  Pose matching involves comparing the poses containing key points of joint locations.
  • 8. TF-IDF Information retrieval employs TF-IDF for feature extraction as a branch of natural language processing (NLP). Term Frequency: TF of a term or word is the number of times the term appears in a document compared to the total number of words in the document. TF = (number of times the term appears in doc.) / (total number of terms in doc.) Inverse Document Frequency: The IDF of a term indicates the percentage of corpus documents that include the term. Technical jargon phrases, for example, have a greater relevance value than words that are used in a tiny fraction of all papers (e.g., a, the, and). IDF(t) = log_e(Total number of documents / Number of documents with term t in it)
  • 9. Uses of TF-IDF  In order to provide results that are most pertinent to our search, TF-IDF was created for document search. If we had a search engine, someone may be looking for James. The outcomes will be presented in relevancy order. In other words, because the term James receives a higher score from TF-IDF, the most pertinent sports articles will be listed higher.
  • 11. Move Recommender Engine Using Collaborative Filtering
  • 12. Move Recommendation algorithm based on improved k-clique
  • 14. The majority of people watch movies in today's culture, but they are only allowed to watch one before they feel confused about what to watch next. What if there was a system that could comprehend you and provide recommendations for you based on your interests? Recommendation systems are there to help with it. Customers frequently check at the product recommendations from their most recent browsing. Customer happiness is the most crucial factor, and the recommendation system has been helping with that for years. User-specific recommendations are provided by recommender systems, which also assist users in making informed choices during online transactions. Sales are increased, the web surfing experience is changed, customers are retained, and the shopping experience is improved.
  • 15. Explanation of the solution and developed application
  • 16. Solution Python was used as a development language. It makes use of NumPy's set of routines for processing arrays, Pandas' quick, adaptable, and expressive data structures for working with relational data, and the cross-platform Matplotlib package for data visualization can offer movies with a similar genre or title.
  • 17. Working of the System Collecting data is the first and most crucial stage in the creation of a recommendation engine. The system uses implicit data (web search history, clicks, search log, and viewed history). Data must be stored after it has been collected. The volume of data will increase dramatically over time. This necessitates the availability of substantial, scalable storage. A variety of storage options are available depending on the sort of data you gather.
  • 18. Working of the System The data must next be examined in depth and evaluated in order to be put to use. As it is produced, data is processed. Data is handled on a regular basis. The screening process comes last. When filtering recommendations based on content, various matrices, mathematical algorithms, and rules are applied to the data. The recommendations are the product of this filtering.
  • 20. Achieved Results Based on Genre Based on Title
  • 21. How does it solve real word problems?
  • 22. Creating a system for movie recommendations can be helpful in the real-world problems. It assists the customer and the firm in locating the greatest movies in accordance with their tastes and also enables the latter to generate revenue. The majority of users have to spend a lot of time looking up movies in their genre. In this situation, a system for suggesting movies aids these users in saving time. There are many excellent films that were underappreciated; with the aid of a recommendation system, these films will be reviewed and received favorably by the majority of users.
  • 23. Pseudo Code of the solution
  • 24. IMPORT numpy as np IMPORT pandas as pd IMPORT matplotlib.pyplot as plt from PIL IMPORT Image SET mov_img TO Image.open("movies.jpg") SET movies TO pd.read_csv('movies.csv', sep=',', encoding='latin-1', usecols=['title', 'genres']) movies.head() SET movies['genres'] TO movies['genres'].str.split('|') SET movies['genres'] TO movies['genres'].fillna("").astype('str') from sklearn.feature_extraction.text IMPORT TfidfVectorizer SET tf TO TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english') SET tfidf_matrix TO tf.fit_transform(movies['genres']) tfidf_matrix.shape from sklearn.metrics.pairwise IMPORT cosine_similarity SET cosine_sim TO cosine_similarity(tfidf_matrix, tfidf_matrix) cosine_sim[:5, :5] SET titles TO movies['title'] SET indices TO pd.Series(movies.index, index=movies['title’]) DEFINE FUNCTION genre_recommendations(title): SET idx TO indices[title] SET similarScore TO list(enumerate(cosine_sim[idx]))
  • 25. SET similarScore TO sorted(sim_scores, key=lambda x: x[1], reverse=True) SET similarScore TO similarScores[2:15] SET movieIdx TO [i[0] FOR i IN similarScore] RETURN titles.iloc[movieIdx] SET tf TO TfidfVectorizer(analyzer='word', ngram_range=(2, 3), min_df=0, stop_words='english') SET tfidf_matrix TO tf.fit_transform(movies['title']) tfidf_matrix.shape SET cosine_sim TO cosine_similarity(tfidf_matrix, tfidf_matrix) cosine_sim[:5, :5] SET titles TO movies['title'] SET indices TO pd.Series(movies.index, index=movies['title']) DEFINE FUNCTION title_recommendations(title): SET idx TO indices[title] SET similarScore TO list(enumerate(cosine_sim[idx])) SET similarScore TO sorted(sim_scores, key=lambda x: x[1], reverse=True) SET similarScore TO similarScore [2:15] SET movieIdx TO [i[0] FOR i IN similarScore] RETURN titles.iloc[movie_indices] title_recommendations('Dark Knight ').head(40)
  • 27. Flowchart and Transition Diagram Flowchart Transition