SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1622
Twitter Sentiment Analysis using Machine Learning
Sourav Chavan1, Vishal Navale2, Sharad Waghmare3, Uday Chavan4, Prof. Nilesh Ghode5
1,2,3,4Department of Electronics & Telecommunication, Atharva College of Engineering, Mumbai
5Prof. Nilesh Ghode, Department of Electronics & Telecommunication, Atharva College of Engineering, Mumbai
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - The advent of social media has seen a drastic
change in how content is made and shared on the web. This
has shifted the main target of selling and advertisingagencies
from traditional methods to digital marketing. One thing that
most of those agencies require is that the analysis of the
content on such social media websites like Twitter. Twitter
acts as a platform, for various sorts of users to share their
views and sentiment on various sorts of topics in 140
characters or less. We have developed a tool that is able to
extract tweets pertaining to a topic and analyse them to
calculate their polarity i.e. positive, negative or neutral.
Key Words: Sentiment, Machine Learning, Logistic
Regression, Sci-Kit Learn.
1. INTRODUCTION
The Emergence of social media has given web users a
venue for expressing and sharing their thoughts and
opinion on all quite topic and events. Twitter, with nearly
600 million users and over 250 million messages per day
has become a gold mine for organization to watch their
reputation and makes by extracting and analyzing the
sentiments of the tweets posted by the people about
them. To analyze the sentiments of the tweet the
Sentiment Analysis came into the picture. Sentimental
Analysis is that the method of computationally identifying
and categorizing opinions from piece of text and determine
whether the writer's attitude towards a selected topicorthe
merchandise, is positive negative or neutral. Here we are
extracting the tweets based on the real time. Sentiment
Analysis refers to the utilization tongue processing, text
analysis and linguistics to systematically identify, extract,
quantify, and study affective states and subjective
information
REAL-TIME DATA EXTRACTION
Data extraction may be a process that involves
retrieval of knowledge from various sources. Frequently,
companies extract data so as to process it further,
migrate the info to a knowledge repository (such as a
knowledge warehouse or a knowledge lake) or to further
analyze it. It’s common to rework the info as a
neighborhood of this process
BLOCK DIAGRAM
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1623
STORY GENERATION & VISUALIZATION
Exploring and visualizing data, regardless of whether its
text or the other data, is an important step in gaining
insights. Before we start exploration, we must think and
ask questions associated with the info in hand. A few
probable questions are as follows:
1. What are the most common words in the entire
dataset?
2. What are the most common words in the dataset for
negative and positive tweets, respectively?
3. How many hashtags are there in a tweet?
4. Which trends are associated with my dataset?
5. Which trends are associated with either of the
sentiments? Are they compatible with the sentiments?
CREATE CORPUS:
One of the primary things required for tongue processing
(NLP) tasks may be a corpus. In linguistics and NLP,
corpus (literally Latin for body) refers to a set of texts.
Such collections could also be formed of one language of
texts or can span multiple languages -- there are
numerous reasons that multilingual corpora (the plural
of corpus) may be useful. Corpora can
also contain themed texts (historical, Biblical, etc.).
Corpora are generally solely used for statistical linguistic
analysis and hypothesis testing
CLEANING AND PRE-PROCESSING
The created dataset could have any redundant information
or any unwanted garbage value which could cause Machine
Learning algorithm to behave abruptly. So, this dataset is
pre-processed to remove such unnecessary data.
EXTRACTING FEATURES FROM CLEANED TWEETS
To analyze a preprocessed data, it must be converted into
features. Depending upon the usage, text features are
often constructed using assorted techniques – Bag-of-
Words, TF-IDF, and word Embeddings.
Fig.2 Word cloud indicating frequency and importance of
words
MODEL BUILDING
We are now through with all the pre-modeling stages
required to urge the info within the proper form and
shape. Now we will be building predictive models on the
dataset using the two feature sets — Bag-of-Words and
TF-IDF.
We will use logistic regression to create the models. It
predicts the probability of occurrence of an occasion by
fitting data to a logit function
The following equation is used in Logistic Regression:
PERFORM SENTIMENT ANALYSIS
This dataset is then used to train the Machine Learning
algorithm. Algorithm used for this project is Logistic
Regression. In statistics, LogisticRegressionmodel isusedto
model the probability of a certainclassoreventexistingsuch
as pass/fail, win/lose, alive/dead or healthy/sick inour case
it is whether text is racist/sexist or not.
Fig.3 Training an algorithm
VALIDATE MODEL
Before plotting the results, the Machine Learning model
should be validated using validation dataset. In machine
learning, model validation is mentioned because
the process where a trained model is evaluated with a
testing data set. The testing model is evaluated with a
testing data set. The testing data set may be a separate
portion of an equivalent data set from which the training
set springs. The main purpose of using the testing data
set is to check the generalization ability of a trained
model. Model validation is administered after model
training. Together with model training, model validation
aims to seek out an optimal model with the
simplest performance
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1624
PLOT THE RESULT
The idea of building machine learning models works on a
constructive feedback principle. You build a model, get
feedback from metrics, make improvements and continue
until you achieve a desirable accuracy. Evaluation metrics
explain the performance of a model. An important aspect
of evaluation metrics is their capability to discriminate
among model results
LITERATURE SURVEY
Sentiment analysis is currently one among the
favored topic in research field. There are various
works happening during this area for various languages
not studied so far like Arabic, Hindi, Thai etc. There are
various open source libraries available for
various languages like python, R etc. which makes the
work easy to research the text and process it. It are
often used for various purposes like in reviewing movies,
products of a companies, about companies, feeling or
emotions of citizens for a rustic. The most popular thanks
to get this information on social media and analyze it. To
make it into something meaningful sense, the classifier
techniques must be used.
The data must be in readable format, in English. The
classifiers are wont to tokenize of classify the info. The
Superwised learning technique is employed with machine
learning approach to detect sentiments and analyze the
emotions of the remainder of the text. Un-Superwised
learning is linguistic approach during which text is first
tokenized into tokens and added with tags to guage the
emotions of the text.
How to get many data to evaluate:
1) Social sites
a) Facebook.com
b) Twitter.com
c) LinkedIn.com
d) LinkedIn.com
2) News websites and comments
3) Movie reviewing sites
4) Products selling sites
i) Flipkart
ii) Snapdeal
5) Blogs etc.
6) Techniques used presently are:
a) Machin Learning
i) Logistic Regression Text Structure:
1. An array of sents/sentences
2. Each sent is again tokenized called tokens
3. Each word or token is padded with 2 other tags in
dictionary format. These added tags make each token to
be recognized as verbs, nouns, adjectives, adverbs etc. to
verify if that token is polar word or not.
4. Separate datasets are there so that each token can be
matched with words present in the datasets.
First, collection of data could also be a priority.Useful data is
what's required before analyzing the data. Sentiment
analysis is performed on the info which is a few product
or review and user wants to understand about if it's good
or not. Sentiments can have various sorts of polarity or
emotions about something.
Summarizing the opinions is additionally one among the
good concern for today's researchers. summarizing the
emotions doesn't affect subset of text or its one a part
of text to be printed. It is printing the info with a
particular sense in fewer number of words and it also
contains the topic of the text.
3. CONCLUSIONS
In conclusion the tool developed by us will be a simple
showcase of a system which will have a number of
applications in the near future.
With the shift of advertising and marketing from print to
digital and social media, sentiment analysis will have a
huge role in deciding how to push products to the
consumers and how to interact with them and twitter are
going to be one among the most platforms for users to
take advantage of this untapped market
 APPLICATIONS AND FUTURE SCOPE
1. Feedback on Pilot Releases And Beta Versions:
When a corporation releases a replacement product
or service, it's released as a pilot or beta version. The
monitoring of public feedback at this stage is
extremely crucial. So, text mining from social media
platforms and review sections greatly helps
accelerate this process.
2. Employee Feedback:
Sentimental analysis also can be wont to receive
feedback from the workers of the corporate and
analyze their emotions and attitude towards their
job. And to work out whether or not they are satisfied
with their job or not.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1625
3. Better Services:
Text mining can provide a filter about, which service
of the corporate is getting more negative feedback.
This will help the corporate to understand, what are the
issues arising thereupon particular service.
And supported this information the corporate can
rectify these problems.
 RESULT AND DISCUSSIONS
In conclusion the tool developed by us will be a simple
showcase of a system which will have a number of
applications in the near future.
With the shift of advertising and marketing from print to
digital and social media, sentiment analysis will have a
huge role in deciding how to push products to the
consumers and how to interact with them and twitter are
going to be one among the most platforms for users to
take advantage of this untapped market
REFERENCES
1. “Sentiment Analysis of Twitter Data” by El_Rahman,
AlOtaibi and AlShehri (IEEE 2019 )
2. “Sentiment Analysis of Polarity in Product Reviews In
Social Media” by Marium Nafees, Hafsa Dar, Ikram Ullah
Lali, Salman Tiwana (IEEE 2018 )
3. Chen, Y., & Zhang, Z. (2018). Research on text sentiment
analysis supported on CNNs and SVM. 2018 13th IEEE
Conference on Industrial Electronics and Applications
(ICIEA).
4. “Investigating sentiment analysis using machine
learning approach” , Proceedings of the International
Conference on Intelligent Sustainable Systems (ICISS
2017) IEEEXploreCompliant- PartNumber:CFP17M19-
ART, ISBN:978-1-5386-1959-9
5. https://pandas.pydata.org/pandas-docs/stable/
https://www.nltk.org/
6. https://docs.python.org/
7. https://scikit-learn.org/
8. https://machinelearningmastery.com/logistic-
regression-for-machine-learning/
9. https://www.coursera.org/learn/machine-learning/
10. https://elitedatascience.com/feature-engineering
11. https://www.analyticsvidhya.com/blog/2018/07/hand
s-on-sentiment-analysis-dataset-python/
12. https://towardsdatascience.com/sentiment-analysis-
concept-analysis-and-applications-6c94d6f58c17
13. https://machinelearningmastery.com/classification-
accuracy-is-not-enough-more-performance-measures-
you-can-use/
14. https://machinelearningmastery.com/statistics-for-
machine-learning-mini-course/

More Related Content

IRJET - Twitter Sentiment Analysis using Machine Learning

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1622 Twitter Sentiment Analysis using Machine Learning Sourav Chavan1, Vishal Navale2, Sharad Waghmare3, Uday Chavan4, Prof. Nilesh Ghode5 1,2,3,4Department of Electronics & Telecommunication, Atharva College of Engineering, Mumbai 5Prof. Nilesh Ghode, Department of Electronics & Telecommunication, Atharva College of Engineering, Mumbai ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - The advent of social media has seen a drastic change in how content is made and shared on the web. This has shifted the main target of selling and advertisingagencies from traditional methods to digital marketing. One thing that most of those agencies require is that the analysis of the content on such social media websites like Twitter. Twitter acts as a platform, for various sorts of users to share their views and sentiment on various sorts of topics in 140 characters or less. We have developed a tool that is able to extract tweets pertaining to a topic and analyse them to calculate their polarity i.e. positive, negative or neutral. Key Words: Sentiment, Machine Learning, Logistic Regression, Sci-Kit Learn. 1. INTRODUCTION The Emergence of social media has given web users a venue for expressing and sharing their thoughts and opinion on all quite topic and events. Twitter, with nearly 600 million users and over 250 million messages per day has become a gold mine for organization to watch their reputation and makes by extracting and analyzing the sentiments of the tweets posted by the people about them. To analyze the sentiments of the tweet the Sentiment Analysis came into the picture. Sentimental Analysis is that the method of computationally identifying and categorizing opinions from piece of text and determine whether the writer's attitude towards a selected topicorthe merchandise, is positive negative or neutral. Here we are extracting the tweets based on the real time. Sentiment Analysis refers to the utilization tongue processing, text analysis and linguistics to systematically identify, extract, quantify, and study affective states and subjective information REAL-TIME DATA EXTRACTION Data extraction may be a process that involves retrieval of knowledge from various sources. Frequently, companies extract data so as to process it further, migrate the info to a knowledge repository (such as a knowledge warehouse or a knowledge lake) or to further analyze it. It’s common to rework the info as a neighborhood of this process BLOCK DIAGRAM
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1623 STORY GENERATION & VISUALIZATION Exploring and visualizing data, regardless of whether its text or the other data, is an important step in gaining insights. Before we start exploration, we must think and ask questions associated with the info in hand. A few probable questions are as follows: 1. What are the most common words in the entire dataset? 2. What are the most common words in the dataset for negative and positive tweets, respectively? 3. How many hashtags are there in a tweet? 4. Which trends are associated with my dataset? 5. Which trends are associated with either of the sentiments? Are they compatible with the sentiments? CREATE CORPUS: One of the primary things required for tongue processing (NLP) tasks may be a corpus. In linguistics and NLP, corpus (literally Latin for body) refers to a set of texts. Such collections could also be formed of one language of texts or can span multiple languages -- there are numerous reasons that multilingual corpora (the plural of corpus) may be useful. Corpora can also contain themed texts (historical, Biblical, etc.). Corpora are generally solely used for statistical linguistic analysis and hypothesis testing CLEANING AND PRE-PROCESSING The created dataset could have any redundant information or any unwanted garbage value which could cause Machine Learning algorithm to behave abruptly. So, this dataset is pre-processed to remove such unnecessary data. EXTRACTING FEATURES FROM CLEANED TWEETS To analyze a preprocessed data, it must be converted into features. Depending upon the usage, text features are often constructed using assorted techniques – Bag-of- Words, TF-IDF, and word Embeddings. Fig.2 Word cloud indicating frequency and importance of words MODEL BUILDING We are now through with all the pre-modeling stages required to urge the info within the proper form and shape. Now we will be building predictive models on the dataset using the two feature sets — Bag-of-Words and TF-IDF. We will use logistic regression to create the models. It predicts the probability of occurrence of an occasion by fitting data to a logit function The following equation is used in Logistic Regression: PERFORM SENTIMENT ANALYSIS This dataset is then used to train the Machine Learning algorithm. Algorithm used for this project is Logistic Regression. In statistics, LogisticRegressionmodel isusedto model the probability of a certainclassoreventexistingsuch as pass/fail, win/lose, alive/dead or healthy/sick inour case it is whether text is racist/sexist or not. Fig.3 Training an algorithm VALIDATE MODEL Before plotting the results, the Machine Learning model should be validated using validation dataset. In machine learning, model validation is mentioned because the process where a trained model is evaluated with a testing data set. The testing model is evaluated with a testing data set. The testing data set may be a separate portion of an equivalent data set from which the training set springs. The main purpose of using the testing data set is to check the generalization ability of a trained model. Model validation is administered after model training. Together with model training, model validation aims to seek out an optimal model with the simplest performance
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1624 PLOT THE RESULT The idea of building machine learning models works on a constructive feedback principle. You build a model, get feedback from metrics, make improvements and continue until you achieve a desirable accuracy. Evaluation metrics explain the performance of a model. An important aspect of evaluation metrics is their capability to discriminate among model results LITERATURE SURVEY Sentiment analysis is currently one among the favored topic in research field. There are various works happening during this area for various languages not studied so far like Arabic, Hindi, Thai etc. There are various open source libraries available for various languages like python, R etc. which makes the work easy to research the text and process it. It are often used for various purposes like in reviewing movies, products of a companies, about companies, feeling or emotions of citizens for a rustic. The most popular thanks to get this information on social media and analyze it. To make it into something meaningful sense, the classifier techniques must be used. The data must be in readable format, in English. The classifiers are wont to tokenize of classify the info. The Superwised learning technique is employed with machine learning approach to detect sentiments and analyze the emotions of the remainder of the text. Un-Superwised learning is linguistic approach during which text is first tokenized into tokens and added with tags to guage the emotions of the text. How to get many data to evaluate: 1) Social sites a) Facebook.com b) Twitter.com c) LinkedIn.com d) LinkedIn.com 2) News websites and comments 3) Movie reviewing sites 4) Products selling sites i) Flipkart ii) Snapdeal 5) Blogs etc. 6) Techniques used presently are: a) Machin Learning i) Logistic Regression Text Structure: 1. An array of sents/sentences 2. Each sent is again tokenized called tokens 3. Each word or token is padded with 2 other tags in dictionary format. These added tags make each token to be recognized as verbs, nouns, adjectives, adverbs etc. to verify if that token is polar word or not. 4. Separate datasets are there so that each token can be matched with words present in the datasets. First, collection of data could also be a priority.Useful data is what's required before analyzing the data. Sentiment analysis is performed on the info which is a few product or review and user wants to understand about if it's good or not. Sentiments can have various sorts of polarity or emotions about something. Summarizing the opinions is additionally one among the good concern for today's researchers. summarizing the emotions doesn't affect subset of text or its one a part of text to be printed. It is printing the info with a particular sense in fewer number of words and it also contains the topic of the text. 3. CONCLUSIONS In conclusion the tool developed by us will be a simple showcase of a system which will have a number of applications in the near future. With the shift of advertising and marketing from print to digital and social media, sentiment analysis will have a huge role in deciding how to push products to the consumers and how to interact with them and twitter are going to be one among the most platforms for users to take advantage of this untapped market  APPLICATIONS AND FUTURE SCOPE 1. Feedback on Pilot Releases And Beta Versions: When a corporation releases a replacement product or service, it's released as a pilot or beta version. The monitoring of public feedback at this stage is extremely crucial. So, text mining from social media platforms and review sections greatly helps accelerate this process. 2. Employee Feedback: Sentimental analysis also can be wont to receive feedback from the workers of the corporate and analyze their emotions and attitude towards their job. And to work out whether or not they are satisfied with their job or not.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1625 3. Better Services: Text mining can provide a filter about, which service of the corporate is getting more negative feedback. This will help the corporate to understand, what are the issues arising thereupon particular service. And supported this information the corporate can rectify these problems.  RESULT AND DISCUSSIONS In conclusion the tool developed by us will be a simple showcase of a system which will have a number of applications in the near future. With the shift of advertising and marketing from print to digital and social media, sentiment analysis will have a huge role in deciding how to push products to the consumers and how to interact with them and twitter are going to be one among the most platforms for users to take advantage of this untapped market REFERENCES 1. “Sentiment Analysis of Twitter Data” by El_Rahman, AlOtaibi and AlShehri (IEEE 2019 ) 2. “Sentiment Analysis of Polarity in Product Reviews In Social Media” by Marium Nafees, Hafsa Dar, Ikram Ullah Lali, Salman Tiwana (IEEE 2018 ) 3. Chen, Y., & Zhang, Z. (2018). Research on text sentiment analysis supported on CNNs and SVM. 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA). 4. “Investigating sentiment analysis using machine learning approach” , Proceedings of the International Conference on Intelligent Sustainable Systems (ICISS 2017) IEEEXploreCompliant- PartNumber:CFP17M19- ART, ISBN:978-1-5386-1959-9 5. https://pandas.pydata.org/pandas-docs/stable/ https://www.nltk.org/ 6. https://docs.python.org/ 7. https://scikit-learn.org/ 8. https://machinelearningmastery.com/logistic- regression-for-machine-learning/ 9. https://www.coursera.org/learn/machine-learning/ 10. https://elitedatascience.com/feature-engineering 11. https://www.analyticsvidhya.com/blog/2018/07/hand s-on-sentiment-analysis-dataset-python/ 12. https://towardsdatascience.com/sentiment-analysis- concept-analysis-and-applications-6c94d6f58c17 13. https://machinelearningmastery.com/classification- accuracy-is-not-enough-more-performance-measures- you-can-use/ 14. https://machinelearningmastery.com/statistics-for- machine-learning-mini-course/