SlideShare a Scribd company logo
Presented By:
Lipika Sharma
Interaction
Hello
How are you?
I am great; thanks for asking.
How was your day?
Chatbots
Do you remember the chatbot you interacted with in last ?
https://www.pandorabots.com/mitsuku/
chatterbot-corpus/chatterbot_corpus/data/english at master · gunthercox/chatterbot-corpus ·
GitHub
What is NLP?
Natural language processing (NLP) is an integral part of AI, Computer Science,
and Linguistics. NLP is all about making computers/machines as intelligent as
human beings in the understanding of natural-communication language like text,
speech, and so on. It comprises 2 major functionalities. they are Human to machine
translation and Machine to Human translation.
Applications of NLP
•Email filters. Email filters are one of the most basic and
initial applications of NLP online. ...
•Smart assistants. ...
•Search results. ...
•Predictive text. ...
•Language translation. ...
•Digital phone calls. ...
•Data analysis. ...
•Text analytics.
Modelling
Techniques
Data Preprocessing
Tokenization
Stop Words Removal
Stemming
Lemmatization
Bag of Words
TF-IDF
Word Embeddings
Sentiment Analysis
Steps towards NLP
Tool Used - Python
Python is a high-level, interpreted, general-purpose
programming language.
Its design philosophy emphasizes code readability with the use
of significant indentation.
Python Library
• NumPy
• Pandas
• Matplotlib
• Seaborn
• NLTK
Art to read the data
Data preprocessing is a data mining
technique which is used to transform
the raw data in a useful and efficient
format..
Demo -
Tokenization –
Tokenization is a process by which sensitive data elements such
as PANs, Personally Identifiable Information elements, etc. are
replaced by surrogate values, or tokens. Tokenization (or
“masking”, or “obfuscation”) means some form of format-
preserving data protection: converting sensitive values into non-
sensitive, replacement values – tokens – the same length and
format of the original data.
•Tokens share some characteristics with the original data elements, such
as format, length, etc
•Each data element is mapped to a unique token.
•Tokens are deterministic: repeatedly generating a token for a given
value yields the same token.
•A tokenized database can be searched by tokenizing the query terms
and searching for those.
Demo
Stemming –
Stemming is the process of reducing a word to its word
stem that affixes to suffixes and prefixes or to the roots of
words known as a lemma.
Advantage of Stemming
• Stemming is a useful "normalization" technique for words
• Stemming is used in information retrieval systems like search engines.
• It is used to determine domain vocabularies in domain analysis.
• Stemming is faster because it chops words
Fun Fact -
• Google search adopted a word stemming in 2003.
Previously a search for “fish” would not have returned
“fishing” or “fishes”.
Demo
Lemmatization –
Lemmatization is a text normalization technique used
in Natural Language Processing (NLP). Essentially,
lemmatization is a technique that switches any kind of
a word to its base root mode. (Lemma)
Difference
Stemming is a process that stems or removes last few
characters from a word, often leading to incorrect
meanings and spelling.
Lemmatization considers the context and converts the
word to its meaningful base form, which is called Lemma.
Stemming vs Lemmatization
Stemming
• Stemming is a process that stems
or removes last few characters
from a word, often leading to
incorrect meanings and spelling.
• For instance, stemming the word
‘Caring‘ would return ‘Car‘.
• Stemming is used in case of large
dataset where performance is an
issue.
• It is faster to process
Lemmatization
• Lemmatization considers the
context and converts the word to
its meaningful base form, which is
called Lemma.
• For instance, lemmatizing the word
‘Caring‘ would return ‘Care‘.
• Lemmatization is computationally
expensive since it involves look-up
tables and what not.
• It is slower
Demo
Stop Words–
Stop words are a set of commonly used words in a language.
Examples of stop words in English are “a”, “the”, “is”, “are” and
etc. Stop words are commonly used in Text Mining and Natural
Language Processing (NLP) to eliminate words that are so
commonly used that they carry very little useful information.
Sample Text with Stop
Words
Sample Text without
Stop Words
Aarush Coaching Classes – A stem
learning place for kids
Aarush Coaching Classes, Stem,
Learning, Place, kids
Can Listening be exhausting ? Listening, Exhausting
I like Teaching, so I teach Like, Teaching, Teach
Stop Words Example
Demo
Modelling Techniques in NLP
Bag of Words
TF-IDF
Word Embeddings
Sentiment Analysis
Bag of Words
A bag-of-words is a representation of text that
describes the occurrence of words within a
document. It involves two things: A vocabulary of
known words. A measure of the presence of
known words.
The Bag-of-words model is an
orderless document representation —
only the counts of words matter. For
instance, in the above example "John
likes to watch movies. Mary likes
movies too", the bag-of-words
representation will not reveal that the
verb "likes" always follows a person's
name in this text.
Bag of Words - Example
TF-IDF
TF -IDF short for term frequency–inverse
document frequency, is a numerical statistic that
is intended to reflect how important a word is to
a document in a collection or corpus.
TF –IDF Explanation
• TF – IDF is multiplication of two values TF and IDF
• TF is the frequency of term divided by a total number of
terms in the document
• IDF is obtained by dividing the total number of
documents by the number of documents containing the
term and then taking the logarithmic of that quotient.
Formula
Steps
NLP PPT.pptx
NLP PPT.pptx
NLP PPT.pptx
That's it 😃! the text is now ready to feed into a machine learning
algorithm.
Word Embeddings
A word embedding is a learned representation for text
where words that have the same meaning have a similar
representation.
Types
Word Embeddings Types
Word2vec Glove fastText
Sentiment Analysis
Sentiment analysis, also referred to as opinion mining, is an approach to
natural language processing (NLP) that identifies the emotional tone
behind a body of text..
“I really like the new design of your website!” → Positive
“The new design is awful!” → Negative
NLP PPT.pptx
Machine Learning Algorithm
https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
Reference :
Sentiment Analysis - Demo
Components of NLP
Difference
Phases of NLP
• Less costly than employing human staff
• Provides quicker customer service response times
• Easy to implement)
Advantages of NLP
Adieu in NLP Style
https://github.com/lipika-tech
Connect with me :
https://www.youtube.com/c/aarushcoachingclasses

More Related Content

NLP PPT.pptx

  • 2. Interaction Hello How are you? I am great; thanks for asking. How was your day?
  • 3. Chatbots Do you remember the chatbot you interacted with in last ? https://www.pandorabots.com/mitsuku/ chatterbot-corpus/chatterbot_corpus/data/english at master · gunthercox/chatterbot-corpus · GitHub
  • 4. What is NLP? Natural language processing (NLP) is an integral part of AI, Computer Science, and Linguistics. NLP is all about making computers/machines as intelligent as human beings in the understanding of natural-communication language like text, speech, and so on. It comprises 2 major functionalities. they are Human to machine translation and Machine to Human translation.
  • 5. Applications of NLP •Email filters. Email filters are one of the most basic and initial applications of NLP online. ... •Smart assistants. ... •Search results. ... •Predictive text. ... •Language translation. ... •Digital phone calls. ... •Data analysis. ... •Text analytics.
  • 6. Modelling Techniques Data Preprocessing Tokenization Stop Words Removal Stemming Lemmatization Bag of Words TF-IDF Word Embeddings Sentiment Analysis Steps towards NLP
  • 7. Tool Used - Python Python is a high-level, interpreted, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
  • 8. Python Library • NumPy • Pandas • Matplotlib • Seaborn • NLTK
  • 9. Art to read the data Data preprocessing is a data mining technique which is used to transform the raw data in a useful and efficient format.. Demo -
  • 10. Tokenization – Tokenization is a process by which sensitive data elements such as PANs, Personally Identifiable Information elements, etc. are replaced by surrogate values, or tokens. Tokenization (or “masking”, or “obfuscation”) means some form of format- preserving data protection: converting sensitive values into non- sensitive, replacement values – tokens – the same length and format of the original data.
  • 11. •Tokens share some characteristics with the original data elements, such as format, length, etc •Each data element is mapped to a unique token. •Tokens are deterministic: repeatedly generating a token for a given value yields the same token. •A tokenized database can be searched by tokenizing the query terms and searching for those.
  • 12. Demo
  • 13. Stemming – Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma.
  • 14. Advantage of Stemming • Stemming is a useful "normalization" technique for words • Stemming is used in information retrieval systems like search engines. • It is used to determine domain vocabularies in domain analysis. • Stemming is faster because it chops words
  • 15. Fun Fact - • Google search adopted a word stemming in 2003. Previously a search for “fish” would not have returned “fishing” or “fishes”.
  • 16. Demo
  • 17. Lemmatization – Lemmatization is a text normalization technique used in Natural Language Processing (NLP). Essentially, lemmatization is a technique that switches any kind of a word to its base root mode. (Lemma)
  • 18. Difference Stemming is a process that stems or removes last few characters from a word, often leading to incorrect meanings and spelling. Lemmatization considers the context and converts the word to its meaningful base form, which is called Lemma.
  • 19. Stemming vs Lemmatization Stemming • Stemming is a process that stems or removes last few characters from a word, often leading to incorrect meanings and spelling. • For instance, stemming the word ‘Caring‘ would return ‘Car‘. • Stemming is used in case of large dataset where performance is an issue. • It is faster to process Lemmatization • Lemmatization considers the context and converts the word to its meaningful base form, which is called Lemma. • For instance, lemmatizing the word ‘Caring‘ would return ‘Care‘. • Lemmatization is computationally expensive since it involves look-up tables and what not. • It is slower
  • 20. Demo
  • 21. Stop Words– Stop words are a set of commonly used words in a language. Examples of stop words in English are “a”, “the”, “is”, “are” and etc. Stop words are commonly used in Text Mining and Natural Language Processing (NLP) to eliminate words that are so commonly used that they carry very little useful information.
  • 22. Sample Text with Stop Words Sample Text without Stop Words Aarush Coaching Classes – A stem learning place for kids Aarush Coaching Classes, Stem, Learning, Place, kids Can Listening be exhausting ? Listening, Exhausting I like Teaching, so I teach Like, Teaching, Teach Stop Words Example
  • 23. Demo
  • 24. Modelling Techniques in NLP Bag of Words TF-IDF Word Embeddings Sentiment Analysis
  • 25. Bag of Words A bag-of-words is a representation of text that describes the occurrence of words within a document. It involves two things: A vocabulary of known words. A measure of the presence of known words.
  • 26. The Bag-of-words model is an orderless document representation — only the counts of words matter. For instance, in the above example "John likes to watch movies. Mary likes movies too", the bag-of-words representation will not reveal that the verb "likes" always follows a person's name in this text. Bag of Words - Example
  • 27. TF-IDF TF -IDF short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.
  • 28. TF –IDF Explanation • TF – IDF is multiplication of two values TF and IDF • TF is the frequency of term divided by a total number of terms in the document • IDF is obtained by dividing the total number of documents by the number of documents containing the term and then taking the logarithmic of that quotient.
  • 30. Steps
  • 34. That's it 😃! the text is now ready to feed into a machine learning algorithm.
  • 35. Word Embeddings A word embedding is a learned representation for text where words that have the same meaning have a similar representation.
  • 37. Sentiment Analysis Sentiment analysis, also referred to as opinion mining, is an approach to natural language processing (NLP) that identifies the emotional tone behind a body of text.. “I really like the new design of your website!” → Positive “The new design is awful!” → Negative
  • 45. • Less costly than employing human staff • Provides quicker customer service response times • Easy to implement) Advantages of NLP
  • 46. Adieu in NLP Style https://github.com/lipika-tech Connect with me : https://www.youtube.com/c/aarushcoachingclasses