NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnn

Definition
• Natural Language Processing, or NLP, is the sub-field of AI that is
focused on enabling computers to understand and process human
languages.
• AI is a subfield of Linguistics, Computer Science, Information
Engineering, and Artificial Intelligence concerned with the
interactions between computers and human (natural) languages, in
particular how to program computers to process and analyse large
amounts of natural language data.

• NLP is used to analyze text, allowing machines to understand how
human’s speak. This human-computer interaction enables real-world
applications like automatic text summarization, sentiment
analysis, topic extraction, named entity recognition, parts-of-speech
tagging, relationship extraction, stemming, and more.
• NLP is commonly used for text mining, machine translation,
and automated question answering.

Applications of Natural Language Processing
• Automatic Summarization: Information overload is a real problem
when we need to access a specific, important piece of information
from a huge knowledge base.
• Automatic summarization is relevant not only for summarizing the
meaning of documents and information, but also to understand the
emotional meanings within the information, such as in collecting data
from social media.
• Automatic summarization is especially relevant when used to provide
an overview of a news item or blog post, while avoiding redundancy
from multiple sources and maximizing the diversity of content
obtained.

Sentiment Analysis
The goal of sentiment analysis is to identify sentiment among several
posts or even in the same post where emotion is not always explicitly
expressed. Companies use Natural Language Processing applications,
such as sentiment analysis, to identify opinions and sentiment online to
help them understand what customers think about their products and
services (i.e., “I love the new iPhone” and, a few lines later “But
sometimes it doesn’t work well” where the person is still talking about
the iPhone) and overall indicators of their reputation.
Beyond determining simple polarity, sentiment analysis understands
sentiment in context to help better understand what’s behind an
expressed opinion, which can be extremely relevant in understanding
and driving purchasing decisions.

Text classification
• Text classification makes it possible to assign predefined categories to
a document and organize it to help you find the information you need
or simplify some activities. For example, an application of text
categorization is spam filtering in email.

Virtual Assistants:
• Nowadays Google Assistant, Cortana, Siri, Alexa, etc have become an
integral part of our lives. Not only can we talk to them but they also
have the abilities to make our lives easier.
• By accessing our data, they can help us in keeping notes of our tasks,
make calls for us, send messages and a lot more.
• With the help of speech recognition, these assistants can not only
detect our speech but can also make sense out of it. According to
recent researches, a lot more advancements are expected in this field
in the near future.

Project Cycle: Cognitive Behavioural therapy
(CBT)

The Scenario
• The world is competitive nowadays. People face competition in even the
tiniest tasks and are expected to give their best at every point in time.
When people are unable to meet these expectations, they get stressed and
could even go into depression.
• We get to hear a lot of cases where people are depressed due to reasons
like peer pressure, studies, family issues, relationships, etc. and they
eventually get into something that is bad for them as well as for others.
• So, to overcome this, cognitive behavioural therapy (CBT) is considered to
be one of the best methods to address stress as it is easy to implement on
people and also gives good results.
• This therapy includes understanding the behaviour and mindset of a
person in their normal life. With the help of CBT, therapists help people
overcome their stress and live a happy life.

Problem Scoping
• CBT is a technique used by most therapists to cure patients out of
stress and depression. But it has been observed that people do not
wish to seek the help of a psychiatrist willingly. They try to avoid such
interactions as much as possible. Thus, there is a need to bridge the
gap between a person who needs help and the psychiatrist. Let us
look at various factors around this problem through the 4Ws problem
canvas.

NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnn

Data Acquisition
• To understand the sentiments of people, we need to collect their
conversational data so the machine can interpret the words that they
use and understand their meaning.
• Such data can be collected from various means:
• 1. Surveys
• 2. Onsite Observations
• 3. Databases available on the internet
• 4.Interviews, etc.

Data Exploration
• Once the textual data has been collected, it needs to be processed
and cleaned so that an easier version can be sent to the machine.
Thus, the text is normalised through various steps and is lowered to
minimum vocabulary since the machine does not require
grammatically correct statements but the essence of it.

Text Normalisation
• In Text Normalisation, we undergo several steps to normalise the text
to a lower level. Before we begin, we need to understand that in this
section, we will be working on a collection of written text. That is, we
will be working on text from multiple documents and the term used
for the whole textual data from all the documents altogether is
known as corpus. Not only would we go through all the steps of Text
Normalisation, we would also work them out on a corpus.
We need to first of all simplify them in order to make sure that
the understanding becomes possible. Text Normalisation helps in
cleaning up the textual data in such a way that it comes down to
a level where its complexity is lower than the actual data.

SENTENCE SEGMENTATION
• Under sentence segmentation, the whole corpus is divided into
sentences. Each sentence is taken as a different data so now the
whole corpus gets reduced to sentences.

Tokenisation
• After segmenting the sentences, each sentence is then further divided into
tokens. Tokens is a term used for any word or number or special character
occurring in a sentence. Under tokenisation, every word, number and special
character is considered separately and each of them is now a separate token.

• In this step, the tokens which are not necessary are removed from the
token list. What can be the possible words which we might not
require?
• Stopwords are the words which occur very frequently in the corpus
but do not add any value to it. Humans use grammar to make their
sentences meaningful for the other person to understand. But
grammatical words do not add any essence to the information which
is to be transmitted through the statement hence they come under
stopwords. Some examples of stopwords are:

These words occur the most in any given corpus but talk very little or nothing about the
context or the meaning of it. Hence, to make it easier for the computer to focus on
meaningful terms, these words are removed.

Sentence Segmentation
“You want to see the dreams with close eyes and
achieve them? They’ll remain dreams, look for
AIMs and your eyes have to stay open for a
change to be seen.”

After Sentence Segmentation
1.You want to see the dreams with close eyes and
achieve them?
2.They’ll remain dreams, look for AIMs and your
eyes have to stay open for a change to be seen.

1.You want to see the dreams with closed eyes and
achieve them?
You
achieve
to see the dreams with closed
eyes them ?
want
and

You want to see the dreams with closed eyes and
achieve them?
The Removed words would be
to, the, and, ?
The outcome would be:
You want see dreams with closed eyes achieve
them

You want see dreams with close eyes achieve
them
You
achieve
see dreams with closed
eyes them
want

STEMMING
• In this step, the remaining words are reduced to their root words. In
other words, stemming is the process in which the affixes of words
are removed and the words are converted to their base form.
• Stemming algorithms work by cutting off the end or the beginning of
the word, taking into account a list of common prefixes and suffixes
that can be found in an inflected word.

you
achieve
see dream with close
eye them
want
you
achieve
see dreams with closed
eyes them
want
After Stemming

Lemmatization is an organized & step by step procedure of
obtaining the root form of the word, it makes use of
morphological analysis (word structure and grammar
relations). Alongside, it is necessary to have detailed
dictionaries which the algorithm can look through to form its
lemma.

With this we have normalised our text to tokens which are the simplest
form of words present in the corpus.

Normalization of the given text:
Sentence Segmentation:
1. Raj and Vijay are best friends.They play
together with other friends. Raj likes to
play football but Vijay prefers to play
online games.Raj wants to become a
footballer. Vijay wants to become an
online gamer.

Normalization of the given text:
Sentence Segmentation:
1. Raj and Vijay are best friends.
2. They play together with other friends.
3. Raj likes to play football but Vijay prefers to play online games.
4. Raj wants to become a footballer.
5. Vijay wants to become an online gamer.

Removing Stop words, Special Characters :
In this step, the tokens which are not necessary are removed from the token list.
So, the words and, are, to, an, (Punctuation) will be removed.
Converting text to a common case:
After the stop words removal, we convert the whole text into a similar case, preferably
lower case.
1. raj vijay best friends
2. they play together with other friends
3. raj likes play football but vijay prefers play online games
4. raj wants become footballer
5. vijay wants become online gamer
Stemming:
In this step, t

1. With this we have normalised our text to tokens which are the simplest form of words
present in the corpus. Now it is time to convert the tokens into numbers. For this, we
would use the Bag of Words algorithm
2. Here calling this algorithm “bag” of words symbolises that the sequence of
sentences or tokens does not matter in this case as all we need are the unique
words and their frequencyin it.
3. Bag of Words Bag of Words is a Natural Language Processing model which helps in
extracting features out of the text which can be helpful in machine learning algorithms.
In bag of words, we get the occurrences of each word and the vocabulary for the
corpus.

• Let us assume that the text on the left in this image is the normalised corpus
which we have got after going through all the steps of text processing. Now,
as we put this text into the bag of words algorithm, the algorithm returns to
us the unique words out of the corpus and their occurrences in it.
• At the right, it shows us a list of words appearing in the corpus and the
numbers corresponding to it shows how many times the word has occurred in
the text body.
• Thus, we can say that the bag of words gives us two things:
1. A vocabulary of words for the corpus
2. The frequency of these words (number of times it has occurred in the
whole corpus).

The step-by-step approach to implement bag of words algorithm:
1. Text Normalisation: Collect data and pre-process it
2. Create Dictionary: Make a list of all the unique words occurring in the corpus. (Vocabulary)
3. Create document vectors: For each document in the corpus, find out how many times the
word from the unique list of words has occurred.
4. Create document vectors for all the documents.

Step 2: Create Dictionary Go through all the steps and create a dictionary i.e., list down all the
words which occur in all three documents:
Dictionary:
Note that even though some words are repeated in different documents, they are all written
just once as while creating the dictionary, we create the list of unique words.
Step 3: Create document vector
In this step, the vocabulary is written in the top row.
Now, for each word in the document, if it matches with the vocabulary, put a 1 under it.
If the same word appears again, increment the previous value by 1.
And if the word does not occur in that document, put a 0 under it.

Since in the first document, we have words: aman, and, anil, are, stressed. So, all
these words get a value of 1 and rest of the words get a 0 value.
tep 4: Repeat for all documents (Same exercise has to be done for all the documents.) Hence, the
table becomes:
In this table, the header row contains the vocabulary of the corpus and three rows correspond
to three different documents. Take a look at this table and analyse the positioning of 0s and 1s in
it. Finally, this gives us the document vector table for our corpus.

•"Jumped" stems to "jump"
•"Running" stems to "run"
•"Swimming" stems to "swim"
2.Stemming of Nouns:
•"Cats" stems to "cat"
•"Houses" stems to "house"
•"Apples" stems to "appl"
3.Stemming of Adjectives:
•"Faster" stems to "fast"
•"Brightest" stems to "bright"
•"Happier" stems to "happi"
4.Stemming of Adverbs:
•"Quickly" stems to "quick"
•"Badly" stems to "bad"
5.Stemming of Suffixes:
•"Unhappiness" stems to "unhappi"
•"Friendly" stems to "friend“
•Stemming of Words ending with y
•Study stems to Studi
•Happy stems to Happi
•Copy stems to copi
•Journey stems to journei
6.Stemming of Irregular Words (note that stemming may not handle irregular words well):
•"Mice" stems to "mice" (ideally, it should be "mouse")
•"Men" stems to "men" (ideally, it should be "man")

Here is a list of common English suffixes:
1.-s or -es: Plural marker (e.g., cats, dogs).
2.-ed: Past tense marker (e.g., walked, played).
3.-ing: Present participle or gerund marker (e.g., running,
swimming).
4.-er: Comparative form (e.g., faster, smarter).
5.-est: Superlative form (e.g., fastest, smartest).
6.-ly: Adverb marker (e.g., quickly, happily).
7.-ful: Full of, characterized by (e.g., beautiful, helpful).
8.-less: Without (e.g., fearless, powerless).
9.-ment: State or quality (e.g., government, excitement).
10.-tion or -sion: Action or process (e.g., celebration, decision).
11.-able or -ible: Capable of, fit for (e.g., comfortable, invisible).
12.-ity or -ty: State or quality (e.g., authenticity, responsibility).
13.-ize or -ise: Form a verb (e.g., organize, realize).
14.-al: Relating to, pertaining to (e.g., cultural, natural).
15.-ish: Having the quality of (e.g., childish, selfish).
16.-ous: Full of, characterized by (e.g., dangerous, famous).
17.-ize: Convert into (e.g., vaporize, memorize).
18.-ful: Having the quality of (e.g., hopeful, joyful).
19.-ance or -ence: State or quality (e.g., importance, existence).
20.-ology: The study of (e.g., biology, psychology).

In natural language processing (NLP), stemming is typically not applied to adverbs like "very" because
stemming is primarily used for reducing words to their base or root forms, and adverbs don't have typical
inflectional variations. Adverbs, including "very," are not typically subjected to stemming because they
don't have prefixes or suffixes that need to be removed to find a base form.

NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnn

Related slideshows

More Related Content

NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnn