CSA4006: Natural
Language Processing
Winter Semester 2024-25
SCOPE
VIT Bhopal University, Sehore
Slot:
Class No.:
1
Instructor Contacts
● Instructor:
Dr. Amit Kumar
AI Researcher- Generative AI and NLP
Assistant Professor
Office: C-526
amitkumar@[Link]
2
Module-1 and Some parts of Module-3
3
Natural Language Processing
❏ Natural Language Processing (NLP) is a subfield of artificial
intelligence (AI) focused on enabling machines to understand,
interpret, and generate human language.
❏ It involves both computational linguistics and machine learning
techniques to process and analyze large amounts of natural
language data.
Key components of NLP
❏ Text Preprocessing: This involves cleaning and preparing text data, which may
include tokenization (breaking down text into words or sentences), stemming
(reducing words to their root form), and lemmatization (grouping different
forms of a word).
Example: Break a sentence into individual words or tokens.
Task: Given the sentence, "Natural language processing is fun," break it down
into a list of words (tokens).
Output: ["Natural", "language", "processing", "is", "fun"]
Key components of NLP (Continue..)
❏ Syntax and Parsing: NLP systems analyze the structure of sentences, including
parts of speech (e.g., noun, verb, adjective) and grammatical relationships.
Parsing helps understand sentence structure and how different parts of a
sentence relate to each other.
Example: Identify parts of speech in a sentence, such as nouns, verbs, adjectives, etc.
Task: For the sentence "The dog runs fast," identify the POS tags.
Output: [('The', 'DT'), ('dog', 'NN'), ('runs', 'VBZ'), ('fast', 'RB')]
● DT = determiner
● NN = noun, singular
● VBZ = verb, 3rd person singular present
● RB = adverb
Key components of NLP (Continue..)
❏ Named Entity Recognition (NER): NER involves identifying proper names, such
as names of people, organizations, locations, dates, and other entities within a
text.
Example: Identify named entities in a text, such as names of people, organizations,
locations, etc.
Task: Extract named entities from the sentence "Barack Obama was born in Hawaii."
Output:
● Person: Barack Obama
● Location: Hawaii
Key components of NLP (Continue..)
❏ Sentiment Analysis: This technique assesses the emotional tone behind a body
of text, classifying it as positive, negative, or neutral. It is often used in social
media monitoring, reviews, and customer feedback analysis.
● Example: Determine the sentiment (positive, negative, or neutral) of a given text.
● Task: Given the sentence, "I love this phone! It's amazing," classify the sentiment.
● Output: Positive
Why it's useful: Sentiment analysis is widely used for customer reviews, social media
monitoring, and understanding public opinion.
Key components of NLP (Continue..)
❏ Machine Translation: NLP models can automatically translate text from one
language to another, leveraging techniques such as statistical methods and
neural networks.
● Example: Translate a sentence from one language to another.
● Task: Translate the sentence "सभी को नमस्कार" (Hindi) to English.
● Output: Hello everyone
Why it's useful: Machine translation is a powerful tool for breaking down language barriers
in communication and is the foundation of tools like Google Translate.
Key components of NLP (Continue..)
❏ Question Answering (QA): NLP systems can interpret and respond to questions
posed in natural language, often by extracting answers from a corpus of
documents.
Key components of NLP (Continue..)
Example 1: Extractive Question Answering
In this type of QA, the answer is directly taken from the provided context.
Context:
"Albert Einstein was a theoretical physicist who developed the theory of relativity. He was born in Ulm, Germany,
in 1879 and is regarded as one of the most influential scientists of the 20th century."
Question:
"Where was Albert Einstein born?"
Answer:
"Ulm, Germany"
Explanation: The system identifies the location "Ulm, Germany" in the context and returns it as the
answer.
Key components of NLP (Continue..)
Example 2: Generative Question Answering
In generative QA, the answer is generated based on understanding the question and the context, rather than being directly
extracted.
Context:
"Leonardo da Vinci was a Renaissance artist, inventor, and polymath, known for masterpieces like the 'Mona Lisa' and 'The Last Supper.'
He also made contributions to various fields such as engineering, anatomy, and architecture."
Question:
"What was Leonardo da Vinci famous for?"
Answer:
"Leonardo da Vinci was famous for his works of art like the 'Mona Lisa' and 'The Last Supper.'"
Explanation: The system generates a response based on a broader understanding of the context, rather than simply extracting
a phrase from the text.
Key components of NLP (Continue..)
❏ Text Generation: This includes generating human-like text based on a given
prompt. Modern models like GPT (Generative Pretrained Transformer) use deep
learning to produce coherent and contextually appropriate responses.
Example: Generate new text based on a given prompt.
● Task: Given the prompt "Once upon a time," generate the next part of the story.
● Output: Once upon a time, there was a small village surrounded by beautiful
mountains.
Why it's useful: Text generation is used in applications like story writing, chatbots, and
creative writing tools.
Key components of NLP (Continue..)
❏ Speech Recognition: NLP is also used in speech-to-text systems, which convert
spoken language into written text. This technology underpins virtual assistants
like Siri, Google Assistant, and Alexa.
❏ Language Modeling: Language models predict the likelihood of a sequence of
words occurring in a given context, allowing for more accurate speech and text
generation.
Key components of NLP (Continue..)
❏ Word Frequency Analysis:
● Example: Analyze the frequency of words in a document.
● Task: Count the frequency of words in the sentence "I love ice cream and I eat ice cream often."
● Output:
○ "I": 2
○ "love": 1
○ "ice": 2
○ "cream": 2
○ "and": 1
○ "eat": 1
○ "often": 1
Why it's useful: Word frequency analysis helps in identifying important terms and can be useful in
document clustering or topic modeling.
Key components of NLP (Continue..)
❏ Word Embeddings
● Example: Represent words as vectors (numerical values) in high-dimensional
space.
● Task: Use a pre-trained word embedding (like Word2Vec or GloVe) to find
similar words to "king."
● Output: Words like queen, prince, monarch, etc., which have similar meanings.
Why it's useful: Word embeddings capture semantic meaning and relationships
between words, enabling applications like recommendation systems and question
answering.
Tools and Libraries for Beginner
❏ NLTK (Natural Language Toolkit): A popular Python library for working with
human language data.
❏ spaCy: An NLP library that's efficient and easy to use for tasks like POS tagging,
NER, and dependency parsing.
❏ TextBlob: A simpler NLP library good for beginners, with built-in functions for
sentiment analysis, translation, and part-of-speech tagging.
❏ Hugging Face Transformers: A more advanced library with pre-trained models
for tasks like text generation and machine translation.
Regular Expressions
Regular expressions are used everywhere
❏ Part of every text processing task
❏ Not a general NLP solution (for that we use large NLP systems
we will see in later lectures)
❏ But very useful as part of those systems (e.g., for
pre-processing or text formatting)
❏ Necessary for data analysis of text data
❏ A widely used tool in industry and academics
Regular expressions
A formal language for specifying text strings
How can we search for mentions of these cute animals in text?
❏ woodchuck
❏ woodchucks
❏ Woodchuck
❏ Woodchucks
❏ Groundhog
❏ groundhogs
Regular Expressions: Disjunctions
Letters inside square brackets []
Ranges using the dash [A-Z]
Regular Expressions: Negation in Disjunction
❏Carat as first character in [ ] negates the list
❏ Note: Carat means negation only when it's first in []
❏ Special characters (., *, +, ?) lose their special meaning inside []
Regular Expressions: Convenient aliases
Regular Expressions: More Disjunction
❏Groundhog is another name for woodchuck!
❏The pipe symbol | for disjunction
❏
Wildcards, optionality, repetition: . ? * +
Regular Expressions: Anchors ^ $
A note about Python regular expressions
❏ Regex and Python both use backslash "\" for special characters.
You must type extra backslashes!
❏ "\\d+" to search for 1 or more digits
❏ "\n" in Python means the "newline" character, not a "slash"
followed by an "n". Need "\\n" for two characters.
❏ Instead: use Python's raw string notation for regex:
❏ r"[tT]he"
❏ r"\d+" matches one or more digits
❏ instead of "\\d+"
The iterative process of writing regex's
Find me all instances of the word “the” in a text.
the
Misses capitalized examples
[tT]he
Incorrectly returns other or Theology
\W[tT]he\W
False positives and false negatives
The process we just went through was based on
fixing two kinds of errors:
1. Not matching things that we should have matched (The)
False negatives
2. Matching strings that we should not have matched
(there, then, other)
False positives
Characterizing work on NLP
In NLP we are always dealing with these kinds of
errors.
Reducing the error rate for an application often
involves two antagonistic efforts:
❏ Increasing coverage (or recall) (minimizing false
negatives).
❏ Increasing accuracy (or precision) (minimizing false
positives)
Regular expressions play a surprisingly large
role
Widely used in both academics and industry
1. Part of most text processing tasks, even for big
neural language model pipelines
◦ including text formatting and pre-processing
2. Very useful for data analysis of any text data
{m, n} – Braces
❏ Braces match any repetitions preceding regex from m to n
both inclusive.
❏ For example –
❏ a{2, 4} will be matched for the string aaab, baaaac, gaad,
but will not be matched for strings like abc, bc because
there is only one a or no a in both the cases.
31
(<regex>) – Group
❏ Group symbol is used to group sub-patterns.
❏ For example –
❏ (a|b)cd will match for strings like acd, abcd,
gacd, etc.
32
Special Sequences
❏ Special sequences do not match for the actual
character in the string instead it tells the specific
location in the search string where the match
must occur.
❏ It makes it easier to write commonly used
patterns.
33
List of special sequences
34
List of special sequences (Continue..)
35
36
RegEx Functions
❏ re module contains many functions that help us
to search a string for a match.
37
38
re.findall()
❏ Return all non-overlapping matches of pattern in string, as a list of strings.
❏ The string is scanned left-to-right, and matches are returned in the order found.
import re
string = """Hello my Number is 123456789 and
my friend's number is 987654321"""
regex = '\d+'
match = re.findall(regex, string)
print(match)
39
[Link]()
❏ Regular expressions are compiled into pattern objects, which have
methods for various operations such as searching for pattern matches
or performing string substitutions.
import re
p = [Link]('[a-e]')
print(p.findall("Aye, said Mr. Gibenson Stark"))
40
❏ Metacharacter backslash ‘\’ has a very important
role as it signals various sequences.
❏ If the backslash is to be used without its special
meaning as metacharacter, use ’\\’
41
❏ Set class [\s,.] will match any whitespace character, ‘,’, or, ‘.’ .
import re
p = [Link]('\d')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))
p = [Link]('\d+')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))
42
import re
p = [Link]('\w')
print(p.findall("He said * in some_lang."))
p = [Link]('\w+')
print(p.findall("I went to him at 11 A.M., he \
43
said *** in some_language."))
p = [Link]('\W')
print(p.findall("he said *** in some_language."))
import re
p = [Link]('ab*')
print(p.findall("ababbaabbb"))
44
[Link]()
❏ Split string by the occurrences of a character or a
pattern, upon finding that pattern, the remaining
characters from the string are returned as part of
the resulting list.
❏ [Link](pattern, string, maxsplit=0, flags=0)
45
❏ The First parameter, pattern denotes the regular expression
❏ string is the given string in which pattern will be searched for and in
which splitting occurs,
❏ maxsplit if not provided is considered to be zero ‘0’, and if any nonzero
value is provided, then at most that many splits occur. If maxsplit = 1,
then the string will split once only, resulting in a list of length 2.
❏ The flags are very useful and can help to shorten code, they are not
necessary parameters, eg: flags = [Link], in this split, the case,
i.e. the lowercase or the uppercase will be ignored.
46
from re import split
print(split('\W+', 'Words, words , Words'))
print(split('\W+', "Word's words Words"))
print(split('\W+', 'On 12th Jan 2016, at 11:02 AM'))
print(split('\d+', 'On 12th Jan 2016, at 11:02 AM'))
47
import re
print([Link]('\d+', 'On 12th Jan 2016, at 11:02 AM', 1))
print([Link]('[a-f]+', 'Aey, Boy oh boy, come here',
flags=[Link]))
print([Link]('[a-f]+', 'Aey, Boy oh boy, come here'))
48
[Link]()
❏ The ‘sub’ in the function stands for SubString, a certain
regular expression pattern is searched in the given
string(3rd parameter), and upon finding the substring
pattern is replaced by repl(2nd parameter), count checks
and maintains the number of times this occurs.
❏ [Link](pattern, repl, string, count=0, flags=0)
49
[Link]()
import re
print([Link]('ub', '~*', 'Subject has Uber booked already',
flags=[Link]))
print([Link]('ub', '~*', 'Subject has Uber booked already'))
print([Link]('ub', '~*', 'Subject has Uber booked already',
count=1, flags=[Link]))
50
print([Link](r'\sAND\s', ' & ', 'Baked Beans And Spam',
flags=[Link]))
[Link]()
❏ This method either returns None (if the pattern doesn’t
match), or a [Link] contains information about
the matching part of the string.
❏ This method stops after the first match, so this is best
suited for testing a regular expression more than
extracting data.
❏ Example: Searching for an occurrence of the pattern
51
[Link]()
import re
regex = r"([a-zA-Z]+) (\d+)"
match = [Link](regex, "I was born on June 24")
if match != None:
print ("Match at index %s, %s" % ([Link](), [Link]()))
52
print ("Full match: %s" % ([Link](0)))
print ("Month: %s" % ([Link](1)))
print ("Day: %s" % ([Link](2)))
Match Object
❏ A Match object contains all the information about the
search and the result and if there is no match found then
None will be returned.
1. Getting the string and the regex
[Link] attribute returns the regular expression passed
and [Link] attribute returns the string passed.
53
Getting the string and the regex
import re
s = "Welcome to Class"
res = [Link](r"\bC", s)
print([Link])
print([Link])
54
Getting index of matched object
❏ start() method returns the starting index of the matched
substring
❏ end() method returns the ending index of the matched substring
❏ span() method returns a tuple containing the starting and the
ending index of the matched substring
55
Getting index of matched object
import re
s = "Welcome to Class"
res = [Link](r"\bCla", s)
56
print([Link]())
print([Link]())
Getting matched substring
❏ group() method returns the part of the string for which the patterns
match.
import re
s = "Welcome to Class"
res = [Link](r"\D{2} t", s)
print([Link]())
57
Alan Turing (1912-1954)
(A pioneer of automata theory)
■ Father of Modern Computer Science
■ English mathematician
■ Studied abstract machines called
Turing machines even before computers
existed
■ Heard of the Turing test?
Languages & Grammars
■ Languages: “A language is a
collection of sentences of finite
length all constructed from a
finite alphabet of symbols”
■ Grammars: “A grammar can be
regarded as a device that
enumerates the sentences of a
language” - nothing more,
nothing less
■ N. Chomsky, Information and
Control, Vol 2, 1959
Image source: Nowak et al. Nature, vol 417, 2002
Alphabet
An alphabet is a finite, non-empty set of
symbols
■ We use the symbol ∑ (sigma) to denote an
alphabet
■ Examples:
■ Binary: ∑ = {0,1}
■ All lower case letters: ∑ = {a,b,c,..z}
■ Alphanumeric: ∑ = {a-z, A-Z, 0-9}
■ DNA molecule letters: ∑ = {a,c,g,t}
Strings
A string or word is a finite sequence of symbols chosen
from ∑
■ Empty string is ε (or “epsilon”)
■ Length of a string w, denoted by “|w|”, is equal to the
number of (non- ε) characters in the string
■ E.g., x = 010100 |x| = 6
■ x = 01 ε 0 ε 1 ε 00 ε |x| = ?
■ xy = concatentation of two strings x and y
Powers of an alphabet
Let ∑ be an alphabet.
■ ∑k = the set of all strings of length k
■ ∑* = ∑0 U ∑1 U ∑2 U …
■ ∑+ = ∑ 1 U ∑ 2 U ∑ 3 U …
Languages
L is a said to be a language over alphabet ∑, only if L ⊆ ∑*
🡺 this is because ∑* is the set of all strings (of all possible length including 0) over
the given alphabet ∑
Examples:
1. Let L be the language of all strings consisting of n 0’s followed by n 1’s:
L = {ε, 01, 0011, 000111,…}
2. Let L be the language of all strings of with equal number of 0’s and 1’s:
L = {ε, 01, 10, 0011, 1100, 0101, 1010, 1001,…}
Definition: Ø denotes the Empty language
■ Let L = {ε}; Is L=Ø?
NO
The Membership Problem
Given a string w ∈∑*and a language L over ∑, decide
whether or not w ∈L.
Example:
Let w = 100011
Q) Is w ∈ the language of strings with equal
number of 0s and 1s?
Finite Automata : Examples
■ On/Off switch
■ Modeling recognition of the word “then”
Finite Automaton (FA)
■ Informally, a state diagram that comprehensively captures all
possible states and transitions that a machine can take while
responding to a stream or sequence of input symbols
■ Recognizer for “Regular Languages”
■ Deterministic Finite Automata (DFA)
■ The machine can exist in only one state at any given time
■ Non-deterministic Finite Automata (NFA)
■ The machine can exist in multiple states at the same time
Deterministic Finite Automata - Definition
■ A Deterministic Finite Automaton (DFA) consists of:
■ Q ==> a finite set of states
■ ∑ ==> a finite set of input symbols (alphabet)
■ q0 ==> a start state
■ F ==> set of accepting states
■ δ ==> a transition function, which is a mapping between Q x ∑
==> Q
■ A DFA is defined by the 5-tuple:
■ {Q, ∑ , q0,F, δ }
What does a DFA do on reading an input string?
■ Input: a word w in ∑*
■ Question: Is w acceptable by the DFA?
■ Steps:
■ Start at the “start state” q0
■ For every input symbol in the sequence w do
■ Compute the next state from the current state, given the
current input symbol in w and the transition function
■ If after all symbols in w are consumed, the current state is one
of the accepting states (F) then accept w;
■ Otherwise, reject w.
Regular Languages
■ Let L(A) be a language recognized by a DFA A.
■ Then L(A) is called a “Regular Language”.
Example #1
■ Build a DFA for the following language:
■ L = {w | w is a binary string that contains 01 as a substring}
■ Steps for building a DFA to recognize L:
■ ∑ = {0,1}
■ Decide on the states: Q
■ Designate start state and final state(s)
■ δ: Decide on the transitions:
■ “Final” states == same as “accepting states”
■ Other states == same as “non-accepting states”
Context-Free Languages (CFL)
❏ A language class larger than the class of regular languages
❏ Supports natural, recursive notation called “context-free grammar”
❏ Applications:
❏ Parse trees, compilers
❏ XML
An Example
❏ A palindrome is a word that reads identical from both ends
❏ E.g., madam, redivider, malayalam, 010010010
❏ But the language of palindromes…
❏ is a CFL, because it supports recursive substitution (in the form
of a Context Free Grammar (CFG))
But the language of palindromes is a CFL, because
❏ This is because we can construct a “grammar” like this:
❏ A⇒ε
❏ A ==> 0
❏ A ==> 1
❏ A ==> 0A0
❏ A ==> 1A1
How does the CFG for palindromes work?
❏ An input string belongs to the language (i.e., accepted) iff it can be
generated by the CFG
❏ Example: w=01110
❏ G can generate w as follows:
❏ A => 0A0
=> 01A10
=> 01110
Context-Free Grammar: Definition
A context-free grammar G=(V,T,P,S), where:
❏ V: set of variables or non-terminals
❏ T: set of terminals (= alphabet U {ε})
❏ P: set of productions, each of which is of the form
❏ V ==> 𝜶1 | 𝜶2 | …
❏ Where each 𝜶i is an arbitrary string of variables and terminals
❏ S ==> start variable
Example #2
❏ Language of balanced paranthesis
❏ e.g., ()(((())))((()))….
❏ CFG?
Example #3
❏ A grammar for L = {0m1n | m≥n}
❏ CFG?
CFG conventions
String membership
Simple Expressions…
Generalization of derivation
Context-Free Language
Ambiguity in NLP
❏ Ambiguity in Natural Language Processing (NLP) happens because
human language can have multiple meanings.
❏ Computers sometimes confuse to understand exactly what we
mean unlike humans, who can use intuition and background
knowledge to infer meaning, computers rely on precise algorithms
and statistical patterns.
Ambiguity
The sentence "The chicken is ready to eat" is ambiguous because it
can be interpreted in two different ways:
1. The chicken is cooked and ready to be eaten.
2. The chicken is hungry and ready to eat food.
This dual meaning arises from the structure of the sentence, which
does not clarify the subject's role (the eater or the one being eaten).
Resolving such ambiguities is essential for accurate NLP applications
like chatbots, translation, and sentiment analysis.
Types of Ambiguity in NLP
❏ The meaning of an ambiguous expression often depends on the
situation, prior knowledge, or surrounding words.
❏ For example: He is cool.
❏ This could mean he is calm under pressure or he is fashionable
depending on the context.
Lexical Ambiguity
❏ Lexical ambiguity occurs when a single word has multiple
meanings, making it unclear which meaning is intended in a
particular context. This is a common challenge in language.
❏ For example, the word "bat" can have two different meanings.
It could refer to a flying mammal, like the kind you might see
at night. Alternatively, "bat" could also refer to a piece of
sports equipment used in games like baseball or cricket.
Lexical Ambiguity
❏ For computers, determining the correct meaning of such a word
requires looking at the surrounding context to decide which
interpretation makes sense.
Syntactic Ambiguity
❏ Syntactic ambiguity occurs when the structure or grammar of a
sentence allows for more than one interpretation. This happens
because the sentence can be understood in different ways
depending on how it is put together.
❏ For example, take the sentence, “The boy kicked the ball in his
jeans.” This sentence can be interpreted in two different ways:
one possibility is that the boy was wearing jeans and he kicked
the ball while he was wearing them. Another possibility is that
the ball was inside the boy’s jeans, and he kicked the ball out
of his jeans.
Syntactic Ambiguity
❏ A computer or NLP system must carefully analyze the structure to
figure out which interpretation is correct, based on the context.
Semantic Ambiguity
❏ Semantic ambiguity occurs when a sentence has more than one
possible meaning because of how the words are combined. This
type of ambiguity makes it unclear what the sentence is truly trying
to say.
❏ For example, take the sentence, “Visiting relatives can be
annoying.” This sentence could be understood in two different
ways. One meaning could be that relatives who are visiting
you are annoying, implying that the relatives themselves cause
annoyance. Another meaning could be that the act of visiting
relatives is what is annoying, suggesting that the experience of
going to see relatives is unpleasant.
Semantic Ambiguity
❏ The confusion comes from how the words "visiting relatives" can
be interpreted: is it about the relatives who are visiting, or is it
about the action of visiting? In cases like this, semantic ambiguity
makes it hard to immediately understand the exact meaning of the
sentence, and the context is needed to clarify it.
Pragmatic Ambiguity
❏ Pragmatic ambiguity occurs when the meaning of a sentence
depends on the speaker’s intent, tone, or the situation in which
it is said. This type of ambiguity is common in everyday
conversations, and it can be tricky for computers to understand
because it often requires knowing the broader context.
❏ For example, consider the sentence, “Can you open the
window?” In one situation, it could be understood as a literal
question asking if the person is physically able to open the
window. However, in another context, it could be a polite
request, where the speaker is asking the listener to open the
window, even though they’re not directly giving an order.
Pragmatic Ambiguity
❏ The meaning changes based on the tone of voice or social
context, which is something that is difficult for NLP systems to
capture without understanding the surrounding situation
Referential Ambiguity
❏ Referential ambiguity occurs when a pronoun (like "he," "she," "it,"
or "they") or a phrase is unclear about what or who it is referring
to. This type of ambiguity happens when the sentence doesn’t
provide enough information to determine which person, object, or
idea the pronoun is referring to.
Referential Ambiguity
❏ For example, consider the sentence, “Alice told Jane that she
would win the prize.” In this case, it’s unclear whether the pronoun
"she" refers to Alice or Jane. Both could be possible interpretations,
and without further context, we can’t be sure. If the sentence was
about a competition, "she" could be referring to Alice, meaning Alice
is telling Jane that she would win the prize. However, it could also
mean that Alice is telling Jane that Jane would win the prize.
Ellipsis Ambiguity
❏ Ellipsis ambiguity happens when part of a sentence is left out,
making it unclear what the missing information is. This often
occurs in everyday conversation or writing when people try to be
brief and omit words that are understood from the context.
For example, consider the sentence, "John likes apples, and
Mary does too." The word "does" is a shortened form of "likes
apples," but it’s not explicitly stated. This creates two possible
interpretations:
Ellipsis Ambiguity
❏ Mary likes apples just like John, meaning both John and Mary enjoy
apples.
❏ Mary likes something else (not apples), and the sentence is leaving
out the specific thing she likes.
The ambiguity arises because it's unclear from the sentence whether
"does" refers to liking apples or something else.
Addressing Ambiguity in NLP
To address ambiguity in NLP, several methods are used to accurately
interpret language.
❏ Contextual analysis is one of the key approaches, where
surrounding words and context help determine the correct
meaning of a word or phrase.
❏ Word sense disambiguation (WSD) resolves lexical ambiguity by
using context to identify which meaning of a word is being used.
Addressing Ambiguity in NLP
❏ Parsing and syntactic analysis help resolve syntactic ambiguity
by breaking down sentence structures to understand different
grammatical interpretations.
❏ Coreference resolution is used to clarify what pronouns or
phrases refer to, solving referential ambiguity.
Addressing Ambiguity in NLP
❏ Discourse and pragmatic modeling help capture speaker intent
and the social context, which resolves pragmatic ambiguity.
❏ Machine learning and deep learning techniques, like BERT and
GPT, leverage large datasets to learn language patterns, aiding in
resolving ambiguity.