0% found this document useful (0 votes)

26 views107 pages

NLP Course Overview and Key Concepts

The document outlines the course CSA4006: Natural Language Processing at VIT Bhopal University, detailing key components of NLP such as text preprocessing, syntax and parsing, named entity recognition, sentiment analysis, machine translation, and more. It also introduces tools and libraries for beginners in NLP, including NLTK, spaCy, and Hugging Face Transformers, and discusses the role of regular expressions in text processing. The document serves as a comprehensive guide for understanding the fundamentals and applications of NLP.

Uploaded by

Shiv Mehrotra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views107 pages

NLP Course Overview and Key Concepts

Uploaded by

Shiv Mehrotra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CSA4006: Natural

Language Processing
Winter Semester 2024-25
SCOPE
VIT Bhopal University, Sehore
Slot:
Class No.:
1
Instructor Contacts
● Instructor:
Dr. Amit Kumar
AI Researcher- Generative AI and NLP
Assistant Professor

Oﬃce: C-526
amitkumar@[Link]

2
Module-1 and Some parts of Module-3

3
Natural Language Processing
❏ Natural Language Processing (NLP) is a subﬁeld of artiﬁcial
intelligence (AI) focused on enabling machines to understand,
interpret, and generate human language.

❏ It involves both computational linguistics and machine learning

techniques to process and analyze large amounts of natural
language data.
Key components of NLP
❏ Text Preprocessing: This involves cleaning and preparing text data, which may
include tokenization (breaking down text into words or sentences), stemming
(reducing words to their root form), and lemmatization (grouping diﬀerent
forms of a word).

Example: Break a sentence into individual words or tokens.

Task: Given the sentence, "Natural language processing is fun," break it down
into a list of words (tokens).
Output: ["Natural", "language", "processing", "is", "fun"]
Key components of NLP (Continue..)
❏ Syntax and Parsing: NLP systems analyze the structure of sentences, including
parts of speech (e.g., noun, verb, adjective) and grammatical relationships.
Parsing helps understand sentence structure and how diﬀerent parts of a
sentence relate to each other.
Example: Identify parts of speech in a sentence, such as nouns, verbs, adjectives, etc.
Task: For the sentence "The dog runs fast," identify the POS tags.
Output: [('The', 'DT'), ('dog', 'NN'), ('runs', 'VBZ'), ('fast', 'RB')]

● DT = determiner
● NN = noun, singular
● VBZ = verb, 3rd person singular present
● RB = adverb
Key components of NLP (Continue..)
❏ Named Entity Recognition (NER): NER involves identifying proper names, such
as names of people, organizations, locations, dates, and other entities within a
text.

Example: Identify named entities in a text, such as names of people, organizations,

locations, etc.
Task: Extract named entities from the sentence "Barack Obama was born in Hawaii."
Output:
● Person: Barack Obama
● Location: Hawaii
Key components of NLP (Continue..)
❏ Sentiment Analysis: This technique assesses the emotional tone behind a body
of text, classifying it as positive, negative, or neutral. It is often used in social
media monitoring, reviews, and customer feedback analysis.

● Example: Determine the sentiment (positive, negative, or neutral) of a given text.

● Task: Given the sentence, "I love this phone! It's amazing," classify the sentiment.
● Output: Positive

Why it's useful: Sentiment analysis is widely used for customer reviews, social media
monitoring, and understanding public opinion.
Key components of NLP (Continue..)
❏ Machine Translation: NLP models can automatically translate text from one
language to another, leveraging techniques such as statistical methods and
neural networks.

● Example: Translate a sentence from one language to another.

● Task: Translate the sentence "सभी को नमस्कार" (Hindi) to English.
● Output: Hello everyone

Why it's useful: Machine translation is a powerful tool for breaking down language barriers
in communication and is the foundation of tools like Google Translate.
Key components of NLP (Continue..)
❏ Question Answering (QA): NLP systems can interpret and respond to questions
posed in natural language, often by extracting answers from a corpus of
documents.
Key components of NLP (Continue..)
Example 1: Extractive Question Answering

In this type of QA, the answer is directly taken from the provided context.

Context:
"Albert Einstein was a theoretical physicist who developed the theory of relativity. He was born in Ulm, Germany,
in 1879 and is regarded as one of the most inﬂuential scientists of the 20th century."

Question:
"Where was Albert Einstein born?"

Answer:
"Ulm, Germany"

Explanation: The system identiﬁes the location "Ulm, Germany" in the context and returns it as the
answer.
Key components of NLP (Continue..)
Example 2: Generative Question Answering

In generative QA, the answer is generated based on understanding the question and the context, rather than being directly
extracted.

Context:
"Leonardo da Vinci was a Renaissance artist, inventor, and polymath, known for masterpieces like the 'Mona Lisa' and 'The Last Supper.'
He also made contributions to various ﬁelds such as engineering, anatomy, and architecture."

Question:
"What was Leonardo da Vinci famous for?"

Answer:
"Leonardo da Vinci was famous for his works of art like the 'Mona Lisa' and 'The Last Supper.'"

Explanation: The system generates a response based on a broader understanding of the context, rather than simply extracting
a phrase from the text.
Key components of NLP (Continue..)
❏ Text Generation: This includes generating human-like text based on a given
prompt. Modern models like GPT (Generative Pretrained Transformer) use deep
learning to produce coherent and contextually appropriate responses.

Example: Generate new text based on a given prompt.

● Task: Given the prompt "Once upon a time," generate the next part of the story.
● Output: Once upon a time, there was a small village surrounded by beautiful
mountains.
Why it's useful: Text generation is used in applications like story writing, chatbots, and
creative writing tools.
Key components of NLP (Continue..)
❏ Speech Recognition: NLP is also used in speech-to-text systems, which convert
spoken language into written text. This technology underpins virtual assistants
like Siri, Google Assistant, and Alexa.

❏ Language Modeling: Language models predict the likelihood of a sequence of

words occurring in a given context, allowing for more accurate speech and text
generation.
Key components of NLP (Continue..)
❏ Word Frequency Analysis:
● Example: Analyze the frequency of words in a document.
● Task: Count the frequency of words in the sentence "I love ice cream and I eat ice cream often."
● Output:
○ "I": 2
○ "love": 1
○ "ice": 2
○ "cream": 2
○ "and": 1
○ "eat": 1
○ "often": 1

Why it's useful: Word frequency analysis helps in identifying important terms and can be useful in
document clustering or topic modeling.
Key components of NLP (Continue..)
❏ Word Embeddings
● Example: Represent words as vectors (numerical values) in high-dimensional
space.
● Task: Use a pre-trained word embedding (like Word2Vec or GloVe) to ﬁnd
similar words to "king."
● Output: Words like queen, prince, monarch, etc., which have similar meanings.

Why it's useful: Word embeddings capture semantic meaning and relationships
between words, enabling applications like recommendation systems and question
answering.
Tools and Libraries for Beginner
❏ NLTK (Natural Language Toolkit): A popular Python library for working with
human language data.

❏ spaCy: An NLP library that's eﬃcient and easy to use for tasks like POS tagging,
NER, and dependency parsing.

❏ TextBlob: A simpler NLP library good for beginners, with built-in functions for
sentiment analysis, translation, and part-of-speech tagging.

❏ Hugging Face Transformers: A more advanced library with pre-trained models

for tasks like text generation and machine translation.
Regular Expressions
Regular expressions are used everywhere
❏ Part of every text processing task
❏ Not a general NLP solution (for that we use large NLP systems
we will see in later lectures)
❏ But very useful as part of those systems (e.g., for
pre-processing or text formatting)
❏ Necessary for data analysis of text data
❏ A widely used tool in industry and academics
Regular expressions
A formal language for specifying text strings
How can we search for mentions of these cute animals in text?
❏ woodchuck
❏ woodchucks
❏ Woodchuck
❏ Woodchucks
❏ Groundhog
❏ groundhogs
Regular Expressions: Disjunctions
Letters inside square brackets []

Ranges using the dash [A-Z]

Regular Expressions: Negation in Disjunction
❏Carat as ﬁrst character in [ ] negates the list
❏ Note: Carat means negation only when it's ﬁrst in []
❏ Special characters (., *, +, ?) lose their special meaning inside []
Regular Expressions: Convenient aliases
Regular Expressions: More Disjunction
❏Groundhog is another name for woodchuck!
❏The pipe symbol | for disjunction
❏
Wildcards, optionality, repetition: . ? * +
Regular Expressions: Anchors ^ $
A note about Python regular expressions
❏ Regex and Python both use backslash "\" for special characters.
You must type extra backslashes!
❏ "\\d+" to search for 1 or more digits
❏ "\n" in Python means the "newline" character, not a "slash"
followed by an "n". Need "\\n" for two characters.
❏ Instead: use Python's raw string notation for regex:
❏ r"[tT]he"
❏ r"\d+" matches one or more digits
❏ instead of "\\d+"
The iterative process of writing regex's
Find me all instances of the word “the” in a text.

the
Misses capitalized examples

[tT]he
Incorrectly returns other or Theology

\W[tT]he\W
False positives and false negatives
The process we just went through was based on
fixing two kinds of errors:
1. Not matching things that we should have matched (The)
False negatives

2. Matching strings that we should not have matched

(there, then, other)
False positives
Characterizing work on NLP
In NLP we are always dealing with these kinds of
errors.
Reducing the error rate for an application often
involves two antagonistic efforts:
❏ Increasing coverage (or recall) (minimizing false
negatives).
❏ Increasing accuracy (or precision) (minimizing false
positives)
Regular expressions play a surprisingly large
role
Widely used in both academics and industry
1. Part of most text processing tasks, even for big
neural language model pipelines
◦ including text formatting and pre-processing
2. Very useful for data analysis of any text data
{m, n} – Braces
❏ Braces match any repetitions preceding regex from m to n
both inclusive.
❏ For example –
❏ a{2, 4} will be matched for the string aaab, baaaac, gaad,
but will not be matched for strings like abc, bc because
there is only one a or no a in both the cases.
31
(<regex>) – Group
❏ Group symbol is used to group sub-patterns.

❏ For example –
❏ (a|b)cd will match for strings like acd, abcd,
gacd, etc.
32
Special Sequences
❏ Special sequences do not match for the actual
character in the string instead it tells the speciﬁc
location in the search string where the match
must occur.
❏ It makes it easier to write commonly used
patterns.
33
List of special sequences

34
List of special sequences (Continue..)

35
36
RegEx Functions
❏ re module contains many functions that help us
to search a string for a match.

37
38
re.ﬁndall()
❏ Return all non-overlapping matches of pattern in string, as a list of strings.
❏ The string is scanned left-to-right, and matches are returned in the order found.
import re
string = """Hello my Number is 123456789 and
my friend's number is 987654321"""
regex = '\d+'

match = re.ﬁndall(regex, string)

print(match)

39
[Link]()
❏ Regular expressions are compiled into pattern objects, which have
methods for various operations such as searching for pattern matches
or performing string substitutions.
import re
p = [Link]('[a-e]')

print(p.ﬁndall("Aye, said Mr. Gibenson Stark"))

40
❏ Metacharacter backslash ‘\’ has a very important
role as it signals various sequences.
❏ If the backslash is to be used without its special
meaning as metacharacter, use ’\\’
41
❏ Set class [\s,.] will match any whitespace character, ‘,’, or, ‘.’ .

import re
p = [Link]('\d')
print(p.ﬁndall("I went to him at 11 A.M. on 4th July 1886"))

p = [Link]('\d+')
print(p.ﬁndall("I went to him at 11 A.M. on 4th July 1886"))
42
import re

p = [Link]('\w')
print(p.ﬁndall("He said * in some_lang."))

p = [Link]('\w+')
print(p.ﬁndall("I went to him at 11 A.M., he \
43
said *** in some_language."))

p = [Link]('\W')
print(p.ﬁndall("he said *** in some_language."))
import re
p = [Link]('ab*')
print(p.ﬁndall("ababbaabbb"))

44
[Link]()
❏ Split string by the occurrences of a character or a
pattern, upon ﬁnding that pattern, the remaining
characters from the string are returned as part of
the resulting list.

❏ [Link](pattern, string, maxsplit=0, ﬂags=0)

45
❏ The First parameter, pattern denotes the regular expression
❏ string is the given string in which pattern will be searched for and in
which splitting occurs,
❏ maxsplit if not provided is considered to be zero ‘0’, and if any nonzero
value is provided, then at most that many splits occur. If maxsplit = 1,
then the string will split once only, resulting in a list of length 2.
❏ The ﬂags are very useful and can help to shorten code, they are not
necessary parameters, eg: ﬂags = [Link], in this split, the case,
i.e. the lowercase or the uppercase will be ignored.

46
from re import split

print(split('\W+', 'Words, words , Words'))

print(split('\W+', "Word's words Words"))
print(split('\W+', 'On 12th Jan 2016, at 11:02 AM'))
print(split('\d+', 'On 12th Jan 2016, at 11:02 AM'))

47
import re
print([Link]('\d+', 'On 12th Jan 2016, at 11:02 AM', 1))
print([Link]('[a-f]+', 'Aey, Boy oh boy, come here',
ﬂags=[Link]))
print([Link]('[a-f]+', 'Aey, Boy oh boy, come here'))

48
[Link]()
❏ The ‘sub’ in the function stands for SubString, a certain
regular expression pattern is searched in the given
string(3rd parameter), and upon ﬁnding the substring
pattern is replaced by repl(2nd parameter), count checks
and maintains the number of times this occurs.

❏ [Link](pattern, repl, string, count=0, ﬂags=0)

49
[Link]()
import re
print([Link]('ub', '~*', 'Subject has Uber booked already',
flags=[Link]))
print([Link]('ub', '~*', 'Subject has Uber booked already'))
print([Link]('ub', '~*', 'Subject has Uber booked already',
count=1, flags=[Link]))
50
print([Link](r'\sAND\s', ' & ', 'Baked Beans And Spam',
flags=[Link]))
[Link]()
❏ This method either returns None (if the pattern doesn’t
match), or a [Link] contains information about
the matching part of the string.
❏ This method stops after the first match, so this is best
suited for testing a regular expression more than
extracting data.
❏ Example: Searching for an occurrence of the pattern

51
[Link]()
import re
regex = r"([a-zA-Z]+) (\d+)"

match = [Link](regex, "I was born on June 24")

if match != None:
print ("Match at index %s, %s" % ([Link](), [Link]()))
52
print ("Full match: %s" % ([Link](0)))
print ("Month: %s" % ([Link](1)))
print ("Day: %s" % ([Link](2)))
Match Object
❏ A Match object contains all the information about the
search and the result and if there is no match found then
None will be returned.

1. Getting the string and the regex

[Link] attribute returns the regular expression passed
and [Link] attribute returns the string passed.
53
Getting the string and the regex
import re
s = "Welcome to Class"
res = [Link](r"\bC", s)

print([Link])
print([Link])

54
Getting index of matched object
❏ start() method returns the starting index of the matched
substring
❏ end() method returns the ending index of the matched substring
❏ span() method returns a tuple containing the starting and the
ending index of the matched substring

55
Getting index of matched object
import re

s = "Welcome to Class"

res = [Link](r"\bCla", s)
56

print([Link]())
print([Link]())
Getting matched substring
❏ group() method returns the part of the string for which the patterns
match.

import re
s = "Welcome to Class"
res = [Link](r"\D{2} t", s)
print([Link]())

57
Alan Turing (1912-1954)
(A pioneer of automata theory)
■ Father of Modern Computer Science
■ English mathematician
■ Studied abstract machines called
Turing machines even before computers
existed
■ Heard of the Turing test?
Languages & Grammars
■ Languages: “A language is a
collection of sentences of ﬁnite
length all constructed from a
ﬁnite alphabet of symbols”

■ Grammars: “A grammar can be

regarded as a device that
enumerates the sentences of a
language” - nothing more,
nothing less
■ N. Chomsky, Information and
Control, Vol 2, 1959

Image source: Nowak et al. Nature, vol 417, 2002

Alphabet
An alphabet is a finite, non-empty set of
symbols
■ We use the symbol ∑ (sigma) to denote an
alphabet
■ Examples:
■ Binary: ∑ = {0,1}
■ All lower case letters: ∑ = {a,b,c,..z}
■ Alphanumeric: ∑ = {a-z, A-Z, 0-9}
■ DNA molecule letters: ∑ = {a,c,g,t}
Strings
A string or word is a finite sequence of symbols chosen
from ∑
■ Empty string is ε (or “epsilon”)

■ Length of a string w, denoted by “|w|”, is equal to the

number of (non- ε) characters in the string
■ E.g., x = 010100 |x| = 6
■ x = 01 ε 0 ε 1 ε 00 ε |x| = ?

■ xy = concatentation of two strings x and y

Powers of an alphabet
Let ∑ be an alphabet.

■ ∑k = the set of all strings of length k

■ ∑* = ∑0 U ∑1 U ∑2 U …

■ ∑+ = ∑ 1 U ∑ 2 U ∑ 3 U …
Languages
L is a said to be a language over alphabet ∑, only if L ⊆ ∑*
🡺 this is because ∑* is the set of all strings (of all possible length including 0) over
the given alphabet ∑
Examples:
1. Let L be the language of all strings consisting of n 0’s followed by n 1’s:
L = {ε, 01, 0011, 000111,…}
2. Let L be the language of all strings of with equal number of 0’s and 1’s:
L = {ε, 01, 10, 0011, 1100, 0101, 1010, 1001,…}

Definition: Ø denotes the Empty language

■ Let L = {ε}; Is L=Ø?

NO
The Membership Problem
Given a string w ∈∑*and a language L over ∑, decide
whether or not w ∈L.

Example:
Let w = 100011
Q) Is w ∈ the language of strings with equal
number of 0s and 1s?
Finite Automata : Examples
■ On/Off switch

■ Modeling recognition of the word “then”

Finite Automaton (FA)
■ Informally, a state diagram that comprehensively captures all
possible states and transitions that a machine can take while
responding to a stream or sequence of input symbols
■ Recognizer for “Regular Languages”

■ Deterministic Finite Automata (DFA)

■ The machine can exist in only one state at any given time
■ Non-deterministic Finite Automata (NFA)
■ The machine can exist in multiple states at the same time
Deterministic Finite Automata - Definition
■ A Deterministic Finite Automaton (DFA) consists of:
■ Q ==> a finite set of states
■ ∑ ==> a finite set of input symbols (alphabet)
■ q0 ==> a start state
■ F ==> set of accepting states
■ δ ==> a transition function, which is a mapping between Q x ∑
==> Q
■ A DFA is defined by the 5-tuple:
■ {Q, ∑ , q0,F, δ }
What does a DFA do on reading an input string?
■ Input: a word w in ∑*
■ Question: Is w acceptable by the DFA?
■ Steps:
■ Start at the “start state” q0
■ For every input symbol in the sequence w do
■ Compute the next state from the current state, given the
current input symbol in w and the transition function
■ If after all symbols in w are consumed, the current state is one
of the accepting states (F) then accept w;
■ Otherwise, reject w.
Regular Languages
■ Let L(A) be a language recognized by a DFA A.
■ Then L(A) is called a “Regular Language”.
Example #1
■ Build a DFA for the following language:
■ L = {w | w is a binary string that contains 01 as a substring}
■ Steps for building a DFA to recognize L:
■ ∑ = {0,1}
■ Decide on the states: Q
■ Designate start state and final state(s)
■ δ: Decide on the transitions:
■ “Final” states == same as “accepting states”
■ Other states == same as “non-accepting states”
Context-Free Languages (CFL)
❏ A language class larger than the class of regular languages
❏ Supports natural, recursive notation called “context-free grammar”
❏ Applications:
❏ Parse trees, compilers
❏ XML
An Example
❏ A palindrome is a word that reads identical from both ends
❏ E.g., madam, redivider, malayalam, 010010010
❏ But the language of palindromes…
❏ is a CFL, because it supports recursive substitution (in the form
of a Context Free Grammar (CFG))
But the language of palindromes is a CFL, because
❏ This is because we can construct a “grammar” like this:
❏ A⇒ε
❏ A ==> 0
❏ A ==> 1
❏ A ==> 0A0
❏ A ==> 1A1
How does the CFG for palindromes work?
❏ An input string belongs to the language (i.e., accepted) iff it can be
generated by the CFG

❏ Example: w=01110
❏ G can generate w as follows:

❏ A => 0A0
=> 01A10
=> 01110
Context-Free Grammar: Deﬁnition
A context-free grammar G=(V,T,P,S), where:
❏ V: set of variables or non-terminals
❏ T: set of terminals (= alphabet U {ε})
❏ P: set of productions, each of which is of the form
❏ V ==> 𝜶1 | 𝜶2 | …
❏ Where each 𝜶i is an arbitrary string of variables and terminals
❏ S ==> start variable
Example #2
❏ Language of balanced paranthesis
❏ e.g., ()(((())))((()))….
❏ CFG?
Example #3
❏ A grammar for L = {0m1n | m≥n}

❏ CFG?
CFG conventions
String membership
Simple Expressions…
Generalization of derivation
Context-Free Language
Ambiguity in NLP
❏ Ambiguity in Natural Language Processing (NLP) happens because
human language can have multiple meanings.

❏ Computers sometimes confuse to understand exactly what we

mean unlike humans, who can use intuition and background
knowledge to infer meaning, computers rely on precise algorithms
and statistical patterns.
Ambiguity
The sentence "The chicken is ready to eat" is ambiguous because it
can be interpreted in two diﬀerent ways:
1. The chicken is cooked and ready to be eaten.
2. The chicken is hungry and ready to eat food.

This dual meaning arises from the structure of the sentence, which
does not clarify the subject's role (the eater or the one being eaten).
Resolving such ambiguities is essential for accurate NLP applications
like chatbots, translation, and sentiment analysis.
Types of Ambiguity in NLP
❏ The meaning of an ambiguous expression often depends on the
situation, prior knowledge, or surrounding words.
❏ For example: He is cool.
❏ This could mean he is calm under pressure or he is fashionable
depending on the context.
Lexical Ambiguity
❏ Lexical ambiguity occurs when a single word has multiple
meanings, making it unclear which meaning is intended in a
particular context. This is a common challenge in language.

❏ For example, the word "bat" can have two different meanings.
It could refer to a flying mammal, like the kind you might see
at night. Alternatively, "bat" could also refer to a piece of
sports equipment used in games like baseball or cricket.
Lexical Ambiguity
❏ For computers, determining the correct meaning of such a word
requires looking at the surrounding context to decide which
interpretation makes sense.
Syntactic Ambiguity
❏ Syntactic ambiguity occurs when the structure or grammar of a
sentence allows for more than one interpretation. This happens
because the sentence can be understood in different ways
depending on how it is put together.
❏ For example, take the sentence, “The boy kicked the ball in his
jeans.” This sentence can be interpreted in two different ways:
one possibility is that the boy was wearing jeans and he kicked
the ball while he was wearing them. Another possibility is that
the ball was inside the boy’s jeans, and he kicked the ball out
of his jeans.
Syntactic Ambiguity
❏ A computer or NLP system must carefully analyze the structure to
figure out which interpretation is correct, based on the context.
Semantic Ambiguity
❏ Semantic ambiguity occurs when a sentence has more than one
possible meaning because of how the words are combined. This
type of ambiguity makes it unclear what the sentence is truly trying
to say.
❏ For example, take the sentence, “Visiting relatives can be
annoying.” This sentence could be understood in two different
ways. One meaning could be that relatives who are visiting
you are annoying, implying that the relatives themselves cause
annoyance. Another meaning could be that the act of visiting
relatives is what is annoying, suggesting that the experience of
going to see relatives is unpleasant.
Semantic Ambiguity
❏ The confusion comes from how the words "visiting relatives" can
be interpreted: is it about the relatives who are visiting, or is it
about the action of visiting? In cases like this, semantic ambiguity
makes it hard to immediately understand the exact meaning of the
sentence, and the context is needed to clarify it.
Pragmatic Ambiguity
❏ Pragmatic ambiguity occurs when the meaning of a sentence
depends on the speaker’s intent, tone, or the situation in which
it is said. This type of ambiguity is common in everyday
conversations, and it can be tricky for computers to understand
because it often requires knowing the broader context.
❏ For example, consider the sentence, “Can you open the
window?” In one situation, it could be understood as a literal
question asking if the person is physically able to open the
window. However, in another context, it could be a polite
request, where the speaker is asking the listener to open the
window, even though they’re not directly giving an order.
Pragmatic Ambiguity
❏ The meaning changes based on the tone of voice or social
context, which is something that is difficult for NLP systems to
capture without understanding the surrounding situation
Referential Ambiguity
❏ Referential ambiguity occurs when a pronoun (like "he," "she," "it,"
or "they") or a phrase is unclear about what or who it is referring
to. This type of ambiguity happens when the sentence doesn’t
provide enough information to determine which person, object, or
idea the pronoun is referring to.
Referential Ambiguity
❏ For example, consider the sentence, “Alice told Jane that she
would win the prize.” In this case, it’s unclear whether the pronoun
"she" refers to Alice or Jane. Both could be possible interpretations,
and without further context, we can’t be sure. If the sentence was
about a competition, "she" could be referring to Alice, meaning Alice
is telling Jane that she would win the prize. However, it could also
mean that Alice is telling Jane that Jane would win the prize.
Ellipsis Ambiguity
❏ Ellipsis ambiguity happens when part of a sentence is left out,
making it unclear what the missing information is. This often
occurs in everyday conversation or writing when people try to be
brief and omit words that are understood from the context.
For example, consider the sentence, "John likes apples, and
Mary does too." The word "does" is a shortened form of "likes
apples," but it’s not explicitly stated. This creates two possible
interpretations:
Ellipsis Ambiguity
❏ Mary likes apples just like John, meaning both John and Mary enjoy
apples.
❏ Mary likes something else (not apples), and the sentence is leaving
out the specific thing she likes.

The ambiguity arises because it's unclear from the sentence whether
"does" refers to liking apples or something else.
Addressing Ambiguity in NLP
To address ambiguity in NLP, several methods are used to accurately
interpret language.
❏ Contextual analysis is one of the key approaches, where
surrounding words and context help determine the correct
meaning of a word or phrase.
❏ Word sense disambiguation (WSD) resolves lexical ambiguity by
using context to identify which meaning of a word is being used.
Addressing Ambiguity in NLP
❏ Parsing and syntactic analysis help resolve syntactic ambiguity
by breaking down sentence structures to understand diﬀerent
grammatical interpretations.
❏ Coreference resolution is used to clarify what pronouns or
phrases refer to, solving referential ambiguity.
Addressing Ambiguity in NLP
❏ Discourse and pragmatic modeling help capture speaker intent
and the social context, which resolves pragmatic ambiguity.
❏ Machine learning and deep learning techniques, like BERT and
GPT, leverage large datasets to learn language patterns, aiding in
resolving ambiguity.

NLP Unit1 Notes
No ratings yet
NLP Unit1 Notes
33 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
16 pages
3170723_NLP_7
No ratings yet
3170723_NLP_7
75 pages
NLP_Unit-1
No ratings yet
NLP_Unit-1
9 pages
NLP Overview: Language Modeling & Tasks
No ratings yet
NLP Overview: Language Modeling & Tasks
41 pages
Unit 2 Updated.pptx
No ratings yet
Unit 2 Updated.pptx
116 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
4 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
26 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
20 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
39 pages
Tokenization in Natural Language Processing
No ratings yet
Tokenization in Natural Language Processing
179 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
20 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
7 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
43 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
14 pages
Components of Natural Language Processing
No ratings yet
Components of Natural Language Processing
88 pages
NLP Fundamentals and Techniques Overview
No ratings yet
NLP Fundamentals and Techniques Overview
55 pages
Comprehensive Guide to Natural Language Processing
No ratings yet
Comprehensive Guide to Natural Language Processing
86 pages
Natural Language Processing Textbook
No ratings yet
Natural Language Processing Textbook
28 pages
Module I
No ratings yet
Module I
196 pages
NLP Unit 1: Overview & Word Analysis
No ratings yet
NLP Unit 1: Overview & Word Analysis
20 pages
NLP Notes: Key Concepts & Challenges
No ratings yet
NLP Notes: Key Concepts & Challenges
13 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
65 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
3 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
6 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
8 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
3 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
65 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
9 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
22 pages
Mastering Natural Language Processing
No ratings yet
Mastering Natural Language Processing
88 pages
NLP: Subfields and Applications
No ratings yet
NLP: Subfields and Applications
23 pages
History and Evolution of NLP
No ratings yet
History and Evolution of NLP
26 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
47 pages
Natural Language Processing CIA 1 Notes
No ratings yet
Natural Language Processing CIA 1 Notes
139 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
21 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
120 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
35 pages
Unit 1
No ratings yet
Unit 1
102 pages
Unit 1 1 48
No ratings yet
Unit 1 1 48
48 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
22 pages
Techniques in Voice Assistant Communication
No ratings yet
Techniques in Voice Assistant Communication
31 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
90 pages
Unit IV
No ratings yet
Unit IV
44 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
23 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
6 pages
NLP Techniques: Tokenization and Models
No ratings yet
NLP Techniques: Tokenization and Models
154 pages
History and Techniques of NLP
No ratings yet
History and Techniques of NLP
46 pages
NLP Advancements and Applications
No ratings yet
NLP Advancements and Applications
9 pages
NLP Lecture 3
No ratings yet
NLP Lecture 3
14 pages
NLP: Understanding NLU and NLG
No ratings yet
NLP: Understanding NLU and NLG
29 pages
Natural Language Processing Course Overview
No ratings yet
Natural Language Processing Course Overview
29 pages
NLP Data Preprocessing Techniques
No ratings yet
NLP Data Preprocessing Techniques
35 pages
Deep Learning in Natural Language Processing
No ratings yet
Deep Learning in Natural Language Processing
10 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
28 pages
Understanding N-Grams in NLP
No ratings yet
Understanding N-Grams in NLP
14 pages
unitINLP
No ratings yet
unitINLP
15 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
90 pages
Understanding NLP and Large Language Models
No ratings yet
Understanding NLP and Large Language Models
8 pages
Logical Representations in Semantics
No ratings yet
Logical Representations in Semantics
24 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
93 pages
Understanding POS Tagging in NLP
No ratings yet
Understanding POS Tagging in NLP
43 pages
Machine Translation Overview and Challenges
No ratings yet
Machine Translation Overview and Challenges
22 pages
Synchronous Context-Free Grammars Explained
No ratings yet
Synchronous Context-Free Grammars Explained
10 pages
IRCTC Train Booking Confirmation Details
No ratings yet
IRCTC Train Booking Confirmation Details
1 page
Pre-Colonial Filipino Music and Literature
No ratings yet
Pre-Colonial Filipino Music and Literature
9 pages
Where the Little Birds Go: A Novel
No ratings yet
Where the Little Birds Go: A Novel
209 pages
Dorian Iten's Smooth Shading Techniques
No ratings yet
Dorian Iten's Smooth Shading Techniques
8 pages
Materiality in EU Sustainability Reporting
No ratings yet
Materiality in EU Sustainability Reporting
29 pages
Data Modeling Made Simple with Embarcadero ER Studio Data Architect Adapting to Agile Data Modeling in a Big Data World 2nd Edition Steve Hoberman ebook testbank solutions long form chapters
100% (5)
Data Modeling Made Simple with Embarcadero ER Studio Data Architect Adapting to Agile Data Modeling in a Big Data World 2nd Edition Steve Hoberman ebook testbank solutions long form chapters
87 pages
Susan Rodgers: Administrative Professional Resume
No ratings yet
Susan Rodgers: Administrative Professional Resume
4 pages
Cadence Silicon Encounter Guide
No ratings yet
Cadence Silicon Encounter Guide
19 pages
BITSAT 2023 Logical Reasoning Question Bank
No ratings yet
BITSAT 2023 Logical Reasoning Question Bank
37 pages
Time Value of Money Explained
100% (2)
Time Value of Money Explained
28 pages
IELTS Listening Midterm Test 2 Guide
No ratings yet
IELTS Listening Midterm Test 2 Guide
4 pages
Air Pollution and Plant Survey in Antoniadis Garden
No ratings yet
Air Pollution and Plant Survey in Antoniadis Garden
19 pages
Agriculture Sector: MSP and Inflation Insights
No ratings yet
Agriculture Sector: MSP and Inflation Insights
38 pages
Inverter and UPS Repair Course Overview
No ratings yet
Inverter and UPS Repair Course Overview
7 pages
Labor Cost Accounting Overview
No ratings yet
Labor Cost Accounting Overview
18 pages
GST Guide for Jewelers in India
100% (1)
GST Guide for Jewelers in India
11 pages
05 Hardware Identification
No ratings yet
05 Hardware Identification
17 pages
Effective Negotiation Skills for Success
No ratings yet
Effective Negotiation Skills for Success
3 pages
Copywriting to $10K Monthly Guide
No ratings yet
Copywriting to $10K Monthly Guide
3 pages
Optimizing Thyrocare's Supply Chain TAT
No ratings yet
Optimizing Thyrocare's Supply Chain TAT
7 pages
Power BI Overview and Bar Graph Guide
No ratings yet
Power BI Overview and Bar Graph Guide
33 pages
Manufacturing Process Audit Checklist
100% (10)
Manufacturing Process Audit Checklist
6 pages
Ayala Land Inc: Real Estate & Sustainability Insights
No ratings yet
Ayala Land Inc: Real Estate & Sustainability Insights
38 pages
B.E. VIII Semester Curriculum 2022-2023
No ratings yet
B.E. VIII Semester Curriculum 2022-2023
21 pages
Color Codes and Emotional Symbolism
No ratings yet
Color Codes and Emotional Symbolism
111 pages
Ciro Alegría Bazán: Life & Works
No ratings yet
Ciro Alegría Bazán: Life & Works
14 pages
Royal Heirs in Imperial Germany: The Future of Monarchy in Nineteenth-Century Bavaria, Saxony and Württemberg 1st Edition Frank Lorenz Müller (Auth.) Ebook Collectors Digital Edition
100% (5)
Royal Heirs in Imperial Germany: The Future of Monarchy in Nineteenth-Century Bavaria, Saxony and Württemberg 1st Edition Frank Lorenz Müller (Auth.) Ebook Collectors Digital Edition
55 pages
Centrifugal Assembly for Ingenio El Molino
No ratings yet
Centrifugal Assembly for Ingenio El Molino
3 pages
Sigil List and Traits Overview
No ratings yet
Sigil List and Traits Overview
8 pages
December 2024 Exam Results & Revaluation Info
No ratings yet
December 2024 Exam Results & Revaluation Info
1 page

NLP Course Overview and Key Concepts

Uploaded by

NLP Course Overview and Key Concepts

Uploaded by

CSA4006: Natural

❏ It involves both computational linguistics and machine learning

Example: Break a sentence into individual words or tokens.

Example: Identify named entities in a text, such as names of people, organizations,

● Example: Determine the sentiment (positive, negative, or neutral) of a given text.

● Example: Translate a sentence from one language to another.

Example: Generate new text based on a given prompt.

❏ Language Modeling: Language models predict the likelihood of a sequence of

❏ Hugging Face Transformers: A more advanced library with pre-trained models

Ranges using the dash [A-Z]

2. Matching strings that we should not have matched

match = re.ﬁndall(regex, string)

print(p.ﬁndall("Aye, said Mr. Gibenson Stark"))

❏ [Link](pattern, string, maxsplit=0, ﬂags=0)

print(split('\W+', 'Words, words , Words'))

❏ [Link](pattern, repl, string, count=0, ﬂags=0)

match = [Link](regex, "I was born on June 24")

1. Getting the string and the regex

■ Grammars: “A grammar can be

Image source: Nowak et al. Nature, vol 417, 2002

■ Length of a string w, denoted by “|w|”, is equal to the

■ xy = concatentation of two strings x and y

■ ∑k = the set of all strings of length k

Definition: Ø denotes the Empty language

■ Modeling recognition of the word “then”

■ Deterministic Finite Automata (DFA)

❏ Computers sometimes confuse to understand exactly what we

You might also like