0% found this document useful (0 votes)
33 views86 pages

Machine Translation and Sentiment Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • Named Entity Recognition,
  • Iterative Convergence,
  • Algorithm Implementation,
  • Text Processing,
  • Beam Search,
  • Abstractive Summarization,
  • Data Mining,
  • Customer Feedback,
  • Language Models,
  • Market Research
0% found this document useful (0 votes)
33 views86 pages

Machine Translation and Sentiment Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • Named Entity Recognition,
  • Iterative Convergence,
  • Algorithm Implementation,
  • Text Processing,
  • Beam Search,
  • Abstractive Summarization,
  • Data Mining,
  • Customer Feedback,
  • Language Models,
  • Market Research

CHAPTER-6

Applications
MRS. Priyanka Bhoir
Machine
Translation
Brief
history
• War-time use of computers in code breaking
• Warren Weaver’s memorandum 1949
• Big investment by US Government (mostly on
Russian-English)
• Early promise of FAHQT
• Fully automatic high quality translation

3/21
What we need
is:
Machine
Translation-MT
Definition:
“The translation process can be described simply as:
1. Decoding the meaning of the source text, and
2. Re-encoding this meaning in the target language.”
Decodin
g
• How to go from the T-matrix and A-matrix to a word
alignment?

• There are several


approaches…
A Translation
Matrix Rob Cat is Dog

Rob 1 0 0 0

Gato 0 1 0 0

es 0 0 .5 0

esta 0 0 .5 0

Perro 0 0 0 1
Building the Translation Matrix: Starting from
alignments

• Find the sentence alignment


• If a word in the source aligns with a word in
the target, then increment the translation
matrix.
• Normalize the translation matrix
Can’t find
alignments

• Most sentences in the hansards corpus are 60


words long. There are many that can be over 100.
• 100100 possible alignments
Iterative
Convergence
• Use Rob Is Tall boy
Estimation
Maximization Rob .66 .33 .25 .25
algorithm
• Creates es .30 .62 .25 .25
translation matrix
alto .02 .05 .5 0

nino .02 .05 0 .5


Countin
g Rob ladka hai.
• Rob is a boy.
• Eric is tall. Eric lamba hai.
• Rima will go to school Rima school jayegi.
… …
Base counts on co-occurrence, weighting based on sentence
length.
Distorting the
Sentence
• Word order changes between languages
• How is a sentence with 2 words
distorted?
• How is a sentence with 3 words
distorted?
• How is a sentence with …

To keep track of this information we use…


•(A quadruply nested default dictionary)
•This could be a problem if there are more
than 100 words in a sentence.
•100x100x100x100 = too big for RAM and
takes too much time
Viterb
i
• If only doing alignment, much smaller memory and
time requirements.
• Returns optimal path.

• T-Matrix probabilities function as the “emission”


matrix
• A-Matrix probabilities concerned with
the positioning of words
Greedy Hill
Climbing
• Best first search
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

• 2-step look ahead to avoid getting stuck in most probable


local maxima
Beam Search
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

• Optimization of Best First Search with heuristics and “beam”


of
choices
• Exponential tradeoff when increasing the “beam” width
Other Decoding
Methods
• Finite State Transducer
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

• Mapping between languages based on a finite


automaton
• Parsing
• String to Tree Model
Ideally

[Link]
5
Expected
Accuracy
70% overall
•Language performance
SENTIMENT
ANALYSIS
WHAT IS SENTIMENT
ANALYSIS?
• Sentiment analysis is the process of using natural language
processing, text analysis, and statistics to analyze
customer sentiment.
• Sentiment Analysis is also known as Opinion Mining.
• Sentiment Analysis is the domain of understanding these
emotions with software, and it’s a must-understand for
developers and business leaders in a modern workplace.
EXAMPL
E
User’s Opinions :
Sameer : It’s a great movie
(Positive statement)
Neha : Nah!! I didn’t like it at all
(Negative statement)
Mayur : The new iOS7 is
awesome..!!!
(Positive statement)
NEED OF SENTIMENT
ANALYSIS
• Rapid Growth of available text on the
internet
• To make decisions
• Web 2.0
APPLICATION
S
• Businesses and Organizations :
• Brand analysis
• New product perception
• Product and Service benchmarking
• Business spends a huge amount of money to find consumer sentiments
and opinions.
• Individuals : Interested in other's opinions
• Purchasing a product or using a service
• Finding opinions on political topics ,movies,etc.
APPROAC
H
•NLP
• Use semantics to understand the language.
• Uses SentiWordNet
•Machine Learning
• Don’t have to understand the meaning .
• Uses classifiers such as Naïve Byes, SVM, etc.
IMPLEMENTATIO
N
ADVANTAGE
S
• A lower cost than traditional methods of getting
customer insight.
• A faster way of getting insight from customer data.
• The ability to act on customer suggestions.
• Identifies an organisation's Strengths, Weaknesses,
Opportunities & Threats (SWOT Analysis) .
• As 80% of all data in a business consists of words, the
Sentiment Engine is an essential tool for making sense of it all.
• More accurate and insightful customer perceptions and
feedback.
CONCLUSIO
N
• We have seen that Sentiment Analysis can be used for
analyzing opinions in blogs, articles, Product reviews, Social
Media websites, Movie-review websites where a third person
narrates his views. We also studied NLP and Machine Learning
approaches for Sentiment Analysis. We have seen that is easy
to implement Sentiment Analysis via SentiWordNet approach
than via Classier approach. We have seen that sentiment
analysis has many applications and it is important field to
study. Sentiment analysis has Strong commercial interest
because Companies want to know how their products are
being perceived and also Prospective consumers want to
know what existing users think.
Sentiment
Analysis
Jignesh Vipul Vaidya,
155
Inde
x
Introduction: What is Sentiment
Analysis?
Sentiments are feelings, opinions, emotions, likes/dislikes, good/bad
Sentiment Analysis is a Natural Language Processing and Information
Extraction task that aims to obtain writer’s feelings expressed in positive
or negative comments, questions and requests, by analyzing a large
numbers of documents.
Sentiment Analysis is a study of human behavior in which we extract user
opinion and emotion from plain text. Sentiment Analysis is also known as
Opinion Mining..
Introduction: What is Sentiment
Analysis?
It is a task of identifying whether the opinion expressed in a text
is positive or negative.

Automatically extracting opinions, emotions and sentiments in text.


Language-independent technology that understand the meaning of the
text. It identifies the opinion or attitude that a person has towards a
topic or an object
Exampl
e
Exampl
e
Exampl
e
Exampl
e
Application
s
● Businesses and Organizations:
○ Brand analysis
○ New product perception
○ Product and Service benchmarking

Business spends a huge amount of money to find consumer


sentiments and opinions.

● Social Media
○ Finding general opinion about recent hot topics in town
Application
s
● Individuals:
○ Interested in other's opinions when…
■ Purchasing a product or using a service
■ Finding opinions on political topics ,movies,etc.
● Ads Placements:
○ Placing ads in the user-generated content
○ Place an ad when one praises a product.
○ Place an ad from a competitor if one criticizes a
product.
Approac
h
● NLP
○ Use semantics to understand the language
○ Uses SentiWordNet Machine Learning
○ Don’t have to understand the meaning
○ Uses classifiers such as Naïve Byes, SVM, etc.
● Machine Learning
○ Machine learning is a branch of artificial intelligence, concerns
the construction and study of systems that can learn from data.
○ Various datasets available on Internet such as twitter dataset, movie reviews
data sets, etc.
Libraries
available
Open-source Python libraries available are

● scikit-learn
● spaCy
● NLTK
● VADER (Valence Aware Dictionary for Sentiment Reasoning) in
NLTK
● pandas in scikit-learn
Implementatio
n
Implementatio
n
Implementatio
n
Implementatio
n
Conclusio
n
Sentiment Analysis can be used for analyzing opinions from Product
reviews, to Automated Interviews. It’s easy to implement. It has Strong
commercial interest because Companies want to know how their
products are being perceived and also Prospective consumers want to
know what existing users think.
Thank
You
Text
Summarizatio
nNarayan
Pandit
Introduction
The basic definition for text summarization in
NLP is the process of summarizing the
information in large texts for quicker consumption.
Introductio
n
• When you open news sites, do you just start reading every news article? Probably not.
We typically glance the short news summary and then read more details if interested.

Short, informative summaries of the news is now everywhere like magazines, news

aggregator apps, research sites, etc.

• Well, It is possible to create the summaries automatically as the news comes in from

various sources around the world.


Introductio
n
• The method of extracting these summaries from the original huge text without losing
vital information is called as Text Summarization. It is essential for the summary to be a
fluent, continuous and depict the significant.

• In fact, the google news, the inshorts app and various other news aggregator apps take
advantage of text summarization algorithms.
Types of
Text
Summarization

Text summarization methods can be


grouped into two main categories:
Extractive and Abstractive methods
Types of
Text
• Extractive TextSummarization
Summarization
It is the traditional method developed first. The main objective
is to identify the significant sentences of the text and add
them to the summary. You need to note that the summary
obtained contains exact sentences from the original text.
• Abstractive Text Summarization
It is a more advanced method, many advancements keep coming
out frequently(I will cover some of the best here). The
approach is to identify the important sections, interpret the
context and reproduce in a new way. This ensures that the core
information is conveyed through shortest text possible. Note
that here, the sentences in summary are generated, not just
extracted from original text.
Various approaches for Text
Summarization
Summarization: Three
Stages
• Content Selection: Choose sentences to extract from the document

• Information Ordering: Choose an order to place them in the


summary

• Sentence Realization: Clean up the sentences


Text Summarization with
Sumyprovides you several algorithms to
• Sumylibraray implement Text
Summarzation. Just import your desired algorithm rather having to code
it on your own.
• First, import the library through below command.

• You can acesss different summarizers available through


[Link] module.
LexRan
• A sentence which is similarkto many other sentences of the text has a
high probability of being important. The approach of LexRank is that
a particular sentence is recommended by other similar sentences and
hence is ranked higher.

• Demonstration step-by-step on how to summarize the below text,


• original_text='Junk foods taste good that’s why it is mostly liked by everyone of any age group especially kids and school going children. They
generally ask for the junk food daily because they have been trend so by their parents from the childhood. They never have been discussed by their
parents about the harmful effects of junk foods over health. According to the research by scientists, it has been found that junk foods have negative
effects on the health in many ways. They are generally fried food found in the market in the packets. They become high in calories, high in cholesterol,
low in healthy nutrients, high in sodium mineral, high in sugar, starch, unhealthy fat, lack of protein and lack of dietary fibers. Processed and junk foods
are the means of rapid and unhealthy weight gain and negatively impact the whole body throughout the life. It makes able a person to gain excessive
weight which is called as obesity. Junk foods tastes good and looks good however do not fulfil the healthy calorie requirement of the body. Some of the
foods like french fries, fried foods, pizza, burgers, candy, soft drinks, baked goods, ice cream, cookies, etc are the example of high-sugar and high-fat
containing foods. It is found according to the Centres for Disease Control and Prevention that Kids and children eating junk food are more prone to the
type-2 diabetes. In type-2 diabetes our body become unable to regulate blood sugar level. Risk of getting this disease is increasing as one become more
obese or overweight. It increases the risk of kidney failure. Eating junk food daily lead us to the nutritional deficiencies in the body because it is lack of
essential nutrients, vitamins, iron, minerals and dietary fibers. It increases risk of cardiovascular diseases because it is rich in saturated fat, sodium and
bad cholesterol. High sodium and bad cholesterol diet increases blood pressure and overloads the heart functioning. One who like junk food develop
more risk to put on extra weight and become fatter and unhealthier. Junk foods contain high level carbohydrate which spike blood sugar level and make
person more lethargic, sleepy and less active and alert. Reflexes and senses of the people eating this food become dull day by day thus they live more
sedentary life. Junk foods are the source of constipation and other disease like diabetes, heart ailments, clogged arteries, heart attack, strokes, etc
because of being poor in nutrition. Junk food is the easiest way to gain unhealthy weight. The amount of fats and sugar in the food makes you gain
weight rapidly. However, this is not a healthy weight. It is more of fats and cholesterol which will have a harmful impact on your health. Junk food is
also one of the main reasons for the increase in obesity [Link] food only looks and tastes good, other than that, it has no positive points. The
amount of calorie your body requires to stay fit is not fulfilled by this food. For instance, foods like French fries, burgers, candy, and cookies, all have
high amounts of sugar and fats. Therefore, this can result in long-term illnesses like diabetes and high blood pressure. This may also result in kidney
failure. Above all, you can get various nutritional deficiencies when you don’t consume the essential nutrients, vitamins, minerals and more. You
become prone to cardiovascular diseases due to the consumption of bad cholesterol and fat plus sodium. In other words, all this interferes with the
functioning of your heart. Furthermore, junk food contains a higher level of carbohydrates. It will instantly spike your blood sugar levels. This will
result in lethargy, inactiveness, and sleepiness. A person reflex becomes dull overtime and they lead an inactive life. To make things worse, junk food
also clogs your arteries and increases the risk of a heart attack. Therefore, it must be avoided at the first instance to save your life from becoming
[Link] main problem with junk food is that people don’t realize its ill effects now. When the time comes, it is too late. Most importantly, the issue
is that it does not impact you instantly. It works on your overtime; you will face the consequences sooner or later. Thus, it is better to stop [Link] can
avoid junk food by encouraging your children from an early age to eat green vegetables. Their taste buds must be developed as such that they find
healthy food tasty. Moreover, try to mix things up. Do not serve the same green vegetable daily in the same style. Incorporate different types of healthy
food in their diet following different recipes. This will help them to try foods at home rather than being attracted to junk [Link], do not deprive
them completely of it as that will not help. Children will find one way or the other to have it. Make sure you give them junk food in limited quantities
and at healthy periods of time. '
• Next, import PlaintextParser. Here, we have a article stored as a string hence
we use it. In case of using website sources etc, there are other parsers
available. Along with parser, you have to import Tokenizer for segmenting
the raw text into tokens.

• You can access the summarizers available through [Link].


Here, I have imported the LexRankSummarizer
• As the text source here is a string, you need to use PlainTextParser.from_string() function to
initialize the parser. You can specify the language used as input to the Tokenizer.
Syntax :PlaintextParser.from_string(cls,string,tokenizer)

• Next create a summarizer model lex_rank_summarizer to fit your text. The syntax
is: lex_rank_summarizer(document, sentences_count).
• You can decide the number of sentences you want in the summary through
parameter sentences_count.

• Final Text:
Itis found according to the CentresforDiseaseControlandPrevention that Kidsand children eating
junk food are more prone to the type-2 [Link] more of fats and cholesterol which will have a
harmful impact on your [Link] will find one way or the other to have it.
Advantages of Text
Summarization
• Summarizing reduces perusing time
• While investigating reports, outlines make the determination procedure simpler
• Summarization improves the adequacy of ordering
• Summarization calculations are less one-sided than human summarizers
• Personalized summaries are useful in question-answering systems as they provide
personalized information
• Utilizing programmed or Summarization frameworks empower business
theoretical administrations to build the number of content archives they can
process
Conclusio
n
Applying text summarization reduces reading time, accelerates the
process of researching for information, and increases the amount of
information that can fit in an area. Content Summarization strategies
are openly confined into abstractive and extractive synopsis.
NAMED ENTITY
RECOGNITON
Presented by

Aditya Vijay
Rana

94 - B

BE Comps
INTRODUCTION
Why do NE Recognition ?
• Key part of Information Extraction system
• Robust handling of proper names essential for
many applications
• Pre-processing for different classification levels
• Information filtering
• Information linking
WHAT IS
NER?
• Sub-domain under NLP (Natural Language Processing).
• A part of IE (Information Extraction).
• Automatic identification and counting of occurrences of named entities
in a collection of information.
• NE involves identification of proper names in texts, and classification
into a set of predefined categories of interest.
• Three universally accepted categories: person, location and organisation
• Other common tasks: recognition of date/time expressions,
measures (percent, money, weight etc), email addresses etc.
• Other domain-specific entities: names of drugs, medical conditions,
• names of ships, bibliographic references etc.
NER
Definition
• Named entity recognition (NER) (also known as entity identification
(EI) and entity extraction) is the task that locate and classify atomic
elements in text into predefined categories such as the names of persons,
organizations, locations, expressions of times, quantities, monetary values,
percentages, etc.
Eg:-
John sold 5 companies in 2002.
<ENAMEX TYPE="PERSON">John</ENAMEX> sold <NUMEX
TYPE="QUANTITY">5</NUMEX> companies in <TIMEX
TYPE="DATE">2002</TIMEX>.
ENTITY? o Word or Phrase that identifies
one item from a set of items that
have similar attributes

o Semantic elements that carry


a meaning

Named Entities with their labels are recognized as follows:


• ENAMEX : Person(Tim Cook) , Organization (Apple , Flint
Center), Location(Cupertino)
• TIMEX : Date , Time
• NUMEX : Money , Percentage , Quantity

o Named Entities are either dependent on the Proper Names tagging or on the Part
Of Speech (POS ) tagging.
NE
The Named entity Types
hierarchy is dividedinto three
major classes Entity
Name, Time and Numerical expressions.

ENAMEX

NE TYPES NUMEX

TIMEX
Entity
Types
Entity Name
Types
❑ Persons are entities limited to humans. A person may be a
single individual or a group. Individual refer to names of each
individual person. Group refers to set of individual

❑ Location entities are limited to geographical entities such as


geographical areas like names of countries, cities, continents
and landmasses, bodies of water, and geological formations.

❑ Organization entities are limited to corporations, agencies, and


other groups of people defined by an established organizational
structure
Entity Name
Types
❑ Facilities are limited to buildings and other permanent
man-made structures and real estate improvements like
hospitals, airport, colleges, libraries etc.

❑ A locomotive entity is a physical device primarily designed to


move an object from one location to another, by carrying,
pulling, or pushing the transported object.

❑ Artifact entities are objects or things, produced or shaped by


human craft, such as tools, weapons/ammunition, art paintings,
clothes, ornaments, medicines.
Numerical Expressions

DISTANCE

QUANTITY

NUMEX
MONEY

COUNT
Numerical Expressions
Distance refers to the distance measures such as kilometers,
Centimeters, meters, acres, feet etc.
Example: 10 cm., twenty feet, 15 hectares
Money specifies the different currency value such as rupee,
euro, Dinar, dollar etc.
Example: Rs. 1000, 250 Euro, $160
Count denotes the number (or counts) of Items/ articles/things
etc.
Example: 5 subjects, 12 students, 20 books
Quantity measurements like liters, tons, grams, volts etc. are
comes under this category.
Example: 20 litres, 22 kg, 50g, 100 volts
Time Expressions

TIMEX

TIME MONTH DAT YEAR


E

DAY PERIOD SPECIAL DAY


Temporal
Expressions
Temporal expressions are the entities refers to time, date, year, month and day
Time: These refer to expressions of time, includes different forms
of expressing time. This also includes Hours, minutes and seconds.
Example
5’o clock in the morning
9.30 a.m.
Evening 6.30 p.m.
Date: This refers to expressions of Date such as 13/12/2001 etc in
different forms. This also includes month, date and year
Example
August 15 1947
1956
September 11
Temporal
Expressions
Day: These are expressions, which convey days in a year. Also it can
include days occurring weekly /fortnightly/ monthly /quarterly/ biennial etc.
Example
−Sunday
−Tomorrow
−Today
−Yesterday
Special Day: refers to special days in a year
Example
−GandhiJayanthi
−Rama Navami
Temporal
Expressions
Period: refers to expressions, which express duration of
time
or time periods or time intervals.
Example
− 17 th century
− 10 minutes
− 10 a.m. to 12 p.m.
− One year
APPLICATIONS OF NER
• PARSING AND MACHINE TRANSLATION
• PROVIDES QUICK OPERATION
• USED IN BIO-MEDICAL SECTORS
• PRIMARILY USED FOR GENRALS ANDARTICLES
• NOW EXTENDED TO WEB BLOGS, TWITTER,FACEBOOK
ETC
Thankou

Common questions

Powered by AI

The consumption of junk food poses several health risks, including obesity, type-2 diabetes, cardiovascular diseases, and nutritional deficiencies. Junk food's high levels of fats, sugars, and carbohydrates lead to rapid weight gain, increased blood sugar, and a sedentary lifestyle, resulting in long-term health issues like heart attacks and kidney failure .

Text summarization is crucial for reducing reading time and enhancing information processing by generating concise summaries of large texts. It is classified into extractive methods, which identify and extract significant sentences from the text, and abstractive methods, which interpret and reproduce the core ideas in a new form for brevity and coherence .

The NLP approach to sentiment analysis uses semantic understanding and tools like SentiWordNet to interpret language meaningfully. In contrast, the machine learning approach relies on classifiers such as Naïve Bayes and SVM, which can process data without understanding linguistic semantics, utilizing datasets for training models .

Sentiment analysis using natural language processing, text analysis, and statistics helps businesses understand consumer sentiments by automatically extracting opinions and emotions from text. This assists in brand analysis, product perception, and service benchmarking, allowing businesses to gauge consumer feedback efficiently, lower costs, and make informed decisions .

Text summarization algorithms enhance content management by reducing reading time, simplifying information extraction, improving the accuracy of indexing, and allowing personalized summaries, which are beneficial for question-answering systems .

Challenges in creating a translation matrix include handling a large number of possible alignments, especially when sentence lengths exceed 100 words, leading to 100100 possible alignments. This makes it computationally intensive and requires sophisticated algorithms like Viterbi for optimal path finding .

Sentence ordering and realization are critical in text summarization to ensure that extracted sentences are not only logically sequenced but also syntactically and semantically coherent, resulting in a fluent and informative summary that maintains the integrity of the original information .

Named Entity Recognition (NER) enhances NLP applications by accurately identifying and classifying key entities like names, locations, and organizations within texts. This enables more precise information retrieval, content categorization, and data analysis across various domains .

The Estimation Maximization algorithm is used in the iterative convergence process for building a translation matrix. It helps optimize the alignment of words between the source and target languages by incrementally improving the probability estimates of word alignment until a stable state is reached .

The development of machine translation began during wartime with the use of computers in code breaking, followed by Warren Weaver's memorandum in 1949 that sparked further interest. Significant investments were made by the US government, primarily focusing on Russian-English translation, driven by the initial promise of fully automatic high-quality translation (FAHQT).

You might also like