0% found this document useful (0 votes)

83 views34 pages

Types of Query Languages Explained

The document discusses various types of query languages used in data and information retrieval, emphasizing the differences between pattern-based and keyword-based querying. It covers the characteristics of keyword queries, including single-word, context, and Boolean queries, as well as the importance of ranking and retrieval units. Additionally, it explores pattern matching techniques and regular expressions for enhanced data retrieval capabilities.

Uploaded by

suryayellaalone

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views34 pages

Types of Query Languages Explained

Uploaded by

suryayellaalone

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Query Languages

Berlin Chen 2005

Reference:
1. Modern Information Retrieval, chapter 4
The Kinds of Queries
• Data retrieval
– Pattern-based querying

– Retrieve docs that contains (or exactly match) the objects that
satisfy the conditions clearly specified in the query

– A single erroneous object implies failure!

• Information retrieval
– Keyword-based querying
– Retrieve relevant docs in response to the query
(the formulation of a user information need)

– Allow the answer to be ranked

IR – Berlin Chen 2
The Kinds of Queries

• On-line databases or CD-ROM archives

– High level software packages should be viewed as query
languages
– Named “protocols”

Different query languages are formulated and then

used at different situations, by considering
- The underlying retrieval models (ranking alogrithms)
- The content (semantics) and structure (syntax) of the text

Models: Boolean, vector-space, HMM ….

Formulations/word-treating machineries: stop-word list,
stemming, query-expansion, ….
IR – Berlin Chen 3
The Retrieval Units

• The retrieval unit: the basic element which can be

retrieved as an answer to a query
– A set of such basic elements with ranking information

• The retrieval unit can be a file, a doc, a Web page, a

paragraph, a passage, or some other structural units

• Simply referred as “docs”

kinds of
retrieval units

kinds of queries IR – Berlin Chen 4

Keyword-based Querying

• Keywords
– Those words can be used for retrieval by a query
– A small set of words extracted from the docs
• Preprocessing is needed

• Characteristics of keyword-based queries

– A query composed of keywords and the docs containing such
keywords are searching for
– Intuitive, easy to express, and allowing for fast ranking
– A query can be a single keyword, multiple keywords (basic
queries), or more complex combination of operation involving
several keywords
• AND, OR, BUT, …

IR – Berlin Chen 5
Keyword-based Querying (cont.)

• Single-word queries
– Query: The elementary query is a word

– Docs: The docs are long sequences of words

– What is a word in English ?

• A word is a sequence of letters surrounded by separators
• Some characters are not letters but do not split a word, e.g.
the hyphen in ‘on-line’
• Words possess semantic/conceptual information

IR – Berlin Chen 6
Keyword-based Querying (cont.)
similarity between
• Single-word queries (cont.) a query and doc
– The use of word statistics for IR ranking
• Word occurrences inside texts
– Term frequency (tf): number of times a word in a doc
– Inverse document frequency (IDF): number of docs in
which a word appears

– Word positions in the docs (see next slide)

• May be required, e.g., a interface that highlights each
occurrence of a specific word

IR – Berlin Chen 7
Keyword-based Querying (cont.)

IR – Berlin Chen 8
Keyword-based Querying (cont.)
• Context queries
– Complement single-word queries with ability to search words
in a given context, i.e., near other words

– Words appearing near each other may signal a higher

likelihood of relevance than if they appear apart

– E.g., Phrases of words or words are proximal in the text

IR – Berlin Chen 9
Keyword-based Querying (cont.)
• Context queries (cont.)
– Two types of queries
• Phrase
Features:
– A sequence of single-word queries 1. Separators in the text
Q: “enhance” and “retrieval” or query may not be
D: “…enhance the retrieval….” the same
2. uninteresting words
– Not all systems implement it! are not considered
• Proximity
– A relaxed version of the phrase query
– A sequence of single words (or phrases) is given
together with a maximum allowed distance between
them
– E.g., two keywords occur within four words Features:
1. May not consider
Q: “enhance” and “retrieval” word ordering
D: “…enhance the power of retrieval…”
IR – Berlin Chen 10
Keyword-based Querying (cont.)

• Context queries (cont.)

– Ranking
• Phrases: analogous to single words

• Proximity queries: the same way if physical proximity is not

used as a parameter in ranking
– Just as a hard-limiter
– But physical proximity has semantic value !

How to do better ranking ?

IR – Berlin Chen 11
Keyword-based Querying (cont.)

• Boolean Queries
– Have a syntax composed of atoms (basic queries) that
retrieve docs, and of Boolean operators which work on their
operands (sets of docs)
AND

translation OR
Leaves: basic queries
Internal nodes: operators

syntax syntactic

A query syntax tree.

IR – Berlin Chen 12
Keyword-based Querying (cont.)
• Boolean Queries (cont.)
– Commonly used operators
• OR, e.g. (e1 OR e2)
e1 and e2 are basic queries
– Select all docs which satisfy e1 or e2. Duplicates are
eliminated e e
1 e OR e e AND e e BUT e
2 1 2 1 2 1 2
d3 d4 d3 d7 d3
d7 d7 d4 d10
d10 d8
• AND, e.g. (e1 AND e2) d7
d8
– Select all docs which satisfy both e1 and e2 d10

• BUT, e.g. (e1 BUT e2)

– Select all docs which satisfy e1 but not e2
– Can use the inverted file to filter out undesired docs

No partial matching between a doc and a query

No ranking of retrieved docs are provided!
IR – Berlin Chen 13
Keyword-based Querying (cont.)

• Boolean Queries (cont.)

– A relaxed version: a “fuzzy Boolean” set of operators
• The meaning of AND and OR can be relaxed
– all : the AND operator
– one: the OR operator (at least one)
– some: retrieval elements appearing in more
operands (docs) than the OR

• Docs are ranked higher when having a larger number of

elements in common with the query

– Naïve users have trouble with Boolean Queries

IR – Berlin Chen 14
Keyword-based Querying (cont.)
• Natural language
– Push the fuzzy Boolean model even further
• The distinction between AND and OR are complete blurred

– A query can be an enumeration of words or/and context queries

– Typically, a query treated as a bag of words (ignoring the

context ) for the vector space model
• Term-weighting, relevance feedback, etc.

– All the documents matching a portion of the user query are

retrieved
• Docs matching more parts of the query assigned a higher
ranking

– Negation also can be handled by penalizing the ranking score

• E.g. some words are not desired
IR – Berlin Chen 15
Keyword-based Querying (cont.)

• Natural language

IR – Berlin Chen 16
Pattern Matching

• Pattern matching: allow the retrieval of docs based on

some patterns
– A pattern is a set of syntactic features that must occur in a text
segments
• Segments satisfying the pattern specifications are said to
“match the pattern”
• E.g. the prefix of a word
– A kind of data retrieval

• Pattern matching (data retrieval) can be viewed as an

enhanced tool for information retrieval
– Require more sophisticated data structures and algorithms to
retrieve efficiently

IR – Berlin Chen 17
Pattern Matching (cont.)

• Types of patterns
– Words: most basic patterns

– Prefixes: a string from the beginning of a text word

• E.g. ‘comput’: ‘computer’, ‘computation’,…

– Suffixes: a string from the termination of a text word

• E.g. ‘ters’: ‘computers’, ‘testers’, ‘painters’,…

– Substrings: A string within a text word

• E.g. ‘tal’: ‘coastal’, ‘talk’, ‘metallic’, …

– Ranges: a pair of strings matching any words lying between them

in lexicographic order
• E.g. between ‘held’ and ‘hold’: ‘hoax’ and ‘hissing’,…

IR – Berlin Chen 18
Pattern Matching (cont.)
– Allowing errors: a word together with an error threshold
• Useful for when query or doc contains typos or misspelling

• Retrieve all text words which are ‘similar’ to the given word

• edit (or Levenshtein) distance: the minimum number of

character insertions, deletions, and replacements needed
to make two strings equal
– E.g. ‘flower’ and ‘flo wer’

• maximum allowed edit distance: query specifies the

maximum number of allowed errors for a word to match the
pattern

IR – Berlin Chen 19
Pattern Matching (cont.)
• String Alignment: Using Dynamic Programming

Ins. (n,m)
query string m
(reference) m-1 Del.
.

Ins. (i,j)
j (i-1,j)
. Del.
. (i-1,j-1) (i,j-1)
.
4
3Del. 3
2Del. 2
Del.
1Del. 1
0
1 2 3 4 5 …. … i … … n-1 n
0
1Ins. 2Ins. 3Ins.
doc string
(test)

IR – Berlin Chen 20
Pattern Matching (cont.)
Step 2 : Iteration :
• String Alignment: Using for i = 1,..., n { //test
for j = 1,..., m { //reference
Dynamic Programming
⎡ G[i - 1][j] + 1 (Insertion) ⎤
Step 1 : Initializa tion : ⎢ G[i][j - 1] + 1 (Delection) ⎥
G[0][0] = 0; G[i][j] = min ⎢ ⎥
⎢G[i - 1][j - 1] + 1 (if LR[i]!= LT[i], Substitution)⎥
for i = 1,..., n { //test ⎢ ⎥
⎣ G[i - 1][j - 1] (if LR[i] = LT[i], Match) ⎦
G[i][0] = G[i - 1][0] + 1;
⎧ 1; //Insertion, (Horizontal Direction)
B[i][0] = 1; //Inserti on ⎪ 2; //Deletion , (Vertical Direction)
⎪
} (Horizonta l Direction) B[i][j]⎨
⎪3; //Substitution (Diagonal Direction)
for j = 1,..., m { //referen ce ⎪⎩4; //match (Diagonal Direction)
G[0][j] = G[0][j - 1] + 1;
B[0][j] = 2; // Deletion } //for j, reference
} (Vertical Direction) } //for i, test

Step 3 : Measure and Backtrace : Note: the penalties for substitution, deletion
G[n][m] and insertion errors are all set to be 1 here
String Error Rate = 100% ×
m
String Accuracy Rate = 100 % − Word Error Rate
Optimal backtrace path = (B[n][m] → ..... → B[0][0])
if B[i][j] = 1 print " LT[i]" ; //Insertio n, then go left
else if B[i][j] = 2 print " LR[j] " ; //Deletion , then go down
else print " LR[j] LR[i] " ; //Hit/Matc h or Substituti on, then go down diagonally
IR – Berlin Chen 21
Pattern Matching (cont.)
• String Alignment: Using Dynamic Programming
Correct
Note: the penalties for (0,5,0,0) C
(0,2,2,1) Delete C
substitution, deletion (0,4,0,1) (0,3,1,1) (1,2,1,2)
or (1,3,0,2)
and insertion errors are
all set to be 1 here (0,4,0,0) C
(0,3,0,1) (0,2,1,1) (1,2,1,1) (1,1,1,2)
Hit C
(Ins,Del,Sub,Hit) (0,3,0,0) B (0,2,0,1) (1,2,0,1) (1,1,1,1) (2,1,0,2)
j or (0,1,2,0) Sub B or (1,0,2,1)

(0,2,0,0) C (0,1,1,0) (1,0,1,1) (2,0,0,2)

(1,1,0,1)
or(0,0,2,0) Del C

(0,1,0,0) A
(0,0,1,0) (1,0,0,1) (2,0,0,1) (3,0,0,1)
Alignment 1: WER= 80% Hit A
(0,0,0,0) Test
Ins B
0
Correct:
B
A
A
C B
A
C
C
C 0 B
(1,0,0,0)
A
(2,0,0,0)
i A
(3,0,0,0)
C
(4,0,0,0)
Test:
Alignment 3:
Ins B Hit A Del C Sub B Hit c Del c WER=80%
Correct: A C B C C
Correct: A C B C C
Alignment 2: Test: B A A C
Test: B A A C
WER=80% Ins B Hit A Sub C Del B Hit c Del c
Hit A Del C Sub B Hit c Del c IR – Berlin Chen 22
Pattern Matching (cont.)
– Regular Expressions
• General patterns are built up by simple strings and several
operations

• union: if e1 and e2 are regular expressions, then (e1 | e2) matches

what e1 or e2 matches

• concatenation: if e1 and e2 are regular expressions, the

occurrences of (e1 e2) are formed by the occurrences of e1
immediately followed by those of e2

• repetition (Kleene closure): if e is a regular expression, then (e*)

matches a sequence of zero or more contiguous occurrence of e

• Example:
– ‘pro (blem | tein) (s | ε) (0 | 1 | 2)*’ matches words
‘problem2’, ‘proteins’, etc.

IR – Berlin Chen 23
Pattern Matching (cont.)

– Extended Patterns
• Subsets of the regular expressions expressed with a simpler
syntax
• System can convert extended patterns into regular expressions,
or search them with specific algorithms
• E.g.: classes of characters:

IR – Berlin Chen 24
Structural Queries

• Docs are allowed to be queried with respect to both their

text content and structural constraints
– Text content: words, phrases, or patterns
– Structural constraints: containment, proximity, or other
restrictions on the structural elements (e.g., chapters, sections,
etc.)
• Standardization of languages used to represent structured
text, e.g., HTML…
Mixing contents and structures in queries

built on the top of basic queries

Query on Text Retrieval A Set of The Final Set of

Boolean model
Text Content model Retrieved Documents Retrieved Documents

Structural
Query structural constraints
IR – Berlin Chen 25
Structural Queries (cont.)

• Three main (text) structures discussed here

– Form-like fixed structure simple
– Hierarchical structure
– Hypertext structure
complex

What structure a text may have?

What can be queried about that
structure? (the query model)
How to rank docs?

IR – Berlin Chen 26
Form-like Fixed Structure (cont.)
• Docs have a fixed set of fields, much like a filled form
– Each field has some text inside
– Some fields are not presented in all docs text

text

– Text has to be classified into a field

fields
– Fields are not allow to nest or overlap text

– A given pattern only can be associated

with a specified filed text

couldn’t represent the text hierarchy

– E.g., a mail achieve (sender, receiver, date, subject, body ..)

• Search for the mail sent to a given person with “football” in
the subject field

• Compared with the relational database systems

– Different fields with different data types more rigid !
IR – Berlin Chen 27
Hypertext Structure (cont.)
• A hypertext is a directed graph where
– Nodes hold some text (content)
– The links represents connection (structural connectivity)
between nodes or between positions inside the nodes

• Retrieval from a hypertext began as a merely

navigational activity
– Manually traverse the hypertext nodes following links to search A

what one wanted C

– It’s still difficult to query the hypertext based on its structure

• An interesting proposal to combine browsing and

searching on the web WebGlimpse
– Allow classical navigation plus the ability to search by content in
the neighborhood of the current node
IR – Berlin Chen 28
Hierarchical Structure (cont.)

• An intermediate structuring model which lies between

form-like fixed structure and hypertext structure
• Represent a recursive decomposition of the text and is a
natural model for many text collections
– E.g., books, articles, legal documents,…

A parsed query used to retrieve

the figure

IR – Berlin Chen 29
Issues of Hierarchical Structure

• Static or dynamic structure

– Static: one or more explicit hierarchies can be queried, e.g., by
ancestry
– Dynamic: not really a hierarchy, the required elements are built
on the fly
• Implemented over a normal text index

• Restrictions on the structure

– The text or the answers may have restrictions about nesting
and/or overlapping for efficiency reasons

– In other cases, the query language is restricted to avoid

restricting the structure
The more powerful the model, the less efficiently it can be implemented

IR – Berlin Chen 30
Issues of Hierarchical Structure (cont.)

• Integration with text

– Effective Integration of queries on text content with queries on
text structure

– From perspectives of classical IR models

and structural models, respectively Classical model: primary -> text
secondary->structure
Structural model: primary -> structure
• Query language secondary->text

– Some features for queries on structure including selection of

areas that
• Contain (or not) other areas
• Are contained (or not) in other areas
• Follow (or are followed by) other areas
• Are close to other areas

– Also including set manipulation

IR – Berlin Chen 31
Query Protocols
• The query languages used automatically by software
applications to query text databases
– Standards for querying CD-ROMs
– Or, intermediate languages to query library systems

• Important query protocols

– Z39.50
• For bibliographical information systems
• Protocols for not only the query language but also the client-
server connection
– WAIS (Wide Area Information Service)
• A networking publishing protocol
• For querying database through the Internet
IR – Berlin Chen 32
Query Protocols (cont.)

• CD-ROM publishing protocols

– Provide “disk interchangeability”: flexibility in data
communication between primary information providers and end
users

– Some example protocols

• CCL (Common Command Language)
• CD-RDx (Compact Disk Read only Data exchange)
• SFQL (Structured Full-text Query Languages)

IR – Berlin Chen 33
Trends and Research Issues

• Types of queries and how they are structured

IR – Berlin Chen 34

Query Languages for Information Retrieval
No ratings yet
Query Languages for Information Retrieval
29 pages
Query Languages and Search Techniques
No ratings yet
Query Languages and Search Techniques
33 pages
Query Languages for Document Retrieval
No ratings yet
Query Languages for Document Retrieval
12 pages
Understanding Query Languages in IR
No ratings yet
Understanding Query Languages in IR
20 pages
Query Types and Structures in Databases
No ratings yet
Query Types and Structures in Databases
29 pages
Query Languages Overview: Chapter Seven
No ratings yet
Query Languages Overview: Chapter Seven
36 pages
Query Languages in Information Retrieval
No ratings yet
Query Languages in Information Retrieval
17 pages
Information Retrieval System Overview
No ratings yet
Information Retrieval System Overview
28 pages
Overview of Query Languages
No ratings yet
Overview of Query Languages
19 pages
Web Mining and Information Retrieval Overview
No ratings yet
Web Mining and Information Retrieval Overview
73 pages
Query Languages and Operations Overview
No ratings yet
Query Languages and Operations Overview
20 pages
Types of Queries in Information Retrieval
No ratings yet
Types of Queries in Information Retrieval
14 pages
Overview of Information Retrieval Systems
No ratings yet
Overview of Information Retrieval Systems
50 pages
Web Information Retrieval Challenges
No ratings yet
Web Information Retrieval Challenges
47 pages
Effective Paragraph Writing Queries
No ratings yet
Effective Paragraph Writing Queries
31 pages
Query Languages and Retrieval Techniques
No ratings yet
Query Languages and Retrieval Techniques
54 pages
Information Retrieval and Web Search Concepts
No ratings yet
Information Retrieval and Web Search Concepts
44 pages
Query Language and Retrieval Methods
No ratings yet
Query Language and Retrieval Methods
11 pages
Boolean Model in Information Retrieval
No ratings yet
Boolean Model in Information Retrieval
42 pages
Modern Information Retrieval Models
No ratings yet
Modern Information Retrieval Models
47 pages
Overview of Information Retrieval Models
No ratings yet
Overview of Information Retrieval Models
46 pages
Query Languages for Information Retrieval
No ratings yet
Query Languages for Information Retrieval
30 pages
Information Retrieval Fundamentals by Yang
No ratings yet
Information Retrieval Fundamentals by Yang
77 pages
Information Retrieval Concepts Overview
No ratings yet
Information Retrieval Concepts Overview
60 pages
Understanding Keyword-Based Queries
No ratings yet
Understanding Keyword-Based Queries
27 pages
Overview of Information Retrieval Systems
No ratings yet
Overview of Information Retrieval Systems
42 pages
Understanding Query Languages in IR
No ratings yet
Understanding Query Languages in IR
29 pages
Overview of Information Retrieval Systems
No ratings yet
Overview of Information Retrieval Systems
32 pages
Search Engine Evaluation Template
No ratings yet
Search Engine Evaluation Template
48 pages
Understanding Information Retrieval Basics
No ratings yet
Understanding Information Retrieval Basics
52 pages
PageRank Algorithm in Information Retrieval
No ratings yet
PageRank Algorithm in Information Retrieval
37 pages
Information Retrieval Course Overview
No ratings yet
Information Retrieval Course Overview
75 pages
Query Processing and Retrieval Techniques
No ratings yet
Query Processing and Retrieval Techniques
28 pages
Boolean and Vector Space Models
No ratings yet
Boolean and Vector Space Models
31 pages
Information Retrieval Models Explained
No ratings yet
Information Retrieval Models Explained
33 pages
Understanding Web Information Retrieval
No ratings yet
Understanding Web Information Retrieval
10 pages
Information Retrieval Models Explained
No ratings yet
Information Retrieval Models Explained
39 pages
Information Retrieval and Web Search Basics
No ratings yet
Information Retrieval and Web Search Basics
11 pages
NLP and Information Retrieval Models
No ratings yet
NLP and Information Retrieval Models
58 pages
Information Retrieval Concepts by Elloumi
No ratings yet
Information Retrieval Concepts by Elloumi
74 pages
CS419: Info Retrieval Models Overview
No ratings yet
CS419: Info Retrieval Models Overview
6 pages
Information Retrieval Query Techniques
No ratings yet
Information Retrieval Query Techniques
16 pages
Overview of Information Retrieval Systems
No ratings yet
Overview of Information Retrieval Systems
54 pages
Overview of Information Retrieval Techniques
100% (6)
Overview of Information Retrieval Techniques
87 pages
Understanding ADBT: Info Retrieval Basics
No ratings yet
Understanding ADBT: Info Retrieval Basics
7 pages
Information Retrieval and Web Search Overview
No ratings yet
Information Retrieval and Web Search Overview
29 pages
Applications of Information Retrieval
No ratings yet
Applications of Information Retrieval
23 pages
Information Retrieval Basics and Models
No ratings yet
Information Retrieval Basics and Models
72 pages
Understanding Modern Information Retrieval
No ratings yet
Understanding Modern Information Retrieval
53 pages
Web Search and Information Retrieval Models
No ratings yet
Web Search and Information Retrieval Models
30 pages
Query Languages in Information Retrieval
No ratings yet
Query Languages in Information Retrieval
9 pages
Overview of Information Retrieval Systems
No ratings yet
Overview of Information Retrieval Systems
57 pages
Overview of Information Retrieval Models
No ratings yet
Overview of Information Retrieval Models
106 pages
Information System and Document Retrieval Models
No ratings yet
Information System and Document Retrieval Models
19 pages
Information Retrieval Techniques Explained
No ratings yet
Information Retrieval Techniques Explained
53 pages
Overview of Information Retrieval Models
No ratings yet
Overview of Information Retrieval Models
46 pages
Information Retrieval in CS583
No ratings yet
Information Retrieval in CS583
33 pages
Information Retrieval Models and Queries
No ratings yet
Information Retrieval Models and Queries
11 pages
Trainee Characteristics Self-Assessment Form
No ratings yet
Trainee Characteristics Self-Assessment Form
23 pages
Sesame Use in Ancient China
No ratings yet
Sesame Use in Ancient China
10 pages
Max Halbert's Resume Overview
No ratings yet
Max Halbert's Resume Overview
1 page
Class XI English Question Paper Blueprint
No ratings yet
Class XI English Question Paper Blueprint
5 pages
Vizag Gas Leak: Causes and Consequences
100% (1)
Vizag Gas Leak: Causes and Consequences
35 pages
CPU and Safety in Computer Operations
No ratings yet
CPU and Safety in Computer Operations
4 pages
Oracle BI Developer Resume Overview
No ratings yet
Oracle BI Developer Resume Overview
10 pages
Protein Sorting in the Golgi Apparatus
No ratings yet
Protein Sorting in the Golgi Apparatus
9 pages
Grover's Search Algorithm Explained
No ratings yet
Grover's Search Algorithm Explained
15 pages
Effective Speech Delivery Tips
No ratings yet
Effective Speech Delivery Tips
2 pages
Quality Management in Pen Evaluation
No ratings yet
Quality Management in Pen Evaluation
8 pages
Emoji Sentiment Analysis in Arabic Texts
No ratings yet
Emoji Sentiment Analysis in Arabic Texts
10 pages
OTDR Test Results for ENTEL Fiber
No ratings yet
OTDR Test Results for ENTEL Fiber
2 pages
18 AWG 2C CMR Cable Specifications
No ratings yet
18 AWG 2C CMR Cable Specifications
1 page
Word Processing and Excel Basics Guide
No ratings yet
Word Processing and Excel Basics Guide
7 pages
3IN1 Steaming Water Tap Manual
No ratings yet
3IN1 Steaming Water Tap Manual
16 pages
Globalization vs. Internationalization Explained
No ratings yet
Globalization vs. Internationalization Explained
7 pages
Origins of Managing Agency in India
No ratings yet
Origins of Managing Agency in India
12 pages
Asian Paints: Rebranding and Strategy
No ratings yet
Asian Paints: Rebranding and Strategy
3 pages
Waybill for Fresh Onions Shipment
100% (1)
Waybill for Fresh Onions Shipment
2 pages
Yashwant Deshmukh: Expert in Opinion Research
No ratings yet
Yashwant Deshmukh: Expert in Opinion Research
2 pages
Grade 11 Technical Science Revision Term 1
100% (1)
Grade 11 Technical Science Revision Term 1
32 pages
Evolution of Materials Science and Engineering
No ratings yet
Evolution of Materials Science and Engineering
42 pages
Knowledge Spillover in Entrepreneurship
No ratings yet
Knowledge Spillover in Entrepreneurship
38 pages
What Is Artificial Intelligence - Introduction, History & Types of AI
No ratings yet
What Is Artificial Intelligence - Introduction, History & Types of AI
15 pages
Badeddula Sandy's Vedic Astrology Chart
No ratings yet
Badeddula Sandy's Vedic Astrology Chart
4 pages
Building System Design Course Overview
No ratings yet
Building System Design Course Overview
79 pages
Minimum Spark Ignition Energies Study
No ratings yet
Minimum Spark Ignition Energies Study
7 pages
Order # 101400000026453
No ratings yet
Order # 101400000026453
5 pages
LC Circuit Dynamics and Energy Analysis
No ratings yet
LC Circuit Dynamics and Energy Analysis
36 pages

Types of Query Languages Explained

Uploaded by

Types of Query Languages Explained

Uploaded by

Query Languages

Berlin Chen 2005

– A single erroneous object implies failure!

– Allow the answer to be ranked

• On-line databases or CD-ROM archives

Different query languages are formulated and then

Models: Boolean, vector-space, HMM ….

• The retrieval unit: the basic element which can be

• The retrieval unit can be a file, a doc, a Web page, a

• Simply referred as “docs”

kinds of queries IR – Berlin Chen 4

• Characteristics of keyword-based queries

– Docs: The docs are long sequences of words

– What is a word in English ?

– Word positions in the docs (see next slide)

– Words appearing near each other may signal a higher

– E.g., Phrases of words or words are proximal in the text

• Context queries (cont.)

• Proximity queries: the same way if physical proximity is not

How to do better ranking ?

A query syntax tree.

• BUT, e.g. (e1 BUT e2)

No partial matching between a doc and a query

• Boolean Queries (cont.)

• Docs are ranked higher when having a larger number of

– Naïve users have trouble with Boolean Queries

– A query can be an enumeration of words or/and context queries

– Typically, a query treated as a bag of words (ignoring the

– All the documents matching a portion of the user query are

– Negation also can be handled by penalizing the ranking score

• Pattern matching: allow the retrieval of docs based on

• Pattern matching (data retrieval) can be viewed as an

– Prefixes: a string from the beginning of a text word

– Suffixes: a string from the termination of a text word

– Substrings: A string within a text word

– Ranges: a pair of strings matching any words lying between them

• edit (or Levenshtein) distance: the minimum number of

• maximum allowed edit distance: query specifies the

(0,2,0,0) C (0,1,1,0) (1,0,1,1) (2,0,0,2)

• union: if e1 and e2 are regular expressions, then (e1 | e2) matches

• concatenation: if e1 and e2 are regular expressions, the

• repetition (Kleene closure): if e is a regular expression, then (e*)

• Docs are allowed to be queried with respect to both their

built on the top of basic queries

Query on Text Retrieval A Set of The Final Set of

• Three main (text) structures discussed here

What structure a text may have?

– Text has to be classified into a field

– A given pattern only can be associated

couldn’t represent the text hierarchy

– E.g., a mail achieve (sender, receiver, date, subject, body ..)

• Compared with the relational database systems

• Retrieval from a hypertext began as a merely

what one wanted C

– It’s still difficult to query the hypertext based on its structure

• An interesting proposal to combine browsing and

• An intermediate structuring model which lies between

A parsed query used to retrieve

• Static or dynamic structure

• Restrictions on the structure

– In other cases, the query language is restricted to avoid

• Integration with text

– From perspectives of classical IR models

– Some features for queries on structure including selection of

– Also including set manipulation

• Important query protocols

• CD-ROM publishing protocols

– Some example protocols

• Types of queries and how they are structured

You might also like