07/14/16
1. XML
2. Basic XML Concepts
3. Defining XML Data Formats
4. Visualization
XML
XML???
X: Extensible
M: Mark-Up Language
L: Language
07/14/16
XML
XML is not
A replacement for HTML
(but HTML can be generated from XML)
A presentation format
(but XML can be converted into one)
A programming language
(but it can be used with almost any language)
A network transfer protocol
(but XML may be transferred over a network)
A database
(but XML may be stored into a database)
07/14/16
XML
But then what is it?
XML is a meta markup language
for text documents / textual data
XML allows to define languages
(applications) to represent text
documents / textual data
07/14/16
XML
XML by Example
<article>
<author>Gerhard Weikum</author>
<title>The Web in 10 Years</title>
</article>
Easy to understand for human users
Very expressive (semantics along with the data)
Well structured, easy to read and write from programs
This looks nice, but
07/14/16
XML
XML by Example
this is XML, too:
<t108>
<x87>Gerhard Weikum</x87>
<g10>The Web in 10 Years</g10>
</t108>
Hard to understand for human users
Not expressive (no semantics along with the data)
Well structured, easy to read and write from programs
07/14/16
XML
XML by Example
and what about this XML document:
<data>
ch37fhgks73j5mv9d63h5mgfkds8d984lgnsmcns983
</data>
Impossible to understand for human users
Not expressive (no semantics along with the data)
Unstructured, read and write only with special programs
The actual benefit of using XML highly depends
on the design of the application.
07/14/16
XML
Possible Advantages of Using XML
Truly Portable Data
Easily readable by human users
Very expressive (semantics near data)
Very flexible and customizable (no finite tag set)
Easy to use from programs (libs available)
Easy to convert into other representations
(XML transformation languages)
Many additional standards and tools
Widely used and supported
07/14/16
XML
App. Scenario 1: Content Mgt.
Clients
XML2HTML
XML2WML
XML2PDF
Converters
Database with
XML documents
07/14/16
XML
App. Scenario 2: Data Exchange
Buyer
XML
Adapter
Su
XML
(BMECat, ebXML, RosettaNet, BizTalk, )
Legacy
System
(e.g., SAP
R/2)
07/14/16
Order
XML
XML
Adapter
Legacy
System
(e.g.,
Cobol)
10
App. Scenario 3: XML for Metadata
<rdf:RDF
<rdf:Description rdf:about="[Link]
<dc:title>A Framework for</dc:title>
<dc:creator>Ralf Schenkel</dc:creator>
<dc:description>While there are...</dc:description>
<dc:publisher>Saarland University</dc:publisher>
<dc:subject>XML Indexing</dc:subject>
<dc:rights>Copyright ...</dc:rights>
<dc:type>Electronic Document</dc:type>
<dc:format>text/pdf</dc:format>
<dc:language>en</dc:language>
</rdf:Description>
</rdf:RDF>
07/14/16
XML
11
App. Scenario 4: Document Markup
<?xml version="1.0" ?>
<!DOCTYPE STORY SYSTEM "[Link]">
<Book Author="Anonymous">
<Title>Sample Book</Title>
<Chapter id="1">
This is chapter 1. It is not very long or
interesting.
</Chapter>
<Chapter id="2">
This is chapter 2. Although it is longer than
chapter 1,
it is not any more interesting.
</Chapter>
</Book>
07/14/16
XML
12
App. Scenario 4: Document Markup
Document Markup adds structural and semantic
information to documents, e.g.
Sections, Subsections, Theorems,
Cross References
Literature Citations
Index Entries
Named Entities
This allows queries like
Which articles cite Weikums XML paper from 2001?
Which articles talk about (the named entity) Weikum?
07/14/16
XML
13
XML for Beginners
Part 2 Basic XML Concepts
2.1 XML Standards by the W3C
2.2 XML Documents
07/14/16
XML
14
2.1 XML Standards an Overview
XML Core Working Group:
XML 1.0 (Feb 1998), 1.1 (candidate for recommendation)
XML Namespaces (Jan 1999)
XML Inclusion (candidate for recommendation)
XSLT Working Group:
XSL Transformations 1.0 (Nov 1999), 2.0 planned
XPath 1.0 (Nov 1999), 2.0 planned
eXtensible Stylesheet Language XSL(-FO) 1.0 (Oct 2001)
XML Linking Working Group:
XLink 1.0 (Jun 2001)
XPointer 1.0 (March 2003, 3 substandards)
XQuery 1.0 (Nov 2002) plus many substandards
XMLSchema 1.0 (May 2001)
07/14/16
XML
15
2.2 XML Documents
Whats in an XML document?
Elements
Attributes
plus some other details
<?xml version=1.0 encoding=utf-8?>
07/14/16
XML
16
A Simple XML Document
<article>
<author>Shivang</author>
<title>XML BASICS</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=1 title=Introduction>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
07/14/16
XML
17
A Simple XML Document
<article>
Freely definable tags
<author>Gerhard Weikum</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=1 title=Introduction>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
07/14/16
XML
18
A Simple XML Document
Start Tag
<article>
<author>Gerhard Weikum</author>
<title>The Web in Ten Years</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=1 title=Introduction>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
End Tag
07/14/16
Element
XML
Content of
the Element
(Subelements
and/or Text)
19
A Simple XML Document
<article>
<author>Shivang Popat</author>
<title>XMl BASICS</title>
<text>
<abstract>In order to evolve...</abstract>
<section number=1 title=Introduction>
The <index>Web</index> provides the universal...
</section>
</text>
</article>
Attributes with
name and value
07/14/16
XML
20
Elements in XML Documents
(Freely definable) tags: article, title, author
with start tag: <article> etc.
and end tag: </article> etc.
Elements: <article> ... </article>
Elements have a name (article) and a content (...)
Elements may be nested.
Elements may be empty: <this_is_empty/>
Element content is typically parsed character data (PCDATA),
i.e., strings with special characters, and/or nested elements (mixed
content if both).
Each XML document has exactly one root element and forms a
tree.
07/14/16
XML
21
Elements vs. Attributes
Elements may have attributes (in the start tag) that have a name
and
a value, e.g. <section number=1>.
What is the difference between elements and attributes?
Only one attribute with a given name per element (but an
arbitrary number of subelements)
Attributes have no structure, simply strings (while elements can
have subelements)
As a rule of thumb:
Content into elements
Metadata into attributes
Example: attributes/useofattributes
07/14/16
XML
22
XML Documents as Ordered Trees
article
author
text
title
number=1
Shivang
abstract
section
title=
In order
The
XML
BASICS
07/14/16
index
provides
Web
XML
23
More on XML Syntax
Some special characters must be escaped using entities:
< <
& &
(will be converted back when reading the XML doc)
Some other characters may be escaped, too:
> >
"
'
07/14/16
XML
24
Well-Formed XML Documents
A well-formed document must adher to, among others, the
following rules:
Every start tag has a matching end tag.
Elements may nest, but must not overlap.
There must be exactly one root element.
Attribute values must be quoted.
An element may not have two attributes with the same
name.
Comments and processing instructions may not appear
inside tags.
No unescaped < or & signs may occur inside character
data.
07/14/16
XML
25
Well-Formed XML Documents
A well-formed document must adher to, among others, the
following rules:
Every start tag has a matching end tag.
Elements may nest, but must not overlap.
ThereOnly
must bewell-formed
exactly one root element.
documents
Attribute values must be quoted.
can
be
processed
by
XML
An element may not have to attributes with the same
name.
parsers.
Comments and processing instructions may not appear
inside tags.
No unescaped < or & signs may occur inside character
data.
07/14/16
XML
26
XML for Beginners
Part 3 Defining XML Data Formats
3.1 Document Type Definitions
3.2 XML Schema
07/14/16
XML
27
3.1 Document Type Definitions
Sometimes XML is too flexible:
Most Programs can only process a subset of all possible
XML applications
For exchanging data, the format (i.e., elements,
attributes and their semantics) must be fixed
Document Type Definitions (DTD) for establishing the
vocabulary for one XML application (in some sense
comparable to schemas in databases)
A document is valid with respect to a DTD if it conforms
to the rules specified in that DTD.
Most XML parsers can be configured to validate.
07/14/16
XML
28
TYPES OF DTD
2 TYPES OF Dtd
Internal DTD
Example
External DTD
Example
07/14/16
XML
29
DTD Example
<?xml version="1.0"?>
<page>
<title>Hello friend</title>
<content>Here is some content :)</content>
<comment>Written by Shivang Popat</comment>
</page>
07/14/16
XML
30
Element Declarations in DTDs
One element declaration for each element type:
<!ELEMENT element_name content_specification>
where content_specification can be
(#PCDATA) parsed character data
(child)
one child element
(c1,,cn) a sequence of child elements c1cn
(c1||cn) one of the elements c1cn
For each component c, possible counts can be specified:
c
c+
c*
c?
exactly one such element
one or more
zero or more
zero or one
Plus arbitrary combinations using parenthesis:
<!ELEMENT f ((a|b)*,c+,(d|e))*>
07/14/16
XML
31
More on Element Declarations
Elements with mixed content:
<!ELEMENT text (#PCDATA|index|cite|glossary)*>
Elements with empty content:
<!ELEMENT image EMPTY>
Elements with arbitrary content (this is nothing for
production-level DTDs):
<!ELEMENT thesis ANY>
07/14/16
XML
32
Attribute Declarations in DTDs
Attributes are declared per element:
<!ATTLIST section number CDATA #REQUIRED
title CDATA #REQUIRED>
declares two required attributes for element section.
element name
attribute name
attribute type
attribute default
Example(withattribute)
07/14/16
XML
33
Attribute Declarations in DTDs
Attributes are declared per element:
<!ATTLIST section number CDATA #REQUIRED
title CDATA #REQUIRED>
declares two required attributes for element section.
Possible attribute defaults:
#REQUIRED
is required in each element instance
#IMPLIED
is optional
#FIXED default always has this default value
default
has this default value if the attribute is
omitted from the element instance
07/14/16
XML
34
Attribute Types in DTDs
CDATA
string data
(A1||An)enumeration of all possible values of the
ID
IDREF
attribute (each is XML name)
unique XML name to identify the element
refers to ID attribute of some other element
(intra-document link)
IDREFS
list of IDREF, separated by white space
plus some more
07/14/16
XML
35
Flaws of DTDs
No support for basic data types like integers, doubles,
dates, times,
No type derivation
Cant express unordered contents conveniently
XML Schema
07/14/16
XML
36
3.2 XML Schema Basics
XML Schema is an XML application
Provides simple types (string, integer, dateTime,
duration, language, )
Allows defining possible values for elements
Allows defining types derived from existing types
Allows defining complex types
Allows posing constraints on the occurrence of elements
Allows forcing uniqueness and foreign keys
Examples
07/14/16
XML
37
XML for Beginners
Part 3 Visualization
4.1 XSLT(Extensible Stylesheet Language
Transformations)
07/14/16
XML
38
XSLT essentials and goals
XSLT is a transformation language for XML. That
means, using XSLT, you could generate any sort of other
document from an XML document. For example, you
could take XML data output from a database into some
graphics.
XSLT is a W3C XML language (the usual XML wellformedness criteria apply)
XSLT can translate XML into almost anything , e.g.:
wellformed HTML (closed tags)
any XML, e.g. yours or other XML languages like SVG, X3D
non XML, e.g. RTF (this is a bit more complicated)
07/14/16
XML
39
XSLTElements
The <xsl:template> Element
The <xsl:value-of> Element
The <xsl:for-each> Element
The <xsl:sort> Element
The <xsl:if> Element
The <xsl:choose> Element
The <xsl:apply-templates> Element
07/14/16
XML
40
A complete XSLT example
07/14/16
XML
41
Summary and Outlook
You should give one, I wont.
07/14/16
XML
42