]
>
XML (Extensible Markup Language)
XML (Extensible Markup Language)
17.1 Background
A format for data exchange and manipulation
Easy to learn.
Self-describing: markup describes the structure and type of data.
Simple format from management perspective: parsing, transformations, portability, human readable
It is a standard (was a major benefit for SQL!): a flood of support
17.2 Syntax
Tags on data elements, identifying their meaning
<COURSES>
<course>
<number> 670 </number>
<quarter> Spring </quarter>
<days> TR </days>
</course>
<course>
<number> 680 </number>
<quarter> Autumn </quarter>
<days> MWF </days>
</course>
</COURSES>
Attributes may be added to the tags
<COURSES>
<course title="data bases" year="2001">
<number > 670 </number>
<quarter> Spring </quarter>
<days> TR </days>
</course>
<course title="data structures" year="2000">
<number> 680 </number>
<quarter> Autumn </quarter>
<days> MWF </days>
</course>
</COURSES>
Cross references may be used to express associations.
<COURSES>
<course ref="sp">
<number> 670 </number>
<days> TR </days>
</course>
<course ref="au">
<number> 680 </number>
<days> MWF </days>
</course>
<quarter id="sp"> Spring </quarter>
<quarter id="au"> Autumn </quarter>
</COURSES>
An underlying tree data model
Allows interleaving of elements with cdata (document-centric instead of data-centric ).
<COURSES>
Two courses
<course ref="sp">
Here is a cse <number> 670 </number> course
offered on <days> TR </days> .
</course>
and
<course ref="au">
We also have a cse <number> 680 </number> course
offered on <days> MWF </days> .
</course>
and two quarters
<quarter id="sp"> Spring </quarter>
and
<quarter id="au"> Autumn </quarter> .
</COURSES>
17.3 Relations and XML
Simple
t a b l e
col1 col2 col3
a1 b1 c1
a2 b2 c2
<table>
<row> <col1> a1 </col1> <col2> b1 </col2> <col3> c1 </col3> </row>
<row> <col1> a2 </col1> <col2> b2 </col2> <col3> c2 </col3> </row>
</table>
Applicable to small subclass of XML structures.
Used mainly by intermediate servers between databases and users.
17.4 Well Formed Files
Start and end tags required: <tag>...</tag>
Empty tags may use short cuts: <tag/>
Element attributes must be in quotes : <tag attribute="...">...</tag> (or ’...’ )
Proper nesting of tags
Root element
Entity codes (http://www.unicode.org/charts/ ):
ش (Arabic sheen ش)
α (Greek alpha α )
א (Hebrew aleph א)
⊆ (subset of or equal to ⊆)
Entity names:
< for < (< )
> for > (> )
" for " (" )
& for & (& )
' for ’ (' )
comments : <!--...-->
Processing instructions : <?....?>
CDATA : <![CDATA[...]]>
17.5 Rendering Semantics
17.6 Recommended Standards
http://www.w3.org :
XHTML: html, head, title, body, br, div, em, p, pre, span, strong a, dl, dt, dd, ol, ul, li, img, ...
<html>
<head>
<title>A document</title>
</head>
<body>
<p>...Some standard and <br/>
<em>emphasized</em>text...</p>
<p>Course options:
<ul>
<li>CSE 670</li>
<li>CSE 680</li>
</ul>
</p>
</body>
</html>
MathML: math, mi, mn, mo, mtext, mspace, mrow, mfrac, msqrt, mroot, mfenced, msub, msup, msubsup,
munder, mover, munderover, mmultiscripts, mtable, mtr, mlabeledtr, mtd, ....
<math>
<mrow>
<mo stretchy="true"> ( </mo>
<mfrac>
<mi> a </mi> <mi> b </mi>
</mfrac>
<mo stretchy="true"> ) </mo>
</mrow>
</math>
SVG: rect, circle, ellipse, line, polyline, polygon, ....
<svg>
<rect x="10" y="10" width="30" height="30" fill="blue"/>
<circle cx="600" cy="200" r="100"
fill="red" stroke="blue" stroke-width="10" />
<defs>
<rect id="MyRect" x="0" y="0" width="60" height="10"/>
</defs>
<use xlink:href="#MyRect"
transform="translate(20,2.5) rotate(10)" />
</svg>
17.7 Name Spaces
A mechanism for separating the XML vocabularies
Tag and attribute names: namespace + local name, separated by a colon
Non-prefixed tag and attribute names are assumed to carry a default namespace.
A name space is defined through a xmlns attribute
A xmlns associates a name with a uniform resource identifier (URI) .
<html xmlns ="...xhtml..."
xmlns:m="...mathml..."
xmlns:s="...svg..">
....
<p>
<m:math>...</m:math>
<s:svg>...</s:svg>
</p>
...
</html>
17.8 Document Type Definitions (DTD’s)
DTDs serve as grammars that XML documents must adhere to.
Valid documents must conform to a DTD (besides being well formed).
<!DOCTYPE COURSES SYSTEM "file.dtd">
<COURSES>
<course ref="sp">
<number> 670 </number>
<days> TR </days>
</course>
<course ref="au">
<number> 680 </number>
<days> MWF </days>
</course>
<quarter id="sp"> Spring </quarter>
<quarter id="au"> Autumn </quarter>
</COURSES>
file.dtd
<!ELEMENT COURSES (course | quarter)* >
<!ELEMENT course (number,days) >
<!ELEMENT quarter (#PCDATA)>
<!ELEMENT number (#PCDATA)>
<!ELEMENT days (#PCDATA)>
<!ATTLIST course ref IDREF #REQUIRED
name CDATA #IMPLIED >
<!ATTLIST quarter id ID #REQUIRED>
CDATA stands for character data.
PCDATA stands for parsed character data.
17.9 A Java DTD-Based Validator
<.. Dtd.java ..> import java.io.File;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.ext.*;
class Dtd {
static public void main(String[] args) throws Exception {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating( args.length>1 );
SAXParser saxParser = factory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.parse ( new File( args[args.length-1] ).toURL().toString() );
} }
-_-_-
javac Dtd.java
This command compiles the program
java Dtd xml-filename
This command checks whether the given file is well formed.
java Dtd validate xml-filename
This command checks whether the given file validates against the specified DTD.
<.. course.xml ..> <!DOCTYPE COURSES SYSTEM "file.dtd">
<COURSES>
<course ref="sp">
<number>670</number>
<days>TR</days>
</course>
<course ref="au">
<number>680</number>
<days>MWF</days>
</course>
<quarter id="sp">Spring</quarter>
<quarter id="au">Autumn</quarter>
</COURSES>
-_-_-
<.. Dtd.dtd ..> <!ELEMENT COURSES (course | quarter)* >
<!ELEMENT course (number,days) >
<!ELEMENT quarter (#PCDATA)>
<!ELEMENT number (#PCDATA)>
<!ELEMENT days (#PCDATA)>
<!ATTLIST course ref IDREF #REQUIRED
name CDATA #IMPLIED >
<!ATTLIST quarter id ID #REQUIRED>
-_-_-
17.10 XML Schema (XML Schema Definition, XSD)
Proposed alternatives for DTDs.
Supports data types, including definition of complex data types from simpler types
Expressed in XML (can be manipulated by standard XML tools).
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" >
<xs:element name="student">
<xs:complexType>
<xs:sequence>
<xs:element name="fname" type="xs:string"/>
<xs:element name="ssn" type="xs:integer"/>
<xs:element name="bdate" type="xs:date"/>
</xs:sequence>
<xs:attribute name="status" type="xs:string"/>
</xs:complexType>
</xs:element>
</xs:schema>
<?xml version="1.0"?>
<student
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
status="junior" >
<fname>Joe</fname>
<ssn>123456789</ssn>
<bdate>1982-11-24</bdate>
</student>
<!ELEMENT student (fname, ssn, bdate) >
<!ELEMENT fname (#PCDATA) >
<!ELEMENT ssn (#PCDATA) >
<!ELEMENT bdate (#PCDATA) >
<!ATTLIST student status #IMPLIED >
Elements and Attributes
Definitions through named types
<element name="..." type="..."/>
<attribute name="..." type="..."/>
Through anonymous types
<element name="...">
...type definition...
</element>
<attribute name="...">
...type definition...
</element>
Default values
<xs:element name="color" type="xs:string" default="red"/>
Fixed values
<xs:attribute name="color" type="xs:string" fixed="red"/>
A simple element contains no elements or attributes
Simple Types
Primitive : string, integer, decimal, boolean, date, time, ID, IDREF, ...
ID and IDREF support key and referential integrity constraints.
Type constructors :
<simpleType name="...">
...
</simpleType>
Union constructor: data can be of any of the specified sub-types (heterogeneous type)
<simpleType name="...">
<union memberType="integer date" />
</simpleType>
List constructor: multi-valued data
<simpleType name="...">
<list itemType="integer" />
</simpleType>
Derived types via restriction on basic types
<simpleType name="course">
<restriction base="string">
<pattern value="CSE[1-9][0-9]{2}"/>
</restriction>
</simpleType>
pattern
<restriction base="string">
<pattern value="me | you"/>
</restriction>
<pattern value="[A-Za-z][02468]"/>
<pattern value="[0-9]{4}"/>
<pattern value="(0|1)+"/>
<pattern value="(0|1)*"/>
enumeration Domain of values
<restriction base="string">
<enumeration value="square"/>
<enumeration value="round"/>
</restriction>
length minLength maxLength number of characters or list items
<restriction base="string">
<minLength value="2"/>
<maxLength value="4"/>
</restriction>
whiteSpace Values: preserve, replace (into space ch), collapse.
<restriction base="string">
<whiteSpace value="preserve"/>
</restriction>
minInclusive maxInclusive minExclusive maxExclusive Range of values
<restriction base="integer">
<minInclusive value="0"/>
<maxInclusive value="17"/>
</restriction>
totalDigits fractionDigits number of all/decimal digits
Complex Types
Type constructors :
<complexType name="...">
...
</complexType>
For document model (text around elements):
<complexType mixed="true">
...
</complexType>
Type compositors —combining into groups
all all the children, in any order
<all>
<element name="name" type="string"/>
<element name="ssn" type="integer"/>
</all>
choice exactly one child
sequence all the children, in the given order
Occurrence Indicators
minOccurs maxOccurs
<sequence>
<element name="name" type="string" maxOccurs="7" />
<element name="ssn" type="integer"/>
</sequence>
Can be ‘unbounded’.
Type extension
<extension base="ns:myType">
<sequence>
<element name="date" type="date"/>
<element name="address" type="string">
</sequence>
</extension>
Type restriction
<restriction base="integer">
<attribute name="status" type="string"/>
</restriction>
When a complex type is restricted, all the declarations must be repeated (with possible restrictions, e.g.,
maxOccurs="..." ).
Example
XML Code from a Database
students
name ssn
Ron 123-45-6789
Nora 987-65-4321
courses
title code
data bases 670
NULL 680
classes
code ssn
670 123-45-6789
680 987-65-4321
<?xml version="1.0"?>
<dataBase
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>
<students>
<student> <name>Ron</name>
<ssn>123-45-6789</ssn> </student>
<student> <name>Nora</name>
<ssn>987-65-4321</ssn> </student>
</students>
<courses>
<course> <title>data bases</title>
<code>670</code> </course>
<course> <title xsi:nil="true" />
<code>680</code> </course>
</courses>
<classes>
<class> <code>670</code>
<ssn>123-45-6789</ssn> </class>
<class> <code>680</code>
<ssn>987-65-4321</ssn> </class>
</classes>
</dataBase>
From Relations to XML Sub-Schemas
<students>
<student> <name>Ron</name>
<ssn>123-45-6789</ssn> </student>
<student> <name>Nora</name>
<ssn>987-65-4321</ssn> </student>
</students>
<xs:element name="students">
<xs:complexType>
<xs:choice>
<xs:element name="student" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:all>
<xs:element name="name" type="nameType"/>
<xs:element name="ssn" type="ssnType"/>
</xs:all>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
<classes>
<class> <code>670</code>
<ssn>123-45-6789</ssn> </class>
<class> <code>680</code>
<ssn>987-65-4321</ssn> </class>
</classes>
<xs:element name="classes">
<xs:complexType>
<xs:choice>
<xs:element name="class" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:all>
<xs:element name="code" type="codeType"/>
<xs:element name="ssn" type="ssnType"/>
</xs:all>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
XML Schemas for the Database
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="dataBase">
<xs:complexType>
<xs:all>
<xs:element name="students"> ... </xs:element>
<xs:element name="courses"> ... </xs:element>
<xs:element name="classes"> ... </xs:element>
</xs:all>
</xs:complexType>
</xs:element>
<xs:simpleType name="codeType">
<xs:restriction base="xs:string">
<xs:pattern value="[1-9][0-9][0-9]"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="ssnType">
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="nameType">
<xs:restriction base="xs:string"/>
</xs:simpleType>
</xs:schema>
Primary Key and Unique Declarations
<xs:key name="studentsKey">
<xs:selector xpath="./students/student"/>
<xs:field xpath="ssn"/>
</xs:key>
A ‘key’ stands for a primary key
The declaration of a key may include more than one field
A ‘unique’ is defined similarly to a key; it allows NULL values
Foreign Key Declaration
<xs:keyref name="toStudent" refer="studentsKey">
<xs:selector xpath="./classes/class"/>
<xs:field xpath="ssn"/>
</xs:keyref>
[full code ]
17.11 A Java XMLSchema-Based Validator
<.. Xsd.java ..> import javax.xml.validation.SchemaFactory;
import javax.xml.XMLConstants;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.Validator;
class Xsd {
public static void main(String[] args) {
try {
SchemaFactory factory = SchemaFactory.newInstance(
XMLConstants.W3C_XML_SCHEMA_NS_URI );
Schema schema = factory.newSchema(new StreamSource( args[0] ));
Validator validator = schema.newValidator();
validator.validate(new StreamSource( args[1] ));
} catch (Exception e) {
System.err.println("--- Error --- " + e);
} } }
-_-_-
javac Xsd.java
This command compiles the program
java Xsd schema-filename xml-filename
This command checks whether the given file validates against the specified Schema.
<.. student.xml ..> <?xml version="1.0"?>
<student xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance"
status="junior"
>
<fname>Joe</fname>
<ssn>123456789</ssn>
<bdate>1982-11-24</bdate>
</student>
-_-_-
<.. student.xsd ..> <?xml version="1.0"?>
<xs:schema xmlns:xs=
"http://www.w3.org/2001/XMLSchema">
<xs:element name="student">
<xs:complexType>
<xs:sequence>
<xs:element name="fname"
type="xs:string"/>
<xs:element name="ssn"
type="xs:integer"/>
<xs:element name="bdate"
type="xs:date"/>
</xs:sequence>
<xs:attribute name="status"
type="xs:string"/>
</xs:complexType>
</xs:element>
</xs:schema>
-_-_-
17.12 XML Path (XPath)
A simple query language
Navigation Space (The Data Model)
Addresses parts of an XML document.
Uses a model of a tree of nodes: elements, attributes, text.
<COURSES>
<course attr="database">
<number > 670 </number>
<quarter> Spring </quarter>
<days> TR </days>
<!--a comment-->
</course>
<course> ... </course>
<course> ... </course>
</COURSES>
Navigation Directions
Directions of tree traversal:
self
child, descendant, descendant-or-self, attribute
parent, ancestor, ancestor-or-self
following-sibling, preceding-sibling
/ (root)
Path and Query Directives
The respond of a query is a set of nodes.
child::course — selects the ‘course’ element children of the context node
descendant::days – day element descendants
attribute::attr — attr attribute
child::* — all element children
attribute::* — all attributes
child::text() — text children
child::node() — all children
child::comment()
child::course/descendant::days
child::course[position()=1], child::course[1] (=last(), =last()-1, >1 )
child::course[@attr="database" ] (’database’)
child::course[ position()=2 ][ @attr="database" ]
self::*[ child::course and ... or ...]
Sample Program
<.. Path.java ..> import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
class Path {
public static void main(String[] args) {
String expression;
try {
XPath xpath = XPathFactory.newInstance().newXPath();
InputSource document = new InputSource("course.xml");
expression = "COURSES/course";
NodeList nameNodes = (NodeList) xpath.evaluate(expression,
document, XPathConstants.NODESET);
System.out.println("Number of courses: " + nameNodes.getLength());
expression = "COURSES/course[1]/days";
String days = (String) xpath.evaluate(expression, document,
XPathConstants.STRING);
System.out.println("Days: " + days);
} catch (Exception e) {
System.err.println("--- Error --- " + e);
} } }
-_-_-
17.13 Query Languages
XSLT
<xsl:template match="example">
<div style="border: solid red">
<xsl:apply-templates/>
</div>
</xsl:template>
xquery
FOR ...
WHERE ...
RETURN ...
for $pn in distinct-values(document("catalog.xml")//partno)
let $i := document("catalog.xml")//item[partno = $pn]
where count($i) >= 3
order by $pn
return
<well-supplied-item>
{$pn}
<avgprice> {avg($i/price)} </avgprice>
</well-supplied-item>
17.14 Sample Problems
Give a XML document representing the relation ...
Give a XML document obeying the DTD ...
Give a XML document obeying the XML schema ...
Give a DTD appropriate for the XML document ...
Give a DTD appropriate for the XML schema ...
Give a XML schema appropriate for the XML document ...
Give a XML schema appropriate for the DTD ...
Reference: Ch. 27 (5th ed) and 26 (4th ed) in textbook.