To validate an XML Doc we require XML Parser.
·
XML Parser is an utility tool using which we can
check whether or XML Document is well formed or not, valid or not>
Examples of XML Parser:
1.
SAX(Simple API for XML Processing)
2.
DOM(Document Object Model)
3.
JDOM(Java document Object Model)
4.
Dom4J(Document Object Model for Java)
Note: the XML Parser can even used for processing the XML
Document
1.
What is an XML Parsers? What are its functions?
·
XML parser is an API that enables XML
applications to work with XML document.
·
An XML parser performs the following things: (1)
reading the xml docu (2) checking well-formedness (3) verify its validity (4) making
XML data available to XML application.
1.
Xml application instantiating the parser and
specifying the xml file to the parse(i.e passing the name of the XML file name
to the parser)
2.
Parser reads the specified xml document and
verifies its well formedness.
3.
In the xml document parser gets the information
about the DTD or XSD. Parser verifies the correctness of the DTD or XSD.
4.
Parser reads the metadata specified in the DTD
or XSD file into the memory.
5.
Basing on the metadata read into memory; XML
parser verifies the validity of the XML document.
6.
Parser makes XML content (data) available to the
XML application either in the form of a
tree structure or in the form or events
Valid XML Document:
·
XML parser can make data stored in XML document
available to XML application if and only if the XML documents are valid.
·
There are two approached for developing valid
XML. (I) using DTD (ii) using XSD
What is DTD?
·
DTD stands for document type definition
·
A DTD is a text file with .dtd extension
·
If XML file holds data, its corresponding DTD
holds Meta data.
In a DTD legal building blocks of an XML documents are
specified. i.e XML vocabulary is specified in A dtd.
What are the
constituents of DTD file?
·
DTD point of view, all XML DOCUMENTs are made up
by the following building blocks:
1.
Elements
2.
Attributes
3.
Entities
4.
PCDATA
5.
CDATA
ELEMENTS:
ATTRIBUTES
ENTITIES:
Some of the characters have a special meaning in XML , like
the less than sign(<) that defines the start of a XML tag . most of you know
the HTML Entity:” ” this “no-breaking-space” entity is used in HTML to
insert an extra space in a document. Entities are expanded when a document is
parsed by an XML parser.
The following entities are predefined in XML:
ENTITIY REFERENCE
|
CHARACTER
|
<
|
<
|
>
|
>
|
&
|
&
|
"
|
“
|
'
|
‘
|
PCDATA:
Xml parsers normally parse all the text in an XML document.
When an XML element is parsed, the text between the xml tags is also parsed:
Ex: <message>This text is also parsed<message>
The parser does this because XML elements can contain other
element, as in this example, where the <name> element contains two other
elements(first and last):
<name><first>Bill</first><last>Gates</last></name>
And the parser will break it up into sub-elements like this:
<name>
<first>Bill</first>
<last>Gates</last>
</name>
Parsed
Character Data (PCDATA) is a term used about text data that will be parsed by
the XML parser.
CDATA-(Unparsed)
Character Data
·
The term CDATA is used about text data that should
not be parsed by the XML parser.
·
Characters like “<” and “&” are illegal
in XML elements.
·
“<” will generate an error because the parser
interprets it as the start of a new element
·
“&” will generate an error because the
parser interprets it as the start of an character entity.
·
Some text , like JAVASCRIPT code, contains a lot
of “<” or “&” characters. To avoid errors script code can be defined as
CDATA.
·
Everything inside a CDATA section is ignored by
the parser.
C DATA section starts with “<![CDATA[“and
ends with “]]>”:
<script>
<![CDATA[function matchwo(a,b)
{
if(a<b&&a<0) then
{
return l;
}
else
{
return 0;
}
]>
</script>
·
In the example above, everything inside the
CDATA section is ignored by the parser.
Note: CDATA
Sections: can not contain the string “]]>”. Nested CDATA sections are not
allowed.
The “]]>”
that marks the end of the CDATA section can not contain spaces or line breaks.
·
In the examples are declared with an element
declaration. An element declaration has the following syntax:
No comments:
Post a Comment