Tuesday 13 August 2013

XML Parser

To validate an XML Doc we require XML Parser.
·         XML Parser is an utility tool using which we can check whether or XML Document is well formed or not, valid or not>
Examples of XML Parser:
1.       SAX(Simple API for XML Processing)
2.       DOM(Document Object Model)
3.       JDOM(Java document Object Model)
4.       Dom4J(Document Object Model for Java)
Note: the XML Parser can even used for processing the XML Document
1.       What is an XML Parsers? What are its functions?
·         XML parser is an API that enables XML applications to work with XML document.
·         An XML parser performs the following things: (1) reading the xml docu (2) checking well-formedness (3) verify its validity (4) making XML data available to XML application.


1.       Xml application instantiating the parser and specifying the xml file to the parse(i.e passing the name of the XML file name to the parser)
2.       Parser reads the specified xml document and verifies its well formedness.
3.       In the xml document parser gets the information about the DTD or XSD. Parser verifies the correctness of the DTD or XSD.
4.       Parser reads the metadata specified in the DTD or XSD file into the memory.
5.       Basing on the metadata read into memory; XML parser verifies the validity of the XML document.
6.       Parser makes XML content (data) available to the XML  application either in the form of a tree structure or in the form or events
Valid XML Document:
·         XML parser can make data stored in XML document available to XML application if and only if the XML documents are valid.
·         There are two approached for developing valid XML. (I) using DTD (ii) using XSD
What is DTD?
·         DTD stands for document type definition
·         A DTD is a text file with .dtd extension
·         If XML file holds data, its corresponding DTD holds Meta data.
In a DTD legal building blocks of an XML documents are specified. i.e XML vocabulary is specified in A dtd.
What are the constituents of DTD file?
·         DTD point of view, all XML DOCUMENTs are made up by the following building blocks:
1.       Elements
2.       Attributes
3.       Entities
4.       PCDATA
5.       CDATA
ELEMENTS:
ATTRIBUTES
ENTITIES:
Some of the characters have a special meaning in XML , like the less than sign(<) that defines the start of a XML tag . most of you know the HTML Entity:”&nbsp;” this “no-breaking-space” entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser.
The following entities are predefined in XML:
ENTITIY REFERENCE
CHARACTER
&lt;
< 
&gt
> 
&amp
&
&quot
&apos;

PCDATA:
Xml parsers normally parse all the text in an XML document. When an XML element is parsed, the text between the xml tags is also parsed:
Ex: <message>This text is also parsed<message>
The parser does this because XML elements can contain other element, as in this example, where the <name> element contains two other elements(first and last):
<name><first>Bill</first><last>Gates</last></name>
And the parser will break it up into sub-elements like this:
<name>
 <first>Bill</first>
<last>Gates</last>
</name>
Parsed Character Data (PCDATA) is a term used about text data that will be parsed by the XML parser.

CDATA-(Unparsed) Character Data
·         The term CDATA is used about text data that should not be parsed by the XML parser.
·         Characters like “<” and “&” are illegal in XML elements.
·         “<” will generate an error because the parser interprets it as the start of a new element
·         “&” will generate an error because the parser interprets it as the start of an character entity.
·         Some text , like JAVASCRIPT code, contains a lot of “<” or “&” characters. To avoid errors script code can be defined as CDATA.
·         Everything inside a CDATA section is ignored by the parser.
C DATA section starts with “<![CDATA[“and ends with “]]>”:
<script>
<![CDATA[function matchwo(a,b)
{
if(a<b&&a<0) then
{
 return l;
}
else
{
return 0;
}
]>
</script>
·         In the example above, everything inside the CDATA section is ignored by the parser.
Note: CDATA Sections: can not contain the string “]]>”. Nested CDATA sections are not allowed.
The “]]>” that marks the end of the CDATA section can not contain spaces or line breaks.

·         In the examples are declared with an element declaration. An element declaration has the following syntax:

No comments:

Post a Comment