Tuesday 13 August 2013

VALID XML DOCUMENT

·         A XML document is said to be valid XML Document if it satisfies XML Document preparation rules, element naming rules and XML DTD/Schema Based rules.
·         Every valid XML Document is well formed XML Document.
·         The reverse may not be true always

Valid XML Document:
·         XML parser can make data stored in XML document available to XML application if and only if the XML documents are valid.
·         There are two approached for developing valid XML. (I) using DTD (ii) using XSD
(ii) using XML SCHEMA
What is DTD?
·         DTD stands for document type definition
·         A DTD is a text file with .dtd extension
·         If XML file holds data, its corresponding DTD holds Meta data.
In a DTD legal building blocks of an XML documents are specified. i.e XML vocabulary is specified in A dtd.
What are the constituents of DTD file?
·         DTD point of view, all XML DOCUMENTs are made up by the following building blocks:
1.       Elements
2.       Attributes
3.       Entities
4.       PCDATA
5.       CDATA
ELEMENTS:
ATTRIBUTES
ENTITIES:
Some of the characters have a special meaning in XML , like the less than sign(<) that defines the start of a XML tag . most of you know the HTML Entity:”&nbsp;” this “no-breaking-space” entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser.
The following entities are predefined in XML:
ENTITIY REFERENCE
CHARACTER
&lt;
< 
&gt
> 
&amp
&
&quot
&apos;

PCDATA:
Xml parsers normally parse all the text in an XML document. When an XML element is parsed, the text between the xml tags is also parsed:
Ex: <message>This text is also parsed<message>
The parser does this because XML elements can contain other element, as in this example, where the <name> element contains two other elements(first and last):
<name><first>Bill</first><last>Gates</last></name>
And the parser will break it up into sub-elements like this:
<name>
 <first>Bill</first>
<last>Gates</last>
</name>
Parsed Character Data (PCDATA) is a term used about text data that will be parsed by the XML parser.

CDATA-(Unparsed) Character Data
·         The term CDATA is used about text data that should not be parsed by the XML parser.
·         Characters like “<” and “&” are illegal in XML elements.
·         “<” will generate an error because the parser interprets it as the start of a new element
·         “&” will generate an error because the parser interprets it as the start of an character entity.
·         Some text , like JAVASCRIPT code, contains a lot of “<” or “&” characters. To avoid errors script code can be defined as CDATA.
·         Everything inside a CDATA section is ignored by the parser.
C DATA section starts with “<![CDATA[“and ends with “]]>”:
<script>
<![CDATA[function matchwo(a,b)
{
if(a<b&&a<0) then
{
 return l;
}
else
{
return 0;
}
]>
</script>
·         In the example above, everything inside the CDATA section is ignored by the parser.
Note: CDATA Sections: can not contain the string “]]>”. Nested CDATA sections are not allowed.
The “]]>” that marks the end of the CDATA section can not contain spaces or line breaks.

·         In the examples are declared with an element declaration. An element declaration has the following syntax:

No comments:

Post a Comment