XML DTD(DOCUMENT TYPE DEFINITION)
What is DTD?
· DTD stands for document type definition
· A DTD is a text file with .dtd extension
· If XML file holds data, its corresponding DTD holds Meta data.
In a DTD legal building blocks of an XML documents are specified. i.e XML vocabulary is specified in A dtd.
What are the constituents of DTD file?
· DTD point of view, all XML DOCUMENTs are made up by the following building blocks:
this is the most important building block using which we can create tags. The tags can contain some text, it can contain someother elements (or) it can be empty.
2. Attributes: attributes are used to provide additional information about an XML tag.
· The attributes must be specified in the starting tags.
· The attributes always come in name value pairs.
· The attribute values must be specified either in single quotes or double quotes.
Some of the characters have a special meaning in XML , like the less than sign(<) that defines the start of a XML tag . most of you know the HTML Entity:” ” this “no-breaking-space” entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser.
The following entities are predefined in XML:
Xml parsers normally parse all the text in an XML document. When an XML element is parsed, the text between the xml tags is also parsed:
Ex: <message>This text is also parsed<message>
The parser does this because XML elements can contain other element, as in this example, where the <name> element contains two other elements(first and last):
And the parser will break it up into sub-elements like this:
Parsed Character Data (PCDATA) is a term used about text data that will be parsed by the XML parser.
CDATA-(Unparsed) Character Data
· The term CDATA is used about text data that should not be parsed by the XML parser.
· Characters like “<” and “&” are illegal in XML elements.
· “<” will generate an error because the parser interprets it as the start of a new element
· “&” will generate an error because the parser interprets it as the start of an character entity.
· Everything inside a CDATA section is ignored by the parser.
C DATA section starts with “<![CDATA[“and ends with “]]>”:
· In the example above, everything inside the CDATA section is ignored by the parser.
Note: CDATA Sections: can not contain the string “]]>”. Nested CDATA sections are not allowed.
The “]]>” that marks the end of the CDATA section can not contain spaces or line breaks.
· In the examples are declared with an element declaration. An element declaration has the following syntax:
3. Entities: these building blocks represent special characters.
4. PCDATA: (Parsed character Data) : this data will not be parsed by the parser and it can not expand the entities.
5. CDATA: (CHARACTER DATA): this data will not be parsed by the parser and it can not expand the entities.