XML

 


COIN78 - XML Lesson 4: Document Type Definition


Putting it Together

This is where the power of XML, as a method of self describing and validating data, becomes more obvious. In building DTDs, and later schemas, you will create a "schema" for the data and data types in your project files. Creating and validating with DTDs is, more or less, straightforward. We will use a couple different validating sites to determine if your files are "valid". While XSD has been approved by the W3C, many authors use different approaches to creating these schemas. DTDs are still used by business and technology as standards for ensuring that XML documents are created with consistency. This section normally takes more than a week to cover. Online students are encouraged to stay up with reading, and have have a good DTD to submit by week four (please see the assignment one project progress description that includes a css file for rendering).

From the point of view of a DTD, all XML documents (and HTML documents) are made up of the following building blocks.

  1. Your first DTD DTDs are used to describe the document to the human eye and to parsers as well. DTDs begin by declaring the document type, "DOCTYPE". The Document Type Declaration is often confused with Document Type Definition. The document type declaration is a part of Document Type Definition and it requires that the DOCTYPE match the root element of an XML document. In our case, our DOCTYPE is "address_book" and the root tag is <address_book>. The parser needs to know that the DTD matches the XML document, and it has to do that by matching the root element.

    <?xml version="1.0"?>
    <!-- The first step is to name and define our documents-->
    <!DOCTYPE address_book [
    
    
    
    ]>
    <address_book> </address_book>
        

  2. Declaring Elements After declaring the document type, we need to tell the parser which elements are in our documents. The first element in our document is the root element itself, address_book. We have also stated that the address_book element will contain parsed character data - PCDATA. We will later add elements to the DTD that are enclosed within address_book. Please note that you need to view the source of this example in order to fully view the DTD (the DTD is "internal"; it is part of the XML document).

    <?xml version="1.0"?>  
    <!-- The first step is to name and define our documents-->  
    <!DOCTYPE address_book [  
        <!ELEMENT address_book (#PCDATA)>  
    ]>  
    <address_book> </address_book>
        

  3. Adding Elements We will now add more elements to our document. Our address_book will contain a child element called "record" which contains the name, address, and contact. Each one of those elements will have child elements, and some of those will have attributes. Please note, that you need to view the source of this example in order to fully view the DTD.

    <?xml version="1.0"?>  
    <!-- The first step is to name and define our documents-->  
    <!DOCTYPE address_book [
      <!ELEMENT address_book (record+)>
      <!ELEMENT record (#PCDATA)>  
    ]>  
    <address_book>     
        <record> </record>  
    </address_book>
        

  4. More Elements to come An element may have more than one child element. When there are a multiple number of child elements, you must designate the order of occurrence. There are two ways to notate the order of occurrence. Using a comma (,) between the child element name and the next child element name indicates that the child elements will occur in the order given. Using a vertical line (|) means that either one or the other child element will occur. Note that the verticle line (|) does not imply order rather it is a choice of using one or the other child element. In our example the next element we will add to our address book will be record. "record" is a child of the root element, address_book. Please look inside the element declaration for the record element and observe how we list the name of the elements that record will accept. There are three child elements each separated by a comma (,). This means that each child element must appear in the order in which they are listed. In our example "name" is first, "address" is second, and "contact" is last. Please note, that you need to view the source of this example in order to fully view the DTD.

    <?xml version="1.0"?>  
    <!-- The first step is to name and define our documents-->  
    <!DOCTYPE address_book [
      <!ELEMENT address_book (record+)>
      <!ELEMENT record (name, address, contact)>
      <!ELEMENT name (#PCDATA)>
      <!ELEMENT address (#PCDATA)>
      <!ELEMENT contact (#PCDATA)>  
    ]>    
    <address_book>
       <record>
            <name> </name>
            <address> </address>
            <contact> </contact>
       </record>  
    </address_book>
          
          

  5. Element Choices This is the same as the example above, instead we are utilizing the "|" delimiter give a choice between using address or contact child elements. In the example, we have chosen to use address instead of contact in the xml document. Please note, that you need to view the source of this example in order to fully view the DTD.

    <?xml version="1.0"?>  
    <!-- The first step is to name and define our documents-->  
    <!DOCTYPE address_book [
      <!ELEMENT address_book (record+)>
      <!ELEMENT record (name,  (address | contact))>
      <!ELEMENT name (#PCDATA)>
      <!ELEMENT address (#PCDATA)>
      <!ELEMENT contact (#PCDATA)>  
    ]>    
    <address_book>
       <record>
           <name> </name>
           <address> </address>
       </record>  
    </address_book>
    

? + *    Element Content Model

  1. ?, OPTIONAL Among the information that you would want to give to human readers as well as parsers is the frequency of which an element can appear. One the useful pieces of information which we would like to tell, is if an element is optional or not. To signify this we use the "?" character. In our DTD if we follow the element name "comments" with the "?". We are expressing that the comments element is optional. Please note, that you need to view the source of this example in order to fully view the DTD. Notice that the comments element has not been included in the xml document.

    <!ELEMENT record (name, address, contact, comments?)>

  2. +, REQUIRED AND MORE The "+" sign indicates that the element is used at least one time or more. In this case "record" of the address_book is required, and there might be more than one record. Please note, that you need to view the source of this example in order to fully view the DTD.

    <!ELEMENT address_book (record+)>

  3. *, ZERO AND MORE The "*" sign indicates that the element is used zero or more times. The "*", is rather similar to "?" with one exception. "?", indicates zero or one occurrence, whereas "*", indicates zero or more occurrences. In this case we expect zero or more "comments". Please note, that you need to view the source of this example in order to fully view the DTD.

    <!ELEMENT record (name, address, contact, comments*)>

Child Elements

  1. "record" Our address_book will need to have series of records. We use the plus (+) with record so that there will be at least one record. The record element will have four child elements, name, address, contact, comments in that sequence. Please note, that you need to view the source of this and the three examples below in order to fully view the DTD.

    <?xml version="1.0"?>  
    <!-- The first step is to name and define our documents-->  
    <!DOCTYPE address_book [
      <!ELEMENT address_book (record+)>
      <!ELEMENT record (name, address, contact, comments)>
      <!ELEMENT name (#PCDATA)>
      <!ELEMENT address (#PCDATA)>
      <!ELEMENT contact (#PCDATA)>
      <!ELEMENT comments (#PCDATA)>  
    ]>    
    <address_book>
       <record>
          <name> </name>
          <address> </address>
          <contact> </contact>
          <comments> </comments>
       </record>  
    </address_book>
        

  2. Children of name The element "name" will contain several elements. For now each "name" will have a "first_name", "middle_name", "last_name" and "nick_name". Using "?" next to "nick_name" allows us to omit it from our content model.

    <?xml version="1.0"?>  
    <!-- The first step is to name and define our documents-->  
    <!DOCTYPE address_book [
      <!ELEMENT address_book (record+)>
      <!ELEMENT record (name, address, contact, comments)>
      <!ELEMENT name (first_name, middle_name, last_name, nick_name?)>
      <!ELEMENT first_name (#PCDATA)>  
      <!ELEMENT middle_name (#PCDATA)>
      <!ELEMENT last_name (#PCDATA)>  
      <!ELEMENT nick_name (#PCDATA)>
      <!ELEMENT address (#PCDATA)>  
      <!ELEMENT contact (#PCDATA)>
      <!ELEMENT comments (#PCDATA)>  
    ]>    
    <address_book>
        <record>
           <name>
               <first_name>first name goes here</first_name>
               <middle_name>middle name goes here</middle_name>
               <last_name>last name goes here</last_name>
               <!-- Notice, no nickname needed since it is optional -->
           </name>
           <address> </address>
           <contact> </contact>
           <comments> </comments>
        </record>  
    </address_book>
        

  3. Children of address The element "address" will also contain several elements. These include "street_address", "street_address_detail", "city", "state", and "zipcode". For now each address will have a "street_address_detail", which can be used for an apartment number or PO box.

    <?xml version="1.0"?>  
    <!-- The first step is to name and define our documents-->  
    <!DOCTYPE address_book [
      <!ELEMENT address_book (record+)>
      <!ELEMENT record (name, address, contact, comments)>
      <!ELEMENT name (first_name, middle_name, last_name, nick_name?)>
      <!ELEMENT first_name (#PCDATA)>
      <!ELEMENT middle_name (#PCDATA)>
      <!ELEMENT last_name (#PCDATA)>
      <!ELEMENT nick_name (#PCDATA)>
      <!ELEMENT address (street_address, street_address_detail, 
                         city, state, zipcode)>
      <!ELEMENT street_address (#PCDATA)>
      <!ELEMENT street_address_detail (#PCDATA)>
      <!ELEMENT city (#PCDATA)>
      <!ELEMENT state (#PCDATA)>
      <!ELEMENT zipcode (#PCDATA)>
      <!ELEMENT contact (#PCDATA)>
      <!ELEMENT comments (#PCDATA)>
     ]>    
    <address_book>
        <record>
            <name>
               <first_name>first name goes here</first_name>
               <middle_name>middle name goes here</middle_name>
               <last_name>last name goes here</last_name>
               <nick_name>nick name goes here</nick_name>
            </name>
            <address>
                <street_address>street address goes here</street_address>
                <street_address_detail>
                    apartment number goes here
                </street_address_detail>
                <city>city goes here</city>
                <state>state goes here</state>
                <zipcode>zipcode goes here</zipcode>
            </address>
            <contact> </contact>
            <comments> </comments>
        </record>
    </address_book>
        

  4. Children of contact The element "contact" will also contain several elements. These include "home_phone", "work_phone", "cell_phone", "fax_number", and "email_address". Please note that in a DTD you might add "pager" to the content model, but most of us have thrown ours away :)).

    <?xml version="1.0"?>  
    <!-- The first step is to name and define our documents-->  
    <!DOCTYPE address_book [
      <!ELEMENT address_book (record+)>
      <!ELEMENT record (name, address, contact, comments)>
      <!ELEMENT name (first_name, middle_name, last_name, nick_name?)>
      <!ELEMENT first_name (#PCDATA)>
      <!ELEMENT middle_name (#PCDATA)>
      <!ELEMENT last_name (#PCDATA)>
      <!ELEMENT nick_name (#PCDATA)>
      <!ELEMENT address (street_address, street_address_detail, 
                         city, state, zipcode)>
      <!ELEMENT street_address (#PCDATA)>
      <!ELEMENT street_address_detail (#PCDATA)>
      <!ELEMENT city (#PCDATA)>  <!ELEMENT state (#PCDATA)>
      <!ELEMENT zipcode (#PCDATA)>
      <!ELEMENT contact (home_phone, work_phone, cell_phone, 
                         fax_number, email_address)>
      <!ELEMENT home_phone (#PCDATA)>  
      <!ELEMENT work_phone (#PCDATA)>
      <!ELEMENT cell_phone (#PCDATA)>
      <!ELEMENT fax_number (#PCDATA)>
      <!ELEMENT email_address (#PCDATA)>
      <!ELEMENT comments (#PCDATA)>  
    ]>    
    <address_book>
      <record>
        <name>
          <first_name>first name goes here</first_name>
          <middle_name>middle name goes here</middle_name> 
          <last_name>last name goes here</last_name>
          <nick_name>nick name goes here</nick_name>
        </name>
        <address>
           <street_address>street address goes here</street_address>
           <street_address_detail>
               apartment number goes here
           </street_address_detail>
           <city>city goes here</city>
           <state>state goes here</state>
           <zipcode>zipcode goes here</zipcode>
        </address>
        <contact>
           <home_phone>home phone goes here</home_phone>
           <work_phone>work phone goes here</work_phone>
           <cell_phone>cell phone goes here</cell_phone>
           <fax_number>fax number goes here</fax_number>
           <email_address>email address goes here</email_address>
        </contact>
        <comments> </comments>
      </record>
    </address_book>

  5. Children of comments The element "comments" will contain just one child element, "misc_comments", but here is where you might add birthday, directions to their house, or other important information.

    <?xml version="1.0"?>  
    <!-- The first step is to name and define our documents-->  
    <!DOCTYPE address_book [
      <!ELEMENT address_book (record+)>
      <!ELEMENT record (name, address, contact, comments)>
      <!ELEMENT name (first_name, middle_name, last_name, nick_name?)>
      <!ELEMENT first_name (#PCDATA)>
      <!ELEMENT middle_name (#PCDATA)>
      <!ELEMENT last_name (#PCDATA)>
      <!ELEMENT nick_name (#PCDATA)>
      <!ELEMENT address (street_address, street_address_detail, 
                         city, state, zipcode)>
      <!ELEMENT street_address (#PCDATA)>
      <!ELEMENT street_address_detail (#PCDATA)>
      <!ELEMENT city (#PCDATA)>  <!ELEMENT state (#PCDATA)>
      <!ELEMENT zipcode (#PCDATA)>
      <!ELEMENT contact (home_phone, work_phone, cell_phone, 
                         fax_number, email_address)>
      <!ELEMENT home_phone (#PCDATA)>
      <!ELEMENT work_phone (#PCDATA)>
      <!ELEMENT cell_phone (#PCDATA)>
      <!ELEMENT fax_number (#PCDATA)>
      <!ELEMENT email_address (#PCDATA)>
      <!ELEMENT comments (misc_comments)>
      <!ELEMENT misc_comments (#PCDATA)>  
    ]>    
    <address_book>
      <record>
        <name>
          <first_name>first name goes here</first_name>
          <middle_name>middle name goes here</middle_name>
          <last_name>last name goes here</last_name>
          <nick_name>nick name goes here</nick_name>
        </name>
        <address>
          <street_address>street address goes here</street_address>
          <street_address_detail>
              apartment number goes here
          </street_address_detail>
          <city>city goes here</city>
          <state>state goes here</state>
          <zipcode>zipcode goes here</zipcode>
        </address>
        <contact>
          <home_phone>home phone goes here</home_phone>
          <work_phone>work phone goes here</work_phone>
          <cell_phone>cell phone goes here</cell_phone>
          <fax_number>fax number goes here</fax_number>
          <email_address>email address goes here</email_address>
        </contact>
        <comments>
          <misc_comments>comments go here</misc_comments>
        </comments>
      </record>
    </address_book>
        


  6. External DTDs You have begun to see that the DTD can grow and expand into a lengthy document. This factor and more importantly the ability of sharing DTDs is the reason for externalizing the DTD of your document. Please note, that you need to view the source of this example in order to fully view the DTD, as well as viewing the address_book.dtd document.

Attributes

  1. record attribute Attributes are usually used for counters or for describing the data of the element in which it is contained. It is data about data, metadata. In the address_book.dtd we added an ID to the "record" element to be used as a counter and to uniquely identify each record as is often done in databases. Since the ID has nothing to do with the data we use an attribute. Other great uses for attributes are units, currency, size, and dimensions. In our example we are telling the parser not to parse the data assigned to the attribute by declaring it as "CDATA" and that it is required indicated by the statement "#REQUIRED". Once an attribute has been defined you do not need to define it again, even if it is used in other elements. Please note, that you need to view the source of this example in order to fully view the DTD.

    <?xml version="1.0"?>  
    <!-- The first step is to name and define our documents-->  
    <!DOCTYPE address_book [  <!ELEMENT address_book (record+)>  
    <!ELEMENT record (name, address, contact, comments)>  
    <!ATTLIST record ID CDATA #REQUIRED>  
    
    .
    .
    .
    
          
    <address_book>
        <record ID="1">
           <name>  
    .
    .
    .
          

Entities

  1. Copy Right You are familiar with built in entities like "&lt;" for "<". However in XML you can create your own entities as well. In this example we declared an entity named "copyright" and then used it right below the "address_book" tag. The end result is the substitution of the entity © for &copyright;. Please note, that you need to view the source of this example in order to fully view the DTD with the entity declaration for ©.

    	
          .
          .
          .
          <!ENTITY copyright "©">  
    ]>  
    <address_book>
       <copyright>&copyright; 2003</copyright>
       .
       .
       .  
        

  2. ENTITIES for short cuts - I put together a small file that shows how entities can be used for inserting text, (my name as from a set of initials &rdc;) and inserting an entire paragraph (rdc.txt from rdc_txt;). See the file entity ex1.xml and rdc.txt. This can be a useful trick to use.

Mixed Text

  1. One of the more clever uses of XML is to provide "meta information at a granular level" to unstructured text. This is the major problem of adding contextual information to a text after it has been written, or the general problem of getting text information 'future' proofed. Take a look at story.xml, story_dtd.xml, story.dtd, (and looking ahead to XML Schema) to story_xsd.xml, and story.xsd. Pay particular attention to the use of elements to define text within a text block. This is also known as 'mixed' but not in the sense that we use in the 'mixed' model.

    The following statement says that "para" may have parsed character data with or without fact and/or name. The asterisk says that there may be zero or many of these:

    	
        <!ELEMENT para (#PCDATA | fact | name)*>
        

    The xml document, below, uses the story.dtd and shows an example of each possible combination that "para" may use. Line breaks have been added for readability.

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE story SYSTEM "story.dtd">
    <story>
      <chapter id="1">
        <para id="1">Some words will go here and a 
          <fact>will appear here</fact> and then my initials. 
            Some more words will go here and a <fact>will appear here</fact> and then my 
            initials again <name>RDC</name>.
        </para>
        <para id="2">Some words will go here and a 
          <fact>will appear here</fact> and then my initials. 
          <name>RDC</name>Some more words will go here and a 
          <fact>will appear here</fact> and then my initials again <name>RDC</name>.
        </para>
        <para id="3">
        </para>
        <para id="4">And only text is in this element. So all the other elements are 
          optional
        </para>
      </chapter>
    </story>
          

Assertion Networks

  1. You can create an 'assertion network' by linking attributes using the ID, IDREF, and IDREFS functions in DTDs. Look closely at recipe.xml, recipe_dtd.xml, and recipe.dtd . These files link the ingredients and steps in the recipe.xml file, and the DTD creates the validation against the assertion network. I will develop this idea more, but wanted to post these files for your use and awareness of this powerful function. ID must be unique within the document - so if an attribute is marked as of type 'ID', it's value must be unique (i.e., it can only occur once). An attribute whose dataType is IDREF, requires the value of that attribute to match another attributes value - somewhere in the document. An attribute whose dataType is IDREFS, requires the value of that attribute to match another attributes value, and that value may occur with more than one other unique ID. For a very complicated model, take a look at files I developed for an assertion network in a course outline of record. The files are labeled course_outline.xml, course_outline_dtd.xml, and course_outline.dtd. These follow the same pattern as the recipe model, but the IDREFs are used.

    <!--DTD generated by XML Spy v4.4 U (http://www.xmlspy.com)-->
    <!ELEMENT course_outline (outcomes, activities, assessments, resources, assignments, exams, portfolio)>
    <!ATTLIST course_outline
    id CDATA #REQUIRED
    >
    <!ELEMENT outcomes (outcome+)>
    <!ELEMENT resource (#PCDATA)>
    <!ATTLIST resource
    number ID #REQUIRED
    activity IDREF #REQUIRED
    >
    <!ELEMENT outcome (#PCDATA)>
    <!ATTLIST outcome
    number ID #REQUIRED
    activity IDREFS #REQUIRED
    assessment IDREFS #REQUIRED
    resource IDREFS #REQUIRED
    >
    <!ELEMENT activities (activity+)>
    <!ELEMENT activity (#PCDATA)>
    <!ATTLIST activity
    number ID #REQUIRED
    resource IDREF #REQUIRED
    >
    <!ELEMENT assessments (assessment+)>
    <!ELEMENT assessment (#PCDATA)>
    <!ATTLIST assessment
    number ID #REQUIRED
    type (assignment | exam | portfolio) #REQUIRED
    outcome IDREF #REQUIRED
    >
    <!ELEMENT resources (resource+)>
    <!ELEMENT assignments (assignment+)>
    <!ELEMENT assignment (#PCDATA)>
    <!ATTLIST assignment
    number ID #REQUIRED
    resource IDREF #REQUIRED
    outcome IDREF #REQUIRED
    >
    <!ELEMENT exams (exam)>
    <!ELEMENT exam (#PCDATA)>
    <!ATTLIST exam
    number ID #REQUIRED
    outcome IDREF #REQUIRED
    >
    <!ELEMENT portfolio (link)>
    <!ATTLIST portfolio
    url CDATA #REQUIRED
    outcome IDREFS #REQUIRED
    >
    <!ELEMENT link (#PCDATA)>


    Look at this DTD tutorial for a more detailed description of the ID, IDREF, and IDREFS.

    The following is the Assertion Network XML file that uses the DTD shown above:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE course_outline SYSTEM "course_outline.dtd">
    <course_outline id="CID_123"> <outcomes> <outcome number="O1" activity="ACT1" assessment="AS1" resource="R1"> Outcome number one </outcome> <outcome number="O2" activity="ACT2" assessment="AS2" resource="R2"> Outcome number two </outcome> <outcome number="O3" activity="ACT3" assessment="AS3" resource="R3"> Outcome number three </outcome> <outcome number="O4" activity="ACT4" assessment="AS4" resource="R4"> Outcome number four </outcome> <outcome number="O5" activity="ACT5" assessment="AS5" resource="R5"> Outcome number five </outcome> </outcomes>
    <activities> <activity number="ACT1" resource="R1">Activity one</activity> <activity number="ACT2" resource="R2">Activity two</activity> <activity number="ACT3" resource="R3">Activity three</activity> <activity number="ACT4" resource="R4">Activity four</activity> <activity number="ACT5" resource="R5">Activity five</activity> </activities> <assessments> <assessment number="AS1" type="assignment" outcome="O1">Assessment one</assessment> <assessment number="AS2" type="assignment" outcome="O2">Assessment two</assessment> <assessment number="AS3" type="assignment" outcome="O3">Assessment three</assessment> <assessment number="AS4" type="exam" outcome="O4">Assessment four</assessment> <assessment number="AS5" type="portfolio" outcome="O5">Assessment five</assessment> </assessments> <resources> <resource number="R1" activity="ACT1">Resource one</resource> <resource number="R2" activity="ACT2">Resource two</resource> <resource number="R3" activity="ACT3">Resource three</resource> <resource number="R4" activity="ACT4">Resource four</resource> <resource number="R5" activity="ACT5">Resource five</resource> </resources> <assignments> <assignment number="ASN1" resource="R1" outcome="O1">Assignment One</assignment> <assignment number="ASN2" resource="R2" outcome="O2">Assignment One</assignment> <assignment number="ASN3" resource="R3" outcome="O3">Assignment One</assignment> </assignments> <exams> <exam number="E1" outcome="O4">Exam one</exam> </exams> <portfolio url="http://www.theospi.org/portfolio" outcome="O1 O2 O3 O4 O5"> <link>http://www.theospi.org/portfolio</link> </portfolio> </course_outline>

 

Example Files

Tutorials

Step through the w3schools DTD tutorial: http://www.w3schools.com/DTD/

 

Links to XML Related Sites

  1. XML.COM
  2. WDVL XML tutorial
  3. Sun Java XML Introduction
  4. IBM'S XML Website
  5. Google Directory on XML

Up Arrow Top