
Why XHTML?
The World Wide Web Consortium (W3C) is a voluntary organization responsible for setting HTML standards. Recent efforts have focused on defining a new XML (eXtensible Markup Language) language for use as a universal markup language, replacing older languages and with standards for creating future languages for special markup needs. For instance, versions of XML have been designed for:
XML is a highly structured language with precise rules for its use. These rules are encompassed within a Document Type Definition (DTD), that is, a set of coding criteria against which Web pages can be validated for conformity to XML standards.
A problem is that not all Web browsers can recognize XML and there are millions of current Web pages written under older HTML standards. So, as a transition between conventional HTML coding and future XML coding, the W3C developed XHTML, eXtensible HyperText Markup Language, which is a family of current and future document types and modules that reproduce, subset, and extend HTML 4. XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents.
XHTML 1.0 (our main focus) is the first document type in the XHTML family. It is a reformulation of the three HTML 4 document types as applications of XML 1.0. It is intended to be used as a language for content that is both XML-conforming and, if some simple guidelines are followed, operates in HTML 4 conforming user agents. Developers who migrate their content to XHTML 1.0 will realize the following benefits:
XHTML Rules
Due to the fact that XHTML is an XML application, certain practices that were perfectly legal in SGML-based HTML 4 must be changed. XHTML must be well-formed meaning that all elements must either have closing tags or be written in a special form and all the elements must nest properly. Below is a list of rules that you must follow so that your document is XHTML compliant:
<script type="text/javascript"> // <![CDATA[ ... unescaped script content ... // ]]> </script>
<style type="text/css">
/* <![CDATA[ */ ... unescaped CSS content ... /* ]]> */
</style>
An alternative is to use external script and style documents in the head section.
<link rel="stylesheet" href="screen.css" type="text/css" /> <script type="text/javascript" src="site.js"></script>
XHTML Document Structure
XHTML documents have a simple, common structure that forms the basis for designing all Web pages. The <html>, <head>, <title>, and <body> elements must be present. The XHTML file extension is .htm or .html. There is no such thing as a .xhtml file extension. The way you specify to a browser that a file is an XHTML file is in the DOCTYPE declaration, not in the file extension.
The basic structure of the XHTML document is shown below:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD/XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml11-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>A Web Page</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
.
page content goes here
.
</body>
</html>
XML Declaration:
Often an XHTML document will begin with the XML declaration:
<?xml version="1.0" encoding="UTF-8"?>
which declares which XML version and character encoding is to be used. The example is the default and may be omitted. If the document instead makes use of XML 1.1 or another character encoding, a declaration is necessary. For a very detailed description of Character Encoding visit the W3C xhtml, section C.9.
Document Type Declaration (DTD):
All XHTML documents must have a DOCTYPE declaration, DTD, placed before the root element. It is not a part of the XHTML document itself and is not an XHTML element, rather, it is used to validate the XHTML document. A DOCTYPE declares to the browser the DTD to which the document conforms. What does a DTD do?
We will just focus on XHTML 1.0 and there are three XHTML DTD's, strict, transitional, frameset.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
Media Type Usage:
XHTML is an XML format. This means that strictly speaking it should be sent with an XML-related media type (application/xhtml+xml, application/xml, or text/xml). However, XHTML 1.0 was carefully designed so that it would also work on legacy HTML user agents (browsers, and other programs and systems that read those documents) as well. If you follow some simple guidelines, you can get many XHTML 1.0 documents to work in legacy browsers. However, legacy browsers only understand the media type text/html, so you have to use that media type if you send XHTML 1.0 documents to them. But be well aware, sending XHTML documents to browsers as text/html means that those browsers see the documents as HTML documents, not XHTML documents.
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
The browsers that accept the media type application/xhtml+xml include all Mozilla-based browsers, such as Mozilla, Netscape 5 and higher, Galeon and Firefox, as well as Opera, Amaya, Camino, Chimera, DocZilla, iCab, Safari, and all browsers on mobile phones that accept WAP2. In fact, any modern browser. Most accept XHTML documents as application/xml as well.
Notice that Internet Explorer is NOT one of the browsers that recognizes the media type application/xhtml+xml. Rather than rendering application/xhtml+xml content, a dialog box invites the user to save the content to disk instead. Both Internet Explorer 7 (released in 2006) and Internet Explorer 8 Release Candidate 1 (released in January 2009) exhibit this behavior, and it is unclear whether this will be resolved in a future release. As long as this remains the case, most web developers avoid using XHTML that isn’t HTML-compatible, so advantages of XML such as namespaces, faster parsing and smaller-footprint browsers do not benefit the user. In this case the media type used should be text/html.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Namespaces:
XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references. The xmlns attribute in <html>, specifies the XML namespace for a document, and is required in XHTML documents. A namespace is declared using the reserved XML attribute xmlns, the value of which must be an Internationalized Resource Identifier (IRI), usually a Uniform Resource Identifier (URI) reference. Using a URI (such as "http://www.w3.org/1999/xhtml") to identify a namespace, rather than a simple string (such as "xhtml"), reduces the possibility of different namespaces using duplicate identifiers.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
Notice that the language is defined using two different attributes, xml:lang and lang. The xml:lang attribute is the standard way to identify language information in XML, but the browser only recognizes the lang attribute if the page is served as text/html. On the other hand, when processing the document as XML, the xml:lang will be the most useful. Since XHTML 1.0 may be used in both an HTML and XML context, you should use both attributes.
XHTML may use other namespaces. The following example shows the way in which XHTML 1.0 could be used in conjunction with the MathML Recommendation:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>A Math Example</title>
</head>
<body>
<p>The following is MathML markup:</p>
<math xmlns="http://www.w3.org/1998/Math/MathML">
<apply>
<logbase>
<cn> 3 </cn>
</logbase>
<ci> x </ci>
</apply>
</math>
</body>
</html>
The following example shows the way in which XHTML 1.0 markup could be incorporated into another XML namespace:
<?xml version="1.0" encoding="UTF-8"?> <!-- initially, the default namespace is "books" --> <book xmlns='urn:loc.gov:books' xmlns:isbn='urn:ISBN:0-395-36341-6' xml:lang="en" lang="en"> <title>Cheaper by the Dozen</title> <isbn:number>1568491379</isbn:number> <notes> <!-- make HTML the default namespace for a hypertext commentary --> <p xmlns='http://www.w3.org/1999/xhtml'> This is also available <a href="http://www.w3.org/">online</a>. </p> </notes> </book>
For a detailed description of XML namespaces read the W3C Recommendation.
Metadata:
Metadata is information about data. Metadata is text, voice, or image that describes what the audience wants or needs to see or experience. The audience could be a person, group, or software program. Metadata is important because it aids in clarifying and finding the actual data. It never stands alone because it is always associated with the data it describes. In the computer world, this means that whatever is being described must in some way itself be addressable (i.e., retrievable by some means, such as by identifier or location).
Metadata helps to bridge the semantic gap. By telling a computer how data items are related and how these relations can be evaluated automatically, it becomes possible to process even more complex filter and search operations. For example, if a search engine understands that "Van Gogh" was a "Dutch painter", it can answer a search query on "Dutch painters" with a link to a web page about Vincent Van Gogh, although the exact words "Dutch painters" never occur on that page. This approach, called knowledge representation, is of special interest to the semantic web and artificial intelligence.
XHTML has been augmented to allow Dublin Core metadata which is a popular standard developed by the library community to be incorporated in Web pages in a way that is compatible with today's Web browsers, and describes a generalized mechanism by which other popular schemas can be used in similar fashion.
Metadata is placed in the head of the document inside the <meta> tag. The meta tag describing the character set is always present.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
There are other types of metadata that may be added, adminstrative, Dublin Core, instructions to search engine robots. The following is an example of how the Dublin Core metadata is used:
<meta name="DC.title" content="Graduate Studies Prospectus" />
<meta name="DC.subject" xml:lang="en-GB" content="Oxford University Graduate Studies
Prospectus, for entry in 2006-7. Courses, research areas, facilities, accommodation,
student funding, how to apply and links to department information." />
<meta name="DC.Date.modified" content="date last modified" />
For a detailed description of how to create and use Dublic Core metadata visit the Dublin Core Metadata Initiative site.
XHTML Deconstructed
I created an example XHTML page following all of the rules and validated the markup using the W3C Markup Validation Service. Remember to do this when you think you have completed your page. Here is what it looks like rendered in Firefox:

At the very top of the page, preceeding the html tag, the code contains the XML Declaration and the DOCTYPE for Strict DTD. The attributes in the html tag set the xmlns namespace and the language definitions. Inside the head tag the media type is set to application/xhtml+xml and the character encoding is set to UTF-8. The title tag is present as well. There is embedded CSS so the CDATA is surrounded by comments so that user agents that don't understand it will ignore it.

Moving past the CSS notice how the CDATA is closed, also surrounded by comment notation. After that the code looks the same as well-formed html code.

Try the example to see how it renders with your browser.
XHTML 1.1
The XHTML 1.1 recommendation defines a new XHTML document type that is based upon the module framework and modules defined in Modularization of XHTML which serves as the basis for future extended XHTML 'family' document types, providing a consistent, forward-looking document type cleanly separated from the deprecated, legacy functionality of HTML 4 that was brought forward into the XHTML 1.0 document types. This document type is essentially a reformulation of XHTML 1.0 Strict using XHTML Modules. This means that many facilities available in other XHTML Family document types (e.g., XHTML Frames) are not available in this document type.
XHTML Modularization is not aimed at the regular users of XHTML, but at designers of XHTML-based languages. It had been observed that companies and groups had the tendency to design their own versions of HTML and XHTML that were often not interoperable at basic levels. XHTML Modularization splits XHTML into a number of modules that can be individually selected when defining a new language. In this way any XHTML-based language that uses tables is guaranteed to use the same definition of tables, and not some divergent version. Modularization also makes it clear where it is OK to add new elements, and where it is not.
With the advent of the XHTML modules defined in Modularization of XHTML, the W3C has removed support for deprecated elements and attributes from the XHTML family. These elements and attributes were largely presentation oriented functionality that is better handled via style sheets or client-specific default behavior.
The single most significant change in XHTML 1.1 is the uncoupling of data from presentation. Formatting is no longer embedded with data and can only be achieved by referencing Cascading Style Sheets (CSS). This leaves data available for easy parsing and reuse by a wide range of new non-desktop products and accessibility applications. XHTML is backwards compatible where it matters most - at the level of information. Some older browsers may not support CSS and so presentation of Web pages may not be exactly as intended, but the information on the Web page is still available. XHTML 1.1 represents a departure from both HTML 4 and XHTML 1.0 - many features were deprecated. In general, the underlying philosophy of XHTML 1.1 was to define a markup language that is rich in structural functionality, but that relies upon style sheets for presentation.
Resources and References
Tutorials
* click the link below to continue with the lesson *
RDFa and Microformats