Document validation

Any document that is indicated to Sidewinder as being XHTML will go through validation. The schemata used have been built using the techniques described in XHTML Modularization 1.1 which provides a set of XHTML modules for re-use when building XML languages.

The schemata used by Sidewinder actually build on the core XHTML modules by adding XForms, SVG, MathML, RDFa and XLink. (The MathML schemas are currently disabled.)

Since the schemata may well contain errors, please use the forums to both check for updates in between releases, and to report any errors you discover.

Screenshot of Sidewinder running in standalone mode, showing validation errors.

Enhancing the modularisation schemas to support RDFa

To enable us to validate RDFa we need to add some features to the XHTML Modularization 1.1 schemata.

Note that in the following mark-up slightly older schemata than those used in XHTML Modularization 1.1 are shown; this will be modified shortly.

Adding the elements

RDFa requires two elements, meta and link. The main ways in which they change from normal XHTML are:

  • meta takes an attribute of property instead of name;
  • both meta and link can be nested inside themselves and each other;
  • both meta and link can take inline text;
  • both meta and link can appear anywhere within a document.

Since there were so many changes to be made it got to the point where a redefinition of the existing elements left so little of the original intact, that new modules were created for each. However, once that was done it became clear that the two modules were almost complete copies of each other, so they have been merged into a new module called xhtml11-rdfa-1.xsd.

This means that in xhtml11-modules-1.xsd the references to xhtml-meta-1.xsd and xhtml-link-1.xsd have been removed, and replaced with this single module.

All of the attributes that were needed for RDFa were moved out to a separate module, and placed in the 'common attribute' collection (see next section). Then all that is needed is for meta and link to use this collection exactly like other elements do. (Note that all the other attributes have been left intact for now.):

  <xs:attributeGroup name="meta.attlist">
    <xs:attributeGroup ref="Common.attrib"/>
    <xs:attribute name="name" type="xs:NMTOKEN"/>
    <xs:attribute name="scheme" type="xh11d:CDATA"/>
  </xs:attributeGroup>

  <xs:attributeGroup name="link.attlist">
    <xs:attributeGroup ref="Common.attrib"/>
    <xs:attribute name="media" type="xh11d:MediaDesc"/>
  </xs:attributeGroup>

(At some point scheme may get deprecated since property is a CURIE, which means that the namespace prefix plays the role of a scheme.)

Defining the elements themselves then becomes a case of adding a common content model:

  <xs:group name="rdfa.content">
    <xs:choice>
      <xs:element ref="meta" />
      <xs:element ref="link" />
    </xs:choice>
  </xs:group>

And then making use of it in both elements:

  <xs:complexType name="meta.type" mixed="true">
    <xs:group ref="rdfa.content" minOccurs="0"
      maxOccurs="unbounded" />
    <xs:attributeGroup ref="meta.attlist"/>
  </xs:complexType>

  <xs:complexType name="link.type" mixed="true">
    <xs:group ref="rdfa.content" minOccurs="0"
      maxOccurs="unbounded" />
    <xs:attributeGroup ref="link.attlist"/>
  </xs:complexType>

  <xs:element name="link" type="link.type"/>
  <xs:element name="meta" type="meta.type"/>

Note that defining both elements in terms of each other fulfills the requirement that they can be nested to any depth. And by adding mixed="true" we also provide for inline text in both elements, although their meanings differ (inline text in meta sets the value for content, whilst inline text in link is the same as in a).

Finally, to allow the use of these two elements anywhere in the XHTML document we need to redefine the Misc.extra group in xhtml.xsd:

  <xs:group name="Misc.extra">
    <xs:choice>
      <xs:group ref="xh11:Misc.extra" />
      <xs:element ref="meta"/>
      <xs:element ref="link"/>
    </xs:choice>
  </xs:group>

Adding the attributes

To incorporate the attributes a module called rdfa-10-rules.xsd was created, and in it an attribute group was added:

  <xs:attributeGroup name="rdfa.attlist">
    <xs:attribute name="about" type="CurieOrURI"/>
    <xs:attribute name="content" type="xs:string" />
    <xs:attribute name="datatype" type="Curie"/>
    <xs:attribute name="href" type="CurieOrURI"/>
    <xs:attribute name="property" type="Curie"/>
    <xs:attribute name="rel" type="CurieOrLinkTypeList"/>
    <xs:attribute name="rev" type="CurieOrLinkTypeList"/>
  </xs:attributeGroup>

All of the data types listed here are included in the same module, and discussed in the next section. (They will however, be broken out into a separate 'types' document.)

As you saw in the previous section, some of the attributes that XHTML uses are not here--more XHTML-specific attributes have been kept back in the XHTML driver file.

To make use of these RDFa attributes we simply need to redefine Common.extra:

  <xs:attributeGroup name="Common.extra">
    <xs:attributeGroup ref="Common.extra"/>
    <xs:attributeGroup ref="rdfa.attlist" />
    <xs:attribute name="charset" type="xh11d:Charset"/>
    <xs:attribute name="hreftype" type="xh11d:ContentType"/>
    <xs:attribute name="hreflang" type="xh11d:LanguageCode"/>
    <xs:attribute name="http-equiv" type="xs:NMTOKEN"/>
  </xs:attributeGroup>

Common.extra is an empty definition in XHTML M12N but is part of Common.attribs which is applied to every element.

Note that hreftype is used here instead of type since otherwise there would be a conflict. This is the name proposed in XHTML 2.0.

Adding the datatypes

As we saw in the previous section, the RDFa attributes rely on a set of datatypes, the most important being the CURIE.

CURIE

Although this may change, for now a CURIE is defined as follows:

  <xs:simpleType name="Curie">
    <xs:restriction base="xs:string">
      <xs:pattern value="[\i-[:]][\c-[:]]*:.+" />
    </xs:restriction>
  </xs:simpleType>

It's pretty straightforward, but to give a quick overview:

  • \i is the set of characters that can start an XML name, which is any letter, an '_' or a ':';
  • [\i-[:]] means any character that is in the set \i but is not a colon...for obvious reasons;

So the first character of our CURIE must be a letter or an underscore, i.e., not a number, colon, equals sign, etc. After that we can have any XML name character, which can now include numbers:

  • \c is the set of XML name characters that can appear after the first character;
  • [\c-[:]] is the same set but without a colon;
  • [\c-[:]]* means match as many as we can (zero or more);

The prefix part has been made quite strict because it must match the namespace prefix rules. But after the colon we can be looser, hence:

  • .+ means one or more of any character.

Note that we could say that the part after the colon can be any valid URI, and this does actually work fine. But since pretty much anything matches successfully, there didn't seem a lot of point. However, so that we have a record of it, making the part after the colon a valid URI can be done with the following expresson (line-breaks only added for layout):

  [\i-[:]][\c-[:]]*
  :
  (([^:/\?#]+):)?(//([^/\?#]*))?([^\?#]*)(\?([^#]*))?(#(.*))?

Escaped CURIEs

An escaped CURIE is used in places where direct use can be ambiguous, and is currently achieved by placing square brackets around the CURIE. The definition is as follows:

  <xs:simpleType name="CurieEsc">
    <xs:restriction base="xs:string">
      <xs:pattern value="\[[\i-[:]][\c-[:]]*:.+\]" />
    </xs:restriction>
  </xs:simpleType>

Note that there is no way with XML Schemas for this regular expression to reference our first expression, hence the repitition of the pattern for a CURIE.

Escaped CURIE or URI

Once we have our escaped CURIE we can use it to create a type that can contain both a URI and an escaped CURIE:

  <xs:simpleType name="CurieOrURI">
    <xs:union memberTypes="xs:anyURI CurieEsc" />
  </xs:simpleType>

This type is used for about and href.

CURIE or Link Type List

The rel and rev attributes are a little tricky since they can contain CURIEs as well as tokens that come from a standard list...which is made slightly more complicated by the fact that it is acceptible to have multiple values.

The first step is to create a 'CURIE or link type' definition, which allows either a CURIE or one of a restricted range of values:

  <xs:simpleType name="CurieOrLinkType">
    <xs:union memberTypes="Curie">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="alternate"/>
          <xs:enumeration value="forward"/>
          <xs:enumeration value="start"/>
          <xs:enumeration value="next" />
          .
          .
          .
          <xs:enumeration value="profile" />
          <xs:enumeration value="role" />
          <xs:enumeration value="cite" />
        </xs:restriction>
      </xs:simpleType>
    </xs:union>
  </xs:simpleType>

Once we have this it's a simple matter to create a list of 'CURIE or link types':

  <xs:simpleType name="CurieOrLinkTypeList">
    <xs:list itemType="CurieOrLinkType" />
  </xs:simpleType>

This definition would make the following valid:

  <link rel="subsection ab:foreward prev" href="[cd:ef]" />

and the following invalid:

  <link rel="blah ab:foreward" href="[cd:ef]" />

Rationalising @href

One of the problems with the schemas created for XHTML Modularization 1.1 is that attributes that play the same role in different places are not always rationalised. One example of this is the href attribute used on anchors and image maps, and another is the use of type on object, script and style.

In order to allow href to appear anyway in the document we need to remove it from the places where it is already set. We achieve this by creating replacements for xhtml-hypertext-1.xsd and xhtml-csismap-1.xsd in which any references to any of the attributes that have now been moved into the Common.attribs group, are dropped.