The need for a generic solution

What we need therefore is a generic solution to putting metadata into our documents, so that a processor can decipher it without having to know in advance that author and geo.position, for example, apply to different things.

For a number of years the general approach has been that metadata and documents just don't mix--you use RDF/XML for complicated metadata, and you put the actual content in your XHTML documents. But as we've seen with examples like location information, people want to put information about themselves, their companies, and their conferences, right into their documents, avoiding the need to publish separate documents. If we're to allow this, but at the same time avoid relying on the implicit knowledge that we discussed, then a key requirement will be to prise apart when we're talking about 'the document' and when we're talking about some other 'thing'.

The first part of what is needed is to bring some order to URI-space and ensure that if a URI is used to represent a document then it isn't used to represent some other 'thing' like a conference or a car, and likewise, if a URI is used to represent a conference or a car, it is never used to represent a document. The second half of the equation is to use a syntax that is richer than the simple meta element embedding provided by HTML (that we have seen in our previous examples).

The solution to the first problem is the notion of information resources, and to the second is RDFa. Let's look at these two points in turn.

Information resources

A key concept to understand in this discussion is the notion of an information resource...but first let's get to grips with resources.

A resource is pretty much anything that we want to talk about; you, me, XTech 2006, John Lennon, the 322 bus route, the city of Paris, meeting room 2 (and its projector), a Zanussi washing-machine, and so on. These things are anything that we may usefully make statements about, and in turn share those statements with each other.

An information resource is a very specific subset of this enormous list of things, and it is those things that you interact with on the web. Why do we need to pull these resources out and make them special, as opposed to, say, car resources or planet resources? The answer actually goes to the heart of the discussion we have just had in the earlier sections.

Let's assume that we all want to share data about the XTech 2006 conference; we might want to say "get this flight to XTech 2006", or "tickets for XTech 2006 cost this much", or "Steven Pemberton is giving a tutorial on XForms and XHTML 2 at XTech 2006". RDF and the idea of URIs gives us a mechanism for identifying the resource 'XTech 2006' (go on, get with the programme...call it a resource...you won't regret it!) in such a way that we can all know that we're talking about the same thing; that mechanism is for us all to agree on a URI that represents this conference and nothing else.

Once we have agreed on this URI, then the information you have carefully put together about hotels that are close to the conference venue, could be mixed with my data about flights from London, and someone else's data about the topics that will be discussed, to build a package of useful data.

So how do we choose this all-important URI? It's probably best to let the conference organisers decide and then leverage the DNS architecture--they could choose a URI in a domain that they own and control. If they do this, then they would most likely decide to use the conference home page:

  <http://xtech06.usefulinc.com/>

That's all well and good, but let's work through what happens when the conference organisers add metadata to this web-page.

Let's say they add something simple like a modification date; the problem now is that we're right back squarely into our 'implied knowledge' problem from the first section, since we now need to know whether the modification date applies to the conference agenda or just to the document page that has been modified.

A similar problem arises if we want to have a unique URI to identify the conference sessions so that we can make statements about them--perhaps we want to suggest sessions to avoid, or we want to give them stars afterwards to indicate how good they were. My session on RDFa is identified by this URI:

  <http://xtech06.usefulinc.com/schedule/detail/58>

but if you opened the document from that URL and found that there was a dc:creator tag, what exactly would it indicate? Would it tell you the name of the page author, or me, the session author? Tricky...

TAG
The W3C's tag group got round this problem by saying that the URI that identifies my session in RDF-space must not be the URI I just quoted, since that is a web-page. Trying to use it to represent both resources (the session and the web-page) causes lots of problems because you only have one URI, and with one URI you can only unambiguously talk about one thing. To put it another way, the session and the web page describing the session, are distinct--there are two resources, not one.

It's important to be clear on the implications of this. No-one is saying that the URI for the session can't be within the address-space of the web-site, so it's fine for my session to be identified as something like this:

  <http://xtech06.usefulinc.com/data/session/58>

The crucial thing about making this separation is that if we now talk about this resource (by using this URI) and make statements like who created it, where it is located, when it was last modifed, how long it is, what its subject matter is, and so on, we know without any doubt that we are talking about the session, and not the page about the session (that is still located at http://xtech06.usefulinc.com/schedule/detail/58.)

We're inching towards a solution, but now we have two further problems; how do we get distinct URIs for our two resources, and how do we put metadata in our document so that it's clear when we're talking about the session and when we're talking about the web-page.

RDFa

We're now ready to look at how RDFa solves this problem. First we'll look at a simple way that RDFa can be used to unambiguously mark-up our metadata, before looking at using nested statements to make the mark-up more compact, and inline metadata to allow the document's content to be reused.

Simple mark-up

Let's return to our simple example from an earlier section:

  <head>
    <meta name="author" content="Mark Birbeck" />
    <meta name="geo.position" content="43.95;4.833333" />
  </head>

Recall that we said that with 'implicit knowledge' we could interpret this as the following set of triples:

  <http://internet-apps.blogspot.com> dc:creator
    [
      foaf:name "Mark Birbeck";
      geo:lat "43.95";
      geo:long "4.833333"
    ] .

(Remember that I mapped the properties to existing vocabularies that seemed appropriate.)

Our next step is to look at how we can mark this up unambiguously; i.e., in such a way that we can dispense with the 'implicit' knowledge.

The RDFa mark-up for this actually mirrors the N3 quite closely (assuming that this is in a document stored at http://internet-apps.blogspot.com):

  <head>
    <link rel="dc:creator" href="#about" />
    <meta about="#author" property="foaf:name" content="Mark Birbeck" />
    <meta about="#author" property="geo:lat" content="43.95" />
    <meta about="#author" property="geo:long" content="4.833333" />
  </head>

This is a generic solution because statements in the head section that follow the layout of the first statement, i.e.:

  <meta property="p" content="o" />

are always statements about the document itself, whilst statements with an about attribute, i.e., this pattern:

  <meta about="s" property="p" content="o" />

always relate to the item designated by the URI in the about attribute.

Our mark-up has two sets of statements; the first concerns the document itself and is a reference to the 'thing' that created it:

  <link rel="dc:creator" href="#about" />

The second lot of statements describe this 'thing' that created the document, and tell us its name and where it is:

  <meta about="#author" property="foaf:name" content="Mark Birbeck" />
  <meta about="#author" property="geo:lat" content="43.95" />
  <meta about="#author" property="geo:long" content="4.833333" />

This separation of the two lots of statements means that even when we don't know anything about what the data 'means' we can still tell which of our resource pairs its about; we can tell if it's about the web-page or the web-page's author, the web-page or the conference, the web-page or the conference session, the web-page or the car, the web-page or the planet, the web-page or the Zanussi washing-machine, and so on.

Nesting statements

In RDFa meta elements can be nested. The normal purpose of this is to make statements about the statement that contains them, such as indicating that the category 'fishing' was added by Jane or that the 4 star rating was given by John:

  <head>
    <meta property="category" content="fishing">
      <meta property="dc:creator" content="Jane Doe" />
    </meta>
    <meta property="rating" content="4">
      <meta property="dc:creator" content="John Doe" />
    </meta>
  </head>

But another use of nested statements is to make a lot of statements about the same resource, and this is achieved by adding an about attribute to the top-level statement. Our mark-up could now look like this:

  <head>
    <link rel="dc:creator" href="#about" />
    <meta about="#author">
      <meta property="foaf:name" content="Mark Birbeck" />
      <meta property="geo:lat" content="43.95" />
      <meta property="geo:long" content="4.833333" />
    </meta>
  </head>

In this small example there's not a lot to choose between this and the first syntax where the about attribute was repeated on each statement. But in a larger example the nested approach is probably easier to manage, as we can see here, where information about the author has been extended to include much of what might be in a person's FoaF file:

  <html
   xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
   xmlns:foaf="http://xmlns.com/foaf/0.1/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
  >
    <head>
      <link rel="dc:creator" href="#about" />
      <meta about="#author">
        <link rel="rdf:type" href="[foaf:Person]" />
        <meta property="foaf:name" content="Mark Birbeck" />
        <meta property="foaf:givenname" content="Mark" />
        <meta property="foaf:family_name" content="Birbeck" />
        <meta property="foaf:mbox_sha1sum"
          content="b8b2922a0d39cd7c7db0b4f65124b4dd2a79fa24" />
        <link rel="foaf:homepage" href="" />
        <link rel="foaf:workplaceHomepage" href="http://www.formsPlayer.com/" />
        <meta property="geo:long" content="4.833333" />
        <meta property="geo:lat" content="43.95" />
      </meta>
    </head>
    .
    .
    .
  </html>

DRY...don't repeat yourself

Software development is built on 're-use', but writing web-pages is one place that we've generally not had the benefit of re-use mechanisms. However, RDFa allows all of the metadata that we just saw in the FoaF example to be obtained from features that would normally be found in a web-page anyway.

For example, if my blog already has a link to my work web-site:

  <body>
    I am currently working on an  XForms processor called
    <a href="http://www.formsPlayer.com/">formsPlayer</a>.
  </body>

I could re-use this information and so replace the mark-up for foaf:workplaceHomepage in the head of the document with this:

  <body about="#author">
    I am currently working on an  XForms processor called
    <a rel="foaf:workplaceHomepage"
      href="http://www.formsPlayer.com/">formsPlayer</a>.
  </body>

A full example that uses all of these techniques might look like this:

  <html
   xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
   xmlns:foaf="http://xmlns.com/foaf/0.1/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
  >
    <head>
      <link rel="dc:creator" href="#about" />
      <meta about="#author">
        <link rel="rdf:type" href="[foaf:Person]" />
        <meta property="foaf:givenname" content="Mark" />
        <meta property="foaf:family_name" content="Birbeck" />
        <meta property="foaf:mbox_sha1sum"
          content="b8b2922a0d39cd7c7db0b4f65124b4dd2a79fa24" />
        <link rel="foaf:homepage" href="" />
        <meta property="geo:long" content="4.833333" />
        <meta property="geo:lat" content="43.95" />
      </meta>
    </head>
    <body about="#author">
      Hi, my name is Mark Birbeck,
      and I am currently working on an XForms processor called
      <a rel="foaf:workplaceHomepage"
        href="http://www.formsPlayer.com/">formsPlayer</a>.
      Don't bother emailing me though, because I'm on holiday
      in Avignon. (I wish...)
    </body>
  </html>