Understanding the XForms dependency-engine

A key component of an XForms processor is the dependency-engine. The idea is pretty straightforward, and will be familiar to anyone who has used a spreadsheet; if some item has its value set by a calculated expression that contains references to other items, then when any of those items change, the first item must be recalculated. It's not necessary to understand the dependency-engine when programming XForms, but having some familiarity with how it works may help when structuring forms.

To illustrate its use, let's take an instance that has two values which are summed to produce a third, and if the sum is greater than 10, the value is said to be invalid:

<xf:instance>
  <instanceData xmlns="">
    <a>4</a>
    <b>5</b>
    <c />
  </instanceData>
</xf:instance>

<xf:bind nodeset="c" calculate="../a + ../b" constraint=". &lt;= 10" />

Our calculate instruction says simply that the value of c is the result of summing a and b, which means that every time either a or b changes we want the calculation to be carried out again. Similarly, since the constraint is based on the value of c itself, then if it changes we need to perform the validity check again. This set of dependencies--the first saying that the value of c is dependent on the values of a and b, and the second saying that the validity of c is dependent on the value of c--is easy to create by parsing the XPath expressions in the bind statements. And once created, processing can be quite fast, since it is now possible to only perform calculations that are required by changes in the data.

Rebuild

By working out the dependencies between data items, it is possible to reduce the number of calculations to only those that are implied by some data change. However, if the structure of the data changes, then the whole set of dependencies needs to be recalculated. This process is called a rebuild and involves reevaluating all XPath expressions used in a particular model.

To see why this is necessary, imagine a slightly more complex model than the one we had above, where we have rows in an invoice, and each row has a 'total price' that is calculated by multiplying the number of items purchased by the individual item's selling price:

<xf:instance>
  <invoice xmlns="">
    <item sku="1223">
      <units>2</units>
      <price>6.99</price>
      <total />
    </item>
    <item sku="776">
      <units>5</units>
      <price>12.99</price>
      <total />
    </item>
  </invoice>
</xf:instance>

<xf:bind nodeset="item/total" calculate="../units * ../price" />

Since an XPath expression like "x/y" will select any y element that is a child of an x element, then this simple bind statement causes the XForms processor to create dependencies across all of the total elements; it's as if we had authored the following statements by hand:

<xf:bind nodeset="item[1]/total[1]" calculate="../units * ../price" />
<xf:bind nodeset="item[2]/total[1]" calculate="../units * ../price" />

But there is a big advantage with using the "item/total" technique, which is that if the data is updated in such a way that the number of rows changes, then the processor will automatically create a new set of dependencies. For example, if the user clicks some button to create a new item in the invoice, a new calculation will be automatically added for the new total node, and the effect would be the same as if we had explicitly written the following:

<xf:bind nodeset="item[1]/total[1]" calculate="../units * ../price" />
<xf:bind nodeset="item[2]/total[1]" calculate="../units * ../price" />
<xf:bind nodeset="item[3]/total[1]" calculate="../units * ../price" />

The process responsible for adding this calculation is rebuild.

When do rebuilds happen?

It is only necessary to reconstuct the dependencies when the structure of the instance data in a model changes. This would be when new nodes are added (using insert) or deleted (using delete), or when some instance data is entirely replaced (using submission with @replace="instance" or reset). In all of these situations the rebuild is automatic.

Improving performance

Although rebuild happens automatically, once you know what will trigger it, you can look at ways of reducing its impact on performance.

Since the process of rebuilding applies to an entire model, it is a good idea to avoid putting unrelated instances into the same model, unless there are no bind statements (and so nothing for rebuild to do). For example, if in our invoice we had an additional instance that is used to retrieve and update customer details, it would be a good idea to place this in a separate model. Otherwise, every time the user requests customer information from the server, the dependency engine would parse all of the XPath expressions for the invoice, updating the dependencies.

An example

Let's return to the simpler instance that we saw before:

<xf:instance>
  <instanceData xmlns="">
    <a>10</a>
    <b>10</b>
    <c />
    <d />
  </instanceData>
</xf:instance>

<xf:bind nodeset="c" calculate="../a * ../b" constraint=". &lt;= 100" />
<xf:bind nodeset="d" calculate="../a + ../b" constraint=". &lt;= 20" />

This gives us four calculations in our dependency graph:

Target Type Expression
c[1] calculate a[1] * b[1]
c[1] constraint c[1] <= 100
d[1] calculate a[1] + b[1]
d[1] constraint d[1] <= 20

as well as the following dependencies:

Node Dependents
a[1] c[1], d[1]
b[1] c[1], d[1]
c[1] c[1]'s constraint
d[1] d[1]'s constraint

We've used the '[]' syntax to convey the idea that if our instance data had more nodes--just as we saw in the invoice example, previously--then these XPath expressions would yield more dependencies.

The meaning of the table is that if any node in the left column changes, the nodes in the dependency list on the right will need recalculating. So if node a changes, then both c and d need to be recalculated; c would be set to a * b and d would be set to a + b. But note further down the table that when c changes, its constraint needs to be recalculated, and similarly, a change in d requires a recalculation of its constraint. A structure like this, where items are linked together, is often called a directed graph.

Master Dependency Directed Graph

We've said that you don't need to understand the dependency-engine in order to use it, so you certainly don't need to understand this next step; but if you are inclined towards the nuts and bolts, you'll no doubt find this feature of the XForms architecture interesting.

The nodes, dependents and calculations that we have in our two tables above, are actually treated as one item, which goes under the name master dependency directed graph. The dependency directed graph part of the name is due as we saw above, to the fact that each node contains a list of references to other nodes that depend on it, and this chain of dependencies takes the form of a directed graph.

The master part of the name is due to the fact that this graph contains all of the relevant information from a model--all of the possible calculations and all dependencies between them--as worked out during the rebuild phase.

Our MDDG for our simple form then, looks like this:

Target Type Expression Dependents
a[1] node c[1], d[1]
b[1] node c[1], d[1]
c[1] calculate a[1] * b[1] c[1]'s constraint
c[1] constraint c[1] <= 100
d[1] calculate a[1] + b[1] d[1]'s constraint
d[1] constraint d[1] <= 20

With this 'master' graph, given information about one or more nodes that have changed, the dependency engine can work out which calculations would need to be performed. This process of performing calculations based on the relationships between nodes is called recalculation, and we'll look at that next.

Recalculate

We've seen how the rebuild phase recreates all of the dependencies for a particular model; once this list of dependencies is available, the recalculate phase can make use of the list to carry out its calculations.

Recall that we had this combination of instance and bind statements:

<xf:instance>
  <instanceData xmlns="">
    <a>10</a>
    <b>10</b>
    <c />
    <d />
  </instanceData>
</xf:instance>

<xf:bind nodeset="c" calculate="../a * ../b" constraint=". &lt;= 100" />
<xf:bind nodeset="d" calculate="../a + ../b" constraint=". &lt;= 20" />

and that after the rebuild phase, we'd have the following MDDG:

Target Type Expression Dependents
a[1] node c[1], d[1]
b[1] node c[1], d[1]
c[1] calculate a[1] * b[1] c[1]'s constraint
c[1] constraint c[1] <= 100
d[1] calculate a[1] + b[1] d[1]'s constraint
d[1] constraint d[1] <= 20

We've already said that the whole purpose of having this 'master' graph is to ensure that the XForms processor doesn't carry out unnecessary calculations, so the next question is how is it used.

The Pertinent Dependency Subgraph

If you got the hang of the master graph that contains details of all calculations and dependencies in a model, then the idea of a smaller graph that contains only the relevant parts of the master, will be quite easy. For example, if a changes, we can see that it has dependents of c and d, and so create a smaller graph:

Target Type Expression Dependents
c[1] calculate a[1] * b[1] c[1]'s constraint
d[1] calculate a[1] + b[1] d[1]'s constraint

To create this list the dependency-engine needs to know what data has changed; if a node is changed by the user (via a form control) or by xf:setvalue, then a reference to the node is recorded in a change list. This list of nodes is then used during the recalculate phase to create a pertinent dependency subgraph, which shows only those calculations from the 'master' list, that need to be performed.

To continue our example, note that the two calculations that were just added to the subgraph also have dependents:

Target Type Expression Dependents
c[1] calculate a[1] * b[1] c[1]'s constraint
d[1] calculate a[1] + b[1] d[1]'s constraint

Having a dependent means that after c and d are recalculated there are further calculations that need to take place, and they too must be added to the subgraph:

Target Type Expression Dependents
c[1] calculate a[1] * b[1] c[1]'s constraint
d[1] calculate a[1] + b[1] d[1]'s constraint
c[1] constraint c[1] <= 100
d[1] constraint d[1] <= 20

Once this list has been created, all recalculations in the list can be performed. The PDS is then discarded, since it will be recreated the next time recalculate occurs.

Recalculations based on changes in other models

Often there will be situations where calculations in one model are dependent on data in another. For example, let's say we have a model that contains a list of actions that our user can perform:

<xf:model id="mdl-actions">
  <xf:instance id="inst-actions">
    <actions xmlns="">
      <action id="add">Add</action>
      <action id="delete">Delete</action>
    </actions>
  </xf:instance>
</xf:model>

We could now use this data to show a list of triggers to the user:

<xf:repeat nodeset="action">
  <xf:trigger>
    <xf:label ref="." />
  </xf:trigger>
</xf:repeat>

Let's extend this so that each action contains a value that indicates whether the user can perform that action or not, and we set the relevance of the action based on that value:

<xf:model id="mdl-actions">
  <xf:instance id="inst-actions">
    <actions xmlns="">
      <action id="add" allowed="1">Add</action>
      <action id="delete" allowed="1">Delete</action>
    </actions>
  </xf:instance>
  <xf:bind nodeset="action" relevant="boolean-from-string(@allowed)" />
</xf:model>

Now all we need to do to show or hide the trigger is bind it the action node (so that it is affected by the node's relevance):

<xf:repeat nodeset="action">
  <xf:trigger ref=".">
    <xf:label ref="." />
  </xf:trigger>
</xf:repeat>

and then use setvalue to control whether a user can perform the action or not. For example, if the last item in a list is deleted, we might indicate that the 'delete action' is not available:

<xf:setvalue ref="action[@id='delete']/@allowed">0</xf:setvalue>

This is a very useful construct, but let's extend it now so that other data plays a role in determining what actions a user can perform. Let's say we have another model that provides some information about a user that has logged in to some system. This information would be provided dynamically, but to make it easier to follow, we've shown it here in an instance:

<xf:model id="mdl-login">
  <xf:instance id="inst-login">
    <user xmlns="">
      <name>John Doe</name>
      <role>admin</role>
    </user>
  </xf:instance>
</xf:model>

Now, let's add a bind statement to our previous instance that allows administrators to use delete:

  <xf:bind nodeset="action[@id='delete']/@allowed" relevant="globalInstance('inst-login')/role[. = 'admin']" />
  <xf:bind nodeset="action" relevant="boolean-from-string(@allowed)" />
</xf:model>

NOTE: We've used the formsPlayer globalInstance function, which allows references to be made to instances in different models. It's a fairly straightforward function, and no dependencies are created.

This should have the effect of setting the value of @allowed on the 'delete action' element based on whether our form user is an administrator or not. However, the key question here is when exactly is this calculation performed?

Recall from the recalculation phase that only those items that are in the change list will cause dependents to be recalculated. In other words, there is no 'recalculate all', except after a rebuild. In our example above, changes to any of the allowed attributes will cause a recalculation of the relevance of the corresponding action element, but changes to the value of globalInstance('inst-login')/role will not.

This obvious solution is to perform a rebuild on the 'actions model', whenever a rebuild is performed on the 'login model', for example, after the submission in the login model has completed:

<xf:model id="mdl-login">
  <xf:instance id="inst-login">
    <dummy xmlns="" />
  </xf:instance>
  <xf:submission action="..." method="..."
    replace="instance" instance="inst-login"
  >
    <xf:rebuild ev:event="xforms-submit-done" model="mdl-actions" />
  </xf:submission>
</xf:model>

However, there is an easier way, which is to put the relevance rule we added to the action model, into the login model; this means that the calculation will be performed as part of the normal course of events at the right time, whenever anything changes in the login model. Our finished mark-up is as follows:

<xf:model id="mdl-login">
  <xf:instance id="inst-login">
    <dummy xmlns="" />
  </xf:instance>
  <xf:submission action="..." method="..."
    replace="instance" instance="inst-login"
  />
  <xf:bind nodeset="globalInstance('inst-actions')/action[@id='delete']/@allowed" relevant="instance('inst-login')/role[. = 'admin']" />
</xf:model>

<xf:model id="mdl-actions">
  <xf:instance id="inst-actions">
    <actions xmlns="">
      <action id="add" allowed="1">Add</action>
      <action id="delete" allowed="1">Delete</action>
    </actions>
  </xf:instance>
  <xf:bind nodeset="action" relevant="boolean-from-string(@allowed)" />
</xf:model>

What's interesting about this approach, is that the 'actions model' remains encapsulated; it deals only with whether an action can be performed or not, but has no 'knowledge' of the conditions that determine that. And from a programmer's perspective we're using bind literally to mean 'please add this calculation to the master dependency directed graph for this model'.