Oracle Fusion Middleware SOA,AIA and BPM: Comparing XSLT and XQuery

Abstract

XSLT 2.0 and XQuery 1.0 have been developed by two Working Groups in close collaboration, and there is a high degree of overlap in the functionality of the two languages. They share many common concepts, such as the underlying data model, and they both include the whole of XPath 2.0 as a sublanguage, together with its extensive repertoire of data types and the associated function library.
The two languages focus on different needs, and to some extent these needs exist in different user communities. This makes it understandable that many developers have only looked seriously at one of the two languages and have dismissed the other as irrelevant. However, while there are many tasks that both languages can do well, there are some where one of the languages is a far better choice than the other. Any serious XML professional should therefore have a grasp of the capabilities of both, and should know how to choose between them objectively. And the fact that the concepts are so similar actually means that when you have learnt one, the other comes easily.
Those who have looked a little at both languages have in some cases found it hard to decide which one is more suited to their particular requirements. There have been many confusing and conflicting statements made by competing vendors, or by users of one of the two languages, who usually seem to have an unwavering conviction that the grass is greener on their own side of the fence. Sometimes the debate starts to look like a religious war.
This paper will attempt an objective side-by-side comparison of the two languages: not just from the point of view of technical features, but also looking at usability, vendor support, performance, portability, and other decision factors. Is it true, for example, that XQuery is better for data and XSLT is better for documents? Is one or the other language easier to learn depending on your computing background? As well as trying to answer these questions, the paper will also illustrate how the two languages can interoperate, so that each can be used for the parts of an application where it is most appropriate.

How does one compare two languages? The purpose of the comparison is to help potential users decide which language is most appropriate to their needs. For this purpose, a raw comparison of features (language A uses XML syntax, language B does not) is not immensely useful: how does this help anyone decide?
I've tried to structure the comparison under three headings.
Firstly, the feature analysis, but with the emphasis being on the implications of the differences for users. This forms the bulk of the paper.
Secondly, user perceptions: what is it about the two languages that makes people like one and not the other? I haven't tried to be scientific about this, most of what I have to say is anecdotal. But I hope that readers will find it useful all the same, if only as a basis for a more objective assessment.
Thirdly, factors that have more to do with implementations of the languages than with the languages themselves. The point here is that people designing implementations of the two languages have certain ideas about how the languages are likely to be used, and their implementations will tend to to be optimized for some kinds of applications at the expense of others.
I could have included a fourth heading: future directions. To choose a technology, you not only need to understand where it is today, you need to have some kind of idea where it is going in the future. However, my crystal ball has never been very reliable and seems to become less so the more it is used. So I'll avoid speculation.

What do the languages have in common?

The first thing to understand about XSLT and XQuery is that they have more similarities than differences. I emphasize this because most of the paper will be talking about the differences. If the languages weren't so similar, it wouldn't be necessary to compare them so carefully.
(We should always remember when we spend time evaluating two technologies, that the more similar they are, the harder it is to make a decision; yet at the same time, the more similar they are, the less difference it is likely to make which one we choose.)
Here are some of the things that XSLT 2.0 and XQuery 1.0 have in common:
Both languages share the same data model and type system. They both take XML as their input and produce XML as their output, and they model XML in the same way. Both can handle untyped documents (those with no schema) as well as schema-validated documents, and in the case of schema-validated documents, they exploit the type information in essentially the same way. One consequence of the fact that the languages share the same data model is that it's likely that applications written in XSLT will be able to interoperate very easily with those written in XQuery.
Both languages are declarative expression-based languages free of side-effects. This means that many of the concepts to be learnt by new users are the same in both cases, and it also means that when you have learnt one of the languages, you have mastered many of the concepts needed to use the other.
The languages share XPath as a common subset. This means that many expressions can be written in exactly the same way in both XSLT 2.0 and XQuery 1.0: same syntax, same semantics. In addition, the two languages share the same library of built-in functions.
Even beyond the domain of XPath, there are many constructs in each language that have a direct parallel in the other, in many cases using similar syntax. The facilities for creating new elements and attributes, while not identical, have more similarities than differences. Similarly, there are syntactic differences in the way user-written functions are declared in XSLT and XQuery, but no real difference in functional capability. Both languages also allow global and local variables to be declared, using different syntax but quite similar semantics.

Syntax Differences

The most obvious difference in the syntactic style of the two languages is that an XSLT 2.0 stylesheet is an XML document, while an XQuery query is not. The XQuery syntax mimics XML in many ways, and many queries encountered in practice are in fact well-formed XML documents, but that's more by accident than by design. It's possible in XQuery to write things that XML does not allow, for example (note the nested quotes):
<a href="{concat("file:///", $filename)}"/> and conversely, there are things that XML allows that you can't do in XQuery, for example you can't replace the "<" operator by an "<" entity reference in an expression such as person[salary < 10000].
What are the practical implications of this difference between XSLT and XQuery?
It is often remarked that the use of XML syntax makes XSLT very verbose. The greater expressiveness of XPath 2.0, and the introduction of user-written functions, makes this less true in XSLT 2.0 than was the case in 1.0, but it remains a valid observation.
XQuery is fully composable: any expression can be nested inside any other. By contrast, XSLT is a two-language system: XPath expressions can be nested inside XSLT instructions, but not vice versa. You can get around this in XSLT 2.0 using function calls (there was also a limited workaround in XSLT 1.0 using variables) but it still means that you need to write a function call when you want to make a callback from XPath into XSLT, for example when computing a sort key.
The fact that an XSLT stylesheet is a well-formed XML document has a number of advantages, however:
Stylesheets can be used as the input or output of a transformation (this is surprisingly common in practice)
XSLT can be embedded in other XML-based languages, and can in turn have other XML-based languages embedded within it. For example, this enables XSLT to support embedded schemas (a schema embedded within a stylesheet) in a way that XQuery can not. Similarly, XSLT can be easily embedded in pipeline processing languages such as Orbeon's XPL.
Because XSLT is XML, rather than merely mimicking XML, the same parser technology can be reused, the whole range of XML techniques can be used when writing stylesheets (for example, use of external entities and CDATA sections) and there are no surprises in store for a user who knows the rules of XML.
Theoretically, another benefit is that it enables a stylesheet to be developed by first creating an XHTML mockup of the required output page, and then replacing elements in the XHTML selectively by instructions that compute those parts of the output that depend on the input. In practice, I haven't seen very much evidence that stylesheet developers work this way, and it doesn't necessarily produce good XSLT code.
Another benefit I have seen from using XML syntax is that it makes the grammar of XSLT much more easily extensible than that of XQuery. Because it tries to make do without any reserved words, and because it mixes a number of different syntactic styles, the grammar of XQuery is a delicate creature. Adding new features like a full-text search capability requires very careful analysis to ensure that no grammatical ambiguities are introduced. By contrast, it's very easy to extend XSLT with new instructions or new attributes, without any risk of ambiguities or backwards incompatibilities. This means that it's quite possible for such extensions to be implemented by vendors (or even by third parties) as well as by the XSL Working Group itself.
It is by no means always true that an XSLT stylesheet (whether 1.0 or 2.0) is longer than the equivalent in XQuery. Consider the simple task: create a copy of a document that is identical to the original except that all NOTE attributes are omitted. Here is an XSLT stylesheet that does the job. It's a simple variation on the standard identity template that forms part of every XSLT developer's repertoire:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="*"> <xsl:copy> <xsl:copy-of select="@* except @NOTE"/> <xsl:apply-templates/> </xsl:copy> </xsl:template> </xsl:stylesheet> In XQuery, lacking an apply-templates instruction and built-in template rules, the recursive descent has to be programmed by hand:
xquery version 1.0; declare function local:copy($node as element()) { element {node-name($node)} { (@* except @NOTE, for $c in child::node return typeswitch($c) case $e as element() return local:copy($a) case $t as text() return $t case $c as comment() return $c case $p as processing-instruction return $p } }; local:copy(/*) That's 9 lines for XSLT, 13 for XQuery. More significant that the difference in length, however, is that the XQuery solution is wrong; it loses unused namespace declarations. There appears to be no way in XQuery 1.0 to copy an element along with all its in-scope namespaces unless the contents of the element are also copied exactly.
Apart from the use of true XML versus merely XML-like syntax, the main difference between the two languages at the syntactic level is probably the way that modularity is handled. XSLT has an include/import mechanism whose detailed semantics discourage independent compilation of modules (an imported module doesn't even need to be a valid standalone stylesheet, for example it can contain references to global variables declared by its caller). By contrast, the import module feature in XQuery is designed to allow library modules to be compiled independently of each other and linked at run-time. On the other hand, the XSLT approach, with its use of import precedence to select the templates and functions that are invoked at run-time, provides a mechanism for polymorphism (and thus, customization of stylesheets) that is entirely missing from XQuery.
Do these differences in syntactic style make a real difference to the effectiveness of the languages? I suspect the answer is, not as much as you might expect. Both styles have advantages and disadvantages, and if you discount the psychological factors, all of the disadvantages have practical workarounds.

Differences in Functionality

In this section I will try to answer the question, what functionality is there in XSLT 2.0 that has no direct equivalent in XQuery, and vice versa?
In nearly all cases, the absence of certain functionality from one of the languages can be worked around. However, the workarounds may be very inconvenient, so these differences can have a real impact on the cost, or even the viability, of writing certain applications.
I'll structure this by looking at the facilities present in one language that are absent from the other, and discuss the workarounds available.

XQuery facilities missing from XSLT

FLWOR expressions

The most obvious XQuery feature with no direct analogue in XSLT is the FLWOR expression.
In fact, with one small exception, nearly every FLWOR expression can be translated fairly mechanically into equivalent XSLT constructs. The mapping is shown in the table below:

Parts of a FLWOR expression, and their XSLT equivalents
Construct	XSLT equivalent
`for $var in SEQ`	`<xsl:for-each select="SEQ"> <xsl:variable name="var" select="."/>`
`let $var := SEQ`	`<xsl:variable name="var" select="SEQ"/>`
`where CONDITION`	`<xsl:if test="CONDITION">`
`order by $X/VALUE`	`<xsl:sort select="VALUE"/>`

Table 1

Thus nearly every FLWOR expression has a direct equivalent in XSLT. For example, to take a query from the XMark benchmark:
for $b in doc("auction.xml")/site/regions//item let $k := $b/name order by $k return <item name="{$k}">{ $b/location } </item> is equivalent to the XSLT code:
<xsl:for-each select="doc('auction.xml')/site/regions//item"> <xsl:sort select="name"/> <item name="{name}" <xsl:value-of select="location"/> </item> </xsl:for-each> In the above example, the variables that appear in the FLWOR expression have been eliminated entirely from the XSLT version, since everything can be handled using the implicit variable "." (the context item) that's available within <xsl:for-each>.
Here's another more complex example from the XMark benchmark. The query is:
for $p in doc("auction.xml")/site/people/person let $a := for $t in doc("auction.xml")/site/closed_auctions/closed_auction let $n := for $t2 in doc("auction.xml")/site/regions/europe/item where $t/itemref/@item = $t2/@id return $t2 where $p/@id = $t/buyer/@person return <item> {$n/name} </item> return <person name="{$p/name}">{ $a }</person>and here is the XSLT 2.0 equivalent.
<xsl:for-each select="doc('auction.xml')/site/people/person"> <xsl:variable name="p" select="."/> <xsl:variable name="a" as="element(item)*"> <xsl:for-each select="doc('auction.xml')/site/closed_auctions/closed_auction"> <xsl:variable name="t" select="."/> <xsl:variable name="n" select="doc('auction.xml')/site/regions/europe/item [$t/itemref/@item = @id]"/> <xsl:if test="$p/@id = $t/buyer/person"> <item><xsl:copy-of select="$n/name"/></item> </xsl:if> </xsl:variable> <person name="{$p/name}"> <xsl:copy-of select="$a"/> </person> </xsl:for-each> The translation here isn't purely mechanical, but it shows that it can be done: I've avoided declaring extra variables in XSLT where they aren't needed. The simplification of the evaluation of $n by replacing a FLWOR expression with a simple path expression could have been done equally well in the XQuery original. (There seems to be a tendency among XQuery programmers, especially those from a SQL background, to use complex FLWOR expressions in cases where a simple path expression with a predicate would suffice.)
There is one feature of FLWOR expressions that doesn't have any direct equivalent in XSLT. In fact, I believe it's the only feature of XQuery that has no functional equivalent in XSLT. This is the ability in the order by clause to refer to variables from any of the containing for clauses. This means that a FLWOR expression containing more than one for clause is not directly equivalent to a set of nested for loops. I have to say that I have found no application that requires this feature. The feature greatly complicates the semantic definition of FLWOR expressions, by requiring the introduction of a concept of tuples that doesn't appear in the user-visible data model. My feeling is that this is an alien import from the relational world. In the XML world, tuples are not needed because element nodes provide quite enough structuring capability.
The fact that FLWOR expressions can in nearly every case (and in all practical cases that I've come across) be rewritten using equivalent XSLT constructs doesn't mean that the distinction is purely cosmetic. There's no doubt that FLWOR expressions make it easier to write complex queries involving multiple joins across different sets of nodes (loosely, "tables"). It could also be argued that the XSLT for-each construct makes it easier to navigate the implicit hierarchic relationships in the XML structure: by using a singular context item to identify position, rather than multiple range variables, it is optimizing for hierarchic navigation rather than joins. It can thus be seen as a usability trade-off, perhaps an example of how XSLT is biased towards document-oriented XML while XQuery is biassed towards data-oriented XML.
Does the difference matter as far as optimizability is concerned? I'm prepared to be proved wrong, but I don't think it does. I think an optimizer can recognize and optimize the join constructs in the XSLT version of the code just as easily as it can in the XQuery version. XSLT optimizability starts to suffer when you make extensive use of template rules, because they are so dynamic in their nature; but if you confine yourself to the subset of the language that maps directly to XQuery, then it's just as optimizable. (Which is not to say I would recommend doing this: there is more to life than performance.)
For users with a SQL background, and for users doing data-intensive tasks, the FLWOR expression is certainly more concise and more expressive than its XSLT expansion. But I think this amounts to a psychological difference, not one that has measurable benefits in the cost of developing applications.

Other XQuery Facilities

Having looked at FLWOR, are there any other XQuery facilities that have no direct equivalent in XSLT?
Static typing is one. Static typing (which I prefer to call pessimistic static typing) requires a query to be written in a way that ensures that all type errors can be detected at compile time. For example, under static typing it's an error to write book[author eq 'Kay'] if the schema allows a book to have more than one author. The construct is permitted, however, if books can have no author, because the eq operator doesn't make it a type error if one of the operands is an empty sequence.
Static typing is optional for XQuery implementations, and it remains to be seen how many implementations will offer it. I've steered clear of it in Saxon, because I find that for many applications - those where the schema is reasonably liberal - it causes the compiler to produce more spurious errors than real coding errors. I've preferred to implement "optimistic static typing" instead: the compiler gives errors only for constructs that are bound to fail at run-time (like adding an integer to a string), not for constructs that might succeed or might fail.
The relationship of XSLT to static typing has been left somewhat ambiguous. It's possible in theory for an XSLT implementor to provide static type checking, but the specification doesn't attempt to define all the detailed inference rules that they would need to apply.
For these reasons, I think the question of static typing should be considered more as a difference between implementations, rather than a difference between the XSLT and XQuery languages as such.
Another XQuery capability which I have already mentioned is not available in XSLT is independent compilation of modules. Again this is to some extent a product feature rather than a language feature. However, it's certainly true that the XQuery module design makes this much easier to achieve than the XSLT design.
XQuery has a typeswitch construct that has no XSLT equivalent. However, it can be emulated with an xsl:choose instruction and a series of instance of tests; or more probably, using xsl:apply-templates. In fact, typeswitch can be regarded as a poor man's substitute for XSLT's template rules, as the example earlier in this paper demonstrates.
One interesting difference between XSLT and XQuery is that the result of an XSLT transformation is always a set of one or more XML documents. By contrast, an XQuery query can produce a result of any type allowed by the data model: for example, the result of the query count(//person) is a number. In XSLT, the result of such an expression needs to be wrapped in an XML document. This can make XQuery more convenient to integrate into host languages in the way that SQL is often integrated. However, it's not really a substantive difference, since the XML wrapper that's needed in XSLT is easily stripped off by the application.
And that, I think, is that. As far as I can tell, every other construct in the XQuery language has a direct equivalent in XSLT 2.0. Now let's look at some of the things in XSLT 2.0 that are not present in XQuery.

XSLT 2.0 facilities missing from XQuery

I will try to cover each of these quite briefly.

Template Rules

I'll take this first because it is the most obvious XSLT feature that has no direct equivalent in XQuery.
Template rules, and the xsl:apply-templates instruction, are the distinguishing characteristic of most XSLT stylesheets. You don't have to use this style of programming, but most of the experts recommend it, especially when handling document-oriented as distinct from data-oriented XML input. The benefit of writing your code this way is that it's much easier to make your code resilient to variations in the structure of input documents: a template rule says what should be done with a <loc> element, for example, independently of where that element is found in the source document.
Template rules make life really difficult for an optimizer. The very flexibility, the independence of the stylesheet structure from the schema structure, makes it almost impossible for an XSLT compiler to know the circumstances in which particular template rules will be activated, or the conditions that will apply when they fire. Optimization is important when dealing with a few megabytes of data, but when the volumes reach gigabytes and terabytes, you can't survive without it: and that's why XQuery offers no equivalent feature.
What this seems to tell us is that XSLT is designed primarily for handling modest volumes of document-oriented (semi-structured) XML, while XQuery is designed primarily for handling vast volumes of data-oriented (predictably-structured) XML. Neither language is exclusively confined to that territory, but that's where they are most at home.

Formatting Capabilities

XSLT 2.0 includes facilities to format numbers, dates, and times. For example, the number 1127 can be converted to the string 1.127,00 while the date 2005-04-18 can be displayed as Montag: Achtzehnte April.
The inclusion of such facilities in XSLT reflects its original primary role in rendering XML data for human consumption. XQuery has no corresponding facilities. This suggests a general rule: if the primary purpose of the application is to render data for display to human readers, XSLT might be a better choice than XQuery.

Regular Expression Handling

Both XSLT and XQuery have access to the standard functions matches(), replace(), and tokenize(), which between them provide a great deal of string-handling functionality using regular expressions. The matches() function tests whether a string matches a regular expression; the replace() function replaces substrings that match a regular expression with a replacement string, and the tokenize() function splits a string into substring by detecting separators that match a regular expression.
In addition to these facilities XSLT 2.0 offers an instruction xsl:analyze-string that has no equivalent in XQuery. Unlike any of the functions listed in the previous paragraph, this allows generalized processing of each matched or non-matched portion of an input string, where the processing can include, for example, creation of new element nodes. The following code, for example, replaces the text see [Kay, 93] with see <citation><author>Kay</author><year>93</year></citation>.
<xsl:analyze-string select="$input" regex="\[(.*),(.*)\]"> <xsl:matching-substring> <citation> <author><xsl:value-of select="regex-group(1)"/></author> <year><xsl:value-of select="regex-group(2)"/></year> </citation> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> The only way of achieving this transformation using XQuery 1.0 is to write some fairly convoluted recursive functions.
The xsl:analyze-string instruction is often particularly useful in up-conversion applications, where the task is to generate XML markup by recognizing textual patterns in an input file. As such it is often used in conjunction with XSLT's unparsed-text() function, which reads a non-XML input file. Again, this has no XQuery equivalent, though it could easily be added as a vendor extension function.

Grouping

Grouping is one of the more difficult tasks to accomplish with XSLT 1.0, but this will become history with the adoption of XSLT 2.0 and its xsl:for-each-group instruction.
Grouping can be defined as identifying implicitly-related sets of nodes in a source tree and making their relationship explicit by adding a hierarchic level to the markup. Often the implicit relationship is that the nodes in a group share a common value for a grouping key: for example, a group might consist of all employees based at the same location, or all employees in the same age bracket. Such examples of grouping are similar to the applications of the GROUP BY construct in SQL, except that SQL does not deal in hierarchies, so its GROUP BY cannot actually create a hierarchic level, all it can do is to compute some aggregate over the members of the group, such as their average salary.
Another difference between XML grouping and relational grouping is that in XML, order is significant, and the implicit relationship used to identify a group often takes order into account. A common example is from XHTML, where there is an implicit relationship between an h2 element and the p and other sibling elements that follow it, up to the next h2. The xsl:for-each-group instruction handles this situation with a group-starting-with option. Positional groups can also be identified by matching the last element in the group, and positional and value-based grouping can be combined with the group-adjacent feature, which recognizes a group as comprising a set of items that (a) are adjacent, and (b) share a common value for a grouping key.
The classic circumvention to the absence of a grouping capability in XSLT 1.0 is the technique known as Muenchian grouping, after Steve Muench of Oracle who invented it. It relies heavily on keys - another feature that is missing from XQuery. All is not lost for XQuery users, however, because there is a new function distinct-values(). Like most other grouping techniques, Muenchian grouping essentially involves two nested loops: an outer loop which identifies and iterates over the distinct values of the grouping key, and an inner loop that locates and processes every item from the input that has that value as its grouping key. In XQuery 1.0, the outer loop can be implemented using the distinct-values() function, and the inner loop using an XPath filter expression (or a FLWOR expression if you prefer) that selects the items having that value. The challenge for an implementation is to detect this coding pattern and implement it with better than O(n2) performance. (Muenchian grouping will generally be O(n log n), and the same can be expected of XSLT 2.0's xsl:for-each-group).
For positional grouping problems, XQuery 1.0 users will have to fall back on the same techniques used by XSLT 1.0: basically a recursive head-tail traversal of the input sequence, passing whatever parameters are needed to recognize the pattern that defines the grouping criterion. (Users unlucky enough to find that their XQuery implementation has no support for the following-sibling axis, which is optional according to the XQuery specification, will really have a challenge on their hands.)
Grouping arises in two main ways. Firstly, it arises in up-conversion applications where the true structure of the data is not explicit in the markup. Converting XHTML to a structure with a nested section hierarchy can be seen as an example of this. Secondly, grouping arises in reporting applications, where the groups are not really intrinsic to the structure of the data, but rather a way of presenting the data for human consumption. For both up-conversion and reporting applications, XSLT 2.0 appears to offer many advantages over XQuery 1.0, of which grouping is just one example.

Keys

XSLT 1.0 users will be aware that one of the most powerful mechanisms for tuning the performance of stylesheets is by means of keys. The xsl:key function allows you, in effect, to declare an index, while the key() function allows you to select nodes by using the index.
Keys can give dramatic performance benefits: I once improved the execution time of a stylesheet running over a 40Mbyte input file from 90 minutes to 45 seconds, and the improvement was all due to keys.
There is no equivalent facility in the XQuery language. However, there may well be equivalent facilities in the better XQuery implementations. The absence of this feature reflects the fact that XQuery is rooted in database thinking. In a database system, indexes are persistent: they are a property of the data, not a property of an individual query. Any XML database with XQuery support is likely to offer some kind of physical database design tool that allows you to define indexes on your data. Another part of the database tradition, at least since SQL appeared on the scene, is that indexes should be exploited automatically when the optimizer finds a query predicate that can take advantage of them; it shouldn't be the job of the query author to say when indexes should and should not be used.
This rather leaves open the question as to what non-database XQuery implementations will do. What if that stylesheet of mine, with a 40Mb input file, had been a query? The answer is that it depends on the vendor. Saxon (whose XQuery implementation operates on in-memory documents only) has provided a vendor extension similar in capability to XSLT's keys, and other products may do the same. However, the resulting queries will not be portable across implementations.

User Perceptions

In the previous section I attempted an objective analysis of the functionality of the XSLT and XQuery languages, with some attempt to analyze the impact of the differences on the cost and viability of projects.
In practice, technologies are often chosen on less objective criteria, and it would be a mistake to dismiss the softer factors as irrelevant. In practice, users don't choose one language over another because they have measured scientifically that it has a shorter learning curve, but they may well choose it because they have heard anecdotally that this is the case, or because they have been persuaded of this by a product salesperson, or because they decide that this is likely to be the case after an hour or a day struggling with the concepts. Perceptions in this area matter as much as reality.
XSLT has been around for six years or so, while XQuery is still very new. It's therefore quite difficult to compare the way people react to the two languages. Even for XSLT, it's very hard to get a coherent picture. There seems to be some polarization: people either love it or hate it. By and large it's probably true that the people who love it use it, and those who hate it don't. This might seem obvious, but there are plenty of technologies that people use despite disliking them intensely, because they feel they have no choice.
What is it about XSLT that some people don't like? I think there are probably three main areas where one encounters resistence. Firsly, people are put off by the verbosity of the XML-based syntax. Secondly, people struggle with the fact that XSLT is a non-procedural language: they find it difficult to learn to express their transformations in a declarative way rather than in the form of a procedural algorithm. Thirdly, some people find the concept of rule-based programming difficult to master. There are some other conceptual hurdles in XSLT as in any language, of course (namespaces and whitespace handling come to mind), but I think these three are probably the biggest ones.
XSLT 2.0 has done a lot to address the first of these problems: complex logic in XSLT 2.0 can be expressed much more concisely. But the reputation will probably stick. Mark Fussel's famous blog explaining Microsoft's decision to drop XSLT in favour of XQuery, used syntax as its major selling point, and fudged any difference between the 1.0 and 2.0 versions of the language, showing eloquently how it's perceptions that matter.
How much does syntax really matter? Probably more than it should. Just as we form our first impressions of other people from the first eye contact, so we quickly make an assessment of a programming language from the first piece of code we see in that language. If that code looks ugly, the language has an uphill struggle.
As for the other conceptual hurdles, the fact that XQuery doesn't have template rules, and therefore looks and feels a lot more like SQL, is probably attractive to many people (especially, of course, SQL users). Among database query language afficionados XSLT has little credibility simply because it doesn't look and feel like a database query language (which is not surprising, given that its origins are much more founded in functional programming languages such as Scheme). These are all issues of perception, I think, but they will ensure that there is a community that likes XQuery and doesn't like XSLT, regardless of functionality.
Among many experienced XSLT users one can sense a similar distaste for XQuery. There are a number of reasons for this. One is sheer incomprehension: why should anyone be interested in a language that has so little functionality compared to XSLT? To some people, it feels as if W3C has spent five years taking a giant step backwards. Another is a feeling of resentment that XQuery has hijacked the future direction of XPath, overturning some of the design principles in XPath 1.0 such as weak typing and avoidance of dynamic errors. Associated with this is a feeling by old-timers that XML was designed for documents and that its increasing use for structured data is compromising the design principles.
Programming languages have always attracted an emotional response, so the existence of such a range of reactions should not be surprising. For the user evaluating technologies and trying to make an informed choice, the difficulty is to separate the emotion (whether it comes from your own developers or from vendor salespeople) from the reality. A healthy dose of scepticism is appropriate.

Implementation Factors

In choosing a language, you can't make a decision based solely on the features of the language itself: you also have to consider features of the language implementations. The product implementations define many important factors that are outside the scope of the W3C specifications: performance, product maturity and support, APIs and integration with other technologies, configurability, diagnostics, instrumentation.
I'm not concerned here with the virtues of one XSLT processor over another, or one XQuery implementation over another. I'm more interested in asking whether there are general characteristics of XSLT implementations that differ from the characteristics of XQuery implementations.
There are at least three reasons that such differences might exist:
The languages have inherent characteristics that constrain implementations.
The languages are aimed at different markets, and implementors optimize their products for the requirements of those markets.
XQuery implementors use other XQuery implementations as their benchmark, while XSLT implementors measure themselves against other XSLT implementations. The two therefore drift in different directions, with different metrics being considered important.
I think there are probably two qualities that are most important to users: product maturity (implying reliability, availability of support and training, availability of skilled developers, and so on) and performance.
On the maturity side, XSLT clearly has a head start, having been around for six years or so already. It's easy to verify that there are more XSLT developers, more training courses, more books, more active internet forums, and all the rest. Saxon alone has had a quarter of a million downloads, and is bundled with packages that ship in many times that quantity. The numbers put XSLT into the same kind of league as Perl, Python, or PHP as mainstream web programming languages. It's much too early to collect comparable data for XQuery.
Interestingly, there are a lot more XQuery 1.0 implementations than XSLT 2.0 implementations (until very recently Saxon had the field to itself among XSLT 2.0 processors). Perhaps this is itself an indicator of maturity: a mature market moves forward with less haste, and the number of vendors in the race tends to drop with time rather than increasing.
Of course another factor that matters to many users is which vendors are supporting which language. For some people, the fact of Microsoft stopping support for Java or XSLT is enough reason to abandon the language. Many other users, however, welcome the opportunity to reduce their dependence on one supplier.
Performance is a big open question. I don't know of any serious studies so far to compare XSLT and XQuery performance. It would be quite hard to set the rules for such a race: how much would one be allowed to optimize the way the code is written to take best advantage of each language? I suspect that for a transformation-style workload (documents in, documents out) there wouldn't be a decisive win for either, and that the variations between products would be greater than any differences between the two languages.
The big area where XQuery should win is in handling queries on XML databases. In fact, in that area, it's probably a no-contest: no-one is building XSLT engines to tackle that job. It's in this area that query optimization is absolutely vital, and all the restrictions in XQuery functionality compared with XSLT can be seen as having one aim in life: to make it possible to optimize queries against large databases. I've heard people argue that XSLT could be optimized just as well, but I don't think the point will ever be tested, because I don't think anyone will seriously try to do it.

Summary and Conclusions

Let's try to draw some conclusions.
Firstly, in functionality alone, there is no doubt that XSLT 2.0 wins over XQuery 1.0. There are many jobs that XSLT 2.0 can do easily that are really difficult in XQuery 1.0. Many of these fall into the categories of up-conversion applications or rendition applications, but there are plenty of others. The example given earlier, of a stylesheet/query that copies a document except for the NOTE attributes, illustrates the point.
Secondly, it's probably true at present that XSLT is better at manipulating documents, and XQuery is better at manipulating data. Both languages should be able to do both jobs, but they seem to be better at some aspects of the job than others.
The extra verbosity of XSLT (which still applies although to a lesser extent with XSLT 2.0) is probably most noticeable with very simple queries ("count how many employees will retire this month"). I find myself increasingly using XQuery for such one-liners in preference to XSLT. This applies whether it's an ad-hoc throwaway query, or something built into a Java application. In many such cases, in fact, all one needs is an XPath expression, and of course XPath is a pure subset of XQuery.
If you are building XML databases, whether "native" XML databases or XML-over-relational databases, XQuery is certainly the language of choice. If you are transforming XML documents in filestore or in memory, I think it's much harder to justify preferring XQuery over XSLT at this stage of the game. In a year's time, perhaps there will be more data to justify making this choice especially for data-oriented applications, but my feeling is that anyone who does so today is probably attaching rather too much weight to subjective criteria.
I would actually encourage any serious XML developer to have both tools in their kitbag. Once you have learnt one, it's easy enough to learn the other. I think that with time, there will be a good level of interoperation between XSLT and XQuery products, so using one language for one task doesn't get in the way of using the other language for another part of the same application. XQuery clearly wins for the database access, XSLT for the presentation side of the application; there are other bits, such as the business logic, where in many cases either language will do the job and it becomes a matter of personal preference.
Slogans can be simplistic, but I think a one-line conclusion is not too far out: XQuery is for query, XSLT is for transformation.

Wednesday, 21 August 2013

Comparing XSLT and XQuery