XML Capabilities - FORXML Clause
With the release of SQL Server 2000, a lot more
support and functionality for XML was included in the product. In this article
we would target at some of the out of the box functionality that SQL Server
2000 offers us to convert our relational data as XML formatted data. Not
wasting much of the time lets get into the usage of these functionality into
our application ...
Well, New with SQL Server 2000 is the ability to
return the results of a query in XML format. This is accomplished by adding the
FOR XML clause at the end of the SELECT statement. The syntax is as follows:
FOR XML
mode
[, XMLDATA] [, ELEMENTS] [, BINARY BASE 64]
The arguements in the braket suggest that these are
optional. But lets understand each of these components before proceeding
further.
mode – this is the only required
argument. It specifies how the XML will be returned in the result set. There
are 3 values that can be used:
-
AUTO – returns query results as nested XML elements.
-
EXPLICIT – you specify the format of the
results.
-
RAW – each row is returned as an XML element.
XMLDATA – when specified, the
schema is returned along with the results.
ELEMENTS – this argument can only
be used in conjunction when mode is AUTO. The query results are returned as XML
sub-elements.
BINARY BASE64 – if the resultset
being returned will contain binary data then this attribute specifies what form
the results will look like. The binary data is returned as BASE 64.
Having laid down the introduction to understanding the
usage of XML capabilities in SQL Server 2000. Lets take a closer look into each
of these methods available to us in prducing XML from relational data.
Note: All the examples explained in this article would
use Pubs database (bundled with SQL Server 2000 standard installation)
RAW
Since the RAW format is the simplest of all and the
basic of all. Just run the following query:
Select au_Id, au_lname, au_fname
from authors
FOR XML RAW
The above example is an simple RAW XML format. The
output from the query would be something like:
XML_F52E2B61-18A1-11d1-B105-00805F49916B
--------------------------------------------------------------------------------
<row au_Id="409-56-7008" au_lname="Bennet" au_fname="Abraham"/>
<row au_Id="648-92-1872" au_lname="Blotchet-Halls" au_fname="Reginald"/>
......
The first thing you will notice is the column name,
its quite strange. This is a GUID that is just a placeholder for the name of
the column in the rowset. All results returned using FOR XML will have a GUID
prefixed by “XML_”. You cannot get rid of this GUID from appearing from your
header. For the further discussion we will ignore this header information.
All the columns in each row of the result set are
returned as attributes, with the result set beginning with the element row
as seen in the example above. The only time a column is not returned as an
attribute is when a column has a NULL value. Having said this we are interested
in the presense of the attribute. You can avoid this by specifying the
ISNULL expression on the columns. I would extend the above example and I am
assuming we have NULL fnames.
Select au_Id, au_lname,
ISNULL(au_fname,'') as [au_fname] from authors
FOR XML RAW
AUTO
The big difference between Raw and Auto is that AUTO
mode returns query results as nested XML elements. Lets extend the same example
we used above to understand the output of this method:
Select
au_Id, au_lname, au_fname from authors
FOR XML AUTO, Elements
The above query results in an element centric output
like below:
<authors>
<au_Id>409-56-7008</au_Id>
<au_lname>Bennet</au_lname>
<au_fname>Abraham</au_fname>
</authors>
<authors>
<au_Id>648-92-1872</au_Id>
<au_lname>Blotchet-Halls</au_lname>
<au_fname>Reginald</au_fname>
</authors>
<authors>..
One more interesting observation you can make from the
XML structure is that we now get the table name as the master
element tag name unlike row in the RAW format. To sum it up,
the differences being that the results are returned in a hierarchal order with
the column values as attributes and the table names as elements. This is much
nicer and easier to read. Hey !!! Where is the hierarchy you are trying to tell
? Well the above query was very basic and naive. Lets get a more complex query
that would get us a hierarchy. Now, lets get the list of all the titles
along with the authors information. Just run the below query
and whatch the output ...
Select
authors.au_Id, au_lname, au_fname, title from authors
Inner Join titleauthor
ON authors.au_id=titleauthor.au_id
Inner Join titles
ON titleauthor.title_id = titles.title_id
FOR XML AUTO, Elements
Output:
<authors>
<au_Id>238-95-7766</au_Id>
<au_lname>Carson</au_lname>
<au_fname>Cheryl</au_fname>
<titles>
<title>But Is
It User Friendly?</title>
</titles>
</authors>
<authors>
<au_Id>724-80-9391</au_Id>
<au_lname>MacFeather</au_lname>
<au_fname>Stearns</au_fname>
<titles>
<title>Computer
Phobic AND Non-Phobic Individuals: Behavior Variations</title>
</titles>
</authors>
<authors>....
So you can see that the titles node is represented as
child node automatically (remember AUTO). This is because there can be multiple
titles an author can make. And hence SQL Server has automatically made that as
a child element. Now this maynot meet the requirement of all. We want the
titles to be placed with the normal other information of the authors also.
Hence this is exactly the requirement that leads us to the next method.
EXPLICIT
With AUTO and RAW modes the user had no real control
over the form of the XML results returned by the query. With the EXPLICIT mode
the user has the ability to specify how the results will look and offers much
more flexibility over the AUTO and RAW modes. There is some responsibility that
the user needs to assume when using EXPLICIT mode by ensuring that the XML is
well formed when generated. The user must determine what their results need to
look like and then write the SELECT query to return those results.
Now getting into the intricacies of what EXPLICIT is,
the query needs to written so that a recordset produces something called a Universal
Table in which the XML document is generated. This is In-Memory
table whose sole purpose is to hold the XML document we want to generate. The
query using the EXPLICIT mode requires two columns of meta data. The first
column a named tag (Tag) number, of an integer type. The second column needs to
be a named (Parent) tag number. This column lists the tag number of the parent
element and is also an integer type.
Whew, getting confused already ... :) ... Lets
continue. The Tag column is required to be the first column in a FOR XML
EXPLICIT query. The Parent column is required to be the second column in a FOR
XML Explicit query. Both these columns donot correspond to a database column.
Now follow all the other columns from the tables after these columns. Now
Explicit does impose another format for representing the column name.
ElementName!TagNumber!AttributeName!Directive
The ElementName indicates name for the
elemets. Tagnumber would the nesting tag number. AttributeName
would be the alias for the column to be denoted as an attribute element. Directives
give you the flexibility to specify a series of values:
-
xml – identical to the element
directive except that the data is not encoded whereas the data is encoded when
the element directive is used.
-
hide – any column that has the
hide directive is not included in the XML document. Very handy. But if you
were to hide an element donot include it on first place in your select.
-
cdata – this causes the column
data to be placed in a CDATA section of an XML document. This data must be a
ntext, nvarchar, text, or varchar datatype. If this directive is used then the
attribute name cannot be used.
-
xmltext – handles overflow. Used
in conjunction with OPENXNL, when there are more elements or attributes than
table columns, these extra elements and attributes are sent into a column
(created by OPENXML) to deal with the overflow of data. The xmltext directive
fetches this extra data from this column.
-
element – causes the column data
to be specified as an element instead of an attribute.
Lets take an simple example to see how this works.
SELECT
1 as Tag, null as Parent,
au_fname as
[Author!1!FirstName!Hide],
au_lname as
[Author!1!LastName!Element],
au_id as [Author!1!ID]
FROM authors
FOR XML EXPLICIT
A fragment of the output is below:
<Author
ID="409-56-7008">
<LastName>Bennet</LastName>
</Author>
<Author ID="409-56-7018">
<LastName>Bennet</LastName>
</Author>
<Author ID="648-92-1872"> ...
There are a number of observation from the query
above. First we donot have FirstName tag, this is because we have given an
directive of Hide. Next we can see that the LastName has come as an element
unlike ID which has come as an attribute. You can do much more with XML
EXPLICIT option. We have just touched the surface here.
XMLDATA
This is an interesting part ... For generating the XML
we need a schema to generate. This is exactly what this option gives us. Look
at the example below, we have extended the same example we used above:
SELECT
1 as Tag, null as Parent,
au_fname as
[Author!1!FirstName!Hide],
au_lname as
[Author!1!LastName!Element],
au_id as [Author!1!ID]
FROM authors
FOR XML EXPLICIT, XMLDATA
Output:
<Schema
name="Schema1" xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ElementType name="Author" content="mixed"
model="open">
<AttributeType name="ID" dt:type="string"/>
<element type="LastName"/>
<attribute type="ID"/>
</ElementType>
<ElementType name="LastName"
content="textOnly" model="closed" dt:type="string"/>
</Schema>
<Author xmlns="x-schema:#Schema1" ID="409-56-7008">
<LastName>Bennet</LastName>
</Author>
<Author xmlns="x-schema:#Schema1" ID="409-56-7018">
<LastName>Bennet</LastName>
</Author>
<Author xmlns="x-schema:#Schema1" ID="648-92-1872">...
You can see that the Data schema has been
included in our result set and placed at the top of the actual row results.
Using XMLData comes in handy because it provides a definition of the data in
case you ever want to validate it.
BINARY BASE64
More often than not our queries might have image
datatypes that return a binary object. And these cannot be embedded into our
XML. But what if I were to select one. Lets tackle this problem with an
example.
Select
* from pub_info
Where pub_id=0877
FOR XML RAW
Running this query gives us an very descriptive error
ever to be seen. "FOR XML EXPLICIT and RAW modes currently do not support
addressing binary data as URLs in column 'logo'. Remove the column, or use the
BINARY BASE64 mode, or create the URL directly using the
'dbobject/TABLE[@PK1="V1"]/@COLUMN' syntax." The solution
to the problem is present in the error itself. Lets use BINARY BASE 64 and see:
Select
* from pub_info
Where pub_id=0877
FOR XML RAW, BINARY BASE64
And surprisingly the output is :
<row
pub_id="0877" logo="R0lGODl...AAAOw==" pr_info="This is ...shington, D.C."/>
We can observe that that the data that we have got is
not very readable. It would be great for us to get an pointer to the location
of the object rather than getting so many junk characters. We can use
SQLXML and have a pointer to this data. I wouldnot discuss this in this
article. But I am giving you a solution in hand for sure.
Conclusion
I think we have taken a small ride on the various
features we can use in getting XML data from SQL Server. I would also in the
later articles give you an insight in manipulating XML data at the SQL Server
while inserting / updating etc. But this should form the basis for getting us
started off in using the XML capabilities from SQL Server 2000.
|