書(shū)名： Hands-On Data Science with SQL Server 2017
作者名： Marek Chmel Vladimír Mu?n?
本章字?jǐn)?shù)： 919字
更新時(shí)間： 2021-06-10 19:14:02

Working with XML data

SQL Server is more than capable of working with XML data. Since the 2005 version, it's actually one of the supported data types, which allows you to store up to 2 GB of XML content per record. When you type a query for SQL Server, the query returns data as a rowset—set of rows and columns. But you can retrieve all the data as an XML document too. You can add a special clause to the query, FOR XML, which will render the result as an XML document instead of the result set.

When you would like to create a XML document as a result of your query, you can use four different modes to render the XML file. These are as follows:

Auto
Path
Explicit
Raw

The raw mode will just generate an element representing a row from the result set. In this mode, each line will be used to build an element named <row>, with attributes representing the columns from the result.

Let's consider a query to find orders for a customer. This query will link two tables with a JOIN operator, and for each customer, it will list all the orders placed by that customer. If a customer has not placed an order, the customer will be skipped in the result set:

SELECT c.[CustomerID],soh.SalesOrderID FROM Sales.Customer c
JOIN [Sales].[SalesOrderHeader] soh ON c.customerID = soh.CustomerID

When we would like to format the output in the form of an XML file, we can add the FOR XML directive to the end of the query:

SELECT c.[CustomerID],soh.SalesOrderID FROM Sales.Customer c
JOIN [Sales].[SalesOrderHeader] soh ON c.customerID = soh.CustomerID
FOR XML RAW

Such a result will generate an XML document with the following fragment:

<row CustomerID="11000" SalesOrderID="43793" />
<row CustomerID="11000" SalesOrderID="51522" />
<row CustomerID="11000" SalesOrderID="57418" />
<row CustomerID="11001" SalesOrderID="43767" />
<row CustomerID="11001" SalesOrderID="51493" />
<row CustomerID="11001" SalesOrderID="72773" />
<row CustomerID="11002" SalesOrderID="43736" />

For one customer, we have several different XML elements, where each element represents a single order. The AUTO mode generates nesting for such XML elements. You have little to no control over the nesting in this mode, since that is done via heuristics and controlled by the SQL Server engine itself. However, if we were to use the query to grab the same data using AUTO mode, then the results would look more interesting:

SELECT c.[CustomerID],soh.SalesOrderID FROM Sales.Customer c
JOIN [Sales].[SalesOrderHeader] soh ON c.customerID = soh.CustomerID
FOR XML AUTO -- AUTO mode instead of the RAW mode

The results are now nested in a way that one customer element encloses more order elements if there are more orders for one customer:

<c CustomerID="11000">
  <soh SalesOrderID="43793" />
  <soh SalesOrderID="51522" />
  <soh SalesOrderID="57418" />
</c>
<c CustomerID="11001">
  <soh SalesOrderID="43767" />
  <soh SalesOrderID="51493" />
  <soh SalesOrderID="72773" />
</c>

This is particularly useful for many queries that include JOIN operations to link more tables together via their keys. The XML document on the output is not formatted well, so we can add a few more options to the query definition, such as naming the elements and adding a root document:

SELECT c.[CustomerID],soh.SalesOrderID FROM Sales.Customer c
JOIN [Sales].[SalesOrderHeader] soh ON c.customerID = soh.CustomerID
FOR XML RAW('customer'), ROOT('CustomerList')

This row tagging is, however, not allowed in AUTO mode and you would need to use a more advanced mode, FOR PATH.

Once we know how the XML output can look on the SQL Server, we can think of importing the data as XML into the SQL Server. There are two approaches we can take here. First option is to open the XML document and parse the information into columnar storage, to extract each value from the attribute, and store the values in proper columns. The second option would be taking the whole XML document and storing it in the column of an XML data type.

We will prepare a new table for importing XML documents via the following set of commands:

CREATE DATABASE ImportedData
GO
USE ImportedData
GO

CREATE TABLE XMLwithOpenXML
(
    Id INT IDENTITY PRIMARY KEY,
    XMLData XML,
    LoadedDateTime DATETIME
)

Once the storage for the documents is created, we can use a function named OpenRowset, which can open the XML document and store the document in the table:

INSERT INTO XMLwithOpenXML(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn) AS BulkColumn, GETDATE()
FROM OPENROWSET(BULK 'C:\SQLData\Import.xml', SINGLE_BLOB) AS importedXML;

XML documents stored in SQL Server have a limited size of 2 GB, and if you consider storing numerous XML documents in a table and then performing any query on such a large amount of data, this can get very slow in terms of processing time. SQL Server can help you speed up and optimize operations with the XML data type by implementing four different types of XML indexes that can be used. You can have a primary XML index and three different kinds of a secondary XML index for your XML data type.

The primary XML index provides a basic object tree representing the XML document structure. This is used to speed up access to the elements and attributes of the XML document without the need to read the whole XML document stored in the table. The secondary indexes are used for specific types of queries based on the XML type functions used to access the data; those indexes are for PATH, PROPERTY, and VALUE, and used with XQuery. Secondary indexes can only be created if the primary index is in place.

An XML index can be created either with T-SQL code or with SQL Server Management Studio (SSMS). If you use SSMS, then you'll need to navigate to your table where you want to create the index and right-click on the Indexes item:

Once the primary index is available, you can create a secondary XML index for the proper XQuery functions, which you'll run on your XML data:

官术网_书友最值得收藏!

Hands-On Data Science with SQL Server 2017

Working with XML data