XML database explained

An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.

Rationale for XML in databases

There are a number of reasons to directly specify data in XML or other document formats such as JSON. For XML in particular, they include:[1] [2]

Steve O'Connell gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa".[4] It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format. In content-based applications, the ability of the native XML database also minimizes the need for extraction or entry of metadata to support searching and navigation.

XML-enabled databases

XML-enabled databases typically offer one or more of the following approaches to storing XML within the traditional relational structure:

  1. XML is stored into a CLOB (Character large object)
  2. XML is `shredded` into a series of Tables based on a Schema[5]
  3. XML is stored into a native XML Type as defined by ISO Standard 9075-14[6]

RDBMS that support the ISO XML Type are:

  1. IBM DB2 (pureXML[7])
  2. Microsoft SQL Server[8]
  3. Oracle Database[9]
  4. PostgreSQL[10]

Typically an XML-enabled database is best suited where the majority of data are non-XML. For datasets where the majority of data are XML, a native XML database is better suited.

Example of XML Type Query in IBM DB2 SQL

select id, vol, xmlquery('$j/name', passing journal as "j") as name from journalswhere xmlexists('$j[licence="CreativeCommons"]', passing journal as "j")

Native XML databases

Native XML databases are especially tailored for working with XML data. As managing XML as large strings would be inefficient, and due to the hierarchical nature of XML, custom optimized data structures are used for storage and querying. This usually increases performance both in terms of read-only queries and updates.[11] XML nodes and documents are the fundamental unit of (logical) storage, just as a relational database has fields and rows.

The standard for querying XML data per W3C recommendation is XQuery; the latest version is XQuery 3.1.[12] XQuery includes XPath as a sub-language and XML itself is a valid sub-syntax of XQuery. In addition to XPath, some XML databases support XSLT as a method of transforming documents or query results retrieved from the database.

Language features

NameLicenseNative LanguageXQuery 3.1XQuery 3.0XQuery 1.0XQuery UpdateXQuery Full TextEXPath ExtensionsEXQuery ExtensionsXSLT 2.0XForms 1.1XProc 1.0
BaseXJava
eXistJava
MarkLogic ServerC++
OpenText xDBJava
Oracle Berkeley DB XMLC/C++
QizxJava
SednaApache License 2.0C/C++

Supported APIs

NameXQJXML:DBRESTfulRESTXQWebDAV
BaseX
eXist
MarkLogic Server
Qizx
Sedna

Data-centric XML datasets

For data-centric XML datasets, the unique and distinct keyword search method, namely, XDMA[13] for XML databases is designed and developed based on dual indexing and mutual summation.

External links

Notes and References

  1. Web site: Nicola. Matthias. 5 Reasons for Storing XML in a Database. Native XML Database. 17 March 2015. 28 September 2010.
  2. Feldman. Damon. Moving from Relational Modeling to XML and MarkLogic Data Models. MarkLogic World. http://world.marklogic.com/. 11 April 2013. 17 March 2015.
  3. NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley Educational Publishers Inc, 2009
  4. O'Connell, Steve. Advanced Databases Course Notes. Section 9.2. Syllabus. 2005. University of Southampton. Southampton, England.
  5. Book: Oracle XML DB Developer's Guide, 10g Release 2. August 2005. Oracle Corporation. http://docs.oracle.com/cd/B19306_01/appdev.102/b14259/xdb05sto.htm. 17 March 2015. XML Schema Storage and Query: Basic. . Section Creating XMLType Tables and Columns Based on XML Schema
  6. Web site: ISO/IEC 9075-14:2011: Information technology -- Database languages -- SQL -- Part 14: XML-Related Specifications (SQL/XML). International Organization for Standardization. 17 March 2015. 2011.
  7. Web site: pureXML overview -- DB2 as an XML database. IBM Knowledge Center. IBM. 17 March 2015.
  8. Web site: Using XML in SQL Server. Microsoft Developer Network. Microsoft Corporation. 17 March 2015.
  9. Book: Oracle XML DB Developer's Guide, 10g Release 2. August 2005. Oracle Corporation. http://docs.oracle.com/cd/B19306_01/appdev.102/b14259/xdb04cre.htm. 17 March 2015. XMLType Operations.
  10. Book: PostgreSQL 9.6 Documentation. https://www.postgresql.org/docs/9.6/static/datatype-xml.html. 1 April 2017. 8.13. XML Type.
  11. Web site: Matthias. Nicola. XML versus Relational Database Performance. Native XML Database. 28 Jun 2017. 22 August 2010.
  12. Web site: XQuery 3.1 Recommendation . 2017-03-21.
  13. Selvaganesan. S.. Haw. Su-Cheng. Soon. Lay-Ki. XDMA: A Dual Indexing and Mutual Summation Based Keyword Search Algorithm for XML Databases. International Journal of Software Engineering and Knowledge Engineering. en-US. 24. 4. 591–615. 10.1142/s0218194014500223. 2014.