XML PersistenceJuly 01, 2001 |
|||||||||
|
Main Points
|
I recently received an email from a reader who is interested in XML persistence.
I have had the same feelings -- the data is XML, why create yet another schema for it. I also agree that Microsoft's solution is cumbersome, but to be fair, so is Oracle's. The difficulty of shredding an XML document into a relational database and pasting it back together can not be overstated. Questions about API, transaction size, and relationship management must be answered.
It is therefore enticing to use a third party product that at least claims to read, write, and query XML documents natively. Native XML database products are sexy at first but fall apart when interrelated documents are stored in them. Native XML databases are not optimized yet for large amounts of data compared to relational databases and none of them can relate documents efficiently. Developers commonly need to "materialize" multiple views from the same persistent store. The hierarchical nature of XML provides much-needed human readability but locks you into a single view of the data. Most modern software applications no longer use hierarchical databases, because end-users almost always ask to see their data in many different ways. XLink, XPointer, and XInclude were not designed to solve this problem efficiently, and except in a very limited domains, they're not going to help. For example, let's say you have two main entities in your universe -- Customers and Orders. Customer support personnel want a document containing all of the company's customers. The sales department wants a document containing a list of orders for the last quarter. It's pretty clear that one XML document isn't going to satisfy both needs. So how many "views" are needed? Two? The service department needs a list of open orders made by a particular customer. That's yet another document. So far we have three types of documents and we're only dealing with two entities! Where does it end? Relational databases were designed to support large amounts of interrelated data. Relational databases can produce an almost infinite number of views of data. On the other hand, a native XML database can only return the documents that you put into it. Some products provide trivial transformations, usually via XSLT. However, they don't let you stitch together data from multiple documents in a flexible and scalable manner. In short, the convenience of native XML databases can be deceiving because as your project grows, you'll probably have to switch to a relational database, losing an enormous amount of development time. A similar thing happened with object-oriented database systems in the late 80s. Just about everyone wanted their software objects to become persistent with a minimum of hand-coding. But the object-oriented databases just didn't scale. They also lacked tools needed for deployment, such as fast backups, fast loaders, and management consoles. Not to mention technical support and a sustainable revenue model. Therefore, a relational database is probably in your
future. Microsoft's and Oracle's proprietary XML solutions are completely
incompatible but solve the same problems in roughly the same way.
Oracle tends to force developers to write Java code whereas Microsoft
focuses more on declarative XML mapping (which is preferable, in my
opinion). Both vendors support an Unfortunately, the customer often dictates which database vendor to use. If you find yourself needing to support both Oracle and SQL Server, the vendors' proprietary extensions aren't an option unless you want to write completely different code for each type of database. I have found that stored procedures are a good way to localize the differences between the vendors. The stored procedures are written in completely different languages (PL/SQL vs T-SQL) but the code that calls them is relatively free of vendor-specific details. Designing a clean API is not trivial. It's unwise to expose the details of the storage system to end-users. Even if you had a magic box that could persist XML documents, would you want to expose it to client code? The learning curve would be very steep and hackers could easily attack it over the Internet. How do you avoid this without hand-writing a lot of code? Unfortunately, I have not found a simple solution. Writing a custom API seems to be inevitable. If you want to study a totally XML-based API that manipulates persistent data, check out UDDI. The storage mechanism completely disappears behind the XML API. The API supports "find", "insert", "update", "drill down", and each verb is described by a separate XML document. The use of GUIDs as keys is interesting if not a bit questionable from a cross-platform perspective. Give the relational database extensions another five years and I'm sure they will be much more viable than they are today. Advancements in XQuery will make quite an impact on functionality, standardization, and openness. I suggest you wait before jumping into the deep end. The bad news is that you do have to design a database schema. I am not aware of a tool that can generate a database schema from, say, an XDR schema. Storing data in a database severely impairs your ability to alter your XML schema during the project lifecycle, especially after you ship. But the good news is that XML provides a buffer between your database schema and your code, so both can (at least theoretically) change independently. There are no silver bullets in this industry, and native XML databases certainly aren't one of them. Pre-XML (aka SQL) database technology is still relevant in the year 2001 for storing and retrieving data, and will be for quite some time to come. |
(c) 2001 XMLEverywhere.com