|
Click here to advertise
XML Schemas Best Practices
Document Design with UML
|
|
These tips were discovered after writing DTDs and XML
processing code for two years. These are guidelines, not mandates.
See also: DTD
Overview
|
|
If two or more different types of elements can appear at the same level in
a tree, create 'container' elements
|
|
Advantages: |
|
- Makes the document easier to read
- Makes the document easier to process using the DOM -
"children" of the same element all appear within a
sub-tree, so it's easy to tell how many children there are
|
|
|
|
Without this rule, it is common to see documents like: |
|
<!ELEMENT Foo (Bar*, Baf*)>
<Foo>
|
|
<Bar/> |
|
<Bar/> |
|
<Baf/> |
|
<Baf/> |
|
<Baf/> |
|
</Foo>
|
|
By following the rule, the document instead looks
like: |
|
<!ELEMENT Foo (Bars?, Bafs?)>
<!ELEMENT Bars (Bar*)>
<!ELEMENT Bafs (Baf*)>
<Foo>
|
|
<Bars> |
|
<Bar/> |
|
<Bar/> |
|
</Bars> |
|
<Bafs> |
|
<Baf/> |
|
<Baf/> |
|
<Baf/> |
|
</Bafs> |
|
</Foo>
|
Tip 2:
|
Use attributes when you can, and elements when you have to
|
|
Advantages: |
|
- Makes the document easier to process using SAX or the DOM
- Makes documents smaller and therefore documents can be
processed more efficiently
|
|
This rule is fairly self-explanatory. Not
everyone is convinced that this is the best way. Arguments
against attributes usually assert that elements are easier for
humans to read. However, attributes lead to significant
performance improvements when processing is involved. The
advantages of attributes can not be ignored if scalability is a
priority.
You can't use attributes when an element needs "many" of
something. For example, a car has many tires. Use
elements instead in this case.
Another argument against attributes used to be based upon
tools. "Tool X" can not use attributes. However, the state of XML and XML tools is such that these arguments
are, for the most part, relics of the past. |
|
Advantages: |
|
- Makes reuse possible without introducing awkward tags
|
|
Large XML projects invariably need to embed a document
within another document. The best way to manage reuse with XML
is to create separate DTDs, each DTD describing the structure of
each "major" element in a system.
If you design for reuse up front, it will save you maintenance
effort as the project gets larger. Today, there are almost no
tools to help with this job. Raw XML can be used to implement
reuse by embedding DTDs inside other DTDs using entities. See
the DTD
page for more information.
A disadvantage of referring to external DTDs is that each DTD
requires a round-trip during the validate step. Each
round-trip represents a significant performance penalty.
|
Tip 4:
|
Avoid 'mixed' content
|
|
If you want mixed content (elements and text), the DTD must
be written as:
<!ELEMENT Foo (#PCDATA|Bar)*>
<!ELEMENT Bar EMPTY>
This DTD is ambiguous. It says that <Foo>
can contain many <Bar> elements. Also, <Foo>
can have multiple blocks of content: <Foo>
hi
<Bar/>
bye
<Bar/>
<Bar/>
goodnight
</Foo>
'mixed' is useful for marking up content and
works well with free-form text. However, unless this is
exactly what you want, you should not define your document so
liberally. Create a new node that will be defined as #PCDATA: <!ELEMENT
Foo (Data|Bar)>
<!ELEMENT Data (#PCDATA)>
|
Tip 5:
|
Plan for DTD maintenance
|
|
DTDs can be changed once they have been published, as
long as certain guidelines are followed:
- Elements can not be removed
- Attributes can not be removed
- Attributes can not be changed from "implied" to
"required"
- Default values should not be modified (generally)
- A "value" can not be removed from an attribute
"value list"
- The required structure of a document can not be changed.
For example, ? can not become + and you a new element can not be
required to appear inside an existing element. Only ? and
*can be used when changing the document structure.
- #PCDATA can't be removed from an element
If these guidelines can't be followed, a new type of document
must be created.
Another way to manage change is to plan for it. For
example, a top-level element could have a "version"
attribute. A document that conforms to the first version of
the DTD would have version="1" and a document
that conforms to the second version of the DTD would have version="2".
The version number would only change if the DTD was modified in such
a way that violated the above guidelines. However, without
diligent coding, this method will fail. Any code
"forgets" to check the version will break when a new
version is introduced to the system. |
Tip 6:
|
Use entities to encapsulate repetition
|
|
As an example, a traveler uses a vehicle to
get to his destination:
<transitMode><car/></transitMode>
<transitMode><boat/></transitMode>
The DTD fragment for this is:
<!ENTITY % VEHICLE "car | boat | train">
<!ELEMENT transitMode (%VEHICLE;)>
The VEHICLE entity is handy because it can be reused. It also makes
the DTD easier to maintain. For example, if you add another
type of vehicle, only the entity needs to be changed. The rest
of the DTD is unaffected. By the way, XML Schemas address this
need via two mechanisms: substitution groups and inheritance
(and it isn't easy to decide which one to use). |
|