<? Cave
Survey Data in XML ?>
Worldwide, cave surveys
collect and store masses of data in a variety of electronic formats each year.
Frequently the particular software product used to render the data drives the
format. This limits interoperability between software products because data,
the common denominator of all applications, is not immediately readable from
one program to another. As a result the only solutions to moving data from one
program to another include: re-keying of survey data, writing new software to
adapt to a legacy format, or the development of data translation utilities.
In order to stimulate
discussion and cooperation within the community of people interested in the
collection and electronic storage of cave data this article gives a brief
introduction to XML fundamentals. Building upon those fundamentals these
writings further introduce a simple XML document intended to support discussion
and provoke thought toward the creation of an XML standard for the cave
surveying community.
The
goal of data entry should be the complete and accurate electronic
representation of raw data for the purpose of communication, processing and
archival. Placing controls upon the practice of data representation, as it
concerns a logical class or group of data, is a fundamental tenet of data
management. A cooperative effort to establish the norms of data representation
is necessary if the quality of data as a whole is to improve.
The
rapid spread of XML creates a new opportunity to store, share and represent
cave survey data across existing and future software products. XML stands for
eXtensible Markup Language and is a human readable tagging language much like
HTML (of internet fame). Unlike HTML
however, which focuses on the display of data and how it looks, XML was
designed to structure, store and send data. XML focuses on what the data is,
not how it appears. XML is about describing information and is not a
replacement for HTML.
Storing data in XML creates
information that can be read by many different types of applications. XML
stores data in plain text files and this simplicity makes it easy to exchange
data between incompatible systems. XML supports Unicode, therefore making the
data internationally transportable. Since XML is independent of hardware, and
software, your data is made available to a wider audience of existing (and yet
to exist) applications. Other clients and applications can access your XML
files as data sources, as if they were accessing databases.
XML can also be used to
store data in databases. Applications can be written to send and retrieve
information from the store, and generic applications can be used to display the
data. Finally, XML can be used to create new languages. For instance the
Wireless Markup Language, WML, is used to markup Internet applications for
handheld devices like mobile phones. WML is written in XML. A whole host of
possible acronyms come to mind for a caving markup language, the most obvious,
CaML.
But XML does have its
downside. It can be verbose, in some cases XML doubles or triples the size of
the data file. But careful planning can minimize this aspect of XML, and the
benefits of a common markup framework far outweigh this small draw back. The
issues of hard disk space, memory constraints or bandwidth are laughable at
best with today's rapid pace of technology. But for all the benefit XML
delivers, it still doesn't solve the human issue of "cooperation". In
order for an XML based markup language to serve a community of users well, the
members of that community must develop their standard cooperatively.
An example of XML
The following XML document
is a simple note to Jane from Jack, stored as XML:
<note>
<to>Jane</to>
<from>Jack</from>
<heading>Spot</heading>
<body>Don't forget to see Spot run!</body>
</note>
Exhibit
1. A simple note expressed in XML
The note has a header, and a
message body, indicated by the <heading> and <body> tags. It also
has sender and receiver information, following the <to> and <from>
tags. The XML file is just pure information wrapped in XML tags and incapable
of doing anything on its own. Someone must write a piece of software to do
anything with it, but because the data is stored in a common form, several
software developers can write complementary applications based upon a common
starting point.
An example of how cave
survey data could be represented in XML follows a similar form to the note
above. Consider exhibit 2, an example of real cave survey data stored in XML,
but reduced in scope for brevity. Notice that XML documents use a
self-describing and simple syntax.
<?xml version="1.0"?>
<caveSurvey>
<caveName>Twisted
Fissure</caveName>
<surveyName>D</surveyName>
<surveyDate>10 15
1994</surveyDate>
<surveyTeam>Miles
Drake, Jo Smith, Paul Gillis</surveyTeam>
<surveyComment>Entrance
Passage, West and East Stream Trunks</surveyComment>
<surveyData>
<shot>
<fromStation>r1</fromStation>
<toStation>r2</toStation>
<distance>9.8</distance>
<foreAzimuth>276</foreAzimuth>
<foreInclination>-12</foreInclination>
</shot>
<shot>
<fromStation>r2</fromStation>
<toStation>r3</toStation>
<distance>2.9</distance>
<foreAzimuth>275</foreAzimuth>
<foreInclination>4</foreInclination>
</shot>
</surveyData>
</caveSurvey>
Exhibit 2. Cave survey data stored in XML
The first line in exhibit 2,
the XML declaration, defines the XML version of the document. In this case the
document conforms to the 1.0 specification of XML. The next line describes the
root element of the document, <caveSurvey>. This is the equivalent of
declaring this document to be a "cave survey".
The next six lines contain
six child elements of the root (caveName, surveyName, surveyDate, surveyTeam,
surveyComment and surveyData). They identify elements of data commonly
collected during a cave survey.
<caveName>Twisted
Fissure</caveName>
<surveyName>r</surveyName>
<surveyDate>10 15
1994</surveyDate>
<surveyTeam>Miles
Drake, Jo Smith, Paul Gillis</surveyTeam>
<surveyComment>Entrance
Passage, West and East Stream Trunks</surveyComment>
<surveyData>
Exhibit 3. XML elements describing typical cave survey
data
The <surveyData>
element, a child of the root element <caveSurvey>, also has children of
its own. In this case those elements are shots, and each <shot> element
in turn has its own children. The five lines after each <shot> tag
describes elements of a typical compass and tape survey shot (Exhibit 4).
<shot>
<fromStation>r1</fromStation>
<toStation>r2</toStation>
<distance>9.8</distance>
<foreAzimuth>276</foreAzimuth>
<foreInclination>-12</foreInclination>
</shot>
Exhibit 4. Data common to a compass and tape survey
captured in XML
In each of the previous
exhibits the end of each element is indicated by a tag with the same name as
the lead element, but prefixed with a slash inside the brackets. For instance
the <shot> element is ended by the closing tag </shot>. A second
<shot> element immediately after this marks the beginning of a new series
of data describing the next shot in the survey. In a real survey file this would
continue until all shots associated with this particular
"<surveyName>" had been recorded.
Finally, the last line of
exhibit 2 defines the end of the root element, </caveSurvey>, and the end
of the XML data file. Even with an application no more sophisticated than a
simple text editor it is a pretty simple thing to interpret the contents of an
XML data file.
XML elements are
extensible.
One of the strongest
benefits of XML is the ability to extend documents to carry more information.
Look at the following expanded cave survey example:
<?xml version="1.0"?>
<caveSurvey>
<caveName>Twisted
Fissure</caveName>
<surveyName>r</surveyName>
<surveyDate>10 15
1994</surveyDate>
<surveyTeam>Miles
Drake, Jo Smith, Paul Gillis</surveyTeam>
<surveyDeclination></surveyDeclination>
<surveyComment>Entrance
Passage, West and East Stream Trunks</surveyComment>
<surveyData>
<shot>
<fromStation>r1</fromStation>
<toStation>r2</toStation>
<distance>9.8</distance>
<azimuth>
<foreAzimuth>276</foreAzimuth>
<backAzimuth>97</backAzimuth>
</azimuth>
<inclination>
<foreInclination>-12</foreInclination>
<backInclination>11</backInclination>
</inclination>
<shotComment></shotComment>
</shot>
<shot>
<fromStation>r2</fromStation>
<toStation>r3</toStation>
<distance>2.9</distance>
<azimuth>
<foreAzimuth>275</foreAzimuth>
<backAzimuth>95</backAzimuth>
</azimuth>
<inclination>
<foreInclination>5</foreInclination>
<backInclination>-6</backInclination>
</inclination>
<shotComment></shotComment>
</shot>
</surveyData>
</caveSurvey>
Exhibit 5. XML documents can be extended to carry
additional data
Imagine that an application
has been designed to work with the earlier simple XML example and it extracts
the <fromStation>, <toStation>, <distance>,
<foreAzimuth>, and <foreInclination>. With that data the application performs the typical functions of
a cave data reduction program. Now imagine new standards of cave survey have
created the need to include additional data in the XML file. This results in
the second, expanded cave survey data file. The author of the new XML file
rearranges the schema somewhat and includes <azimuth>,
<inclination>, <backAzimuth>, <backInclination>, and
<shotComment> tags and data.
Should the changes in the
new data format cause the original application to crash? No, the original
application can still find the elements it needs: <fromStation>,
<toStation>, <foreAzimuth>, and <foreInclination>. It cannot
however take advantage of the new data added to the XML file. That is a
function of the new software program that reads the XML file.
DTD's and their role.
Because XML is free and
extensible, XML tags are not predefined. Instead, you must define your own
tags. This is in contrast to HTML, where the tags are defined by standards
(e.g. HTML 4.0), and the author of an HTML document can only use tags defined
in the standard. The tags in the cave survey example above (e.g.
<caveSurvey> and <foreInclination>) are not defined in any XML
standard. Instead these were created for the purpose of this article. In order
for these tags to become widely useful however the tags and their relationships
need to be recorded.
When an author (or community
of authors) defines the elements and structure they intend to use within a
document those decisions must be preserved for later reference. This record is
kept in a DTD (Document Type Definition). XML uses DTD's to describe elements
of data and their relationships. An XML document that complies with a
particular DTD is self-describing and "valid".
<!DOCTYPE caveSurvey [
<!ELEMENT caveSurvey (caveName, surveyName, surveyDate, surveyTeam, surveyComment, surveyData*)>
<!ELEMENT surveyData (shot*)>
<!ELEMENT shot (fromStation, toStation, distance, foreAzimuth,
foreInclination)>
<!ELEMENT caveName (#PCDATA)>
<!ELEMENT surveyName (#PCDATA)>
<!ELEMENT surveyDate (#PCDATA)>
<!ELEMENT surveyTeam (#PCDATA)>
<!ELEMENT surveyComment (#PCDATA)>
<!ELEMENT fromStation (#PCDATA)>
<!ELEMENT toStation (#PCDATA)>
<!ELEMENT distance (#PCDATA)>
<!ELEMENT foreAzimuth (#PCDATA)>
<!ELEMENT foreInclination
(#PCDATA)>
]>
Exhibit 6. A DTD can be used to validate the form of an
XML data.
The first line in exhibit 6
uses the statement - "<!DOCTYPE caveSurvey [", to declare the document to be a type of caveSurvey. The line after this defines the element caveSurvey
and enumerates its children:
<!ELEMENT caveSurvey (caveName, surveyName, surveyDate, surveyTeam, surveyComment, surveyData*)>
Exhibit 7. The valid elements of caveSurvey are
enumerated in the DTD.
Note that the last element
in this group, surveyData, is post-fixed with an asterisk. This indicates that
surveyData itself has child elements. In this case the child is the element
"shot", which has children of its own as indicated by another
post-fixed asterisk.
<!ELEMENT surveyData (shot*)>
<!ELEMENT shot (fromStation, toStation, distance, foreAzimuth,
foreInclination)>
Exhibit 8. The DTD lists children of shot, and
illustrates shot's relationship to surveyData.
These first three statements (caveSurvey, surveyData and shot)
define the hierarchical structure that an XML document must posses before it
can be validated by this DTD. After these structural statements are
made, the DTD goes on to define of the specific elements and the type of
information they will carry.
<!ELEMENT caveName (#PCDATA)>
Exhibit 9. The DTD also identifies the type of data
carried by each element.
By now Element and caveName
should be self-explanatory. PCDATA means "Parsed Character Data" and
indicates to the XML reader how the information in the caveName element should
be handled. Another possible format is CDATA, or Character Data, which is
handled differently by the XML reader.
DTD statements such as those
in exhibit 6 can be associated with an XML file via one of two methods. For
small, relatively simple XML documents it's easiest to include the DTD within
the XML document itself as header information, similar to the following
example.
<?xml version="1.0"?>
<!DOCTYPE caveSurvey [
<!ELEMENT caveSurvey (caveName, surveyName, surveyDate, surveyTeam, surveyComment, surveyData*)>
<!ELEMENT surveyData (shot*)>
<!ELEMENT shot (fromStation, toStation, distance, foreAzimuth,
foreInclination)>
<!ELEMENT caveName (#PCDATA)>
<!ELEMENT surveyName (#PCDATA)>
<!ELEMENT surveyDate (#PCDATA)>
<!ELEMENT surveyTeam (#PCDATA)>
<!ELEMENT surveyComment (#PCDATA)>
<!ELEMENT fromStation (#PCDATA)>
<!ELEMENT toStation (#PCDATA)>
<!ELEMENT distance (#PCDATA)>
<!ELEMENT foreAzimuth (#PCDATA)>
<!ELEMENT foreInclination
(#PCDATA)>
]>
<caveSurvey>
<caveName>Twisted
Fissure</caveName>
<surveyName>D</surveyName>
<surveyDate>10 15
1994</surveyDate>
<surveyTeam>Miles
Drake, Jo Smith, Paul Gillis</surveyTeam>
<surveyComment>Entrance
Passage, West and East Stream Trunks</surveyComment>
<surveyData>
<shot>
<fromStation>r1</fromStation>
<toStation>r2</toStation>
<distance>9.8</distance>
<foreAzimuth>276</foreAzimuth>
<foreInclination>-12</foreInclination>
</shot>
<shot>
<fromStation>r2</fromStation>
<toStation>r3</toStation>
<distance>2.9</distance>
<azimuth>275</azimuth>
<inclination>4</inclination>
</shot>
</surveyData>
</caveSurvey>
Exhibit 10. DTD statements can be included within the XML
data file.
For more extensive DTD's it
may be desirable to place a reference inside the XML file to a DTD located in
another file, as in the following example.
<?xml version="1.0"?>
<!DOCTYPE caveSurvey SYSTEM
"caveSurvey.dtd">
<caveSurvey>
(…cave survey data…)
</caveSurvey>
Exhibit 11. XML files can reference DTD statements in
external files.
The DOCTYPE attribute - SYSTEM "caveSurvey.dtd" -
informs the XML reader that it can find the DTD to validate this XML file on
the local system. This avoids the need to repeat the same DTD information in
every XML file thereby reducing file size. It can also be used to refer to a
DTD file located on another computer entirely, as in the DOCTYPE declaration in
exhibit 12.
<!DOCTYPE caveSurvey SYSTEM
"http://www.psc-cavers.org/dtd/caveSurvey.dtd">
Exhibit 12. DTD statements may be retrieved from remotely
located machines.
This statement would allow
the XML reader to retrieve the appropriate DTD for application to this XML file
from a web server on the network.
DTD's are much more flexible
than this short description can convey and in the process of developing a cave
survey DTD the caving community will likely need to use more advanced
capabilities of the DTD in order to accomplish their objectives.
What remains to be done?
So what next? How do we get
there from here?
First, popular support is
critical to the success of any form of community endeavor. Significant input
from the community of persons interested in cave survey data is sought and will
be shared on a range of topics as they arise. Further details to support these
endeavors are available on the web site www.psc-cavers.org/xml,
including examples, links to tutorials, tools and discussions.
Second, suggestions are
needed from the community for the "things" they think a cave survey
data file should include. This is a simple list that is not immediately
concerned with how those "things" are ultimately arranged in XML
format. The goal is to brainstorm those elements that are most critical to cave
surveys and will result in a flexible standard to meet the community's needs.
Third, submissions for
candidate arrangements of the data elements are needed. Those submissions could
take the form of raw XML files, as used above, or DTD statements, which drive
the design of the XML file. Within this phase it would be appropriate to
discuss issues of style, formatting, abbreviations, use of attributes, etc.
Summary
XML is going to be
everywhere (and indeed its use is already widely spread). The XML standard has developed swiftly, and
a large number of software vendors have been quick to adopt it. XML holds the
promise of becoming the most common technique for all data storage and data
transmission needs.
Storing cave survey data in
XML makes it more valuable to cave science because it becomes future-proof.
Surveyors can position themselves to develop increased capabilities in the
future by contributing to a joint effort to develop the XML representation of
cave survey data today. For further discussion of development of cave survey
data standards in XML please visit the web site www.psc-cavers.org/xml and review the
information it contains.
-Devin Kouts, November
26, 2000