What other work is being done in this field?
    UIS Proposal for CaveXML

What work has been done that we can benefit from?
    Leveraging CDI design requirements
    Mike Lake's CaveScript XML

A review of the CDI Standard Working Document , drafted and contributed by Taco van Ieperen and Larry Fish, produces several useful ideas for cave survey data that XML can readily facilitate. Below, several "requirements" have been replicated from that document. As these issues are worked together the community of interest should be very pleased with how well XML answers the needs listed below. (under construction)

  1. SIMPLE. One of the biggest problems with the current survey exchange standards is that they try to anticipate every conceivable situation. As result, the standards are so complex that they defeat the purpose of creating a simple method of exchanging data. As an example of the problems that occur with an ambiguous format, look at SEF. I have personally tested seven different SEF translators and every one has a problem reading one or more of my test files. The fundamental problem with SEF is that it has so many options that is difficult to test all of the combinations. The basic structure of a survey exchange standard should be based on a simple, unambiguous format. No "call-back" routines or recursive descent compilers should be required parse the data.

  2. SIMPLE NUMBER FORMAT. One of the places where survey formats go wrong is trying to support every conceivable number format. For example, SEF supports various combinations fixed field and character delimited numbers. This kind of flexibility would allow you to read numbers from a wide range of langauges and media, but in this day and age, it has no practical use. As a result, a file exchange format should use a simple, regular number format. The details don't matter, but only one number format should be used throughout the format. I can easily write one routine in just about any language that will read almost any number format. However, I don't want to have to write and test routines that will read six different number formats. I have a bias toward comma delimited number strings. They are easy to parse and they are widely used in data bases.

  3. SIMPLE UNITS FORMAT. Another place where exchange format can wrong is to attempt to present the data every conceivable measurement units. There are literally dozens of units that cavers use to measure cave surveys. Parsing units like degrees and minutes or feet and inches adds another layer of complexity to the data transfer process. As a result, all data should be presented in a fixed set of units. All of these units can be reduced to simple common measures like meters, and degrees. A set of flags can be used to save the specifications for the original data format. As a long as enough digits of precision are saved, the original units can be easily restored once the data is transfered.

  4. FIXED ITEM ORDER. One of the biggest problems with exchange formats is configurable data order. The rational for using configurable data order is that cave surveyors use a variety of formats when entering data in a survey book. However, processing files with a configurable item order add another layer of complexity to the tranlator. However, presenting the data in the original order is really the responsibility of a survey editor, not an exchange file format. Making the item order fixed makes the translator much easier to write. A set of flags can be used to save the original data order.

  5. HUMAN READABLE. The exchange format should be written in simple ASCII text and should be human readable. Making it human readable makes it self documenting, makes it easier to debug problems and also makes it accessable to other programs like word processors, spread sheets, data bases etc. It also makes it possible for the files to be transmitted over the internet.

  6. This goal is implied by some of the other goals listed above, but it is important to emphasize that the format should be easy to parse. One of the main purposes of a file interchange format is to be able to exchange data between ALL of the cave survey programs. If people cannot write a simple program to parse the data, then few people will adopt the standard. The fewer people that adopt the standard, the less useful it will be.

  7. EXTENSIBLE. Because it is impossible to anticipate all the needs of a cave surveyors, the format should be extensible. The extensibility should be structured in such a way that extensions to the format do not result in chaos. For example, extension could be placed inside of "begin-end" pairs so that older translators can ignore new data items.

You can read the whole original CDI standard if you care to...

A review of Mike Lake's CaveScript XML , available online at http://www.science.uts.edu.au/~michael-lake/cavescript/docs/DTD.html has several useful perspectives that Cave Survey data in XML should consider adopting.

  1. The inclusion of external survey files is very useful and a logical application of XML to the description of Cave Survey data.

  2. A Preface or comment area could be useful for the inclusion of adhoc, unstructed information about the survey.

  3. Area Elements would be useful for describing the geographic location of the cave, e.g. country, state, province, county, etc.

  4. Equating survey stations will continue to be necessary.

  5. Theodolite, Topofil and diving surveys are suggested for support in CaveScript. This is a desirable goal worthy of further consideration.
On the other hand, there are also some aspects of CaveScript that may limit its ability to meet the needs of the community of cave surveyors and software developmers.
  1. The CaveScript approach to representing survey data in XML while robust, is decidedly orientated toward the Survex method of representing survey data. It appears that CaveScript begins with the basic Survex data file and essentially migrates much of it into an XML representation. There is nothing wrong with this approach; it will work for any software developer who wishes to read survey data from a CaveScript file. But from a "community" perspective CaveScript doesn't seem to put much consideration into the needs of other software developers.

  2. CaveScript's application of XML could limit future flexiblility. XML creates "tags" to identify "Elements" of data and Elements can have "Attributes". An example could look like this -

    <Date format="YYMMDD">991026</Date>
    here Date is the Element and format is its Attribute. 991026 is the data (October 26, 1999).

    While there are no hard and fast rules about how one uses Elements versus Attributes, there are some definite consequences of using one or the other. Elements are designed to be organized in a nested fashion. This ability means limitless amounts of data can be organized, searched and addressed in an easily navigable hierarchical structure.

    Each Element can have Attributes, as many as the author wishes, but Attributes can not be nested or organized hierarchically. Instead Attributes are merely appended one after the other to the Element they modify. The consequence of this is a very "flat" collection of data, and Attributes (which are basically sub-elements of a super Element) are themselves very difficult to assign Attributes to.

    A rule of thumb that is emerging from the experience of XML authors is this: Use Elements to enclose data to the greatest degree possible. Use Attributes to modify Elements in those cases where the data is highly static. For example:

          <Compass serialNumber="123456">
              <Owner>Bob McGee</Owner>
              <Correction> </Correction>

    In a nutshell, extensive lists of Attributes describing a single Element are indicative of data that lends itself well to a nested structure.