General Discussion:
Several software developers, those who's cave survey software is under the most active development, have been contacted and asked for input. Their useful comments are coming in all the time and to some degree are replicated below: (I readily recognize that I don't have enough input from our European counterparts. This is not an intentional effort to ignore them, I merely don't have the names or email addresses of the right people to contact. Any suggestions to fix this situation will be gratefully accepted.)


Feedback on v0.4

From Garry Petrie -

  • more attributes for shots. They include comments on a shot by shot basis, and processing codes, e.g. length exclusion, void, etc.

    [ I did add a tag to version 0.4 for the purpose of housing such processing instructions. The tag I chose is <Handling> but already I'm thinking this should be changed to <Processing>. This will need further expansion to establish the children or data types that could be housed within this "processing" portion of each shot. Send some ideas on what you think appropriate tags of flags should be. I'll also be more liberal with the <Comment> tag. It should basically appear in every major subsection of the file hierarchy.

    I'm also thinking of adding a <Heading> tag. This would contain all the header information that doesn't fit inside a pair of survey tags, like CaveName, GeographicData, CaveNumber, etc. After doing this there would be just three children of the root <CaveSurvey> tag, - <DataFileVersion>, <Heading> and <Survey> DSK ]

    Feedback on v0.3

    From Taco van Ieperen -

  • <DataFileVersion> 0.3 </DataFileVersion> Is this necessary in XML or does it encourage bad behaviour. In general your format should be backwards compatible, and if it isn't, it should be a new format <CaveSurvey2> or something.

    [ At this point in the game the product is so underdeveloped that I can't attempt to enforce compatability between versions 1, 2, and 3. But in the future I agree, backward compatability is a design goal. Compatability will become very serious when we get to a version 1.0, which should represent the "baseline" implementation from which all future work builds. I'm using DataFileVersion now at the suggestion of others to keep track of where we are in the development process. In the future the data inside this tag could be used by an application developer to understand what types of data might appear in a given XML file. For example, v1.0 might not support cave diving survey data, but maybe version 2.0 will. An application could gain some basic information based on that simple element. DSK ]

  • <Survey> I wouldn't embed the <EquivalentStation> in the From and To fields. I think it should go up on level in the scope so that station equivelences are basically similar to survey shots except that all you need to do is identify some station names. The same holds for the GeoLocation. Geolocation should also specify coordinate systems. This would allow you to have a survey containing goelocations only (for a radio location day for example), or have a single folder with all your station equivalences.

    [I'm starting to think the same thing, too. I threw the <EquivalentStation> in there to see how it would work. It is nice having it in the context to which it applies, i.e. within the <From> or <To> tag pair that it describes, but I'm having second thoughts. There's also merit to placing it at a "global" level as you suggest. Further though, there's the talk underway in the UIS discussion for using XPointer and Xlinks to associate data elements in one location (or file) with data elements in another location (or file). This will innevitably be one of the thornier issues to work out. I'm not wedded to what we have now, but I'd like to keep it there for the time being. Atleast until some more thought and research can be brought to bear. DSK]

  • Accuracy: There needs to be information on instrument accuracy. Americans especially are obsessive about surveying to the nearest 1/10th of a mm user lasers, whereas many foriegn cavers are content to do a shot do about head level in the middle of a passage. It should be possible to specify that one survey is accurate to about +/- 1cm/degree that the other is +/- 5 so that the loop correction code can work properly.

    [Great one, thanks. I assume accruacy will be an attribute of each type of instrument (compass, clino, tape, etc.). There fore I'll add a tag like <Accuracy> to each of the "instrument" element structures, right alongside the <Correction> tag. DSK]

  • <Survey> I would add units to the start of the survey
    <LengthUnits>[Meters | Centimeters | DecimalFeet | FeetAndInches]<\LengthUnits>
    <CompassUnits>[Degrees | WhateverThe400degreeCircleIsCalled]<\CompassUnits>
    <ClinoUnits>[Degrees | WhateverThe400degreeCircleIsCalled]<\ClinoUnits>
    <DiveUnits> ... </DiveUnits>
    I would keep this with the instrument description stuff since you already have a correction with the instruments and you could reasonable assume the correction to be in the same units.

    [I think this is a good call too. And I'll up the ante. Since we're placing data at the global level we should naturally create the ability to override it at the global level. E.g. suppose for some reason someone sneaks a shot into the middle of the survey that isn't decimal feet, but is instead feet and inches. A tag incorporated in that shot would tell you the units used for that particular shot and allow the processing software to deal with it accordingly. If not local units were indicated for a shot then the global units would be used. DSK]

  • Another thing you need to add is wall measurements (<up, down, left, right> at each station. There are several ways of describing this data.

    [Actually, this is already in there. If you look immediately below the tag <StationName> you'll see it. But this brings up a point I've been thinking about. I'm not really pleased with the method I've used to organize the data of a survey. What we have here starts with a highlevel <CaveSurvey> that contains <Shot>s, each of which contains a single <From> station and a single <To> station. This is a decent hierarchical representation of a cave survey but there may be some method of crafting a less cumbersome structure. Something similar to what gets entered into the survey book. I've got to think about this one more. But suggestions would defintely be helpful here. DSK]

  • A final consideration is that order of measurements needs to be preserved. For instance, if I put in 100 stations as Length, Clino, Compass, and the software comes back Length, Compass, Clino, when I reload then that would be HIGHLY annoying if I need to proofread after. This is a pretty universal problem so maybe the following tags could help: <DisplayFields> <Field>Length</Field> <Field>Compass</Field> .... Clino,ClinoBack,Compass,CompassBack,Up,Down,Left,Right,Depth </DisplayFields>

    [OK, I need my soap box for this one. Here's a very clear instance of mixing presentation into the survey data. I agree that cavers want to see their data "presented" in exactly the same way they entered it into the "editing application" but tracking the settings of that "presentation layer data" should be handled seperately from the storage of actual cave survey data. Those presentation settings should be recorded for later use, but they should be recorded in an "application data" file, and not the "survey data" file. For instance, the application developer could save the user's "Editor Appearance" configuration settings in an XML file called EditorSettings.XML. But there's no reason for those settings to go into the actual CaveSurvey data file. That would introduce exactly the kind of data that some other cave survey rendering program would view as unnecessary. Please see the comments on this topic in my rant titled " Data Quality Comments " below. Soapbox off. DSK]

  • Also, People want to distinguish between surface surveys and underground surveys. They also want to know if a shot counts in the system length (maybe the shot is a measure to a wall that isn't part of the real survey). This is on a per-shot rather then a per survey basis. That is all I can think of right now, but it should be enough to start you on Version 0.4.

    [Differentiating between surveys was suggested buy someone else previously (maybe you). But I haven't put much effort into it just yet. The difficulty is knowing when you actually go from below ground to above ground. Unless you use two different surveys (with different names) it could be different. For instance, if I survey my way out of an entrance, down the valley and into another entrance, I haven't done anything to indicate I'm out of the cave. I know this normally doesn't occur but I use it to illustrate that surveying below ground differs little from surveying above ground. You just lose the left, right, up and down. One way of dealing with this is to add a comment, either globally if the entire survey is above ground that tells you this is a surface survey, or locally if you just have a few shots within a survey that are above ground. This could be taken to the other extreme too, how do we know when we're looking at a regular cave survey versus a diving cave survey?

    Knowing how to handle a shot, i.e. whether it counts in the system length, is a good one. I consider this kind of thing to be a processing instruction, that while not recorded in the cave, definitely applies to how the survey data should be handled (and is not a presentation issue like order of entry). The difficulty here is coming up with a convention that software developers can all agree upon. This one borders very closely on proprietary extenstions but I'll try to get something into the next version that address this.

    Thanks for all the excellent commentary Taco. You've definitely made a big contribution. DSK]


    Feedback on the Second draft, v0.2

    From Larry Fish -

    The only suggestion I have at this point is to maybe add one more layer of hierarchy above the <CaveSurvey> level. For example, it is often useful to group several caves together into a system. You might want to add a tag like <CaveSystem>

    Another approach would be have the groupings done with a separate file which controls how the different caves are assembled into a system. This would make the system more flexible because you would not have to subdivide a file if you wanted to work with individual caves.

    [ This second suggestion is probably the better approach. There are two developments within the XML world, Xpointers and Xlinks, that facilitate the association of files, and their constituent data to one another. Conversation has begun on the Cave Surveying discussion group to develop an international standard similar to the effort presented here. In that conversation Xpointers and Xlinks have been discussed as methods of associating one survey to another in different files, or even equating one station to another, in the same file or a different file. I'll probably watch developments in that group while continuing to make progress with this effort. DSK]


    Developer feedback on the first draft:    

  • Points from Olly Betts    
  • Comments from Larry Fish    
  • Questions from Garry Petrie.    
  • Ralph Hartley's inputs    
  • Observations from David McKenzie.


    Opening comments, before the first draft submission:    
  • Thoughts from Larry Fish    
  • Input from Taco van Ieperen.    
  • Bob Thrun's two cents.    
  • Devin Kouts' comments on Data Quality.

  • Developer Feedback on the first draft of Cave Survey data in XML

    December 19, 2000 -
    From Olly Betts

  • Hmm, surely "station comment" isn't the same "shot comment"? If you do mean "shot comment", a data line in a .svx file which has a comment after it, could reasonably be regarded as having a "shot comment". [ Good point, shots and stations are two different things, and could have different comments associated with them. I think something similar to what Ralph suggests below, a general comment tag that you could use anywhere within the Cave Survey data structure would be an appropriate thing. DSK ]

  • Survex allows control points to have variances and covariances. Pretty much vital now GPS units are so common... Sounds like a reasonable sub-element of the control point element. DSK ]

  • Survex allows shots with no clinometer reading (they're assumed to be flat with a high vertical error). But that's not really important to data exchange unless anything else allows it. This opens an interesting topic, default data values. It's not unreasonable to assume a certain value if the data surveyor fails to enter a specific value. Zero, for the assumed clinometer reading, seems reasonable as a place holder and in fact is used by at least one other cave survey rendering program - Compass. In the case of XML it's just as easy to assume Zero when you encounter a <clinometer> tag that possesses no data. DSK ]

    From Larry Fish

  • The only big thing that I see that is missing is information about how the data should be organized. This gets into the Survex issues of local and global station names and the "tree" like directory structure that most survey programs support. [ organization will be introduced in the next release of the draft. DSK ]

  • you will have to convince the various cave survey programmers to invest the time in writing software to support the new standard. There has to be some good reason for everyone to invest their time in a new standard. [ I hope to do this with a discussion of the utility that comes with XML, e.g. industry standard API that save developers the effort of rolling their own. DSK ]

  • what would be the advantage of the new standard, particularly if it doesn't deal with directory structure and local/global symbols? [ agreed, and this can be addressed as we move forward. DSK ]

  • this standard will be used primarily for data interchange. Data interchange is a one shot event. You convert the data, fix any conversion problems and you are done. [ If we are successful it may become more useful than just as a simple method of interchange, but that's a personal decision on the part of the software developer. DSK ]

  • The one way that you might be able to get people to adopt a new standard is to find a way to deal with the incompatabilities between the British system and the other systems. [ Agreed, will have to seek some discussion on this topic. DSK ]

    From Garry Petrie

  • For the <GeoLocation>' how are the coordinates expressed, DMS, DDM, etc, and are unreferenced x,y and z numbers allowed?
  • For <Datum> what are the allowed names, NAD27, NAD CONUS 27?
  • Does the LRUD measurement refer to which station?
  • What is the minimum number of elements that make up a shot, e.g can one have a forward compass and a reverse inclination?
  • There are sturctural issues also, e.g. must every shot in a survey be connected?
  • Can shots have zero length?
  • Can surveys exist without ties?
  • Can a <GeoLocation> exist without a survey using it?
  • Can surveys have branches, Toporobot does not because it requires an extra LRUD for every survey leg.
  • What about surface surveys?
  • How are surveys of different quality indicated, for loop closure?
  • Is shot data expressed in native units and if so what are the accepted unit descriptors and if not, how is the original look and feel of the data book preserved?
    [ Garry has a bunch of excellent observations and questions here. I won't try to coment on them just yet but record them here for everyone to consider. DSK ]

    From Ralph Hartley

  • I assume <CompassCorrection> was omitted by accident [ yes, it was. DSK ]

  • Is there a good reason for seperate tags <SurveyComment> and <ShotComment> as opposed to a single <Comment> tag that can go anywhere? [ excellent suggestion, I'll implement it. DSK ]

  • How about a <Vertical> tag for shots made with a plumb bob and tape. These are very common. I know <Vertical>-15</Vertical> is the same displacement as <Length >15</Length><ForwardInclination>-90</ForwardInclination>, and many programs represent vertical shots that way, but the distinction may be more important to advanced loop closure algorithms that care about different sources of error. [ Hmm, interesting suggestion and it seems like a reasonable thing to consider. I put this one out to the community of interested persons for comment. DSK ]

    From David McKenzie

  • - what sort of cave survey data [the XML proposal] will store. Will it contain? 1) processed data (adjusted to be consistent), 2) partially processed data (some units and measurement formats converted), or 3) raw data exactly as it was originally keyed in by the surveyors (character strings potentially representing all the data forms that appear in field books)? My view is that [it] will need to be the latter type (or ideally a combo of all three, offering "fall-back" options for the importing program) if it's to serve as a format for archived survey data. [ The answer is 3, raw data as it was originally collected by the surveyors. It is from that point that all other cave survey science flows. The issue of post-processed data is worthy of consideration, but it should be considered seperately from the storage of "original, authoritative" data. DSK ]

  • This is also true if it's main purpose is to facilitate *platform changes* -- that is, moving projects from program A to program B without discarding important information. Since data proofing and debugging is an ongoing requirement in the largest projects, I wouldn't want quad bearings, for example, to be irreversibly converted to azimuth degrees and tenths. Nor would I want Lat/Long converted to UTM, or NAD27 GPS locations converted to WGS84 locations, even though the raw data could conceivably reference both datums. Certainly the station name qualifiers that control scope and all the descriptive attributes assigned to both stations and vectors would have to be preserved. ... I don't believe comprehensiveness should be given up for simplicity or ease of importing. [ I concur, original data should be preserved at all costs, and never converted for the sake of convenience. See my previous discussion of this subject below .DSK ]

    Earlier Discussions:

    Thoughts from Larry Fish:

  • "A few years ago, I worked with Taco on the CDI standard. While I was working on developing COMPASS, I had been forced to do a lot of work with the SMAPS standards RSD, SEF and HTO. I thought they were all terrible standards. Taco had had a similiar experience and we thought that we could come up with a simplier and more logical standard. The result was CDI. The idea behind CDI was to make it simple and extensible, while supporting the minimal set of information needed to convey survey data information. That way you could add new features, but at the same time, programs could ignore features that they didn't understand".

  • "British survey techniques are very different from those in other parts of the world. Apparently, British surveys use a system where the station name is often just a simple number like "1", "2", "3", etc. This means that there are lots of duplicate station names. They are resolved by having "section", "sub-section" and "sub-sub-section" names that are combined and added to the station name so each is uniquely defined. This structure is built into the British cave survey program Survex".

  • "I'm not sure that the problem is "what goes in the file" or "how you represent the data." I think the big problem is the underlying structure. The British system requires a complicated scheme of local and global station labels. In this system, station label don't have to have unique names. Most the other survey programs require that every station label has a unique names".

  • "Taco and I tried to solve this problem by writing software that would recurse through Survex data and convert the station labels to unique names. This didn't solve the problem to the satisfaction of the British, because all the structure of their data would be lost in the conversion process".

  • "COMPASS stores all the information in the survey files in one set of units. These units are feet and degrees. Each survey has a set of flags that tells how the data was originally entered or how the user wants it displayed and edited. This way, when you view the data or edit the data, it is always in the units you prefer".

  • "It might seem that it would be better to store all the data in the original units. However, if you do that, there a much greater chance of error and processing the data is much slower. For example, if you enter the compass data in "Quads", (where W25W is the same as 335 degrees), the programs would have to test and convert the data on the fly. This would slow down the processing and if an error is found were in the Quads format, processing would have to stop. By storing the data in a fixed format, the data is tested as it is entered and you always know you are processing valid numbers".

  • "The trick is going to be preserving Survex's structure without placing an undue burden on the other programs".

  • "A lot of cave divers use SMAPS version 4 because SMAPS 5 changed the way Depth Gauge readings were understood".

    From Taco van Ieperen:

  • "The european problem may still be a big one, and ultimately it may require two slightly different formats for the actual survey shots. The major problem as I am sure you understand is that the Europeans name all of their station numbers the same way. This means that a system could have 500 stations all called "1". Most NA cave software assumes Unique names".

  • We solved this in CDI by having the full path to the station being part of the name. IE: Station 1 would be "\WindCave\EastGallery\UpperLeads\BigCrawl\1". This is of course a nasty way of doing things because if you rename a folder or a survey all of your connections vanish".

  • The European software solves this problem [with] a series of command lines programs that you write scripts for that generate a compiled point file that you feed into a renderer. Ambiguity is resolved by hand in these command line scripts".

    Comments from Bob Thrun:

  • The major dificulties in exchanging data between programs that I can think of are:
    1. station names
    2. survey shot ordering
    3. unusual survey methods
    4. passage dimension conventions
  • I would prefer that an effort toward establishing a data interchange standard work on the real problems. It was not stated whether XML would be the normal working format for data reduction programss or it would be only an export/import format.

    Data Quality Comments from Devin Kouts:

    After reviewing the proprietary data formats of Compass, OnStation, Survex, Walls and WinKarst I've noticed a state of affairs, that if left unaddressed, will undermine the simiplicity of an eventual XML solution to representing cave survey data.

    Probably for convenience sake, the various authors of cave survey rendering software have slipped into the habit of storing proprietary data elements within the same file as the raw cave survey data. This is a poor practice for at least three reasons.

    1. a new opportunity to corrupt the survey data is created every time the data file is opened and written to,
    2. once survey data has been placed in a form usable to one application, the proprietary add-ins make it much less usable to other applications, and
    3. proprietary data needlessly increases the overall "payload" of the cave survey data file
    Colocating data in a single file, while convenient for the developer, has the net effect of "contaminating" the survey data, as seen from another application's perspective.

    In order to insure the highest possible integrity of raw cave survey data, authors of cave survey software should endeavor to store modifying data, proprietary data, or data derived from processing, in a seperate data store (i.e. file). These practices increase the quality of cave survey data collectively and make survey data more interchangeable because there's less need to navigate "extraneous" elements added into the raw data store.

    Data is sacred. Science lives and dies upon the quality of the data it collects. Therefore the treatment of data must be taken seriously and its integrity protected at every turn. Surveyors are ultimately responsible for the quality of the data they collect in the pursuit of cave science. But, application developers hold an obligation to the "data collector" to protect the data and minimize any possibility of data corruption in the subsequent processing, representation and transfer of that data.