HISPID/ABCD Schema Work In Progress

From Hiscom
Jump to: navigation, search


Note: this document is no longer relevant. Please consult HISPID5, HISPID/ABCD_Workshop_Executive_Summary and HISPID_Mapping_to_ABCD for the most up to date information.




The purpose of this document is to keep track of which bits of the HISPID Mapping to ABCD have been completed in the schema. I've loaded up the new hispid6.xsd schema to the MEL cvs (note: there is something wrong with cvsview, so the files aren't showing up there, but you can still check them out - the project name is hispid2abcd).

Initial steps (but see #Second_Version below)

  1. I created a new schema which was a copy of ABCD2.06b.XSD (development version sent by Walter Berendsohn 10 June 2007)
  2. I changed the default namespace to http://www.chah.org.au/schemas/hispid/6
  3. I imported the abcd2.06b schema
  4. I globally edited every type="... element to include the abcd: prefix
  5. I started deleting all the types that we definitely weren't going to consider (eg NameViral, NameZoological etc.)--Peter 17:11, 6 July 2007 (EST)

Changes to the schema

CoordinateMethod

started with a fairly simple one by restricting the CoordinateMethod element to an enumerated list.

Collection Date (cdat)

There has been some discussion of this on the hiscom mailing list, so this is a summary of where we are on this.

It was found that the documentation in ABCD didn't really reflect what was in the schema. The ABCD element Gathering/GatheringDateTime/ISODateTimeBegin. This element abcd:DateTimeISO is a String type that has been restricted to accept the following values:

For the example 15 Jul 2006, 9:34pm, the following values validate:

2006-07-15T21:34
2006-07-15
2006-07		# year and month only
2006		# year only
--07		# month only
--07-15		# month and day only
---15		# day only
  
# note the following do not validate
2006-07-00
2006-07-
2006-07--
2006-
2006--
2006---
-07-15

So I don't think there is any further restriction required and we can accept this element and agree on the allowed values (especially for missing values). Note that there are other dateTime elements of the xs:dateTime type - these are quite different (e.g. dateLastEdited)

State (pru)

This is a new element, so I decided to put it in with the HispidUnitType extension. That way I can keep all the additional elements together and make it easier to document and track. I created an HispidUnitType and started placing all the extra elements in there. then I just add an element in the UnitExtension of type HispidUnitType.

Collector's Name(s) (cnam, cnam2, cper)

We cannot add a per collector attribute to the GatheringAgent element, this would break when validating against ABCD. Document this as a possible inclusion. Placed in extension for now.

Geocode in Degrees minutes and seconds

We decided we would put in a new element called CoordinatesDMS. However, placing it in the schema would break validation against ABCD, so I made a CoordinatesDMS Type and have included a CoordinatesDMS in the HispidUnit Extension. ??? is this the best way to do it? From the documentation on creating an extension to ABCD it suggests that its best to keep to simple types. It might be best to think about ecapsulating all the extension components in the one file.

Points for Discussion

New elements

What to do with completely new elements? If we put them in the schema wherever we like we will not be able to validate against ABCD. So we can either go ahead and do that anyway, knowing that we will break the schema, or we can put them in the UnitExtension or DataSetExtension.

New attributes

Need to find out what happens if we add in extra attributes to an element (eg. collectors number in the GatheringAgent) - will this stop it validating against ABCD? [Yes. --Peter 15:25, 9 July 2007 (EST)]

size of document

How are we going to keep this (fairly big) schema document in synch with abcd? If they change elements around, we have to be able to make exactly the same changes so that our documents will validate against abcd. I can't see an easy way to do this.

what do we leave in

Related to the point above. If we can get rid of all the elements we don't think are useful then we can trim down the beast and make it easier to maintain.

namespace

I'm thinking we should probably prefix the hispid types so that there is no confusion rather than using the default namespace for hispid [Done --Peter 15:25, 9 July 2007 (EST)].

version

Is hispid6 the right name for this? I just thought I'd jump ahead to avoid any confusion - any thoughts?

validation

How far do we go with validation? e.g.enforcing mixed case etc. Could do this with regular expressions, but is it worth it?

Some useful links

Understanding XML Schema: http://msdn2.microsoft.com/en-us/library/aa468557.aspx

Designing XML Schema Libraries: http://msdn2.microsoft.com/en-us/library/aa468549.aspx


Second Version

Due to some of the problems mentioned above (in particular, the problem about keeping this document in synch with any changes to ABCD), I decided to take another approach.

Instead of creating a new schema with a new HISPID namespace, the original ABCD document was used, keeping the abcd namespace. By doing this we avoid having to globally edit the namespace throughout the document. We then import the hispid extention and use the hispid namespace. This approach is more similar to the way that EFG extends ABCD although it goes beyond this by changing element types where we want to use a more restricted hispid version. This allows us to easily find any customisations (by searching for any hispid:Name types. It also allows us to keep the new elements and restrictions in the extension. The disadvantage is that it is a larger document because we have not deleted any of the unwanted elements.

So, we take the following approach to elements:

http://hiscom.chah.org.au/files/hispid_abcd_docs.png

The file ABCD2.06_impl_HISPID.xsd is a copy of ABCD2.06.xsd with the following modifications:

  • imports the HISPID_extension1.0.xsd document
  • adds in the extra HispidGathering, HispidIdentification and HispidUnit elements under UnitExtension
  • changes the type for elements that we are restricting to use the hispid: prefix and reference the new types defined in the extension.

HigherTaxonRank

Another extension that will break the schema - adding extra values in the HigherTaxonRankEnum: subgenus, section, subsection, series, subseries. Can do this with a union to include the extra enumerations. Perhaps what we need is two namespaces: one for the elements that won't break the schema (ie that are restrictions or will be inside an xs:any element) and those that will break the schema (that extend in places where there is no legal place to allow extension)??

Type Qualifier Flag (tql)

Added in enumeration for legal values.

Non-computerised Data Flag

Added this in HispidUnit and added the YesNoEnum type On seconds thoughts, lets just make that an xs:boolean type.

Habit / Life Form

Added LifeForm element in HisidUnit

POSNAT POSCUL POSINT

  • Added NaturalOccurrenceEnum as a type to restrict the abcd:NaturalOccurrence element
  • Added CultivatedOccurrenceEnum as a type to restrict the abcd:CultivatedOccurrence element
  • Added IntroductionAgency to the HispidUnit Extension