HISPID/ABCD Workshop Executive Summary

The HISPID/ABCD Workshop took place at the State Herbarium of South Australia in Adelaide, 6–8 June 2007. This is an executive summary of the outcome of the meeting. A longer article documents the change to HISPID in terms familiar to HISPID users.

Australian herbaria have been using the Australian standard for specimen data, known as Herbarium Information Standards and Protocols for Interchange of Data (HISPID), for almost two decades (Croft 1989; Whalen 1993; Conn 1996). Its use continues in Australian herbaria because databases in these institutions have been developed in concert with HISPID, and thus reports exist in most institutions to export data in this format. The current version of Australia's Virtual Herbarium, first prototyped in 1998, uses HISPID 3 (Conn 1996). It became TDWG’s specimen data interchange standard until it was superseded by ABCD in 2004.

ABCD is not designed with the same constraints as HISPID, which makes it difficult to validate interchanged data to the same extent possible with HISPID. It is also the case that ABCD is considerably more complex than HISPID.

Participants

 * Bill Barker (State Herbarium of South Australia, Department of Environment and Heritage, South Australia)
 * Rex Croft (State Herbarium of South Australia, Department of Environment and Heritage, South Australia)
 * Peter Neish (National Herbarium of Victoria, Royal Botanic Gardens Melbourne, Victoria)
 * Ben Richardson (Western Australian Herbarium, Department of Environment and Conservation, Western Australia)
 * Greg Whitbread (Australian National Herbarium, Centre for Plant Biodiversity Research, Australian Capital Territory)

Summary
It was agreed to build an extension schema that allows HISPID to become an XML-based document format based on ABCD, and also continue to use the HISPID vocabulary. We would do this by extending ABCD in the same manner used by Extension For Geosciences (EFG). We would further introduce restriction to standard ABCD elements so the resulting XML instance documents can be validated against the HISPID vocabulary. We did not make any changes to the schema that would conflict with ABCD. Therefore every HISPID 5 instance document will also be valid ABCD 2.06.

We also took the opportunity to upgrade the definitions to:


 * adopt ABCD controlled vocabularies where possible
 * remove abbreviations and use full words or phrases unless this reduced its usefulness
 * make capitalisation consistent
 * improve definitions in clarity or syntax
 * recommend extensions to ABCD elements
 * recommend extensions to ABCD controlled vocabularies
 * remove redundant elements

We followed the method used by Extension For Geosciences (EFG) as follows:


 * Placed our extensions in an extension schema document which includes the ABCD schema document for reference to ABCD types.
 * Created a HISPID 5 schema document that references the extension schema to permit us to more easily migrate to a newer version of ABCD with minimal effort.

Recommended New Concepts and Attributes
The HISPID extension to ABCD includes the following concepts. It is recommended that these are considered for future ABCD versions.

ABCD supplies a "catch-all" element, SiteMeasurementOrFact, that can be used for many data types that are not specifically a part of ABCD. We found that there were a sufficient number of instances where we would need to use SiteMeasurementOrFact unless we extended ABCD to introduce specific elements for the extra elements in HISPID.

NameFormula
It is not possible to represent name formulae for informal hybrid names which are common in Australian specimen databases, e.g. Acacia ? desertorum x heteroneura var. desertorum x jutsonii. New elements in ABCD are needed to be able to represent the parents of taxa in an informal hybrid.

PrimaryRecordingUnit
The HISCOM community considers this to be core information (as the Country element is) and that it should be transferred in its own element rather than as a NamedArea.

Substrate
This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

SoilType
This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

Vegetation
This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

NumberOfSheets
This has been transferred as standard data in HISPID and is very useful when a data file accompanies a loan. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

FieldCollectionComponent
During discussion we recognised that the TypeOfCultivatedMaterial (tcul) field should be expanded to include all components collected in the field.

Allowed values:



ProvenanceType
This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness. It is not able to be placed in the BotanicalGardenUnit as the HISPID field refers to cultivated material associated with the record, not the record itself.

PropagationHistory
As for ProvenanceType above.

DonorType
As for ProvenanceType above.

Frequency
This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

IntroductionAgency
An important field with regards to invasive plants that requires a controlled vocabulary, which is unavailable placing it in SiteMeasurementOrFact.

Allowed values:



LifeForm
This has been transferred as standard data in HISPID and is commonly recorded by botanists. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

Phenology
An important field (e.g. to determine flowering time associated with climate change) that requires a controlled vocabulary, which is unavailable placing it in SiteMeasurementOrFact.

Allowed values:



NonComputerisedDataFlag
This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

InfraGenericTaxon
Previous versions of HISPID 5 had extended ABCD's HigherTaxon to permit the use of several infra-generic ranks, however these ranks are not higher taxa in the general definition of a higher taxon. There was additional confusion caused because ABCD preferred to use latin terms, e.g., or at least attributed each term with a language, while HISPID 5's extension preferred un-attributed English terms. See InfraGenericRankEnum.

HispidAccessions
This element corrects a probable oversight in ABCD 2.06b. contains 3 sub-elements that are repeatable, but the way it was constructed makes it difficult to use. There needs to be an element Accession between Accessions and the three current sub-elements. This was corrected in HISPID 5.0.3 with the addition of the HispidAccessionsType.

Recommended Restrictions
The HISCOM community have a need to restrict the following types over and above the restrictions in ABCD. Whether these restrictions would be applicable to other communities is not certain, but they are presented here for discussion. (Note: all the additional HISPID types are defined at the end of the HISPID extension).

DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/NameAddendum

 * abcd type: abcd:String
 * hispid type: hispid:NameAddendumEnum

Restricted to:



/DataSets/DataSet/Units/Unit/SpecimenUnit/NomenclaturalTypeDesignations/NomenclaturalTypeDesignation/TypeStatus

 * abcd type: abcd:String
 * hispid type: hispid:TypeStatusEnum

Restricted to:


 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)
 * (type material of unknown status)

DataSets/DataSet/Units/Unit/Identifications/Identification/Identifiers/IdentifierRole

 * abcd type: abcd:String
 * hispid type: hispid:IdentifierRoleEnum

Restricted to (abbreviations consistent with convention):


 * - confirmavit (identifier has agreed with i.e. confirmed the identification).
 * - citavit (used when a specimen is cited in a publication).
 * - determinavit (the identifier has determined the identification).
 * - scripsit (identification communicated in written correspondence).
 * - vidit (identification seen and communicated verbally).
 * - upgrade (taxonomic update based on literature - where specimen not cited).
 * - temporary sorting slip (used to temporarily name a specimen prior to critical examination).

DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/IdentificationQualifier

 * abcd type: Anonymous Type
 * hispid type:

Not done, as anonymous type difficult to maintain in future schema versions. Restrictions would be:


 * - Akin to or bordering
 * - Compare with
 * - Incorrect
 * - Perhaps
 * - Close to
 * - Questionable

/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/MeasuredBy

 * abcd type: abcd:String
 * hispid type:

Not done. MeasurementOrFactAtomised is repeated throughout the schema, so cannot easily be restricted in different contexts. Would restrict to:



/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/Method

 * abcd type: abcd:String
 * hispid type:

Not done. MeasurementOrFactAtomised is repeated throughout the schema, so cannot easily be restricted in different contexts. Would restrict to:


 * - Digital Elevation Model
 * - Global Positioning System (GPS) unit

/DataSets/DataSet/Units/Unit/Gathering/Depth/MeasurementOrFactAtomised/MeasuredBy

 * abcd type: abcd:String
 * hispid type:

Not done. MeasurementOrFactAtomised is repeated throughout the schema, so cannot easily be restricted in different contexts. Would restrict to:



/DataSets/DataSet/Units/Unit/KindOfUnit

 * abcd type: abcd:StringL
 * hispid type: hispid:KindOfUnitEnum

Restricted to:



/DataSets/DataSet/Units/Unit/Gathering/SiteCoordinateSets/SiteCoordinates/CoordinateMethod

 * abcd type: abcd:String
 * hispid type: hispid:CoordinateMethodEnum (base abcd:String)

Restricted to:



/DataSets/DataSet/Units/Unit/HerbariumUnit/NaturalOccurrence

 * abcd type: abcd:StringL
 * hispid type: hispid: NaturalOccurrenceEnum

Restricted to:



/DataSets/DataSet/Units/Unit/HerbariumUnit/CultivatedOccurrence

 * abcd type: abcd:StringL
 * hispid type: hispid: CultivatedOccurrenceEnum

Restricted to:



/DataSets/DataSet/Units/Unit/SpecimenUnit/NomenclaturalTypeDesignations/NomenclaturalTypeDesignation/DoubtfulFlag/Coordinates/

 * abcd type: abcd:String
 * hispid type: hispid:DoubtfulFlagEnum

Restricted to:



New elements that should really be extensions
The following elements are included in the HISPID extension as an interim measure to allow the information to be transferred. The most desirable outcome would be if they were included as additional attributes or elements of existing ABCD types as documented below.

CoordinatesDMS
To allow transfer of verbatim geocode components (see CoordinatesLatLong below).

PerCollector
To allow the GatheringAgent to be recorded as a 'per' collector (see GatheringAgent below).

SecondaryCollectorIdentifier
To allow multiple collectors' numbers (see GatheringAgent below).

Recommended extensions to existing ABCD elements
Extending an existing ABCD element would potentially allow invalid ABCD instance documents. So any ABCD elements that have been extended have been quarantined in the ABCD  extension elements. In this way, the potential usefulness can be demonstrated, but the instance documents can continue to be valid ABCD. These are documented below for the consideration of the ABCD working group.

/DataSets/DataSet/Units/Unit/Gathering/SiteCoordinateSets/SiteCoordinates/CoordinatesLatLong
Rather than adding a new element for CoordinatesDMS (degrees, minutes and seconds), it might be more parsimonious to include the individual elements within the CoordinatesLatLong type. So additional elements would be required for: LatitudeDegrees, LatitudeMinutes, LatitudeSeconds, LatitudeDirection, LongitudeDegrees, LongitudeMinutes, LongitudeSeconds, LongitudeDirection.

There is a need to transfer coordinates in degrees, minutes and seconds (as often recorded by the original collector). These need to be transferred in discrete units to enable validation (therefore the VerbatimLongitude and VerbatimLatitude mooted for ABCD version 2.06c are not sufficient).

/DataSets/DataSet/Units/Unit/Gathering/Agents/GatheringAgent
The current ABCD GatheringAgent concept is not able to readily incorporate multiple collectors' numbers. It is suggested that a Collectors Field Number attribute or element be added to this concept rather than being placed as a child of the Unit element.

A “per” collector is an amateur or casual collector who collected a specimen on behalf of a primary collector. Adding a boolean 'percollector' attribute to the GatheringAgent would seem the best way to include this information.

/DataSets/DataSet/Units/Unit/Gathering/Aspect/Ordination
Recommend that this field be extended to include the values:



HigherTaxonRankEnum Type
The HigherTaxonRankEnum extension type was removed from HISPID 5.0.3 because it wasn't following ABCD properly. We extended HigherTaxonRankEnum to add taxon ranks below that of genus, however these ranks are not considered to be higher taxa. In its place we created InfraGenericRankEnum with the same controlled vocabulary.

InfraGenericRankEnum
InfraGenericRankEnum, added in HISPID 5.0.3, contains just the extra elements needed from HISPID 3 for infra-generic ranks commonly used in the Australian flora, i.e.



In HISPID 5.0.3 these values are also properly attributed, using  to be English terms.

General comments on mapping process
Where something is defined as a very generalised complex type that is reused throughout the schema, it makes it hard (impossible?) to restrict. A restriction in one context might not apply in another. An example is the HISPID Method of Altitude Determination field which uses the ABCD Gathering/Altitude/MeasurementOrFactAtomised/Method concept. We would like to control the vocabulary of this field to the following values: dem, gps, field estimate, altimeter, map, unknown. However, this would restrict all uses of this complex type (e.g. for Method of Depth Determination). The only way to restrict is to create a new type for every MeasurementOrFact item that needs to be controlled. So we would need a separate AltitudeMeasurementOrFact type and a DepthMeasurementOrFact type.

Anonymous types make maintenance difficult e.g. IdentificationQualifier is defined as an anonymous complex type, so instead of changing just the type e.g. from abcd:IdentificationQualifier to hispid:IdentificationQualifier we must construct a new type and remove the anonymous type completely. This makes the schema harder to maintain as new versions of ABCD are introduced. One of the goals is to make this sustainable. So perhaps a suggestion not to use anonymous types in ABCD?

The complete documentation for HISPID 5 is now available.