HISPID/ABCD Workshop Executive Summary

From Hiscom
Jump to: navigation, search


The HISPID/ABCD Workshop took place at the State Herbarium of South Australia in Adelaide, 6–8 June 2007. This is an executive summary of the outcome of the meeting. A longer article documents the change to HISPID in terms familiar to HISPID users.

Australian herbaria have been using the Australian standard for specimen data, known as Herbarium Information Standards and Protocols for Interchange of Data (HISPID), for almost two decades (Croft 1989; Whalen 1993; Conn 1996). Its use continues in Australian herbaria because databases in these institutions have been developed in concert with HISPID, and thus reports exist in most institutions to export data in this format. The current version of Australia's Virtual Herbarium, first prototyped in 1998, uses HISPID 3 (Conn 1996). It became TDWG’s specimen data interchange standard until it was superseded by ABCD in 2004.

ABCD is not designed with the same constraints as HISPID, which makes it difficult to validate interchanged data to the same extent possible with HISPID. It is also the case that ABCD is considerably more complex than HISPID.

Contents

Participants

  • Bill Barker (State Herbarium of South Australia, Department of Environment and Heritage, South Australia)
  • Rex Croft (State Herbarium of South Australia, Department of Environment and Heritage, South Australia)
  • Peter Neish (National Herbarium of Victoria, Royal Botanic Gardens Melbourne, Victoria)
  • Ben Richardson (Western Australian Herbarium, Department of Environment and Conservation, Western Australia)
  • Greg Whitbread (Australian National Herbarium, Centre for Plant Biodiversity Research, Australian Capital Territory)

Summary

It was agreed to build an extension schema that allows HISPID to become an XML-based document format based on ABCD, and also continue to use the HISPID vocabulary. We would do this by extending ABCD in the same manner used by Extension For Geosciences (EFG). We would further introduce restriction to standard ABCD elements so the resulting XML instance documents can be validated against the HISPID vocabulary. We did not make any changes to the schema that would conflict with ABCD. Therefore every HISPID 5 instance document will also be valid ABCD 2.06.

We also took the opportunity to upgrade the definitions to:

  • adopt ABCD controlled vocabularies where possible
  • remove abbreviations and use full words or phrases unless this reduced its usefulness
  • make capitalisation consistent
  • improve definitions in clarity or syntax
  • recommend extensions to ABCD elements
  • recommend extensions to ABCD controlled vocabularies
  • remove redundant elements

We followed the method used by Extension For Geosciences (EFG) as follows:

  • Placed our extensions in an extension schema document which includes the ABCD schema document for reference to ABCD types.
  • Created a HISPID 5 schema document that references the extension schema to permit us to more easily migrate to a newer version of ABCD with minimal effort.

Recommended New Concepts and Attributes

The HISPID extension to ABCD includes the following concepts. It is recommended that these are considered for future ABCD versions.

ABCD supplies a "catch-all" element, SiteMeasurementOrFact, that can be used for many data types that are not specifically a part of ABCD. We found that there were a sufficient number of instances where we would need to use SiteMeasurementOrFact unless we extended ABCD to introduce specific elements for the extra elements in HISPID.

NameFormula

It is not possible to represent name formulae for informal hybrid names which are common in Australian specimen databases, e.g. Acacia ? desertorum x heteroneura var. desertorum x jutsonii. New elements in ABCD are needed to be able to represent the parents of taxa in an informal hybrid.

PrimaryRecordingUnit

The HISCOM community considers this to be core information (as the Country element is) and that it should be transferred in its own element rather than as a NamedArea.

Substrate

This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

SoilType

This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

Vegetation

This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

NumberOfSheets

This has been transferred as standard data in HISPID and is very useful when a data file accompanies a loan. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

FieldCollectionComponent

During discussion we recognised that the TypeOfCultivatedMaterial (tcul) field should be expanded to include all components collected in the field.

Allowed values:

  • cutting
  • cytological
  • division
  • dna
  • photograph
  • plant
  • seed

ProvenanceType

This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness. It is not able to be placed in the BotanicalGardenUnit as the HISPID field refers to cultivated material associated with the record, not the record itself.

PropagationHistory

As for ProvenanceType above.

DonorType

As for ProvenanceType above.

Frequency

This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

IntroductionAgency

An important field with regards to invasive plants that requires a controlled vocabulary, which is unavailable placing it in SiteMeasurementOrFact.

Allowed values:

  • introduced by humans
  • introduced by natural means
  • no information
  • not applicable

LifeForm

This has been transferred as standard data in HISPID and is commonly recorded by botanists. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

Phenology

An important field (e.g. to determine flowering time associated with climate change) that requires a controlled vocabulary, which is unavailable placing it in SiteMeasurementOrFact.

Allowed values:

  • bisexual flowers
  • buds
  • female cones
  • female flowers
  • flowers
  • male/female cones
  • male cones
  • male flowers
  • fruit
  • fruiting cones
  • gametophyte
  • sporophyte
  • spore-bearing bodies
  • fertile
  • sterile
  • leafless

NonComputerisedDataFlag

This has been transferred as standard data in HISPID. The alternative of placing it in the SiteMeasurementOrFact reduces its usefulness.

InfraGenericTaxon

Previous versions of HISPID 5 had extended ABCD's HigherTaxon to permit the use of several infra-generic ranks, however these ranks are not higher taxa in the general definition of a higher taxon. There was additional confusion caused because ABCD preferred to use latin terms, e.g. familia, or at least attributed each term with a language, while HISPID 5's extension preferred un-attributed English terms. See InfraGenericRankEnum.

HispidAccessions

This element corrects a probable oversight in ABCD 2.06b. SpecimenUnit/Accessions contains 3 sub-elements that are repeatable, but the way it was constructed makes it difficult to use. There needs to be an element Accession between Accessions and the three current sub-elements. This was corrected in HISPID 5.0.3 with the addition of the HispidAccessionsType.

Recommended Restrictions

The HISCOM community have a need to restrict the following types over and above the restrictions in ABCD. Whether these restrictions would be applicable to other communities is not certain, but they are presented here for discussion. (Note: all the additional HISPID types are defined at the end of the HISPID extension).

DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/NameAddendum

  • abcd type: abcd:String
  • hispid type: hispid:NameAddendumEnum

Restricted to:

  • agg.
  • s. lat.
  • s. str.

/DataSets/DataSet/Units/Unit/SpecimenUnit/NomenclaturalTypeDesignations/NomenclaturalTypeDesignation/TypeStatus

  • abcd type: abcd:String
  • hispid type: hispid:TypeStatusEnum

Restricted to:

  • epitype
  • holotype
  • isotype
  • isolectotype
  • isoneotype
  • isoparatype
  • isoparalectotype
  • isosyntype
  • kleptotype
  • lectotype
  • neotype
  • paratype
  • paralectotype
  • syntype
  • topotype
  • type (type material of unknown status)

DataSets/DataSet/Units/Unit/Identifications/Identification/Identifiers/IdentifierRole

  • abcd type: abcd:String
  • hispid type: hispid:IdentifierRoleEnum

Restricted to (abbreviations consistent with convention):

  • conf. - confirmavit (identifier has agreed with i.e. confirmed the identification).
  • cit. - citavit (used when a specimen is cited in a publication).
  • det. - determinavit (the identifier has determined the identification).
  • scrips. - scripsit (identification communicated in written correspondence).
  • vid. - vidit (identification seen and communicated verbally).
  • upg. - upgrade (taxonomic update based on literature - where specimen not cited).
  • tss. - temporary sorting slip (used to temporarily name a specimen prior to critical examination).

DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/IdentificationQualifier

  • abcd type: Anonymous Type
  • hispid type:

Not done, as anonymous type difficult to maintain in future schema versions. Restrictions would be:

  • aff. - Akin to or bordering
  • cf. - Compare with
  • incorrect - Incorrect
  • forsan - Perhaps
  • near - Close to
  • ? - Questionable

/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/MeasuredBy

  • abcd type: abcd:String
  • hispid type:

Not done. MeasurementOrFactAtomised is repeated throughout the schema, so cannot easily be restricted in different contexts. Would restrict to:

  • collector
  • compiler
  • automatically generated

/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/Method

  • abcd type: abcd:String
  • hispid type:

Not done. MeasurementOrFactAtomised is repeated throughout the schema, so cannot easily be restricted in different contexts. Would restrict to:

  • dem - Digital Elevation Model
  • gps - Global Positioning System (GPS) unit
  • field estimate
  • altimeter
  • map
  • unknown

/DataSets/DataSet/Units/Unit/Gathering/Depth/MeasurementOrFactAtomised/MeasuredBy

  • abcd type: abcd:String
  • hispid type:

Not done. MeasurementOrFactAtomised is repeated throughout the schema, so cannot easily be restricted in different contexts. Would restrict to:

  • collector
  • compiler
  • automatically generated

/DataSets/DataSet/Units/Unit/KindOfUnit

  • abcd type: abcd:StringL
  • hispid type: hispid:KindOfUnitEnum

Restricted to:

  • alcohol
  • bark
  • boxed
  • cytological
  • fruit
  • illustration
  • image
  • other
  • packet
  • pollen
  • print
  • reference
  • seed
  • sheet
  • slide
  • transparency
  • vertical
  • wood

/DataSets/DataSet/Units/Unit/Gathering/SiteCoordinateSets/SiteCoordinates/CoordinateMethod

  • abcd type: abcd:String
  • hispid type: hispid:CoordinateMethodEnum (base abcd:String)

Restricted to:

  • collector
  • compiler
  • generalised arbitrary point
  • automatically generated
  • gps
  • topo
  • unknown

/DataSets/DataSet/Units/Unit/HerbariumUnit/NaturalOccurrence

  • abcd type: abcd:StringL
  • hispid type: hispid: NaturalOccurrenceEnum

Restricted to:

  • native
  • assumed to be native
  • doubtfully native
  • formerly native (extinct)
  • not native
  • recorded as native in error
  • no information
  • none of the above
  • not applicable

/DataSets/DataSet/Units/Unit/HerbariumUnit/CultivatedOccurrence

  • abcd type: abcd:StringL
  • hispid type: hispid: CultivatedOccurrenceEnum

Restricted to:

  • cultivated
  • assumed to be cultivated
  • doubtfully cultivated
  • formerly cultivated (extinct)
  • not cultivated
  • recorded as cultivated in error
  • no information
  • none of the above
  • not applicable

/DataSets/DataSet/Units/Unit/SpecimenUnit/NomenclaturalTypeDesignations/NomenclaturalTypeDesignation/DoubtfulFlag/Coordinates/

  • abcd type: abcd:String
  • hispid type: hispid:DoubtfulFlagEnum

Restricted to:

  • possibly
  • probably
  • ?
  • not

New elements that should really be extensions

The following elements are included in the HISPID extension as an interim measure to allow the information to be transferred. The most desirable outcome would be if they were included as additional attributes or elements of existing ABCD types as documented below.

CoordinatesDMS

To allow transfer of verbatim geocode components (see CoordinatesLatLong below).

PerCollector

To allow the GatheringAgent to be recorded as a 'per' collector (see GatheringAgent below).

SecondaryCollectorIdentifier

To allow multiple collectors' numbers (see GatheringAgent below).

Recommended extensions to existing ABCD elements

Extending an existing ABCD element would potentially allow invalid ABCD instance documents. So any ABCD elements that have been extended have been quarantined in the ABCD <xs:any> extension elements. In this way, the potential usefulness can be demonstrated, but the instance documents can continue to be valid ABCD. These are documented below for the consideration of the ABCD working group.

/DataSets/DataSet/Units/Unit/Gathering/SiteCoordinateSets/SiteCoordinates/CoordinatesLatLong

Rather than adding a new element for CoordinatesDMS (degrees, minutes and seconds), it might be more parsimonious to include the individual elements within the CoordinatesLatLong type. So additional elements would be required for: LatitudeDegrees, LatitudeMinutes, LatitudeSeconds, LatitudeDirection, LongitudeDegrees, LongitudeMinutes, LongitudeSeconds, LongitudeDirection.

There is a need to transfer coordinates in degrees, minutes and seconds (as often recorded by the original collector). These need to be transferred in discrete units to enable validation (therefore the VerbatimLongitude and VerbatimLatitude mooted for ABCD version 2.06c are not sufficient).

/DataSets/DataSet/Units/Unit/Gathering/Agents/GatheringAgent

The current ABCD GatheringAgent concept is not able to readily incorporate multiple collectors' numbers. It is suggested that a Collectors Field Number attribute or element be added to this concept rather than being placed as a child of the Unit element.

A “per” collector is an amateur or casual collector who collected a specimen on behalf of a primary collector. Adding a boolean 'percollector' attribute to the GatheringAgent would seem the best way to include this information.

/DataSets/DataSet/Units/Unit/Gathering/Aspect/Ordination

Recommend that this field be extended to include the values:

  • NNE
  • ENE
  • ESE
  • SSE
  • NNW
  • WNW
  • WSW
  • SSW

HigherTaxonRankEnum Type

The HigherTaxonRankEnum extension type was removed from HISPID 5.0.3 because it wasn't following ABCD properly. We extended HigherTaxonRankEnum to add taxon ranks below that of genus, however these ranks are not considered to be higher taxa. In its place we created InfraGenericRankEnum with the same controlled vocabulary.

InfraGenericRankEnum

InfraGenericRankEnum, added in HISPID 5.0.3, contains just the extra elements needed from HISPID 3 for infra-generic ranks commonly used in the Australian flora, i.e.

  • section
  • subsection
  • subgenus
  • series
  • subseries

In HISPID 5.0.3 these values are also properly attributed, using xml:lang="en" to be English terms.

General comments on mapping process

Where something is defined as a very generalised complex type that is reused throughout the schema, it makes it hard (impossible?) to restrict. A restriction in one context might not apply in another. An example is the HISPID Method of Altitude Determination field which uses the ABCD Gathering/Altitude/MeasurementOrFactAtomised/Method concept. We would like to control the vocabulary of this field to the following values: dem, gps, field estimate, altimeter, map, unknown. However, this would restrict all uses of this complex type (e.g. for Method of Depth Determination). The only way to restrict is to create a new type for every MeasurementOrFact item that needs to be controlled. So we would need a separate AltitudeMeasurementOrFact type and a DepthMeasurementOrFact type.

Anonymous types make maintenance difficult e.g. IdentificationQualifier is defined as an anonymous complex type, so instead of changing just the type e.g. from abcd:IdentificationQualifier to hispid:IdentificationQualifier we must construct a new type and remove the anonymous type completely. This makes the schema harder to maintain as new versions of ABCD are introduced. One of the goals is to make this sustainable. So perhaps a suggestion not to use anonymous types in ABCD?

The complete documentation for HISPID 5 is now available.

References

  • Conn, R.J. (ed.) (1996). HISPID3. Herbarium Information Standards and Protocols for Interchange of Data. (Council of Heads of Australian Herbaria at Royal Botanic Gardens: Sydney). Viewed at http://plantnet.rbgsyd.nsw.gov.au/HISCOM/HISPID/HISPID3/H3.html on 18 June 2007.
  • Croft, J.R. (ed.) (1989). HISPID - Herbarium Information Standards and Protocols for Interchange of Data [Version 1]. (Australian National Botanic Gardens: Canberra).
  • Whalen, A. (ed.) (1993). HISPID - Herbarium Information Standards and Protocols for Interchange of Data [Version 2]. (National Herbarium of New South Wales: Sydney).