AVH fields-HISPID discussion for Adelaide meeting 2014

Please see the AVH fields Google Doc for an up-to-date list of AVH fields and concepts provided by the different providers.

The following items, I would like to discuss at the HISCOM meeting:

Concept names
In the meta.xml files that go with the Darwin Core Archives that we deliver AVH data in to ALA (see AVH data) and in which in future no doubt more and more herbaria will deliver directly to AVH, Darwin Core concepts are referenced by their URL. Darwin Core has nice informative URLs, for example http://rs.tdwg.org/dwc/terms/scientificNameAuthorship, of which the last URI segment is the concept name.

So far, I have been sending the ABCD URLs for ABCD concepts we use for which there is no equivalent in Darwin Core. ABCD URLs are not informative at all, as ABCD concepts are numbered, so the URL looks like http://wiki.tdwg.org/twiki/bin/view/ABCD/AbcdConcept0295. Of course everyone sees immediately that this is the scientific name. ALA can't see what this means, so these fields are stored as 'ABCD Concept ####', which is how they show up on the occurrence detail pages. Also, it is not quite right to deliver ABCD URLs with CSV, as ABCD can really only be XML.

I like the Darwin Core URLs, so I created URLs for all HISPID concepts in the different versions of HISPID that we have on the Wiki, for instance:
 * http://hiscom.chah.org.au/hispid/3/terms/insid
 * http://hiscom.chah.org.au/hispid/4/terms/fulnam
 * http://hiscom.chah.org.au/hispid/5/terms/cou

The same has been done for all the fields on the AVH data page:
 * http://hiscom.chah.org.au/avhfields/TypifiedName

I would like to replace all the ABCD URLs that we use with the AVH URLs – and eventually the new HISPID URLs – but in order to do that we need to redeliver the entire AVH data set, which is not exactly a trivial thing to do, so before we do that we need a decision from HISCOM on what the names of these concepts should be – as well as a decision on some of the things below – so we have to do it only once. Changing the rest of the URL afterwards is probably not that big a deal, as it only involves the aggregator (I think).

Notes fields
When we first delivered the complete AVH 3 database to ALA, Miles mapped the ABCD Gathering Notes to the Darwin Core occurrenceRemarks. We didn't have Unit Notes in AVH 3, but since I discovered that some herbaria were delivering this I have delivered them to ALA as an AVH custom field 'ABCD Unit Notes'.

The &quot;official&quot; ABCD to Darwin Core mapping maps Gathering Notes to eventRemarks and Unit Notes to occurrenceRemarks. I have made the change on the AVH data page and you will find it like this in the AVH fields Google Doc, but so far I haven't changed the way it is delivered to ALA. This is because I think that our Gathering Notes are more appropriately mapped to occurrenceRemarks. HISPID 5 for HISPID Users mapped the HISPID 3 Descriptive Notes (cnot) to Gathering Notes and Miscellaneous Notes (misc) to Unit Notes. Descriptive Notes – and perhaps botanical collecting notes more generally – cannot be eventRemarks, as an Event may apply to multiple taxa. Also, what is delivered as Unit Notes might be more appropriately mapped to a custom Miscellaneous Notes field than to occurrenceRemarks.


 * I found the Darwin Core fieldNotes field, which seems appropriate for our Collecting notes. Then we could use occurrenceRemarks for Miscellaneous notes (????). eventRemarks might be appropriate for collecting trip information. I think I would really prefer not to use occurrenceRemarks for Miscellaneous notes. Probably best to leave this for the HISPID review and maintain the status quo for now. --NielsKlazenga (talk)

Identification qualifiers
This is just here to flag that we really want ALA to do something about identification qualifiers – it has been an issue in the ALA issue tracker since the HISCOM meeting last year. But HISCOM need to discuss with ALA how this should be done. FCIG should be included as well. I think this is the most important outstanding issue with the new AVH.

Cultivated and introduced plants
We have CultivatedOccurrence and NaturalOccurrence mapped to dwc:establishmentMeans. At the moment the values for CultivatedOccurrence and NaturalOccurrence are concatenated, but there is a GBIF vocabulary for establishment means that I think we should use. The POSS terms that HISPID adopted apply to taxon–area relationships rather than individual occurrences and half of the terms have no bearing on specimen data.

Establishment means is important to many Atlas users, but is not reliably delivered – or databased – with AVH data, which makes sense and is not a problem in itself, as it is not specimen data. I would like us to think about a way to populate this field as much as possible from the state censuses and only use the specimen data for native taxa that also have adventive occurrences.

Actually, it is all in my report.

Spatial datum
This field was overlooked in the MoU Technical Addendum and is neither required nor recommended. It should definitely be a highly recommended field, but since all current providers are already supplying it and it is important for many AVH use cases, I think we might want to make it a required field.

We need some more documentation in the AVH Help on this – and from what I have seen in the harvested data also in HISPID. The recommended way to deliver spatial datum in Darwin Core is as EPSG codes, so all records have their spatial datum as epsg:4326 on the occurrence detail page, because all lats and longs in ALA are transformed to WGS84. I am not sure if our users know what that means.

SourceInstitutionID and SourceID
There is some confusion about what to deliver as SourceInstitutionID and SourceID, as in HISPID there is only the Institution ID. SourceInstitutionID and SourceID map to Darwin Core institutionCode and collectionCode respectively, so the correct SourceInstitutionID and SourceID for current providers are as follows:

The aggregator script has a function to deal with the data that is currently delivered in these fields, so, if you want to fix it, let me know, as fixing stuff on your side will break stuff on my side. If BRI wants to use AQ, which is currently prepended to the catalogue number, as collectionCode, this should be cleared with ALA first, as the catalogue number is used for matching records that come in with records already in the BioCache.