Wrappers for Herbarium Data

From Hiscom
Jump to: navigation, search


Final report now available

Jorg Holetschek has provided his report on providing BioCASe wrappers for Australian herbaria. ALA travelling BioCASe circus

Background

The Atlas of Living Australia Collections Data Management panel had determined that getting data out of institutions was considered a high priority. Jorg Holetschek will come from Berlin, to run a hands-on session with herbaria staff in Melbourne, and then Jorg and another technical person will travel to each herbarium to help with local implementations. Jorg can come to Australia in February and March 2010.


Questions put to HISCOM, October 2009:

  • Given that the problems with getting data out of institutions are not just technical, is this a sensible approach?
  • And, if so, what do we need to do to get ready?


Three general use cases for wrappers:

  1. Those for whom the ALA could house and run infrastructure
  2. Those that want to host a web service, but need technical assistance
  3. Those that have an implementation, and want some further support


Web Services – TAPIR, BioCASE

There are several protocols in use: BioCASE and TAPIR being the most likely candidates. There are several implementations of TAPIR (.NET, PHP and Python), though I have not discriminated here.

TAPIRlink is a PHP that supports Darwin Core. TAPIRLink doesn't fully support ABCD. It cannot produce repeatable elements (such as multiple identifications or identification history), which are a key feature of ABCD.

IPT doesn’t handle nested structures – just Darwin Core. TAPIR doesn’t handle nested structures very well.

BioCASE is supported (by Jorg). Though TDWG says it is no longer supported. BioCASE can move nested data.

Jorg: Web services provided by the BioCASe software can be used by any component - not only by an aggregator, but also by one of the herbaria. However, the software will only produce ABCD documents (from the collection database), it cannot read ABCD documents from another service and feed data into the database. It is a read-only application.

Data needs to be moved according to HISPID5 specs. HISPID5 data is nested - ie multiple items per specimen (eg the history of identifications, all the gathering agents).

BioCASE – we need to define HISPID5 as part of BioCASE’s concept mapping scheme. This part of BioCASE however, is broken and doesn’t work. This can be solved with scripts to massage the data. MEL and HO are working models of this.

Jorg: HISPID5 is an ABCD extension. ABCD is BioCASE's data schema - so I guess all that is needed is some conversion of file formats (xsd -> cmf).

Rex Croft: We would need a new CMF file and instructions on how to upgrade those already running BioCase with the standard ABCD 2.06 mapping. We then need to update our BioCase database mappings, and then AVH has to be modified to accept those new fields.

Is nesting needed? Can we just deliver HISPID3 (un-nested)?

Another way to do this is by creating a holding database with an XML form of nested HISPID5 data, and use TAPIRlink, or OAI-PMH, to deliver that.

Should we use BioCASE or TAPIR?

  • Both are needed.
  • AVH can harvest both BioCASE and TAPIR. CSV are also being delivered to AVH.
  • There is a large investment in time and money, and experience gained in installing one system or another, this should not be wasted.


Current status of data wrappers

Summary table for State and Territory herbaria – October 2009
Herbarium Database Intermediary Web service Status Comment
BRI Oracle MySQL To do: BioCASE Already in HISPID. Copy AD After Jorg
NSW KE EMu To do: PostgreSQL to MySQL To do: BioCASE To do: Copy MEL
MEL Texpress MySQL BioCASE Working
PERTH Texpress MySQL Testing BioCASE Close to working
AD Texpress Oracle to MySQL BioCASE Working
HO FileMaker MySQL to MySQL TAPIRlink Scripts for Botanical and Faunal data. Close.
CANB Oracle MySQL Testing TAPIR Experimenting. Maybe BioCASE
DNA Oracle BioCASE Needs to be mapped

Other

A turnkey solution will be wanted for some institutions. ALA should host:

  • Data provider database – and accept data via CSV. Applicable for those institutions that manage data locally, but can’t, or don’t want to, manage their own web service. This could also be used as a second web service for an institution.
  • Collections management database – for small institutions that cannot manage a database.


Data exchange between herbaria. This requires each participating herbarium to be a harvester. Leave this to stage 2.


Jorg is a BioCASE expert. Is he a TAPIR person?

Kevin Richards, in New Zealand, is a TAPIRlink expert.


Actions underway

Jorg Holetschek has agreed to come to Australia. I expect that he will work with those that have implemented BioCASE. He would be invaluable in showing how the database should be structured, so a web service could be built on top of it. He will create local champions, who can then work with others.


Calendar

November 2009

  • John Tann to determine what is needed from AVH to be ready for Jorg.
  • Scope what needs to be done at each herbarium and the training outcomes that would like to be achieved. Most of this has already been done but we will need to be revisited.
  • Let Brett Summerell know if your herbarium would be able to host a visit and would benefit from it. Would it be achievable given IT, departmental and technological constraints?

February 2010

  • 4-5 February. Hands-on BioCASe_Workshop in Melbourne for IT people.
  • Visits to herbaria by Jorg.

March 2010

  • Visits to herbaria by Jorg.