HISCOM 2012 AGM Canberra minutes
12–14 November 2012 (HISCOM meetings 12–13 November, HISCOM/CHAH and HISCOM/MAHC (did I miss this, or was it CHAH/MAHC? -NielsKlazenga 22:35, 21 December 2012 (EST)) meetings, Launch of AVH and OZCAM - 14 November)
Crosbie Morrison Building, Australian National Botanic Gardens, Canberra
- 1 Attendees
- 2 Apologies
- 3 1. HISCOM housekeeping
- 4 2. AVH management group
- 5 3. Brief report on ALA activities
- 6 4. Australia's Virtual Herbarium
- 6.1 1. Data delivery
- 6.2 2. Engagement with university herbaria
- 6.3 3. Memorandum of Understanding (MoU)
- 6.4 4. Images
- 6.5 5. User feedback
- 6.6 6. AVH Annotations
- 6.7 7. Sensitive data
- 6.8 8. Data delivery to GBIF
- 6.9 9. Future work on AVH
- 6.10 10. Collectory/Resources of Australasian Herbaria (RAH)
- 7 5. World Flora Online
- 8 6. National Species Lists
- 9 7. MEL’s quality control tools
- 10 8. Report from FCIG (Alison Vaughan)
- 11 9. AVH Trust Project
- 12 10. Wiki editing workshop
- 13 11. HISPID
- 14 12. Units and GUIDs
- 15 13. ALA (Peter Doherty)
- 16 14. Morphbank (Peter Brenton)
- 17 15. AVH Trust project proposal (Greg Whitbread)
- 18 16. Report from TDWG (Paul Flemons)
- 19 17. Biodiversity Volunteer Portal (Paul Flemons)
Peter Bostock (BRI), Gary Chapple (NSW), Wayne Cherry (NSW), Ian Cowie (DNA), Jim Croft (CANB), Anne Fuchs (ANBG), Niels Klazenga (MEL), Dave Martin (ALA, part of meeting), Matt Miles (AD), Kevin Thiele (PERTH), Helen Thompson (ABRS), Alison Vaughan (MEL) (Minutes), Michelle Waycott (AD), Greg Whitbread (CANB), Aaron Wilton (CHR) (Chair)
Wednesday morning: Peter Brenton (ALA), Peter Doherty (ALA), Paul Flemons (FCIG), John Hook (CANB), Paul Murray (CANB)
Eleanor Crichton (AD), Beth Mantle (FCIG), Laurence Paine (HO), Ben Richardson (PERTH), Brett Summerell (NSW)
1. HISCOM housekeeping
1. Minutes from previous meetings
The minutes of the previous meeting were accepted without modification.
The outstanding actions from previous meetings were reviewed; most will be discussed under other agenda items. Greg thinks he found the minutes of the 2010 AGM on the Internet Archive, but they need some work to make them readable.
The Wiki is running quite well at the moment. The URLs no longer include ‘index.php/’; several extensions have been installed (e.g. syntax highlighting, better user administration tools). The Wiki needs to be cleaned up and be better organised, using categories to give it a more hierarchical structure.
Niels is happy to run the Wiki on the MEL server for now, but it makes sense to host it more centrally. Niels can’t get the reverse proxy to work in Melbourne. The CHAH website is currently hosted by Aussie Hosting (a Sydney-based company), which is also used for CHAHBG, CABC, the Australian Seedbank Partnership, BGANZ etc. It seems to be working okay, but we would lose the short URLs if we transferred the HISCOM Wiki to it.
Some people have multiple user profiles in the Wiki, and other people don’t have access. The user list needs to be cleaned up.
FCIG has recently adopted Technical Manager and Web Content Manager roles. It was agreed that this could be a good model for HISCOM to follow. It was pointed out that FCIG is a much smaller committee, so HISCOM may need to have more than two specialised roles. It was also noted that we need to be careful that the adoption of such roles doesn’t exclude participation by others in the committee. It was suggested that rather than having a single person responsible for each function or role, that two or three people could be jointly responsible for them.
It was agreed that the roles should be adopted, but that they should be coordination roles, instead of management roles.
The draft roles were discussed and modified. It was agreed that we need a separate User Liaison role.
We need to consider how these roles fit in with the proposed AVH management group.
AVH now has a news page; there was some discussion about appropriate content and how the creation of news items might be coordinated. This would be seen as part of the Web Content Coordinator role, if it is adopted.
We need to discuss with CHAH how the CHAH website will be managed, and what interaction between HISCOM and CHAH is needed with regard to web content.
The User Liaison is responsible for:
- Coordinating responses to feedback from users
- Administering user access to sensitive data.
The Technical Coordinator is responsible for:
- Being a point of contact for ALA and other for technical matters
- Coordinating the management of:
- the HISCOM and CHAH portals
- the HISCOM Wiki
- data delivery mechanisms for the AVH
- Coordinating responses to technical enquiries regarding the AVH, data access, delivery and standards.
The Web Content Coordinator is responsible for:
- Being a point of contact for HISCOM Wiki and AVH static content
- Maintaining web content related to HISCOM activities
- Updating, and soliciting, content from HISCOM to maintain the news feed
- Ensuring that AVH and CHAH websites meet accessibility, security and usability standards.
4. Election of Chair and other roles
- Aaron was re-elected as Chair for the coming year.
- Alison Vaughan was nominated in the role of User Liaison
- Niels Klazenga was nominated in role of Technical Coordinator
We elected Michelle as Deputy Chair, and thanked Brett for his contributions
Recommendation 1: CHAH to endorse the following nominations:
- Chair: Aaron Wilton
- Deputy Chair: Michelle Waycott
- Technical Coordinator: Niels Klazenga
- User Liaison: Alison Vaughan
5. Terms of Reference
The changes agreed at the last HISCOM meeting have been incorporated (e.g. that the Chair of HISCOM is not a CHAH member, and that the Deputy Chair is a member of CHAH). It was agreed that subscription to the HISCOM e-mail list (and inclusion in other communication mechanisms) is granted by request, approved by the HISCOM Chair.
The wording around the nomination of transitory roles was modified, and the period of office for executive roles was clarified.
The sections about the creation of Project Manager and Coordinator roles were removed from the ToR.
It was agreed that having an ALA technical staff member as a permanent member of HISCOM would be greatly beneficial.
Ben, Greg and Jim are responsible for moderating HISCOM-L. Jim has been moderating the list based on whether or not something is relevant to the group, rather than based on who is on the list.
The HISCOM list was the reviewed.
Aaron noted that the HISCOM list has low traffic compared to the MAHC and CHAH lists, and suggested that it could be used more effectively.
2. AVH management group
The idea of developing an AVH management group to allow HISCOM to move away from AVH and focus on new projects was discussed. The original view of AVH was that it included everything to do with what goes on with a herbarium, so it was felt that it doesn’t make sense to split off the management of ‘AVH’ to work on ‘other’ projects. There was a strong feeling that there is no need for an AVH management group.
3. Brief report on ALA activities
Dave Martin provided an overview of recent ALA activities:
- Released Version 1 of Delta
- Funded by Australian National Data Service to supply web services developed by people at JCU to produce species distribution polygons that can be used for quality assertions; this can now be used for other taxonomic groups (it was developed for birds)
- Tools to assist in the identification of weeds
- Developed a hub for the Australian Seedbank Partnership
- Developed a hub for the microbial community
- Looking at developing a hub for OBIS
- In February a tool called Fish Map will be released
- PhyloJive: integration of phylogenetic trees, point data and characters from Identify Life so you can map characters or species (Jim encourages everyone to have a look at the maps; the dots on maps for multiple species or characters looks really good)
- Volunteer Portal: work on templates so people transcribing data from labels can use existing data to search for related collecting events etc.
- Sandbox: tool that allows you to upload data and pass it through the same sort of validation and integration tools so it can be tested; it can also be faceted against ad hoc parameters (see blog post about termite mounds data)
- Duplicate record detection: in the case of observations it is looking at duplicate records (i.e. the exact same record coming in via different pathways); for specimen-based records it’s looking for duplicate specimens. The notion of a representative record is used to indicate which is the more representative or reliable record, e.g. if two records asserted to be related have different levels of geocoordinate precision, the record with the higher precision will be the representative specimen.
- Species interaction work: working closely with Greg; data mining to extract species interactions between Acacia and wasps.
- Soils to satellites project: producing a single portal that combines distribution data and ecological data
- ALA blog: http://www.ala.org.au/blogs-news/
It was noted that there are a number of ALA projects running in the faunal community, but none in the botanical community.
4. Australia's Virtual Herbarium
1. Data delivery
Niels has summarised which fields are delivered to AVH: AVH data providers. All herbaria should deliver as many fields as they can. The fields to prioritise providing are:
- typification fields
- loan fields
- cultivated flag
- phenology (reproductive condition in DwC)
- determination history (previous identification in DwC)
BioCASe v. 2.5 onwards has debugging on by default.
2. Engagement with university herbaria
The University of New England (UNE) will start delivering to AVH via a BioCASe provider soon. There was a suggestion that the state and territory herbaria could play a mentoring role with their local university herbaria. The focus of university herbaria varies; some are primarily teaching herbaria, but others participate in loans and exchange programs too. There was an acknowledgement that there needs to be interest and enthusiasm from the university herbaria for a mentoring/knowledge-sharing role to be effective.
3. Memorandum of Understanding (MoU)
There needs to be an Memorandum of Understanding that outlines what fields must be delivered, what data delivery mechanism should be used, how often data should be updated, and what license it should be delivered under. We might also need a separate MoU to cover the delivery of images (these might not all be delivered under a CC-BY licence).
If records come in with URL for an image, ALA can consume that. If that is not possible, image files can also be sent in, with a table that links the file name with the record identifier. ALA has been working on Morphbank; the new version of Morphbank has just been released.
Jim noted that it is important that we manage our image data with the same rigour that we curate our specimen data.
There was some discussion about how to obfuscate sensitive data on labels in digital images.
5. User feedback
Alison provided a summary of the feedback from AVH users. The User Voice system was disabled in September after we realised that users were not getting alerted when someone had responded to their feedback. Other feedback has come in through email@example.com, firstname.lastname@example.org and email@example.com.
Several bug fixes were reported (e.g. problems with taxon name queries, problems using AVH in IE8, missing fields), which were resolved quickly by the ALA team.
The main requests for improvements have been:
- Implementation of a bounding box query (which was available in AVH3)
- Requests to have the map tab first (FCIG are happy to make this change in OZCAM too)
- Better rainfall and temperature layers, so the values can be interpreted more easily.
There was some discussion about whether firstname.lastname@example.org is the best email address to use, or whether it should be email@example.com.
6. AVH Annotations
Dave gave a demonstration of how the annotation system works.
- The OZCAM community has provided some feedback on the issue types. ALA hasn’t received any feedback from the AVH community, but the list of issue types can be expanded if necessary.
- Implication of a geospatial issue, habitat incorrect, suspected outlier (the record is not removed from the biocache, but is removed from spatial portal)
- There is a facet missing; you should be able to facet by records with annotations; Dave will look into this.
- Contacts on the Collectory page are authorised to respond to annotations within AVH (i.e. to verify them); there can be multiple contacts for the one institution (collections manager, data manager etc.). Send details directly to Miles if you want extra contacts added (or if any other details need to be updated: collections.ala.org.au).
- HISCOM would like to ingest annotations from AVH and pass responses back to AVH with data delivery.
- Discussion of delivering HISPID verification level flag and displaying it on the Record detail page.
- It was pointed out that it would be good to deliver herbarium-based annotations to AVH.
- Discussion of how to prevent multiple annotations of the same type for the same specimen.
- The system keeps a copy of the annotated record, and will check newly uploaded versions of the record to see if the part of the record that the annotation relates to has been changed; the original annotation would not be removed, but would be flagged as possibly resolved (this facility is still being tested, but the capability is there).
7. Sensitive data
Need to clarify the procedure for approving who gets access to sensitive data, and for how they access it (blanket access, or one off data dump sent to them).
8. Data delivery to GBIF
If AVH data is delivered to GBIF, the default action is for the whole AVH data archive to be downloaded in one go. The sensitive data is masked in the archive. An AVH archive has already been created, but has not been exposed to GBIF yet.
9. Future work on AVH
ALA does not have as many resources to commit to AVH as it has had previously, so any proposals for development will need to be properly documented and prioritised so that the time needed and resources available can be assessed.
Enhancements will need to be discussed in details with the ALA team so that it can be mapped out, and planned as there are limited resources.
10. Collectory/Resources of Australasian Herbaria (RAH)
CHAH had decided that the ALA collectory pages would become the new Resources of Australasian Herbaria, and would form part of AVH, but it is unclear how the work towards this is progressing.
At the moment there is no relationship between the AVH static content and the Collectory. The Collectory pages provide useful statistics on the number of records delivered to AVH, and the number of records downloaded. It was agreed that institutional or departmental logos should link to the relevant institutional or departmental page, but that we should include a list of AVH data providers on the AVH data page, which will link to the Collectory pages.
It was agreed that it would be good to have a MAHC contact as well as a HISCOM contact (i.e. a collections manager and a data manager) on each institution’s Collectory page.
5. World Flora Online
The World Flora Online (WFO) Technical Working Group (TWG) has been established, and Terms of Reference have been written. Chuck Miller is the chair of the TWG; Greg Whitbread is a member. There have been discussions about the best way of developing the WFO, with some concern that the current proposal is not the best way forward. The Kew-based members of the TWG are coming to Canberra in early 2013 to discuss IPNI2 and whether it can form the basis of WFO. It is important that ABRS and other institutions who are working on online flora projects are kept in the loop.
6. National Species Lists
Greg provided an update on the National Species Lists project. Details about the different services are available at: biodiversity.org.au/.
Greg talked about taxonomic concepts. And names. At length.
In AVH, there are separate classifications for the names that are in APC and those that are not. The two classifications overlie each other in AVH.
7. MEL’s quality control tools
Alison demonstrated MEL's Fancy Quality Control Machine (FQCM) and GPI error checker, and discussed the enthusiastic uptake of the tools by data entry staff.
There was strong interest from other HISCOM members in this approach, and request for the underlying queries used to detect data entry errors.
8. Report from FCIG (Alison Vaughan)
- FCIG had a discussion about the need for a transfer field for Expedition ID, so specimens in different collections that were collected on the same expedition (e.g. BushBlitz) can be effectively queried; who records expedition or collecting trip in their database?
- Agreed that this is important and useful (implications for collecting permits etc.)
Field names etc.
- The idea of making the vocabulary across the OZCAM and AVH sites was discussed; FCIG agree that this is a good idea
- FCIG agree that it would be better to have the maps tab first in the results, instead of the list of matching records.
Biodiversity Heritage Library (Joe Coleman)
- Scanning takes approximately 4 hours per book (300 pp.), plus metadata processing (0.5-1 hr) and marking articles
- They decided it was not viable to set up mobile scanning stations (less efficient than centralised scanning; loss of volunteers and local knowledge)
- Sites with digital publications can upload them remotely (Alison has e-mailed Joe Coleman from MV requesting more information about how this works)
- A program called Macaw is used to clean up the scans (it makes PDFs look better, so reasonably poor quality PDFs can actually turn out okay)
9. AVH Trust Project
The AVH Trust is funding the writing of a proposal for the digitisation of herbarium specimens from Papua New Guinea. There is an additional $60,000 available from the AVH Trust. Two project ideas were discussed:
- Resolving issues delivering high-res images to ALA (there was concern that this is too closely linked to ALA for the AVH Trust to want to fund it)
- APNI/APC editor (there was support for this idea): $58,000 would fund completion of the APNI/APC editor and allow re-use of APNI/APC for the development of state floras
10. Wiki editing workshop
Niels gave a short workshop on how to edit the Wiki. Editing help is available at http://meta.wikimedia.org/wiki/Help:Editing.
At the last HISCOM meeting, several required changes to HISPID were flagged. Since then, Niels and Alison have identified some additional areas where new concepts could be added or existing vocabularies need to be updated. Niels has added some notes to the HISPID 5 for HISPID Users page..
This is a good time to review HISPID, as Walter Berendsohn is planning to bring out a new version of ABCD. We should aim to have a list of additions and fixes for ABCD ready to be sent off to Berlin very soon after the HISCOM meeting.
1. Hybrid formulas
HISCOM wants to be able to deliver atomised hybrid formulas, as well as the full scientific name of the hybrid's parents. Although this won’t be used by ALA, it will be useful for direct harvesting from other BioCASe providers. For this to work, we need the hybrid fields to be added to the identification element of ABCD, and not just in the HISPID extension.
Michelle raised the issue that the HISPID vocabulary for phenology is much smaller than what is implemented in AD, and the the vocabulary needs to be expanded to better describe non-flowering plants.
3. Expedition ID
At the recent FCIG meeting, there was a discussion about the need for a transfer field for Expedition ID, so specimens in different collections that were collected on the same expedition can be effectively queried. HISCOM agreed that this is an important and useful field to add to HISPID.
Note: This is already in HISPID – //element(*,Unit)/NamedCollectionsOrSurveys/NamedCollectionOrSurvey –. -NielsKlazenga 17:57, 21 December 2012 (EST)
4. Geocode source
This is a mixed concept and only partly corresponds with abcd:CoordinateMethod. There is no ABCD concept to deal with the first two items in the HISPID vocabulary ('collector' and 'compiler'), but there is one in Darwin Core: dwc:georeferencedBy. This element should be added to ABCD as well. A person’s name might be more useful than just 'collector' or 'compiler', but would probably have to be translated back in AVH for privacy reasons.
5. Plant occurrence and status
The posnat vocabulary contains some concepts that apply at the specimen level, and others that apply to presence of a taxon in an area.
The current definition of substrate is ambiguous. It seems to refer more to the underlying geology of the collecting locality, than to what the specimen itself was growing on. At MEL, substrate has primarily been used to record microhabitat (it has mostly been used for cryptogams). MEL has recently added a separate field for geological substrate (i.e. the HISPID concept), and would like to be able to deliver both concepts.
7. Deaccession flag
HISCOM agreed that we need to include a deaccession field in HISPID to flag when records need to be flagged as deaccessioned in AVH.
8. Deaccession reason
For the remaining occurrence record to be usefully interpreted by users, the reason for deaccessioning the specimen should be recorded.
12. Units and GUIDs
It was agreed that we need a working group comprising members of each major herbarium to come up with a strategy for implementing GUIDs.
13. ALA (Peter Doherty)
- A funding proposal has been submitted to Department of Science
- Dave Martin and Miles Nicholls positions are secure through to June 2013, and will probably extend beyond that.
- Any major development work for AVH needs to be well scoped; development that also relates to the Museum community may be prioritised.
14. Morphbank (Peter Brenton)
- ALA partnered with US developers to re-develop Morphbank in order for it to meet modern standards (it has been minimally funded since the late 1990s)
- Australian node of Morphbank: content focus is Australian-based images; about 10,000 new images (large batches from ANIC, Queensland Museum and about to get images from Southern Cross University Herbarium)
- Morphbank is a repository and is not designed for bulk downloads, but you can download individual images
- Most images have CC-BY licence, but it does support other licences
- ANIC SATSCAN images are about 450 MB; they were put up online primarily to allow researchers to view collections prior to making loan requests
- The IIP zooming tool is used (it is similar to Zoomify, but open-source, not proprietary)
- The full-sized images are stored in a file server
- It is mostly being used as a mechanism for sharing images and not as a primary store for images (although a couple of institutions are using it for this). The main benefit of Morphbank is the zooming viewer.
- The beta version of Morphbank incorporated into ALA was released a couple of weeks ago
- Batch upload processes are available (requires admin access)
- Potential problem with duplication of records in Morphbank and ALA
- Records in Morphbank can only be edited manually, so the best option would be to only supply the minimum metadata required to link the image with the associated record in ALA.
- There is a minimum set of ten mandatory fields (TSN, Scientific name etc.)
- Multiple versions of metadata can be associated with the one image; each set of metadata has a separate URL
- URLs are permanent, so can be cited safely
- The process for subscribing is straightforward (gatekeeping could be devolved to CHAH and CHAFC)
- Currently a many-to-one relationship between images and specimens, but intention to make it many-to-many
- Tools to markup images; marked-up images would become a sub-image of the original
- Work is continuing on the new version; ALA has run out of money for it, but the international partners are committed to it.
15. AVH Trust project proposal (Greg Whitbread)
Greg presented his AVH Trust funding proposal.
16. Report from TDWG (Paul Flemons)
The 2012 meeting was in Beijing; only 85 people attended (compared to 300+ at other meetings).
- Kevin Richards is leading a group to redevelop the TDWG website
- A separate conference management website will be implemented
- Conference planning: an effort is being made to plan TDWG meetings further in advance to make it easier for people to attend
- Main themes under discussion were:
- genomic data standards
- tissue data standards
17. Biodiversity Volunteer Portal (Paul Flemons)
Paul gave a demonstration of the Biodiversity Volunteer Portal (BVP). The BVP is based around the idea of virtual expeditions.
There are two steps in the process:
Validators are either in-house staff, or are skilled volunteer transcribers who are invited to become validators. Volunteers are good at (and comfortable with) transcription, but are not always comfortable with georeferencing.
Data entry templates can be customised for different expeditions. Talk to Paul if you’re interested. A customised template would cost $5,000 – $10,000 to develop, or you can use one of the existing templates. To get going, you need:
- images (images must have a URL that the BVP can grab them from)
- upload files
- a tutorial for users.
The upload file needs to contain the image filename, its URL, institution and registration/accession number (species name is optional).
Records are included in the Atlas as drafts. Once the record is ingested into the home institution’s database and delivered to ALA, the draft BVP record is removed.
The BVP tries not to have too many transcription projects running at the same time, so as not to overwhelm volunteers or spread the effort too thin. Currently there are usually between two and ten projects running, and six pending (and more in the pipeline). Paul is keen to get the herbarium community involved, so we can jump the queue if we have a botanical project to put up.
Paul’s estimate of the efficiency of the process is that you get the equivalent of three people’s work for the cost of one (but this is based on insects, which are very time-consuming to unpin and photograph; herbarium specimens will be much more efficient to image).
It would be possible for us to run a test project to gauge efficiency and test workflows, and then remove the dataset from ALA.