Joint HISCOM/MAHC/FCIG 2017 Melbourne

= WEDNESDAY 18th October 2017 Joint HISCOM / FCIG / MAHC Meeting = Location: Melbourne Museum. This is the first joint meeting of three groups where we have had a day together.

Attendees

 * HISCOM - Wayne Cherry (NSW), Ainsley Calladine (AD), Nick dos Remedios (ALA), Michael Hope (ALA), Ann Fuchs (ANBG), Greg Clarke (ANBG), Niels Klazenga (MEL), Nimal Karunajeewa (MEL), Donna Lewis (DNA, chair)


 * MAHC - Shelly James (NSW), Caroline Ricci (AD), Jo Palmer (CANB), Joanne Birch (MELU) Melissa Makin (MELU), Karina Knight (PERTH), Lyn Cave (HO), Sarah Hirst (DNA), Karen Marais (MQU), Frank Zich (CNS, chair), Antony Kusabs (WELT), Ines Schonberger (CHR), Gill Brown (BRI, minutes), Helen Vonow (AD), Pina Milne (MEL), Kaylene Bransgrove (BRIP)


 * FCIG - Nicole Fisher (ANIC, chair), Margaret Cawsey (Wildlife collections), Keith McGuire (SA), Clifford Davy (TMAG), Ursula Smith (Museums Victoria), Paul Flemons (Aus Museum)

Apologies

 * MAHC -
 * HISCOM - Aaron Wilton (CHR), Laurence Paine (HO), Eleanor Crichton (AD), Simon Checksfield (CSIRO), Ben Richardson (PERTH)
 * FCIG -

General Discussion
With the delivery of the revised HISPID standard and the Australasian Virtual Herbarium, the next set of issues/drivers/focus for HISCOM need to come from the herbarium community - what are technical blockages between each institution? what technical issues need to be addressed to deliver the best outcome for Herbaria.

Recognised that communication flow needs to be improved across the board. Noted that it would be good to have a go to person on CHAH executive but not necessarily attending each meeting.

Managers of Australasian Herbarium Collections (Frank Zich - MAHC Chair)
Subcommittee of CHAH, represented by collection managers from Australia and New Zealand herbaria. First established in 2009 with the purpose to exchange information about collections and develop guidelines and common standards that help us aspire to good technical practice in our collections. Our group is great for information sharing and resources. We have primary list of contacts for all herbaria in Australia and New Zealand, lists of plant taxa susceptible to insects/pest, email list for all things collections.

We have 2 teleconferences and one face to face (AGM) meeting per year. We always meet at someone's institution base and do tour to share best practice and see how others do things. We have worked on:
 * Quarantine/Biosecurity documentation
 * Loans documentation - conditions and agreements
 * Guidelines for destructive sampling and vouchering (particularly for genetics)
 * National standards for Australasian Herbaria (DRAFT, with CHAH for c. 18 months) - best practice guidelines
 * TYPE Imaging procedure (for GPI project)

MAHC will continue to work on standards, guidelines, disaster planning, Nagoya Protocol and other collections matters.

Herbarium Information Systems Committee (Donna Lewis - HISCOM Chair)
Established in 1995 to assist with developing Australian Virtual Herbarium (AVH) and to provide technical support and ongoing maintenance. In June 2016 the AVH became the Australasian Virtual Herbarium, with data from Australian and New Zealand Institutions. HISCOM have worked on a number of projects including the AVH, which now has 26 data providers including state, territory and university herbaria in Australia and New Zealand. HISCOM recently committed to undertake a review of HISPID (Herbarium Information Standards and Protocols for Interchange of Data) in the 2014–15 financial year. HISPID has been the standard used in Australian herbaria (at least) since the first version was published in 1989. The last major update of HISPID took place in 2007, when the transfer format was changed from one based on ASN.1 to one based on XML using ABCD’s extension schema capability. The definitions in both ABCD and Darwin Core tend to be rather general and we still need HISPID to provide definitions and vocabularies that are more specialised towards herbarium data (while still complying with the broader definitions in Darwin Core). The recent version of HISPID 6.0 was published in 2016 and is available in GitHub http://hiscom.github.io/hispid/terms/. HISCOM also facilitated setting up BioCase Providers 12 in total for Australian State/Territory and University herbaria, and assisted institutions with alternative methods for data delivery to the AVH - including BRI which deliver DwCA directly to ALA and then NZ herbaria, which deliver via CHR.

HISCOM Positions
 * Chair - Donna Lewis current (Ben Richardson, just elected as next chair) - 3 year term
 * Assistant Chair - Anne Fuchs (just elected) - 3 year term
 * Technical Co-ordiantor - Niels Klazenga
 * AVH user liaison group - Niels Klazenga & Nick dos Remedios (technical enquiries), Nimal & Eleanor (data related enquiries), Aaron Wilton (NZ) but proposed additional NZ reps

We have 2-3 teleconferences and one face to face meeting (AGM) per year. A CHAH local member will be invited to AGM and also CHAH rep to be invited to teleconferences. In the past we have also had separate meetings as well for working groups (e.g. HISPID working group, 4-5 members, etc.) if needed. Meetings are funded by each institution but there is a limited amount of discretionary CHAH funding that can support single people attending face to face meetings (for both MAHC and HISCOM).

Faunal Collections Information Group (Nicole Fisher - FCIG Chair)
FCIG is a small group and productive group that is similar to HISCOM but only Australian. It includes an ALA representative like HISCOM does. No CHAFC member sitting on FCIG but chair gets copied in on minutes and agenda of meetings. Worked on big projects in the past including OzCam. Sometimes we have working groups, e.g. tissue working group. Group started with OzCam (2003-4) but FCIG started properly in 2007/9? to broaden out to cover data issues across the museum sector. No equivalent of MAHC in faunal collections. There is a group called CHAEC (Council Heads of Entomological Collections) but it is not currently active. This group does have a collections group and have a representative on CHAFC meetings.

Actions from 2016 MAHC/HISCOM AGM
HISCOM and MAHC went through the open and closed action items from MAHC/HISCOM Joint Meeting in Brisbane 2016 and updated or closed the actions. Actions can be found at https://github.com/hiscom/business/issues. The joint actions are labelled MAHC and the year the actions was created from the AGM.

ALA Collections Community Coordinator’s
Neither community currently has a coordinator so both positions are vacant at the moment. Feedback to Michael is that it really relies on the personality of the person in the role. John (ALA) is happy to continue to support the roles but if we decide to continue with CCC we may need to redefine role but the PD does not need to be the same for both communities. The term of the roles is likely to be 18 months, as ALA can currently write contracts until end of June 2019. FCIG decided role worked best when CCC was a member of FCIG and new what the issues were.

Discussion around the way the position should be filled, as 2 half positions or 1 full time. Preferred way for both groups was to continue with 2 half time positions. Agreed that the role needs to be a communication role - promotion, marketing, which is not the natural strengths for many researchers and scientists. The person needs to understand science and be able to communicate it to developers or to Michael who can translate to developers.

ALA Update (Michael Hope)
Things ALA have worked on and are currently working on. Spent a lot more time inward over the past 12 months because most of the systems were the original that were set up and needed updates to be able to move on. ALA has 2 years' worth of funding (9M over next two years). Now at 72 million records, with a steady rise in around 9-10 million new records each year. Biological collections numbers are around 6.2 million for plants and 5 million for animals. Now no limit on how many records you can download at one time.

NCRIS facility
 * ALA/AEKOS data integration complete. Now that deduplication only around 45,000 new records.
 * All NCRIS biodiversity facilities are now talking to start collaborating more (e.g. TERN, ZoaTrack/Nectar/BCCVL/ANDS/RDS)
 * ALA/NCRIS Symposium in May 2017. Good feedback from NCRIS community.

ALA support is going to move support to nectar user support service so will be a ticketed system rather than an email group so more efficient dealing with calls. ALA is starting to look at impact analysis not just use of data.

Infrastructure update
 * System upgrades
 * Clustered processing server so can add more servers to cluster if hit performance issues to improve scalability and improved performance in re-indexing all ALA records
 * Load-balanced bio cache servers (now 4) to improve stability and performance
 * Offline download server - now send you an email with your data, record limit removed.

Updated Names Infrastructure Putting together the name index to identify all records. Take NSL and add to it to cover exotic and foreign species. Getting better with each iteration of it to improve merging of sources and matching. Manual process and checks takes 2-3 months so if new name in APC won't immediately be reflected in ALA. New sources of names added:
 * NZOR - done
 * WORMS - underway
 * Genomic (GreenGenes, SILVA,?) - still under investigation as adding more bacterial/eDNA type data

If you go to names on a species page the sources of the name is identified and it will now give you list of all sources of that name.

Hubs
 * Updates to ALA - new look and skin. More changes coming to the interface of ALA. Watch this space.
 * Updates to AVH - NZ data incorporated.
 * OZCAM - up to date
 * APPD (Australian Plant Pest Database) - restricted access for biosecurity. Requires updates.

Biocollect 530+ community projects, up from 400 last year. Large presence in the citizen science base in Australia. Original going to be a community project finder but grown so now used for ALA record a sighting function (not a biocollect project)Custom portals - getting a lot of agencies coming to ALA to write custom portals (e.g. Govt agencies, NRM, NGOs)

Profiles
 * eFLora now released https://profiles.ala.org.au/opus/foa or www.ausflora.org.au
 * User Interface overhaul and tidy up
 * Streamlined searching
 * Masterlist - at a collection administration level to help manage collection within the flora. Can now say my flora contains these taxonomic groups and create master list and apply flora to that list and will automatically create search and browse functions for that list. Will create a stub profile for those that don't already exist and fill in automated features - distribution map, etc.
 * Filter functions - similar functionality as masterlist at a user level and can apply that to a flora (I am only interested in these species) to limit your view of the flora to that species list

Future options
 * IEK
 * State Flora
 * Traits?
 * Field guides?

Genomics Data - phylogenetic tools we have phylolink but ALA starting to wonder if they do these domain specific tools or do they focus on aggregation and delivery of data focus and rely on NECTR to produce the tools themselves (e.g. BCCVL) or someone else who can do those tools.

ZoaTrack - coming along. Telemetry data that stores and analyses and lets researchers track their telemetry data. Only around 3-4 tools across the world that do this. Peggy (Museums Victoria) who manages this is highly involved in international telemetry standards group about sharing data so can begin to access each other's data. Trying to bring some of this summary data into the ALA.

Annotations and alerts Nick did a major release and fixed this around a month ago. Email alert now has a table and can click on link and will take you straight to the record that has been annotated. All annotations now have unverified, verified and refuted status. If you want to check your institutions annotations CCC put together a document and sent around in 2016. Go to your collections page on the ALA, select the red number of records at the top and then on the facets on the side you go to "Assertions" and select "Unconfirmed" under "Has user assertion" facet. To edit/verify annotations you need to be listed as a contact with edit privileges in the collection page. Your login to ALA needs to be the same as your email (case sensitive) to your email on the collection page.

National Species List and Australian Faunal Directory (Anne Fuchs)
APNI Community editing:

CHAH requested it be a community exercise around 12 months ago so Kirsten Cowley and Anna Munro, who have edited APNI for years, ran two 3 day training sessions for one staff member from each state/territory institution. This training enabled staff to:
 * Become familiar with data requirements for consistency and completeness
 * Each institution take on responsibility for institution journals, state census or allocated publications
 * Anna and Kirsten are still available as support

A number of improvements have been made to the name editor after more people started using it. Every institution has been trained and all but one has been doing edits since the training.

NSL search:

Feedback from community that the advanced search page output formatting was not what was expected and not always complete or intuitive. NSL team aimed to: Prototype last year - on track with content format, search options partially met.
 * Format the output for community expectations
 * Balance ease of search with richness
 * Scalable to additional shards/disciplines (e.g. lichens)

Names search (this is the next iteration, and the NSL team are continuing to develop it). Work to ensure it can be presented on any type of device. You can now choose a number of options including: what sort of name you want and advanced search by author, etc.

The NSL team would like a small group of interest and available to contribute. Please contact Anne if you are interested. Name check facility exists where you can copy a list of names and check all of them against the APC at the one time to find out if all those names exist or are accepted names, etc.

Margaret asked what the timeline on an NSL for animals was. Anne said they are working with Anthony Whalen (ABRS) on status of AFD. But there needs to be data work done so they can get the data out of AFD and into the NSL.

DAWR will be teaming with Anne's group to have a version of the NSL for their organisms. They have a recruited a developer who will be meeting with Anne and Greg very soon.

Images
As an outcome of the ALA Collections Community Workshop in Canberra, February 2016 the CCC’s did a document on biological collections image delivery to ALA, including screenshots. This was circulated to FCIG but HISCOM and MAHC do not remember seeing this. Other documents have been produced without knowledge of the first so work is being repeated.

The ALA has no minimum metadata requirements but recommend creator and licence.

Ursula and Michael discussed that non-vouchered (orphaned) images can be put onto flicr and machine tag them, and then pulled up to ALA. Pioneered by BHL.

Species records in ALA image tab for common animals gets filled up with poor images. The ALA has a preferred species list and will be putting it out to community to provide preferred images. But in the mean time you can click on images and vote them up or down (must be logged in to vote). You can also select the Star on the image if you have administration rights; CCC did this recently.

We can send images to ALA for specific taxa that are not related to voucher specimens if we want. How will we manage taxon names on non-vouchered images? Ok if straight forward name change but more difficult if not.

Asked if url need to be persistent if delivering images this way (via url)? Nick responded that it will work but it will then be a broken link back to it so not preferred. Can also drop images onto FTP upload server to deliver images to ALA. Image viewing plugin can handle high resolution images and will only provide what they can see. ALA has no size requirement and can handle high resolution, the size restrictions are around delivery from institution to ALA. ANIC provided 900MB images of drawers of insects that can see whole drawer and then down to single insects.

http://specimens.ala.org.au is a page Images of specimens from Australia's Natural History Collections and aggregates all images from each institution.

AF: MAHC would like a consolidated reference document on supplying images.

Nagoya
Working group Gill and Pina from MAHC have formed a working group to advance Nagoya compliance within the collections community. Margaret and Shelly have expressed interest in joining to further Nagoya compliance in our collections. Anyone else who is interested, please speak to Gill and/ Pina. AF: could we put a link to Gill's presentation here on the work being done by QLD as it seemed like a good model for considering all the aspects?

Data HISPID/Data exchange standards use fields from GGBN. Not PIC but have permit. Still need to store restrictions around use and nobody has yet sorted it out. Could probably categorise the types. Only really need:
 * Permit number (start/expiry date)
 * Who permit is issued to
 * Conditions

Australian Wildlife Collections (ANWC) will be scanning permits and storing in new CMS.

Duplicates
Recognition that 'duplicates' has two distinct scenarios 1) specimens which have been sent to different institutions that go on to have their own 'life' eg. different determinations etc. 2) the same data which has been provided to multiple datasets which is then sent to the ALA from each. Each scenario has similar but slightly different requirements in terms of identifying duplicates.

Data versus Specimens Issue came up at TMAG with duplicates found when records entered for collection events (e.g. investigator expedition) so end up with unregistered and then uploaded again when they get registered in a particular TMAG collection. Observations and collection items will be flagged as different types (observation and preserved specimen) but they won't be noted as duplicates.

Investigator expedition with Marine National Facility (part of CSIRO) requires that data must be published within 12 months of the trip but it is unclear if it was to be published on ALA or just in spreadsheet or pdf. This highlights the need to have these discussions before people go on these trips and need to figure this out before. It needs to be clear in the contracts or up front for everyone to know what will happen to the data and data management.

We need a good and consistent way to link observation data and lodged specimen data. Margaret asked if this should be like BushBlitz, e.g. given a collecting trip a name. Ursula says we are talking about this amongst ourselves but need to talk about it more with the people collecting the material.

Good case for ALA to use to develop their duplicate detection tools but other fields such as field number may be useful to say these two things are the same.

BushBlitz expedition name should be assigned before trip but can find it on website under resources tab. Task to report on BushBlitz records we need to deliver survey expedition names to ALA. The list is still loaded on the Bushblitz website : Under the resources tab, bottom of the list is “Bush Blitz Point data name formats”

Wildcard characters don't currently work in the url query or search fields for ALA but Michael will wrangle a developer to fix it.

Duplicate detection that Ryonen (CCC) was working on last year and processing ALA does around duplicate detection is around finding multiple records that point to the same record (data duplicate) not a duplicate specimen in a separate institution (specimens duplicate for herbaria - similar to the FCIG tissue problem). Not a lot the ALA can do in terms of automatically detecting But can make it easier for us to detect or flag potential duplicates because at the end of the day you will still need a human to make a decision on what is the duplicate. If put in donor and original accession number so could then start to put some smarts around/join the specimens up and find out when identifications or geocode change.
 * Different definitions of duplicate
 * Very complex

ALA treats everything as an observation even if its basisOfRecord is preserved specimen or anything other than observation. ALA needs to look at how this is treated going forwards because this won't work into the future with gnomic data coming online.

Document storage and data sharing between groups
Options: Note: Google docs not an option because it is blocked by some institutions
 * Basecamp (ALA has a licence)
 * Others?

CCCs could co-ordinate this and it may help them to get their head around what our major issues are. In the first instance, the Chairs (Frank, Nicole, Donna) are to scope what the groups need exactly, then look at the options - this was also supported by CHAH at the combined HISCOM/CHAH meeting.

2018 SPNHC & TDWG Conferences and Committee Meetings
HISCOM looking to meet the day before TDWG/SPNHC, MAHC to try to co-ordinate and a good opportunity for NZ herbaria who can't normally make MAHC/HISCOM meetings to attend a face to face meeting.