HISCOM/CHAH Workshop: Towards a common approach to Australasian Electronic Floras

Venue
Noel Lothian Lecture Theatre, Botanic Gardens of Adelaide, Hackney Road, Adelaide.

Site map. The Lecture Theatre is at location 17; it is at rear of Plant Biodiversity Centre building which houses the State Herbarium, facing the Bicentennial Conservatory.

Access through Gardens from city, or through Hackney Road entrance to Gardens between Plant Biodiversity Centre and Goodman Building to immediate south (three story red-brick Gardens administration building).

Taxi
Phone 131008. The Plant Biodiversity Centre is location number 4860.

Pick-up point on footpath outside garden fronting the PBC building, facing entry point on Hackney Road.

Date
Monday to Tuesday, 3--4 December 2007. Start: 8.30 am for 9 am start.

Scope
This specification for Electronic Floras covers the data representation of traditional floras, monographs and revisions when managed and viewed as digital media, either on line or on CDROM or DVD. This includes all organisms traditionally associated with botany - i.e. plants, bryophytes, fungi, algae etc. It is built on the standards of TDWG and deals with:


 * introductory essays and background material
 * descriptive taxon profiles
 * taxon fact sheets
 * identification keys
 * taxon names and synonyms
 * descriptions
 * associated text information
 * associated image information
 * references and bibliography
 * links to external data sources, including:
 * checklists
 * censuses
 * census-linked data and nomenclators
 * specimen data (maps)
 * images
 * analytical and visualization tools
 * versioned textual input from authors and editors

The purpose of the specification will be to enable consistent and interoperable data management across Flora projects, sharing of data and ability to undertake research and create new Flora products from multiple data sources.

Out of scope:

Details of multiple-entry atomized matrix keys (DELTA, LUCID type) are out of scope as they are the subject of specific application development. These identification tools often contain taxon profile or fact sheet information which may be referenced by electronic floras.

Anticipated attendees

 * Helen Thompson	- ABRS
 * Dale Dixon	- DNA
 * Greg Whitbread	- CANB
 * Jim Croft	- CANB
 * Marco Duretto	- Tasmanian Herbarium (HO)
 * Laurence Paine	- Department of Tourism, Arts and the Environment, Tasmania (HO)
 * Karen Wilson	- NSW
 * Brett Summerell (CHAH Chair)	- NSW
 * Kevin Thiele	- PERTH
 * Peter Neish	- MEL
 * Neville Walsh	- MEL
 * Ailsa Holland	- BRI
 * Murray Henwood	- SYD
 * Ilse Breitwieser	- CHR
 * Gerry Cooper	- CHR
 * Belinda Pellow	- Wollongong University
 * Bill Barker (Convener)	- State Herbarium of South Australia (AD)
 * Juergen Kellermann	- State Herbarium of South Australia (AD)
 * Hellmut Toelken	- State Herbarium of South Australia (AD)
 * Robyn Barker	- State Herbarium of South Australia (AD)
 * Fred Gurgel 	- State Herbarium of South Australia (AD)
 * Peter Thornton	- SA Department for Environment & Heritage (AD)
 * Rex Croft	- SA Department for Environment & Heritage (AD)
 * Paul Coddington	- South Australian Partnership for Advanced Computing /University of Adelaide

Agenda
Adjustments welcome!

PREMEETING - YOU ARE ENCOURAGED TO DEVELOP CONTENT UNDER THESE HEADINGS

As part of developing the agenda, it would be very helpful if we could start work on the framework of the standard itself. See Australasian eFloras versus Flora of Australia schema

Welcome
Meeting opened at 9:15am with a welcome from Bill Barker.

Brett Summerell was invited to chair the meeting.

Laurence Paine was invited to record the minutes.

Clarify Workshop Agenda, Objectives
Bill Barker provided a background to the origins of the workshop, indicating that electronic flora's have been a topic of interest at the previous 2 HISCOM meetings (Hobart and Darwin). Bill provided an overview of the collaborative achievements of the CHAH community (HISPID, AVH) and indicated that the goal of the workshop should be develop a common approach to the exchange of data to assist with the collaborative development of electronic floras.

The chair invited further comments on the agenda and objectives of the workshop, and the attendees agreed to enhance and alter the agenda as the morning progressed.

[[Media:Efloras_wrkshp_introduction_01.pps|Introduction Presentation]] (Bill Barker spoke verbally to this)

Current Flora projects
Electronic Flora products on-line, on disk or in production (proforma handout filled out before meeting - brief presentation at meeting)

FloraBase is not a flora (it has no capacity to do identification), however it does have an advanced search function on a number of attributes which assits the end-user in determining the likely species that they are inquiring about.
 * FloraBase - the Western Australian Flora (Kevin Thiele)

Goal: Harvest descriptive information from other sources to supplement the information provided. Provide a search function on upper level keys

Issues: Currency of information

[[Media:E-flora_meeting_Adelaide_Dec_2007.pps|Presentation]]
 * Flora of Australia Online (Helen Thompson)

Flora of Australia deals with data, not specimens collection information (apart from specimen citations), which has its own unique challenges for the management of the currency of information. Aiming for integration with APNI/APC for nomenclature/taxonomy. LSID's are considered a key part of the future and are planned for use with the Australian Faunal Directory (AFD) before the Flora of Australia.

Issues: IP and copyright (The consensus of the group was that acknowledgment of the author/contributor should be sufficient to resolve the majority of issues, however there would be a significant overhead if the author needed to provide permission every time it was used by a different state.)


 * The Flora of the Sydney Region (Murray Henwood)

(Presentation is too large to store in the wiki)

Flora of the Sydney Region is specifically designed as a portable publication for the NSW central coast botanical region (supports teaching endeavors)

Keys are deliberately directed to supporting the delivery of field based teaching, with species descriptions embedded within the key pairs.

Goal: Revised edition which will be available by print-on-demand in hard format

Issues: Composition of keys (order in particular) is an issue for consideration

Kevin Thiele raised the issue of why the Flora of the Sydney Region was not a subset of the NSW Flora? Belinda Pellow indicated that it could be a subset of the NSW flora, however the teaching aspect needed consideration. Karen Wilson concurred that it would be feasible, but to date had never been a priority.


 * PlantNet FloraOnline - Flora of New South Wales (Karen Wilson)

[[Media:AD_NSW_FL.pps|Presentation (Click to download)]]

Goal: Supplement the hard-copy version.

Issue: Synchronisation of information between specimens and flora information.
 * Electronic Flora of South Australia (Bill Barker)

[[Media:EFloraSA_2007_efloras_wrkshp_01.pps|Presentation (Click to download)]]

eFlora is a multipart publication and single volume field guide.

Issues: APC compatability (misapplication of names especially), maintaing the census data and specimens currency, capacity to access information via synonyms.


 * Flora of New Zealand (Jerry Cooper)

[[Media:JerryCooperLandcare.pps|Presentation (Click to download)]]

NZ have developped a number of front-end tools to assist with the maintenance of flora information (.Net/SQLServer technology focussed). Australiasian integration is seen as a logical outcome.

Jim Croft made the observation that the logical model of the NZ solution was remarkably similar to the Atlas of Australia model.

Other States

Victoria: No current eFlora solution, some data available as by-products of other projects. Algae and Fungi would be easy to implement.

Northern Territory: Northern Territory will have an online electronic flora solution available in 2008 (aim is to be available by mid-2008).

Tasmania: Tasmania will have an online electronic flora solution in 2008. It will begin with simple accounts being made available as pdf early 2008, and developing over the year.

ANBG: Work has stalled on an electronic flora, but is a project which is still active. This workshop will address some of the standards to adopt.

Umbrella or Related Projects
The following projects have at least in part elements that will intersect with Electronic Floras and will have applicable standards and conventions:


 * Atlas of Living Australia
 * Australia's Virtual Herbarium
 * Global Biodiversity Information Facility
 * Encyclopedia of Life

Atlas of Living Australia

Kevin Thiele provided an update on the current status of ALA. A business plan for 2007/08 has been developed, which includes:
 * Tools available assessment (a gap analysis of the tools available to plug into ALA)
 * Data available assessment(names, descriptive, dna, multimedia) - undertake a gap analysis of the stores and maturity of the available data
 * IdentifyLife collaboration (web-based repository for atomized descriptive data to assist with the identification of organisms)

Phoenix - a commercial tool to assist with the management of dichotomous keys. It uses both textual and graphical prompts to assist with the process of identification. Phoenix is a desktop based solution which has a web publication process available. Greg Whitbread indicated that the ANBG had successfully built a server side solution to build dynamic keys (implemented Phoenix using Flora of Australia coded key). Jim Croft raised an issue with using a commercial solution as part of the delivery to ALA/GBIF (IP problem).


 * Australia's Virtual Herbarium

Brett Summerell provided an update on the AVH Trust Board. Current highlights are: CHAH to be more involved with the management of the AVH Trust Changes to the Board due to resignations of incumbents. It was always envisaged that AVH would encompass eFloras AVH is the plant community contribution to the ALA (through CHAH)

Bill Barker raised the issue of dynamic data provision to AVH by some institutions (HO, DNA, BRI, and others), and the status of the funding request made to ALA. Greg Whitbread indicated that he had prepared and submitted the proposal. Kevin Thiele indicated that he was not aware of the ALA management committee reviewing the proposal. Helen Thompson indicated that ABRS submitted a number of proposals (regarding species lists) as part of the initial submission to DEST for ALA project funding, none of these proposals ended up as part of the ALA, but the development/comletion/use of such species lists could be of use to electronic floras.

Jim Croft raised the issue of formalizing the scope of the information which fits under the AVH product. Bill Barker concurred that the AVH should be the brand by which all Australasian Flora is known.

ACTION: Finalize the scope of information which will be delivered through the AVH product, and submit to CHAH for ratification. (Bill Barker)

NZ Landcare Research indicated that it had in principle agreement to provide specimen records to AVH.

Jim Croft made the following observations regarding GBIF:
 * Global Biodiversity Information Facility


 * descriptive data wasn't part of the original scope of the information capture, but now is becoming the focus of the development of standards by TDWG.
 * changes to executive staff at the GBIF secretariat has provided some exciting opportunities to reaffirm the primary direction and mission of GBIF.

Jim made the observation that the Encyclopedia of Life fact sheets may overlap with specimen descriptions and electronic flora, and that was a risk that needed to be considered.


 * Encyclopedia of Life

EOL potentially has the same issues as ALA - no identified funding for content, and the EOL/ALA partnership needs some work. EOL has invited ALA to be a member of the project, and join the EOL Board (ALA to look into a suitable MOU or other agreement with EOL).

Electronic Floras will need to have the capacity to feed into each of these.

Greg Whitbread provided a view that all these projects are complimentary to each other, and in developing electronic floras will have some competing activities.

Jim Croft discussed the overlap of entities that provide names lists (Species2000, OTIS etc), and the need to clarify which entities provide nomenclature, concepts and checklists authoritatively.

Applicable standards
The following TDWG standards:

The SPM standard is still under development. It is a high level schema for species profile information.
 * Species Profile Model (SPM) (Greg Whitbread)

SDD is an XML schema for describing descriptive paragraph and dichotomous keys data at both the atomised and "blob" (textual) level with markup in between.
 * Structured Descriptive Data (SDD)  (Kevin Thiele)

TCS represents taxanomic concept information which makes it feasible to exchange species check lists, distribution and identification data.
 * Taxonomic Concept Schema (TCS)  (Jerry Cooper)


 * Access to Biological Collections Data (ABCD)  (Peter Neish)
 * Herbarium Information Standards and Protocols for Interchange of Data (HISPID)   (Peter Neish)

Schemas to support the interchange of specimen and observational data.

And activities:
 * Taxonomic Literature Interest Group

Greg Whitbread made the observation that all of these standards will have an impact on individual components of any electronic flora project.

TDWG 2007 reports
These should provide some of the current global context for this workshop.

Alex Chapman
(Discussion lead by Kevin Thiele in Alex Chapman's absence)

My TDWG 2007 talk was entitled Mechanisms for coordination and delivery of descriptive data and taxon profiles in the Australasian Biodiversity Federation and covered a number of issues to be discussed in the proposed Workshop.

The full presentation includes a short comparison of the current Flora of Australia schema - a current candidate model for presenting a polythetic eFlora from printed and online sources of various types - with TDWG's newly-proposed Species Profile Model - a high-level semantic web RDF schema.

Alexc 01:08, 8 October 2007 (EST)

Seeking agreement towards common standards
Karen Wilson raised the issue of understanding the Australian use cases. Bill Barker provided a couple of examples of maritime projects which seek to develop species profiles of regional flora (joint SA/WA coastline and Great Barrier Reef). The ensuing group discussion identified the following use cases:
 * Aggregation of information
 * Re-purposing of existing content
 * Catalogue of existing information
 * Notification (alerts) by subscription
 * Collaborative enhancement and maintenance of data
 * Attribution (use or a requirement of publishing?)
 * Peer review (validation?)

Metadata may need to be applied to the information to assist end-users with understanding the scope and context of the applicability of the information (state-scoped description v national-scoped description).

Fact sheet content

 * Suggest starting with the Flora of Australia model as the foundation
 * Information 'chunks' from the Flora of Australia given in the xml examples, but in a hierarchical/nested view. Three pages at the beginning of the pdf cuts these down to the base chunks & lists them alphabetically (http://www.environment.gov.au/biodiversity/abrs/online-resources/flora/pubs/flora-of-australia-online-schema-documentation-july-2007.pdf).
 * Expand as necessary to accommodate other Electronic Flora project data content

The existing FoA schema was compared against the needs of the group. Additional attributes were identified. Some issues that were raised during the discussion include:
 * citation schema - there does not appear to be an existing TDWG standard
 * frequency (number of taxa)
 * vernacular and indigenous names (these plus common name from TCS, not SDD)
 * proclaimed [weed/conservation] status (atomised with a controlled vocabulary?)
 * atomization of description to logical levels (not to a codification level for this exercise. Diagnostic as well as full)
 * distribution (textual) based on regional vocabularies
 * qualifier (e.g., cf.) to be kept in SDD
 * images - own schema to support attribution etc elements
 * phenology - will need to take into account different plant types (algae, vascular etc.)
 * notes - sub-levels (contributor, name, protologue, biology, ecology) to allow extra info to be provided. (Need capacity for sub-attributes e.g. fire as a sub-attribute of ecology)
 * description layers and taxon relationships, scope of applicability of description
 * scope of description/synonyms (synonymy from TCS)

[[Media:e-flora_interchange_updated.doc|SDD document for an eFlora (Click to download)]]

ACTION: Helen Thompson and Jim Croft to coordinate the collaboration of the SDD modeling process.

Re-presenting printed keys

 * Primary focus on dichotomous keys in published Floras
 * Interactive electronic renditions (e.g. Phoenix)
 * Matrix key fragments

At the request of Rex Croft, Kevin Thiele provided an overview of the capabilities and limitations of Phoenix with respect to interactive keys. Suggestion was made that consideration be given to an AVH style project for an open source style keys development project. Jim Croft raised the issue that as a community we should not be considering development of that style of workbench tool, instead the community should focus on ensuring standards are accepted by commercial providers, so that information remains open and accessible by the community. Gerry Cooper raised the issue that the community needs to create and provide information management tools. The general consensus was that the tools to generate the information must provide standards based information to ensure ongoing interoperability.

Greg Whitbread raised the issue of how to manage and present keys in the future. The consensus of the group was that both dichotomous and matrix style keys will need to be supported for the foreseeable future.

ACTION: Small working group to review and provide recommendation on legacy (printed) and electronic keys, representation in an online environment, integration with interactive keys, including consideration of standards. Development of an appropriate proposal for ALA. (Kevin Thiele)

Possible attendees:
 * Bill Barker
 * Karen Wilson
 * Dale Dixon
 * Marco Duretto
 * Murray Henwood
 * Helen Thompson (Helen to provide a list of known existing CDRom & online keys)
 * Greg Jordan

Discussion continued regarding dichotomous keys and web-based representation, which could allow greater flexibility for decision tree

Sharing of content

 * Nationally
 * Internationally

Jim Croft raised the issue that there will need to be local level consideration given to restricting access to some data to prevent it from being transferred.

Geographic coverage of content

 * International
 * State, Territory
 * Regional, Local
 * Special Geographic Areas

Juergen Kellerman raised the issue that qualification of geographical scope will need to be available in the standards, to assist with contextual applicability of information.

A model for the overall system
Bill's eFloraSA diagram to promote discussion --->

ACTION: Conversion of the conceptual diagram into a logical model. (Bill Barker)

It would be useful to have other publication models visible here for comparative purposes. eg.


 * Flora of Australia
 * PlantNet
 * FloraBase
 * EoL ...

Data sources

 * Existing on-line Floras
 * Paper-based Floras
 * Taxonomic journals
 * Interactive Keys
 * Lucid
 * Delta
 * APNI access (bibliographic especially)

Data storage methods



 * Flora of Australia Online xml schemas and examples
 * Text, PDF
 * HTML, CSS
 * XML, XML Schema
 * Relational database, database gateway

Data exchange standards

 * TDWG SDD
 * TDWG SPM
 * TDWG TCS
 * TDWG ABCD
 * Taxanomic literature standard (taXMLit, TaXML)
 * Darwin Core

Jim Croft raised the question of which preferred components of each standard we would use. Greg Whitbread and Laurence Paine had a discussion regarding the merits of a single document as against assembling multiple documents to produce a final presentation of the eFlora. From the discussion it was agreed that neither is the ideal solution, however for flexibility and re-usability the multiple part solution appears the better outcome.

The key protocol for the delivery of information is TAPIR. A possible alternative is to use RDF and LSID's.

ACTION: Put together a small working party to recommend a preferred model, and which delivery mechanism will be the most logical. (Greg Whitbread/HISCOM)

Australasian eFloras versus Flora of Australia schema
ACTION: Add comments to this section of the wiki as issues and opportunities are identified. (All)

PLEASE TRY COMPLETING TABLE AS FAR AS APPROPRIATE BEFORE THE MEETING.

Note: This is a preliminary exercise which will inform on how each eFlora system, as far it has been developed or it is planned, compares with the Flora of Australia. From this exercise we should be in a better position to discuss standards for integrating our data in various ways.

Flora of Australia Online xml schemas and examples:
 * Definition of Flora of Australia elements listed in table http://www.environment.gov.au/biodiversity/abrs/online-resources/flora/pubs/flora-of-australia-online-schema-documentation-july-2007.pdf
 * An example of records from the Flora of Australia to enable comparison of content http://www.environment.gov.au/biodiversity/abrs/online-resources/flora/pubs/flora-of-australia-text-example.doc

You might find this file useful to matching your fields: The Flora of Australia schema at the level presented here has elements which most of us will have, others which we are not planning to include, others which may have data which are put in more than one of our own eFlora elements, and others which we have divided into more than one element.

Table 1. The equivalency of Australasian (e)Floras with the Flora of Australia schema
KEY:
 * Content - equivalent, split (in other FoA element groupings), deficient (some data fields missing), not available; enter C e,s,d,n.
 * Arrangement - same, different; enter A s,d.
 * Structure - atomised further, same, included in broader element grouping; enter S a, s, i.
 * Current functionality - suffix * indicates under construction (or being fixed!)
 * Major uncertainty on relationship - blank cell

NOTES:

1 HO - Early thinking stages of an eFlora system (functional design stage). Some synergies exist between existing collection databases and FoA, but with a new Collection Management system under development, we are in a position to align ourselves with whatever standards and schemas are necessary.

2 PERTH - A composite scoring based on existing FloraBase taxon profiles and planned integration of Nuytsia journal content.

Image handling

 * Image formats
 * Image quality, colour depth
 * Image resolution, size
 * Image tiling, progressive revelation of detail
 * Zoomify
 * Image storage repository
 * Morphbank

Agreed that the standard used to store the images, size etc is not really an issue for web presentation. Agreed that the ability to have a thumbnail and an actual image available probably more useful. Murray Henwood indicated that their eBot project is using iSpheres imaging technology, which others might like to investigate.

Dublin Core metadata provides a good set of attributes for consideration for managing image information. The Dublin Core metadata standard has been expressed as a RDF schema.

The biggest issue is discoverability of existing digital assets - determining that an image is available.

Action: Need to encourage an ALA project relating to image storage, delivery and exchange. (Kevin Thiele)

Front end for author and editor input/modification of textual content

 * In-house tools
 * Distributed tools
 * On-line database
 * Data maintenance - Updating data (especially keys) (HT)
 * Wiki, Web 2.0 tools
 * Taxonomic workbenches such as:
 * CATE
 * EDIT Scratchpads
 * ITIS workbench

The consensus of the group was that a tool to assist with the authoring of content would be advantageous for collaborative flora development. However, there was no immediate need by the majority of the group to have such a solution. Australian Faunal Directory (AFD) developing an authoring tool that may be of use (may be available for Flora of Australia Online in a couple of years?)

Output to simple print ready documents

 * Text
 * PDF
 * LaTex
 * InDesign
 * Pagemaker
 * Other

Certainly a level of automation can be undertaken to generate content, however it is envisaged that a flora publication would need a designers finesse to produce a polished product.

Output to Web

 * HTML, CSS
 * XML, XSL, XSLT

Consensus was that both formats would need to be supported.

Intellectual property management

 * Data sources, repositories
 * Vouchering, original material
 * Attribution and Acknowledgment
 * Custodianship
 * Copyright
 * Access
 * Access levels
 * Licences

Full attribution and acknowledgment are seen as the solution to this particular problem. Institutions which have not signed a formal CHAH instrument may not adequately understand or be covered for intellectual property issues.

ACTION: Review MoU documents to ensure that new institutions supplying data via AVH are adequately covered. (Brett Summerell)

The consensus of the group was that multiple levels of access to information will be required in a collaborative environment.

Citing Electronic Floras
Bill Barker raised the issue of citations for electronic flora.

Concensus of the group was that some form of snapshot solution (online or hard copy) is required to maintain the integrity of literature. Not currently done for FloraBase or PlantNet.

RDF and OWL
Jim Croft raised the issue of wether RDF was necessary for the implementation of the product. Greg Whitbread suggesed that OWL might be a better solution, however just getting the structure down in Word in the first instance was more than acceptable.

Summary session
Ways forward outlined and agreed.

Actions were reviewed and agreed too.

Issues to raise (through Kevin Thiele) with ALA Management committee:


 * Image Indexing service/Repository
 * Host meeting about keys
 * Help get Tapir providers operational
 * Formalise Ontology for overall Flora species pages
 * eFlora aggregator - show links to and aggregate content from all the Flora pages
 * help establish LSIDs?