AVH web pages QA - issues and suggested action
AVH2 - Examples of issues to be considered before release
note: STILL BEING TIDIED UP - BILL 31/3/08
Issues from AD
Key to comment flags:
** = NOT IN AGREEMENT WITH SPECS, PROBABLY DOWN TO HERBARIA TO DO ## =-NOT IN AGREEMENT WITH SPECS, PROBABLY SAPAC
Public interface try out 12/3/08
Download as PDF, PNG, open from download window works, but noticed ...
- ISSUE ## No embedded “From Australia’s Virtual Herbarium © Council of Heads of Australasian Herbaria” or . “©Australia’s Virtual Herbarium”
Download of data
I did an email download of Hakea leucoptera (should be all herbaria), 1014 records downloaded. Comments on HTML output, added to following discussion with Rex Croft:
- NOW THAT WE ARE LOOKING AT DOWNLOAD OF DATA OFF THE AVH … we can potentially add filters to deal with data before it goes into the AVH:
- Spell out state (NT => Northern Territory)
- Decapitalise State and Country.
- Adjust ways we deal with nulls
- We can also adjust outputs:
- Sort order (can’t work out what basis the herbaria, grouped, are output – will they continue to be grouped, or as updated will they come out shuffled)
- Accuracy of geocodes
- Null values
- In the Public access area, we should have data that can be mapped: so the following fields need to be in consistent format: plant name, geocode lat/long, geocode precision, collection date, as well as date last edit. All should have blank nulls. WE HAVE THIS HERE except in plant name (see below).
- State, Country, and geocode source should also be standardised (see below).
- The other fields should be our normal textual data (s.n. in collector number,  where interpretation of an inadequate label, etc.)
- People wanting access to alternative data would approach or be part of a CHAH member herbarium.
- Different approaches to blank entries in the “non-break space” across the HTML table – Additional collectors has a blank. We need this to match other fields and keep the grid lines across the whole table width without a break.
- MISSING HERBARIUM IS BRI
- We are outputting the taxonomic name field. As a result there are many inconsistencies, e.g. in rank (“ssp”, not “ssp.” or preferably “subsp.”). Haven’t checked how the qualifier is output?
- MEL: “Hakea leucoptera ssp sericipes “, “Hakea leucoptera ssp indet.” and “Hakea leucoptera”
- CANB/CBG have: “Hakea leucoptera” and “Hakea leucoptera subsp. leucoptera”
- DNA “Hakea leucoptera” only, which is fine – they don’t have to id down to subspecies if they don’t believe in my taxonomy (smile)
- PERTH: “Hakea leucoptera subsp. sericipes” and “Hakea leucoptera W.R.Barker subsp. sericipes W.R.Barker”, “Hakea leucoptera R.Br.”
- NSW “Hakea leucoptera”.
- HO “Hakea leucoptera”
- AD (purists might argue for ssp. being replaced with subsp. – I don’t care as long as we are consistent) “Hakea leucoptera R.Br. ssp. leucoptera”, “Hakea leucoptera R.Br.”, “Hakea leucoptera R.Br. ssp. sericipes W.R.Barker”
- ISSUE: WE SHOULD PRODUCE not the name string field, but have AVH2 produce separate component fields including qualifier and rank qualified, and maybe output a concatenation of these, made more complex by a script inserting the qualifier in the correct place. (See GIS use above).
- ANSWER: AVH2 now ouputs the genus, species, infrasp.rank and infrasp.name fields in addition to the full scientific name.::::Rex 15:06, 18 September 2008 (EST)
- Institutional Acronym
- OK for all but for…
- AD -- needs to add AD-A and AD-C here rather than as part of record id.
- Corrected. Rex 15:06, 18 September 2008 (EST)
- Record ID:
- Most MEL record ids have an A suffix – not seen this before – is A for Angiosperm – but there is a “111863B”?. Is this OK?
- The 'A' and 'B' represented the part number for mixed collections (i.e. We assign one number to a sheet, and then suffix the number for each part. So, in short, yes, the B is okay... sometimes they go up to J (for mosses mostly).--Alisonv 16:23, 14 May 2008 (EST)
- DNA has its A prefix: how do we cite this in specimens examined – should it be stripped?
- (a HISCOM policy issue maybe if DNA want to keep; BRI would supply its BRI numbers here presumably not its AQ data record ids --Pbostock 15:21, 14 June 2008
- (EST) BRI does not use any numbers other than the Acquisition Number which must be prefixed by AQ to distinguish from non-databased and long-defunct BRI Sheet numbers)
- AD has AD as a prefix – it should be stripped.[Rex did this because of AD-A and AD-C series. #**These need to be put under Instnl Acronym field.
- SOLUTION AD, AD-A and AD-C are now stored in the source_institution_id field, with the recordno stored in the unit_id field. :::Rex 15:06, 18 September 2008 (EST)
- NSW, HO, PERTH seem OK.
- Lats and longs:
- Spec is degrees to one decimal place only, blank if not available.
- ALL HERBARIA OK. Perfect for GIS, BUT…
- ISSUE: This should be to nearest 10 minutes. This is --Pbostock 15:21, 14 June 2008 (EST) 1/6th = 0.166667 degree (NOT 1/600 degree). I suggest if 1/5th degree = 0.2 degree (nearest 12 minutes), we can make the dumbing down obvious … 32.0, 32.2, 32.4, 32.6 degree etc. This has great benefits for users who don’t read caveats. Otherwise we will have our dumbed down data given false precision – not good messages if publications using our data receive subsequent criticism and we could have been approached for a more precise dataset.
- Generalised locality.
- MEL has “Nearest locality not available”.
- HO has “not available”.
- AD only one delivering a meaningful location (Spear Creek), but has a blank where none available.
- CANB/CBG, DNA delivering a blank
- PERTH have a couple of locations – they were introducing a way of generating from their gazetteer as I recollect.
- ISSUE: Referring to b.iii. above, Rex proposes we should send a blank to AVH and it should add “not available”, as in this output, as required. But note: CHAH has agreed to provide a generalised location.
- State/Territory and Country
- CANB/CBG. No state/territory. Only info is “AUSTRALIA”
DNA, HO, AD. Comply with spec (Upper lower case full strings)
- PERTH. “NT” not spelt out in former, “Australia” fine in latter
- NSW has “NT” and “AUSTRALIA” – upper case issue
- Collection date
- We have a date field here: format “24-Nov-1860”.
- ALL HERBARIA OK. Enables date sorting/mapping in GIS. We need the textual field too for specimens examined: check if in restricted area.
- ISSUE: HISCOM/CHAH policy: systematics researchers apply to herbaria for full data access for revisional studies use.
- GeoPrecision rating
- “10000.0” is wrong. It should be an integer “10,000”.
- Geocode source
- This should have null value of “unknown” or whatever is in HISPID5 dictionary.
- Collector number.
- This again seems on occasions to have textual information stripped out. Is this necessary? I don’t think so, and therefore it is not desirable. Not sure where stripping has taken place, but have marked as herbarium issue as MEL is fine.
- AD. Numbers, but no s.n., (these were stripped these out – we’ll put them back!)
- AD conversion program had a bug whereby the collector series prefix was not added (normally text). This has been corrected. :::Rex 15:06, 18 September 2008 (EST)
- Date last edited OK.
- We have a date field here: format “24-Nov-1860”.
- ALL HERBARIA OK.
Other issues out of spec.
- Rex indicates you cannot search on a “blank”. Texpress allows for inserting “!*” [= not anything] queries, for example.
- AVH2 has been modified so that the actual word NULL can be entered in the infraspecific name field to force the SQL match "IS NULL". This technique could be extended to other fields if required.
- Rex 14:54, 18 September 2008 (EST)
Query screen layout
- a. File download options.
- These in the Public and Simple and Extended in Restricted area query pages have the following deficiencies:
- Hitting a data format or entering an email address or other action in the Data Download area should activate the Data Download button
- Even better is Rex’s suggestion: the buttons and headers for Browser, Data Download, Map *:Download options should only be on display. Only on hitting the Data Download area should it expand out to display the options currently set out beneath it.
Updating records in the AVH
- ISSUE: where an Herbarium has improved its conversion filters for delivering data but not changed the record how do we get the update to the AVH. For example, we suspect that CANB already has State field data, but Greg hasn’t got round to building the conversion. The AVH is updated on a change to the Last Edit Date field.
- ISSUE: How do we deal with such a universal update of records. Ideally we might update just a field, but Rex indicates it would need a full replacement of records. This potentially clogs up servers hosting the AVH. So his suggestion is to have a “bit at a time” upload. Are there going to be probs with this, gaps in data downloads at time, for example?
- SOLUTION 1: If you have a local copy of the 'avh' database, create SQL to populate or update the field, and distribute the SQL to all copies of AVH2. I have done this for MEL's State field, and AD's typestat and family field. And the numeric fields for both MEL and AD. Rex 14:50, 18 September 2008 (EST)
- 1. This seems to be Site Stats rather than “My” stats.
- 2. It is currently limited to analysis of 500 hits. This seems very low, but it is what we can get for nothing. Is that sufficient?
- 3. Not sure if it’s working right as there looks like a lot of repetition of similar front ends, but then maybe that’s because no-one’s using it.
- 4. From the URLs it seems to repeat pages or at least the pages seem to be identical – however, it is more a case of the names of the pages are poorly devised, e.g. one of them has avhadmin when it is a main query page in the restricted area.
- 5. Someone conversant with this facility on HISCOM needs to have a good look at it and comment. I think it’s probably a very nice free package but could we get more out of it or are there alternatives.
- Only in Public area. If you’ve got the password then we assume you agree to do the right thing. But what is the right thing, every user will ask, won’t they?!
- Bill 13/3/08, minor updates 18/3