Units and their IDs

From Hiscom
Jump to: navigation, search
[Transcluded from Discussion page]

Starting points / recommendations

  1. ABCD/HISPID UnitID is a catalogue number, not some other identifier, see: abcd:UnitID, hispid:accid, dwc:catalogNumber.
  2. Catalogue numbers are identifiers and are opaque and indivisible: there is no meaning in the catalogue number or in any part of it. Catalogue numbers are not big-endian, little-endian or somewhere-in-the-middleian, so you can't query for them – in any but your own database – using wildcards.
  3. Solutions for specific cases, such as mixed collections, multiple specimens or sets of label data on the same sheet, and specimens divided over multiple sheets, should be contained to the situation at hand and should isolate any problems, not smear them out over the entire collection.
  4. Curation practices, in particular in relation to the catalogue number, may affect the usability of the data and should consider use outside the institution as well.
  5. Suffixes might be okay, for strictly internal use, but shouldn't be depended on to make catalogue numbers unique (see 4).
  6. The catalogue number is what links the AVH record to the specimen, so catalogue numbers that are delivered to AVH should be the same as the number on the specimen (2, 4) or people using the specimens or looking up a cited specimen will not be able to find the record in AVH. Local knowledge or methods in place at institutions to relate barcodes to catalogue numbers are not available to external users of our specimens and specimen data.

This page comes forth from Action 36 in the minutes of the HISCOM AGM in Canberra, 12–14 November, 2012. At the meeting it was decided to establish a working group to come up with a strategy for implementing GUIDs (Globally Unique Identifiers). As a start, we need to know what units (collection objects, gatherings, determinations etc.) we should assign GUIDs to. In earlier discussions and again at the AGM, it was felt that we don't have a sufficient overview of what different herbaria regard as "specimens" and what numbering practices are in place. Therefore it was decided to document specimen concepts at different herbaria and how unit IDs are assigned. It should be noted that a survey was initiated by Ben more than a year ago, but so far nobody has filled it in. The survey below repeats the questions in Ben's survey and gives some extra space for comments.

For the purpose of the survey, unit IDs can be identifiers (not primary keys) in a database, or the catalogue (or accession) number that is on the physical specimen. We want to know both, but it is important to know how the database record relates to to the physical specimen, so an extra question has been added to indicate to what units catalogue numbers are assigned.

An original specimen (before it is divided into parts) always receives its own unique ID Not always, a unique barcode is assigned to each taxon as a part of a collection where possible. However, separate sheets have their own unique barcodes. There may be multiple barcodes on the same sheet where there is more than one taxon.
The divided parts of an original specimen always receive their own IDs No, for pickled material for example have the same AD-code but separate sheets have their own numbers, slides have a different number series. Mixed specimens (eg mosses) have a suffix attached to the same number referring to a specimen on a sheet)
Catalogue numbers are assigned to We use barcodes as catalogue numbers and these are as above.
Notes We are trying to evaluate models for applying unique identifiers to the various objects associated with a specimen.

AK

An original specimen (before it is divided into parts) always receives its own unique ID
The divided parts of an original specimen always receive their own IDs
Catalogue numbers are assigned to
Notes


BRI

An original specimen (before it is divided into parts) always receives its own unique ID Yes - called the "AQ Number" - leading "AQ" followed by up to 7 digits. JSTOR Unit IDs, for example, are of form "BRI-AQ1234567".
The divided parts of an original specimen always receive their own IDs No - all parts live under the single AQ Number.
Catalogue numbers are assigned to A collection, including all of its parts. Some internal indexing occurs for spirit bottles & wood samples, but these cannot be relied on for permanency.
Notes BRI assigns an AQ number manually to each original specimen during the data-entry process. Note that AQ numbers in the range 900,000-999,999 are assigned to specimens presumed or known to be held in other Herbaria, and are cited (in BRI lit.) as "No specimen in BRI". This is a way of keeping track of all Qld taxa, including those where we have never received a specimen, enabling the Qld Census to be auto-generated.

CANB

An original specimen (before it is divided into parts) always receives its own unique ID Currently yes but previously often not: since about July 1999 each specimen/gathering has received a unique 6-digit sequential CANB accession number. Prior to that, when the CANB (Australian National Herbarium) and CBG (Australian National Botanic Gardens) collections were processed separately, each ‘item’ or part of a gathering (e.g. 2 sheets of one gathering) received a different (5 or 6 digit) CANB accession number, OR the same (5 or 7 digit) CBG accession number.
The divided parts of an original specimen always receive their own IDs Yes. This is achieved by adding a ‘point’ number to the main accession number. So, for example, a sheet of A.B.Smith 45 (one gathering/specimen) would be CANB xxxxxx.1 and a spirit specimen, fruit separate or 2nd sheet etc. would be CANB xxxxxx.2 . For older accessions (see above), the two divided parts usually would be CANB xxxxxx.1 and CANB yyyyyy.2, or CBG zzzzzzz.1 and CBG zzzzzzz.2, or CANB xxxxxx.1 and CBG zzzzzzz.2 (if CANB and CBG replicates existed and have now been combined on the database and in the physical collection). If there are more than 2 divided parts, we use .3, .4 etc. Occasionally, when the main/base accession number differs for the various parts, there may be more than one ‘.1’ number, e.g. CANB xxxxxx.1 and CANB yyyyyy.1, or CANB xxxxxx.1 and CBG zzzzzzz.1, but the overall ID for the part is still unique to it.
Catalogue numbers are assigned to Current practice (for new accessions): a specimen comprising a herbarium sheet and a spirit jar will have the same 6 digit CANB sequential accession number assigned to each of those parts/objects/items, and the record(s) can be retrieved from our data base using this number, even though each part has a different point number. Previous practice (for accessions before May 1999 and still applying to these specimens): specimens/gatherings with more than one part/item/object have different 5- or 6-digit CANB accession numbers for each part, or the same 5- or 7-digit CBG accession number.
Notes Given the complications associated with the different prior accession-numbering systems of the CANB and CBG collections, it is best to consider that each part/item/object of a specimen/gathering now in CANB is uniquely identified by: (code + base accession number (of 5-7 digits) + a point number), and it is on this basis that records are supplied to the AVH.

CHR

An original specimen (before it is divided into parts) always receives its own unique ID Want to answer No (it does not get a unique number that represents the whole), but maybe/sometimes... If an assignment is made it is the catalogue number but it will be altered to accommodate the number of parts(as described below). If the material is not accessioned there is a protocol that determines whether these 'unpublished/unconfirmed' catalogue numbers can be re-assigned.
The divided parts of an original specimen always receive their own IDs Yes - each part is assigned a GUID as well as the catalogue number.
Catalogue numbers are assigned to any specimen (whether sheet, packets, ethanol(when needed only), slide etc as well as images when specimen consists only of the image)
Notes Our catalogue numbers are of three parts: prefix (always CHR); sequential number; optional single letter suffix. Specimens of a single part are not given a suffix, however specimens consisting of several parts are normally given the same sequential number and a single letter suffix to form the unique catalogue number (e.g,. CHR 1234 A; CHR 1234 B). There are few exceptions to this where a single specimen with multiple parts has been given different sequential numbers.

Many of our "ancillary" collections have their own numbering system. We do not consider that these numbers necessarily will be permanent depending on the material, and they are definitely not citable. A large amount of ancillary material is only assigned permanent catalogue ("CHR") numbers when they are required for loan or to be cited.

CNS

An original specimen (before it is divided into parts) always receives its own unique ID Currently yes but previously not: since about July 2008 each specimen/gathering has received a unique 6-digit sequential CNS accession number. Prior to that, when the QRS (Australian National Herbarium - Atherton) collections were processed each ‘item’ or part of a gathering (e.g. 2 sheets of one gathering) received a different (up to 6 digit) QRS accession number. The MBA herbarium data is currently being processed for import to the database (10,000 records), these have a unique up to 5-digit MBA accession number.
The divided parts of an original specimen always receive their own IDs Yes. This is achieved by adding a ‘point’ or 'Item' number to the main accession number. So, for example, a sheet of A.B.Smith 45 (one gathering/specimen) would be CNS 123456.1 and a spirit specimen, fruit separate or 2nd sheet etc. would be CNS 123456.2 . For older accessions (see above), the two divided parts usually would be characterised by the example QRS 123456.1 and QRS 123457.2, or QRS xxxxxx.1 and MBA zzzzzzz.2 (if QRS and MBA replicates exist and have now been combined in the database and in the physical collection). If there are more than 2 divided parts, we use .3, .4 etc.
Catalogue numbers are assigned to Current practice (for new accessions): a specimen comprising a herbarium sheet and a spirit jar will have the same 6 digit CNS sequential accession number assigned to each of those parts/objects/items, and the record(s) can be retrieved from our data base using this number, even though each part has a different point number. Previous practice (for accessions before July 2008 and still applying to these specimens): specimens/gatherings with more than one part/item/object have different up to 6-digit QRS accession number for each part.
Notes Given the complications associated with the different prior accession-numbering systems of the CNS and QRS collections, it is best to consider that each part/item/object of a specimen/gathering now in CNS is uniquely identified by: (code + base accession number (of up to 6 digits) + a point number), and it is on this basis that records are supplied to the AVH. Also, our database has the capacity to deliver greater than 6-digit 'base' accession numbers if required.

DNA

An original specimen (before it is divided into parts) always receives its own unique ID Currently yes, but some 4000 records are duplicated in the database.
The divided parts of an original specimen always receive their own IDs No. Currently the presence of different parts is indicated only by flag fields.
Catalogue numbers are assigned to Whole gatherings, not the parts
Notes In the field the unique identifier for a gathering is the collectors name and number. Once this is data based all parts of a gathering then become the one accession number (e.g. DNA 00123456) which applies to one or more mounted sheets and all other parts such as spirit material or carpological material. The presence of spirit material, carpological material, photographs, etc for that gathering is indicated by flag fields for that accession. Where a gathering is later found to be two taxa these may be split, with one part remounted and added as a new accession or where this is trivial the sheet is simply annotated.

However, for a few years in the early 70's individual gatherings were data based separately at Alice Springs and Darwin (i.e. under separate 'D' and 'A' numbers). There are around 4,000 of these duplicated accessions, based on records with matching collector, collectors number, date & taxon name. Where they are found at DNA we have been de-accessioning one sheet(the A number). Occasionally there are multiple records in the data base of what is clearly only one accession (e.g someone has mistakenly hit save multiple times when data basing an accession). We also have a policy of de-accessioning these duplicated records.

HO

An original specimen (before it is divided into parts) always receives its own unique ID Yes. If a large specimen is spread across several herbarium sheets, the individual sheets share the same HO number (sheet 1 of 2, sheet 2 of 2, etc).
The divided parts of an original specimen always receive their own IDs No, with the exception of specimens with dried and spirit collections (see below)
Catalogue numbers are assigned to Current practice: a specimen comprising a herbarium sheet and a spirit jar will have a different HO number assigned to each of those elements.
Notes Our collection reflects the changing policies of different curators over time. We haven't attempted to bring legacy specimens in line with most recent practice.

MEL

An original specimen (before it is divided into parts) always receives its own unique ID No
The divided parts of an original specimen always receive their own IDs Yes
Catalogue numbers are assigned to
  • each herbarium sheet or packet
  • spirit jars
  • carpological material (each item)
Notes
  1. We distinguish between parts of a collection or preparations that are made as part of the curation process (the ones listed above), which all get their own catalogue numbers, and preparations that are made by scientists that study the specimen later. For example, microscope slides will only get a catalogue number if they are the primary preparation (i.e. there is no sheet, packet or spirit collection).
  2. There will be at most one spirit preparation record (and hence only one catalogue number) assigned for each original specimen. This is because contents of spirit jars is sometimes merged or split after catalogue numbers have been assigned.
  3. Our current database allows for containers and hence individual IDs for original specimens. However, our previous database did not. We had (and still have) a "multisheet" string, but this can not be used to retrospectively assign containers and unique IDs, as it has been used to list (i) all parts of an original specimen (as it arrived at MEL) as well as (ii) all specimens that are putatively the result of the same collecting event (based on information already in the database). As we cannot say with certainty which of the two is the case for any particular record, basing unique IDs on our "multisheet" strings would be presenting assertion as fact. We could probably safely decide that from a certain time (in the past) all multisheet strings represent original specimens rather than putative collecting events. We put this in the too hard basket when we migrated our collections database from Texpress to Specify, so this is still an area in which we've got some work to do.
  4. MEL uses a single "MEL number" for mixed collections and sheets with multiple specimens or sets of label data (the number of specimens on the sheet is not always identical to the number of sets of label data). A suffix – A–Z – is used for the different components in order to make the catalogue number unique.

NE

An original specimen (before it is divided into parts) always receives its own unique ID
The divided parts of an original specimen always receive their own IDs
Catalogue numbers are assigned to
Notes

NSW

An original specimen (before it is divided into parts) always receives its own unique ID This is dependent on whether the specimen has been mounted or not. Most of our older collections are not mounted and as such all the material in the accession has one accession number, except if there is separate spirit or carpological which would have a separate unique accession number. For example, our Eucalypt collection is unmounted. Most specimens have enough material to mount on a number of sheets. If we were to mount a specimen like this each sheet would be given a unique NSW accession number and is recorded as Sheet 1 of 3, 2 of 3 etc etc on the database and on the sheet. When mounting newly arriving material, if there is a need for multiple sheets, we give each sheet of an accession a unique NSW accession number along with any associated spirit and carpological material. We use the sheet 1 of 3 etc to link the sheets and the EMU database has the ability to link other associated collections as well (eg spirit and carpological ifthey exist).
The divided parts of an original specimen always receive their own IDs Yes
Catalogue numbers are assigned to Catalogue numbers are assigned to all parts of the same collection and the numbers are unique.
Notes

PERTH

An original specimen (before it is divided into parts) always receives its own unique ID Not always, a unique barcode is assigned to each taxon as a part of a collection where possible, however separate sheets usually get their own barcodes (and when this happens, the original un-divided specimen does not receive an id. In no case will there be multiple barcodes on the same sheet where there is more than one taxon - such a sheet would always be split.
The divided parts of an original specimen always receive their own IDs No, but the exact situation where this doesn't occur eludes me.
Catalogue numbers are assigned to The original specimen collected from the field, or the parts of the specimen, depending on specimen size or storage method. Carpological specimens receive their own ID rather than the ID of any accompanying sheets
Notes

WELT

An original specimen (before it is divided into parts) always receives its own unique ID
The divided parts of an original specimen always receive their own IDs
Catalogue numbers are assigned to
Notes