This page lists metadata fields extracted from the spreadsheet version 3.0.4.
| Name | Data type: | Value: | Description of metadata field: | Populated by whom and when: | Updated where and when: | Generated by: | Maps to in ARS: | Maps to in Specify: | Notes and issues: |
|---|---|---|---|---|---|---|---|---|---|
| "asset_created_by" | String | Username (both service users and human users) or null. | The name of the user who did the initial sync with ERDA. A file is considered created when it has been successfully synced with ERDA. Alone uploading a file to an asset file share is is not an action that gets persisted or has any lasting effect on an asset. | ARS populates this field when media files are synced for the first time with ERDA. | Never | ARS | "asset_created_by" and the event user in CREATE_ASSET event. | nan | Find a way to have this persist in ARS |
| "asset_deleted_by" | String | Username of an admin user or null. | The name of the user who deleted all files belonging to the asset. Deleting an asset requires ADMIN level permissions. Deleting an assets files does not delete the metadata entry of the asset. | Populated by ARS when the last file(s) belongin to an asset has been deleted from ERDA (synced with an empty share). | Never | ARS | "asset_deleted_by" and the event user in DELETE_ASSET event. | nan | For this field to be useful there needs to be an event in ARS protocol that takes this field and uses it as the user in a delete file event. |
| "asset_guid" | String | Example: "7e8-6-0a-08-32-39-0-002-03-000-048fe2-00000" or "7e8-6-0a-08-32-39-0-002-03-000-048fe2-00000_72". | This is the unique id generated for each asset and is generated before incorporation into the storage system. Parts of the string are defined based on things such as the workstation, institution and date, the other parts are randomly generated. This is to enable a unique name for each asset.Derivatives can be recognized by ending with _72 or _400. | Ingestion server after receiving initial data via ingestion client. HPC scripts when creating a derivative. | Never | Ingestion server or pipeline (hpc scripts). ARS (future if assets come from specify) | asset_guid | nan | How would this be populated when coming from Specify. How would this be populated if an asset was created manually. |
| "asset_locked" | Boolean | True, False - default is False. | Boolean that lets us control if an assets files can be changed(synced with ERDA). A locked asset can not have its files changed. A locked asset can be created/synced with specify. An unlocked asset can have its files updated in ARS/ERDA. Used by ARS to determine if the asset is ready for syncing with Specify. | Set to false when asset is created by ARS. | Updated in ARS. When has not been decided. | ARS | asset_locked | nan | Not part of the metadata before creation in ARS.Do this require special (ADMIN) permission to unlock.Need to work through all of the scenrarios of how syncing to and from Specify happens and agree |
| "asset_pid" | String | Undecided. Currently we insert a dummy value "INSERT_FOR_TESTING_PURPOSES" (21/02/25). | We have not decided how this is constructed yet. One possible PID is to construct a URL like pid.dassco.dk/GUID1234555677243. This is then the unique and resolvable identifier that we will use when sharing. It is mandatory for our funding that we have persistent identifiers for each asset (ideally resolvable as well). So we imagined an easy way to do this would be to incorporate the guid into a persistent identifier that can be clicked on to resolve (see asset_pid).This is a required field for creating an asset in ARS. | TBD | Never | Undecided | asset_pid | Undecided, None | This is a requirement for creating an asset in ARS. We dont have a way of creating this. We are inserting dummy values for now. We need to decide on a system soon. It is a condition of our grant.Postponed decisions for now (31-1-25). |
| "asset_subject" | String | Examples: "folder", "device target", "specimen", "label". | This defines what the asset is a representation of (e.g., a specimen, a label etc). This helps us diffferentiate between the different asset subjects which are treated and reported on differently. It is a way of categorising the types of assets.For non ct scan pipelines this will be populated during the barcode scanning process (ct-scans are presumably always specimens). | Running pipeline processes and by the integration server. | Never | HPC pipeline process. | asset_subject | Title [in part] and Type/description of attachment (remarks) in Specify attachment record | Which field will this map to in specify and vice versa. |
| "asset_updated_by" | String | Username (both service users and human users) or null. | User that last updated the asset files. This is the user from keycloak which could include a service user, e.g. processing script.Update include the deletion of files as long not all files are deleted. Update includes adding files to an asset as long as its not the initial addition of files. | ARS populates this field when media files are changed, and the change has been persisted in ERDA. | When asset files are changed/have changes made to them and the change has been persisted in ERDA. | ARS | "asset_updated_by" and partly the event user for UPDATE_ASSET event. | nan | The field does not exist in ARS. It should be used as the user in the event protocol if we implement updating the media as part of that. BS: Maybe this is not MVP, but lets just geet an estimate |
| "audited" | Boolean | True or False. | This is to mark the record as to having been manually audited. Auditing can occur after complete processing and syncing with Specify or in some cases where there is an issue.When an asset marked as audited have any changes made to either its metadata or its files it will no longer be considered audited, and the status will revert back to "false". Default value is false. | ARS when asset is created. | In the ARS ui by a user when an audit happens. By ARS when changes are made to an asset that has its audit status set to "true". | Yes | audited | nan | nan |
| "audited_by" | String | Username of a user or null. | This is the username of the last person who audited the asset. Auditing will be done independently of the digitisation and usually by the technical team leader or a senior digitiser. Auditing an asset cannot be done by the same user that is the main digitiser. | ARS when an asset is has its audit status set to "true". | In ARS when a user audits an asset that had previously been audited. | ARS | audited_by | nan | nan |
| "barcode" | List of strings | Digit code that can be prefaced by the institution. The institution string is NOT part of the encoded digit code.Fx NHMD00929517 | This refers to a physical barcode on the specimen. It is the specimen number and links to the physical object and its Specify record. Combined with institution and collection. It is part of an ID unique to that specimen and consisting of: An institution acronym, a collection acronym, and the barcode. We can have one specimen barcode related to multiple assets and in some cases one asset with multiple specimen barcodes (an example of the later is a multispecimen herbarium sheet)." | Integration server when running pipeline or ARS when sync with Specify. | If an error is discovered during auditing or processing. This will likely be handled manually by a user in in the dassco ui. | Pipeline process(barcode reader), Specify. | Each entry in the list maps to a "barcode" field in the specimens protocol | Catalog Number | None. |
| "camera_setting_control" | String | cam1.4, cam1.4_error or null | This indicates if the cameras exif data falls into certain predetermined categories. These categories need to be determined. | Processing module on hpc server. Reads from exif data and checks against a list of values. | Never | Integration server. | camera_setting_control | nan | Missing implementation and list of camera settings to check against. |
| "collection" | String | Example: "Vascular plants", "Entomology" | This is the collection name (a collection of related specimens) within the institution that holds the specimen and should align with Specify collections for synchronisation (also part of specimen registration number for NHMA). It groups specimens, it maps to Specify records and in combination with institution and barcode produces a unique physical specimen number.This is a required field for creating an asset in ARS. | Input comes from the digitiser when using the ingestion client, and the field gets populated when the asset has its initial metadata created by the ingestion server. | If an error is discovered during auditing or processing. This will likely be handled manually by a user in in the dassco ui. | Ingestion server. ARS (future if assets come from specify). | collection | Maps to collection in Specify, but not in attachment record. Check with specify experts. | Postponed decision on what to do if an asset can change collection (31-1-25). |
| "complete_digitiser_list" | List of strings | Example: ["Mary", "Jane"] | In cases where multiple digitiser has worked as a team to digitise then they will all be included in this list. We wont know who did exactly which part of the digitisation process. All will be given credit for the process. | Ingestion server from input when digitisers starts a session. | Never | Ingestion server. | complete_digitiser_list | copyright holders? | Requires updating how ARS handles statistics. Needs implementation in ARS and the pipeline.Mapping with specify needs to be defined. |
| "date_asset_created_ars" | String | Timestamp of the type: ISO 8601:YYYY-MM-DDThh:mm:ssZ or null. | The timestamp of the initial sync with ERDA. A file is considered created when it has been successfully synced with ERDA. Alone uploading a file to an asset file share is is not an action that gets persisted or has any lasting effect on an asset. | Populated by ARS when the asset first had media files added that was added to ARS synced with ERDA. | Never. | ARS | "date_asset_created_ars" and timestamp in event protocol for CREATE_ASSET. | nan | None. |
| "date_asset_deleted_ars" | String | Timestamp of the type: ISO 8601:YYYY-MM-DDThh:mm:ssZ or null. | The timestamp when all files belonging to the asset has been deleted by syncing an empty fileshare with ERDA. Deleting an asset requires ADMIN level permissions. Deleting an assets files does not delete the metadata entry of the asset. | Populated by ARS when the last file(s) belongin to an asset has been deleted from ERDA (synced with an empty share). | Never. | ARS | "date_asset_deleted_ars" and timestamp in event protocol for DELETE_ASSET. | Date Media Deleted | None. |
| "date_asset_finalised" | String | Timestamp of the type: ISO 8601:YYYY-MM-DDThh:mm:ssZ or null. | The date when an asset was finalised (the asset_locked metadata field should change to true at same time). Finalised means that all processing has been completed and that the asset cannot be changed at all, but the metadata still can. The idea would be that a cited asset can be completely frozen.Not currently being used. Default value should be null. | Ingestion server defaults to null. | User request this after citing most likely. By a user in ARS. | ARS | "date_asset_finalised" | nan | Needs to correspond with some event in the ARS event protocol. Also needs some protocol on our side to determine if a copy is needed for. Considering deleting this field.Postponed decisions for what will happen with this field (07/02/25). |
| "date_asset_taken" | String | Timestamp of the type: ISO 8601:YYYY-MM-DDThh:mm:ssZ or null. | The date and time of when the original media were created (e.g., when the original raw image was taken). This should also be available from the exif data of each image. | Ingestion server when metadata is first created. | Never. | Ingestion server. | date_asset_taken | Date media created (fileCreatedDate) in attachment record | None. |
| "date_asset_updated_ars" | String | Timestamp of the type: ISO 8601:YYYY-MM-DDThh:mm:ssZ or null. | Timestamp for when an assets files have been changed. This is defined as when a change has been successfully synced with ERDA. Changes can be changes, deletions or additions of files. | ARS when a sync with ERDA that changes the files belonging to an asset happens. | ARS when a sync with ERDA that changes the files belonging to an asset happens. | ARS | nan | nan | nan |
| "date_audited" | String | Timestamp of the type: ISO 8601:YYYY-MM-DDThh:mm:ssZ or null. | This is the last date the asset was audited. Auditing happens following partial/complete processing of the asset following digitisation. | ARS when an asset is has its audit status set to "true". | In ARS when a user audits an asset that had previously been audited. | ARS | date_audited | nan | nan |
| "date_metadata_created_ars" | String | Timestamp of the type: ISO 8601:YYYY-MM-DDThh:mm:ssZ. | Timestamp the metadata relating to an asset was created in the ARS | ARS during creation of metadata record. | Never | ARS | nan | nan | nan |
| "date_metadata_ingested" | String | Timestamp of the type: ISO 8601:YYYY-MM-DDThh:mm:ssZ or null. | This field records the date the ingestion client or server generated the initial metadata of the asset.This can help us trouble shoot any issues which occurred during ingestion, and could be useful for data validation during auditing. | Ingestion server when metadata is first created. | Never | Ingestion server. | "date_metadata_ingested" | nan | nan |
| "date_metadata_updated_ars" | String | Timestamp of the type: ISO 8601:YYYY-MM-DDThh:mm:ssZ or null. | Timestamp for when an asset's metadata was last updated. | Populated by ARS when the first change to an assets metadata is registered. | After each update to an assets metadata in ARS. | ARS | nan | nan | nan |
| "date_pushed_to_specify" | String | Timestamp of the type: ISO 8601:YYYY-MM-DDThh:mm:ssZ or null. | This is the date the asset and/or its metadata was initially synced with Specify. | ARS when initial sync with specify happens. | Never. | ARS. | nan | nan | nan |
| "digitiser" | String | Name of the digitiser, example: "Bo", "Marianne" | This is the name of the person who created the original asset (e.g., the name of the person who imaged the original specimen). For mass digitisation, this is filled in via the Ingestion Client. In case of multiple digitisers cooperating to create an asset they must agree to have a main digitiser appointed that goes into this field. The others will be added to the "complete_digitiser_list" field. All receive equal credit for the process. | Input comes from the digitiser when using the ingestion client, and the field gets populated when the asset has its initial metadata created by the ingestion server. | If an error is discovered during auditing or processing. This will likely be handled manually by a user in in the dassco ui. | Ingestion server. | "digitiser" | Most likely the "credit" field. TBD. | None. |
| "external_publishers" | List of objects | For now an object in the list looks like this: {"name": str} | A of objects containing informaion about names of the publishers (we can add URLs to external publisher sites or any further information we need want later), to which we are publishing this asset and a user can download the asset from the publisher (a reverse link). This should be searchable to see what is being published where. Knowing where we are publishing to (and how many/which records) is important for reporting and managing those relationships. | Either by entering it directly (e.g., in UI), via a bespoke script/pipeline or populated via a pipeline | Mostly by direct user input in UI (e.g., when publish to Morphosource) | TBD. | external_publisher | nan | Really need to think about how this could work in reality and if there is a smarter way to do this.Postponed decision on how to implement usage (31-1-25). |
| "Field name" | String, boolean | Null or some example. | This is a field template.Please remember that there are two white spaces after each header. Also all these headers must be present, if you dont have a any input for one just write TBD or None in it. | Note | Note | nan | nan | nan | Field template. |
| "file_format" | String | The file extension of the asset file. Example "tif". | The format of the main asset file(s). This means that it is possible for an asset to have a minor text file attached to it that will not be part of this. | Ingestion server will set this when the asset metadata is created. | If an error is discovered during auditing or processing. This will likely be handled manually by a user in in the dassco ui. | Ingestion server. | file_formats | File format [mimeType] | Name mapping issue "file_format" to "file_formats". We are using a single value in lower case. ARS is using a list of values all in upper case that must come from a enum list. Specify is using a single value I think. What happens if assets can have thumbnails (jpegs) added to them?PB: If an asset has thumbnails added to them, these should not be included in the file_format.We accept the current discrepancy with the expectation that an asset cannot be composed of multiple file formats (21-1-25). |
| "funding" | List of strings | Example: "Dassco Tranche 1" | A list of entities that helped fund the processing of the asset in some way. This is used for reporting on and citing. | Ingestion server when metadata is first created. | This is editable in case of retrospectively fixes or additional sponsors if an asset needs to go through further processing. Updates can happen manually through the dassco ui or via the integration server running pipeline processes. | Ingestion server. | funding | nan | None. |
| "has_thumbnail" | Boolean | True or False. | Flag that denotes if the asset has a thumbnail. | When a new asset is created(ARS, ingestion server or processing pipelines) this should be filled out. | When a thumbnail for the asset is created or deleted the update should be registered first where the change occurs and then in ARS. | ARS, ingestion server or processing pipelines. | nan | nan | nan |
| "institution" | String | NHMD, AU | The name of the Institution which owns and digitised the specime. This tells us ownership, allows us to map to Specify and facilitates management and reporting.Required field for creating an asset in ARS. | Input comes from the digitiser when using the ingestion client, and the field gets populated when the asset has its initial metadata created by the ingestion server. | Never. | Ingestion server. | institution | Maps to Specify Installation. Defines the values in Copyright Holder, License and Credit | An asset cannot change institution. |
| "issues" | List of objects | An object looks like: {category:string, name:string, timestamp:datetime, status: string, description: string, notes: string, solved: boolean} Expandable list of enums for the category field in the object. | List of known issues that have been attributed to an asset. This is not a list of errors but a list of issues that can have various effects on the asset (ex. preventing publishing, further processing). This will give context to these conditions. The fields in the issue object: - category : this is the overall category for the issue and belongs to a defined list of values - name : the name of the issue, this can be anything but should aim to be short and descriptive - timestamp : the timestamp for when the issue was noted - status : the assets status when the issue was discovered - description : the description of the issue - notes : any further information about the issue that isnt covered by description - solved : flag noting if the issue has been taken care of, it is not a problem to have unsolved issues | Processing pipeline when issues are discovered. Auditors when issues are discovered. | When an issue is resolved or a new one is discovered. | Any type of user that discovers an issue. | issues | nan | nan |
| "legality" | Object | Three fields inside the object; "copyright", "credit" and "license", each with null or a string as value. | This contains information about the legal status of an asset. Who does the copyrights belong to. Who should be credited. What type of license is this governed by. | ARS when reverse syncing with Specify. | When new data is discovered in Specify by ARS. | Specify | "legality" | License, copyright and credit. | nan |
| "make_public" | Boolean | True, False - default is True. | Not all assets should necessarily be made available to external publishers (e.g., documents) or in some cases where an issue is detected with the asset. Bit ambigous about what is the aim of the field | Populated by ingestion server or when an assets metadata is first created. | Integration server when information has been gathered through the pipeline processess that this should be made public. | Integration server when pipeline processes are run. | "make_public" | Make Public [isPublic] | Specify is of source of truth |
| "metadata_created_by" | String | Username (both service users and human users). | The name of the user that creates the asset in ARS. An asset is created with only metadata to start (media files gets added later). A user can be human or a system name (integration server, specify or something else). | Populated by ARS when the asset is created in ARS. | Never | ARS | "metadata_created_by" and the event user in CREATE_ASSET_METADATA event. | nan | Leaves us without a clear field for who created the metadata in the first place. Rename to ARS_metadata_creator would be nice. Could be changed to have both a user and a pipeline as part of its value. |
| "metadata_source" | String | Example: "ingestion_server_v1.0" | This field records where the metadata is intially generated as a json format. Knowing this helps us troubleshoot any issues which occurred during ingestion or other processes. | Populated by ingestion server when an assets metadata is first created. Can also be be populated by other users creating metadata. | Never | Ingestion server or pipeline scripts. | "metadata_source" | nan | nan |
| "metadata_updated_by" | String | Username (both service users and human users) or null. | User that last updated the metadata of an asset. This is the user from keycloak which could include a service user.Updating means adding or changing a value of one the fields in the metadata. If no updates have been made this will be "null". | Populated by ARS the first time an asset has its metadata updated. | In ARS when metadata is updated. | ARS | "metadata_update_by" and the event user in UPDATE_ASSET_METADATA event. | nan | nan |
| "metadata_version" | String | Example: "metadata_template_v1.0.1" | The version of the metadata template used to create this particular metadata. | Populated by ingestion server when an assets metadata is first created. Can also be be populated by other users creating metadata. | If the metadata undergoes changes that includes conforming to a different metadata template this will be updated by the system user. | Ingestion server. | metadata_version | nan | should have a protocol on how to update the metadata to new version. Hint:metadata conversion repo |
| "mime_type" | String | application/json, image/tif | The MIME type of the assets main file also known as media type. This is the standard way of describing media. We map the expected types based on the file type detected. | Populated by the ARS from header when file is uploaded. | In case the asset file changes whoever changed it should update the mime type to fit the new file type. | Integration server, Specify. | nan | nan | What happens if not prepopulated by the assets source (ex. when syncing from specify the mimetype should already be present) ?? |
| "mos_id" | String | Combined unique id based on workstation name, date of imaging and the mos barcode. Example: "WORKHERB0001_20032022_MOS 174" | Id binding multi object specimens together. This is relevant when a specimen is in multiple parts across multiple assets. Each asset has its own barcode but this id lets us identify them as a whole.When digitisers use the digi app to input data and a MOS is found the entry and the other part of the MOS is set to be part of a container with a unique name ("container name" in specify). This id will overwrite the value created by the integration server when a sync with specify happens. | Integration server when receiving the barcode results denoting the asset as a MOS. | When syncing with specify if the asset is part of a container, the container name will be synced to be the mos_id. | Integration server and specify. | "mos_id" | container id(?) | Need to know the exact mapping with specify. |
| "multi_specimen" | Boolean | True or False. | A multispecimen is a single object (such as a sheet) that contains multiple specimens in it. This is often represented by a single image (or other media) associated with multiple specimen numbers (barcodes). Thus one asset is linked to multiple specimens. Default is false and ARS is keeping track of this field, and it cannot be manually changed.Multiple specimens is discovered by running pipeline processes which returns multiple barcodes from one asset. | ARS when an asset is created. | ARS will update this depending on the number of registered specimens in an asset. If the number of specimens in the specimens protocol goes above 1 it will change to "true" or from above 1 down to 1 it will change to "false". | ARS | multi_specimen | nan | Integration server keeps track of this of its own also. |
| "parent_guids" | List of strings | Example: ["7e8-6-0a-08-32-39-0-002-03-000-048fe2-00000"] | This is the name (asset_guid) of the parent assets in the case of derivatives (e.g., a cropped and downsampled jpg derived from a tif).It lets us easily locate the unprocessed parent image, allows us to link related assets and allows us to filter when searching for assets. | Integration server when running pipeline. | Never. | Integration server when pipeline processes are run. | parent_guid | nan | nan |
| "payload_type" | String | Examples: "image", "master image", "ct scan". | Describes what the asset represents (such as an image or a ct scan).Lets us diffferentiate between the different payload types which are treated and reported on differently. It is a way of categorising the types of assets. | Ingestion server when metadata is created or integration server when running pipeline processes. | Never. | Ingestion server or integration server. | payload_type | nan | better naming for atleast original files, and derivatives and resolution |
| "pipeline_name" | String | 8 characters followed by 4 digits. Example: PIPEHERB0001 | The name of the pipeline. A pipeline is the full flow an asset goes through from digitisation to its final state (most likely published).This determines the processing jobs an asset will run through.This is a required field for creating an asset in ARS. Multiple pipelines can be added in ARS through the event protocol. The original pipeline will be filled in the pipeline_name and corresponding pipeline field. | Ingestion server when creating the metadata. | Never | Ingestion server. | pipeline | nan | ARS calls this field "pipeline". |
| "preparation_type" | List of strings | Examples: "sheet", "pinned", "dry", "slide" (and more will be added as the project progresses). | This shows the way the specimen has been prepared (e.g., a pinned insect or mounted on a slide). It is possible for a specimen to contain multiple preparation types. Example: DNA sample from a pinned insect. | Integration server when running pipeline processes. | If an error is discovered during auditing or processing. This will likely be handled manually by a user in in the dassco ui. | Ingestion server when creating the metadata based on the input in the ingestion client. | "preparation_type" in the specimen protocol. | None[Not in attachments record, but in specimen catalogue record - something to think about for the future - low priority] | Question: Does Specify support multiple prep types or asset level prep types, can we map a field for this ? |
| "push_to_specify" | Boolean | True, False - default is False. | Not all assets will be pushed to Specify. Some are not needed in specify and for others there could be issues found during processing. This field will be populated during image processing based on information gained from reading barcodes primarily. Defaults to "false". | Ingestion server when metadata is created. | Integration server at the end of pipeline processes. Other users as needed. | Ingestion server. | push_to_specify | nan | nan |
| "restricted_access" | List of strings | nan | [Currently unsure how this will work]. Need to think about embargoes put in in Specify and whether this should be reflected in here? | Note | Note | ? | restricted_access | nan | We have not decided how this is meant to work. There is a schema in the pipeline write up appendix for values that could go here. ARS has its own enum list that values must conform to. Further we are not sure how we will be using this at all. It does not have any practical importance for keycloak.Decisions on how this works has been postponed(31-1-25). |
| "session_id" | String | Example: "session0001" | Session id from the ALICE setup. This gets added to the image filenames of each image of an asset. It is used so we know which assets should refer to which image of the device target. This field does not exist in ARS since it has not relevance there. We use it solely for matching images with a specific device target shot. | Ingestion server from input that comes from digitisers when using a workstation with the ALICE setup. | Never | Ingestion server. | nan | nan | This is not part of the metadata in ARS. It is only relevant for handling errors that occurs between digitisation and the creation of first set of derivatives and adding of ribbons to the asset images. |
| "specify_attachment_remarks" | String | Example: "Specimen image (AU_5423784})" | This maps directly to the specify field "remarks". It will usually contain the asset_subject and maybe further information deemed relevant for specify. | Integration server when an asset is determined to become a specify attachment. | When remarks in specify are edited they should sync with ARS. | Integration server. | nan | nan | We have not decided exactly how this will look. This can potentially be mapped through specify and to darwin core. We could fex give information about parents for derived attachments here. Or any other type of information we would like end users to have access to (checked with dasscos internal specify team). |
| "specify_attachment_title" | String | Example: "Image." | This maps directly to the specify field "title". It will usually contain the asset_subject and asset_payload with "of" between them. This is the temporary title for an attachment. Once syncing from specify to ARS happens this will change to include the specimen taxonomy name. | Integration server when an asset is determined to become a specify attachment. | When specify sync with ARS if the specimen the attachment belongs to has taxonomy names then the title should be updated by ARS to reflect this and then resync with specify. | Integration server. | nan | nan | We have not decided exactly how this will look. The syncing from specify to ARS will come later so the initial title created by the integration server is currently the permanent temporary title. |
| "specimen_pid" | String | nan | We need a system to uniquely resolve digital specimen data. We don't currently have a system in place for this. We need to develop this. This could potentially be developed at same time as asset pids using a similar system.The resolvable identifier for digital specimens that could be shared and link ARS, Specify and external publishers. Could be used to go directly to relevant specimen information. | Note | Never. | Integrationserver? | "specimen_pid" part of the specimens protocol. | Persistent IdentifierPB comment - are we sure about this? The persistent identifier in Specify is not necessarily what we will use in the future. | Missing decision on how and where to construct this.Decisions on how this works has been postponed(31-1-25). There is asset pid for the whole asset so its different - the way it would currently work is that an asset with 2+ specimens would have both specimens added in the specimens protocol but would share the same specimen pid. ARS wont need to change to accomodate changes here. For assets with multiple specimens would we not want more than one specimen pid? When assigning a specimen pid to a specimen in the specimen protocol the same specimen cannot be added to other assets. Problem for parent/child assets. |
| "status" | String | Examples: "for processing", "being processed", "working copy", "archive", "processing halted", "for deletion", "issue with media", "issue with metadata". | Status describing what the asset is currently doing. This is meant as an overall status and not to specify which part of the status the asset is currently in/doing. Fex "being processed" would not give information about the current part of the process. Values come from an enumerated list of strings. This list can be edited in the ARS with admin privileges through an endpoint.Required field for creating an asset in ARS. | Populated by ingestion server or when an assets metadata is first created. | Integration server when an asset changes its status within the pipeline. | Ingestion server or pipeline scripts. | status | nan | Need a new list of status. Is used for cases of derivatives not being created. Using the status issue with media. Use case for auditing being in progress. Suggested list of status: IN_PIPELINE - while automated processing is happening, ERROR - automated processing failed, DELETED - asset media was deleted, CORRUPTED - something has gone horribly wrong and asset is in an unsalvageable state, AUDITING - auditing in progress, WORKING - someone is working with this asset manually, ARCHIVED - asset has finished automatic processing and can be accessed. |
| "tags" | Dictionary containing key strings each with a value string. | A dictionary of dynamic propertiesFx "ocr": ocr_tex | We are still developing our pipelines and can imagine the need to add additional fields in the future. It would be good to have a field to cover ourselves if we discover the need to additionally annotate our metadata assets until we can add more. | Can be populated by anyone at anytime. | Anytime and anywhere. | Anyone. | tags | nan | Best field ever! Does everything when you need it. |
| "workstation_name" | String | 8 characters followed by 4 digits. Fx "WORKHERB0001" | This is the name of the workstation used to do the imaging. Details of the technical set up (e.g., hardware and software) should be recorded for each workstation name as a record. Updates to these should result in a change in the name of the workstation.Tells us which workstation an asset was originally created on. Can monitor digitisation, trouble shoot and retrieve technical details associated with the workstation and images. | Populated by ingestion server or when an assets metadata is first created. | Never. | Ingestion server. | workstation | nan | ARS calls this field "workstation". Ingestion and integration server calls it "workstation_name". |