Annotation template

Overview

The Curation home page provides a high-level introduction to key aspects of the curation process, the definition of an immune signature, and the kinds of information to be captured .  Also covered there are details of how to choose the tissue and response component for cell-type signatures (see "Example tissue and response component combinations").

This page describes every field in the example annotation template.

As a general rule, curation should be faithful to the manner in which findings are reported in the source manuscript. If need be, resolution of discrepancies (e.g. inconsistent use of controlled vocabulary terms by manuscript authors to specify gene symbols, cell types or protein markers) will be performed by the project team, in a post-curation review.

Results that are qualitative only, e.g. a visual comparison in a heatmap without a statistical test, should in general not be captured (for an example see  figure 4C under "Example annotation").

Capitalization

Values should be entered in lower case except where capitalization carries semantic information, e.g., as in the case of reporting human gene symbols (in contrast to lower-case capitalization for mouse genes):

Column name Example
response_component (for case of gene symbols) STAT1

The annotation template

The table below is a transposed version of the annotation sheet from the example annotation template.  Each row of the table represents a column of the sheet. Permissible values are described under "Vocabulary"; "choose from list" means that values should come from the term lists specified in the "terms" sheet of the annotation template.

In the annotation sheet, curators can add as many signature rows as needed to capture all pertinent immune signatures found, one row per signature.

We suggest reviewing the table with the  sample annotation sheet open in another browser window, for easy access to examples of immune signatures that can lend clarity to the points made below.

Curation sheet column header Descriptive text on Dashboard Vocabulary Additional information for curators

curation_date 

curation date YYYY-MM-DD 

YYYY-MM-DD

Date curation completed

cohort

cohort - any characteristics of the population(s) studied, plus whether the result was taken from a subgroup of the broader cohort tested.

free text

Example: COVID-19 infected patients and age/sex-matched healthy controls (Atlanta, Georgia)

Also report if the result was limited to a subgroup of the tested cohort, e.g.

  • subjects suffering adverse events
  • particular threshold levels of antibody titer level
  • based on a receiving a particular treatment

age_min

age_min - age of youngest subject including both cases and controls

number

Include both case/affected and control subjects

age_max

age_max - age of oldest subject including both cases and controls

number

include both case/affected and control subjects

age_units

age units

choose from list

hours, days, months, years

number_subjects

number of subjects - count of case plus control subjects used in the measurement

number

Often differs by signature within a publication. If number of subjects for a particular signature is not clear in text, use total for cohort

tissue_type

tissue type

Free text.  Multiple entries must be separated with a semicolon

As reported. The parent tissue of the response components. For cell-type results, the tissue is often PBMCs, but can also be a specific base cell type. For gene expression, the tissue might be a specific base cell type

tissue_type_term_id

Cell Ontology ID of tissue

Choose from list or lookup Cell Ontology code. Multiple entries must be separated by a semicolon, e.g. CL:0000576 (monocyte); CL:0000235 (macrophage).  Just the code is also acceptable, e.g. CL:0000576.

Cell Ontology IDs for tissues in column “tissue_type”.  If there is no matching cell type in the pulldown list, the curator can try to look up a matching term in the Cell Ontology. If there is no appropriate Cell Ontology term, UBERON codes can also be used.

method

method - primary experimental method used to measure the response

Choose from list. Only one entry expected. You can add new methods.

The primary experimental method used to measure the response, e.g. RNA-seq, CyTOF, CITE-seq.

response_component

response component

Gene or protein symbols can be separated with commas or semicolons. Cell types or other names must be separated using semicolons

The entities whose response is being measured.  Please copy symbols and names exactly as reported in the publication - except spell out greek letters or other special characters. For cell types, this includes all markers. Examples for a signature with three cell types: T cells CD3+/CD4+/Ki67+; T cells CD3+/CD8+/Ki67+ CD86+ myeloid dendritic cell (DC); CD86+ monocyte

is_model

signature was derived from a computational model

Y/N

Were the response components chosen using a classification or other model-building strategy?

response_behavior_type

response behavior type

Choose from list. Only one entry allowed.

The type of change being measured, e.g. gene expression, cell-type frequency.

response_behavior

response behavior (direction, correlation type etc.)

Choose from list. Only one entry allowed. Add new behaviors if required.

Common values in the pulldown list:

  • up, down
  • positively correlated, negatively correlated, correlated
  • positively predictive, negatively predictive, predictive

comparison

comparison (affected vs control, correlated variable, time vs baseline event etc.)

Free text. Use "vs" rather than a dash to separate comparison terms (A vs B). Separate multiple comparison entries with semicolons (A vs B; C vs D).

Comparisons are typically between two groups, or may reflect a  correlation of the response component with some other measured variable.    Only report significant results.   Examples:

  • severe COVID-19 cases (N=24) vs healthy (N=28)
  • moderate COVID-19 cases vs healthy, severe COVID-19 cases vs healthy
  • interferon-stimulated genes in COVID-19 vs healthy
  • bacterial DNA levels across COVID-19 and healthy subjects

Include group sizes (e.g., N=24). Include time comparison if relevant, e.g. 7d vs 0d, where the times are relative to the baseline reference event time (0d). Times before baseline event can be entered as negative numbers, e.g. days before vaccination (7d vs -1d).

**Please be concise**

baseline_time_event

baseline time event

Free text

The reference event from which the time of the experimental response is measured, e.g. hospital admission, onset of symptoms

time_point

time point relative to baseline event at which response was measured

Number or free text

Time point when response was measured, e.g. “7”, “various”, “0 to 8”

time_point_units

time point units

Choose from list. Lowercase only

days, months etc.

exposure_material

infection: exposure material (pathogen name); vaccine: exposure material (vaccine name) 

free text

Enter the (pathogen or vaccine) underlying the immune exposure as reported in the publication or use the NCBI Taxonomy term name

exposure_material_id

(vaccine and infection templates use different ontologies) 

infection:exposure material (NCBI taxid); vaccine: exposure material (vaccine ontology).

infection: Choose from list or use format ncbi_taxid:2697049.  vaccine: use format VO:0000045.  

infection:  NCBI Taxonomy ID of pathogen causing disease. vaccine: vaccine ontology ID of vaccine administered

exposure_process (infection template)

exposure process - method by which immune exposure  occurred

Choose from list. Enter new process if needed.

Method by which exposure to pathogen occurred

disease_name (infection template)

disease name

free text

E.g. COVID-19

disease_stage (infection template)

disease stage - reported disease stage(s) of affected subjects

free text

Reported disease stage(s) of affected subjects in all comparisons entered in row (not including control subjects). We will not attempt to match these directly with the comparisons. Examples include moderate, severe, ICU etc.  Pooled can also be added.

additional_exposure_material (vaccine templates)

exposure material - additional

free text

Any additional exposure material, e.g. “Live attenuated vaccine TC-83 challenge”, “ex-vivo restimulation with live VZV”

target_pathogen (vaccine templates)

target pathogen

text

NCBI taxonomy name (non-influenza pathogens).  For influenza, can just note e.g. “influenza A virus; influenza B virus”.  Values will be filled in by script based on vaccine year.

target_pathogen_taxonid (vaccine templates)

target pathogen (NCBI TaxID)

“influ:xxxx”, “ncbi_taxid:nnnnn”,  “ncbi_taxid:10335 (Human alphaherpesvirus 3)”.  Separate multiple entries with a semicolon.

For influenza, enter the tag “influ:xxxx” for the vaccine year/type from the vaccine_years.xlsx spreadsheet, e.g. “influ:2008”, “influ:2009mv”.  For all other pathogens, enter a matching provided value e.g. “ncbi_taxid:10335 (Human alphaherpesvirus 3)” (see file “ncbi_txids.tsv”) or enter the NCBI Taxonomy ID of pathogen if available, or nearest higher taxonomy entry if not. Multiple entries are allowed (expected for influenza only).

vaccine_year (vaccine templates)

vaccine year 

YYYY, comma separated if multiple

For influenza vaccines only, enter the official year of the vaccine, e.g. 2008, or the tag from the vaccine_years.xlsx spreadsheet, e.g. “2009mv”.  Multiple years allowed, e.g. “2008, 2009, 2010”

adjuvant (vaccine templates)

adjuvant

free text

Name of adjuvant, and vaccine ontology ID if available, e.g. “AS03 (VO:0001320)”

route (vaccine templates)

route

free text

i.m, i.n, i.d., po, subcutaneous etc.

scheduling (vaccine templates)

scheduling

free text

Has been used to record number of doses, e.g. 1 dose, 2 doses.

publication_reference_id 

publication reference (PMID)

pmid:nnnnnnnn or nnnnnnnn

Enter the PMID for the article curated, using format pmid:id,such as pmid:32788292, or just the id number, e.g. 32788292

publication_date

print publication date or posted date YYYY-MM-DD

YYYY-MM-DD

Partial dates are OK, e.g. YYYY-MM

publication_reference_url

publication URL

The URL only, not any associated display text

URL of the article curated. Please use PubMed if available

signature_source

signature source - figure, table or text section

Free text. Multiple entries allowed if needed

Figure, table number etc. where the signature was found, as given in the source listed in the publication_reference_url.  Multiple entries allowed, i.e. if a result is drawn both from a primary and a supplemental source

comments

comments and additional details

Free text, limit of 250 characters

Details clarifying any aspect of the signature not captured in other fields.  Will appear in Dashboard

curator_comments

curator comments

Free text, no limit

Questions or notes for further examination from the curator. Will not appear in Dashboard

 

Notes

*Be concise - A subset of the columns above (as specified by a signature-specific template) will be combined to generate a human-readable summary of the signature, to be displayed at the HIPC Dashboard. Please be as concise as possible when annotating free text columns.