The aim of this project is to capture human immune signatures of infection that reflect a statistically significant change in a quantity of biological interest (e.g., expression of a gene; or abundance of an immune cell type). Our focus is on measurements from in vivo studies, e.g., from blood samples rather than cell stimulation experiments.
Signature - A signature encapsulates both a change in the behavior or abundance of a biological response component, such as an immune cell type or a gene, as well as the metadata describing the context under which the signature was identified. The metadata is explicitly modeled in the signature representation, using controlled vocabularies where possible.
Metatdata - The metadata include (1) the tissue in which the signature was observed, (2) the immune exposure and timing underlying the observed comparison, (3) clinical details of the cohort from which tissue samples were taken (including sex, age, etc.), and (4) a reference to the publication in which the signature was reported.
Some of the metadata will have already been collected before the manuscript is passed on to a volunteer curator. Major details that need to be captured include:
- the cohort - clinical details of the subjects from which tissue samples were taken (may include sex, age, geographic location etc., e.g., African-Americans aged 55+).
- the tissue in which the signature was observed, e.g. blood, PBMCs, etc.
- the immune exposure (SARS-CoV-2)
- the comparison reported, such as change in gene expression between day 14 and day 0
- the type of change, such as up or down regulation of genes, or correlation of differential gene expression with another experimental or clinical variable
We are currently focused on signatures of:
- gene expression
- cell-type frequency - how a particular cell type changed in frequency among measured types e.g. in samples of PBMCs after infection.
- cell activation state - generally a change in the presence of a particular cell protein or set of proteins.
- cytokine/protein level
Curating an example publication containing immune signatures
As an example, we will look at the paper "Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans" published in Science (Arunachalam et al. 2020, DOI: 10.1126/science.abc6261).
The authors used a systems biology approach to characterize immune response in 76 COVID-19 patients and 69 age- and sex-matched controls from two geographically distant cohorts. The analysis involved the integration of data from mass cytometry (CyTOF) and single-cell transcriptomics of leukocytes, transcriptomics of bulk peripheral blood mononuclear cells (PBMCs), and multiplex profiling of plasma cytokines.
- In myeloid cells derived from the PBMCs of COVID-19 patients, the authors observed reduced expression of proinflammatory cytokines and of HLA-DR (the human leukocyte antigen DR class II gene). Additionally, plasmacytoid dendritic cells demonstrated impaired mTOR signaling (mammalian target of rapamycin) and reduced production of IFN-a (interferon-a).
- By contrast, they detected enhanced plasma levels of inflammatory mediators—including EN-RAGE, TNFSF14, and oncostatin M—which correlated with disease severity and increased bacterial products in plasma.
- Single-cell transcriptomics revealed a lack of type I IFNs (interferons), reduced HLA-DR in the myeloid cells of patients with severe COVID-19, and transient expression of IFN-stimulated genes. This was consistent with bulk PBMC transcriptomics and transient, low IFN-a levels in plasma during infection.
- The paper concludes that, taken together, these data suggest that COVID-19 causes an impaired type I IFN response in the periphery.
These and other findings are discussed in various sections of the papers, including in tables and supplementary material. The goal of this project is to identify the relevant information from the paper and to represent it using a standardized spreadsheet format (details under section “Annotation Template”) which allows for a concise, stylized, human-readable summary of the research finding (an immune signature).
The parts of an immune signature (the data model) illustrated
The publication mentioned above has been fully curated and the resulting immune signatures are available in this Google spreadsheet, one row per signature. We use this spreadsheet below as the basis of an introductory discussion of the curation process. More details about the sheet itself can be found on the Annotation template page.
The columns of the spredsheet represent the components of our data model. They capture the essential information embedded in a published immune signature. Key elements of this data model (genes, cell types, pathogens) are specified using controlled vocabularies. This ensures their consistent codification across all singatures and facilitates efficient cross-referencing and search in the HIPC Dashboard.
Each immune signature begins by specifying the cohort where the signature was observed. For comparison-derived signatures, a cohort would incude all comparison groups, e.g., affected and healthy individuals; or severe and mild cases. A publication will typically report the age ranges of the subjects, and a geographical location. In the example publication, there were two separate cohorts, one in Hong Kong, and the other in Atlanta. Some results involve only one or the other cohort, whereas others combine results from both. The cohorts involved are reported for each signature.
Results for each cohort are entered on separate rows of the curation sheet if they are presented separately in the publication. If only a part of a cohort is used in a given experiment, and the age range is not given for this group, use the ages of the entire cohort.
- cohort: COVID-19 infected patients and age/sex-matched healthy controls (Atlanta, Georgia)
- age_min: 23, age_max: 94, age_units: years
The central element of an immune signature is the change in the quantity or state of a biological "unit", in response to an immune exposure. The units whose changes are measured are called the response components. In this project we are most concerned with changes in the expression of genes, in the abundance of proteins, and in the frequency or activation state of certain blood immune cell types. We may also, with lesser priority, capture changes in the level of metabolites, or of enrichment of gene expression in predefined pathways. Technologies typically employed in these measurements are RNA-seq for gene expression, flow cytometry or mass cytometry (CyTOF) for characterizing proteins expressed by immune cells, and CITE-seq for combined gene and protein expression at the single cell level.
For convenience, curators can aggregate into one immune signature multiple response components, e.g., all the genes that are found to be up- or down-regulated in a comparison. Please note, for downstream analysis of gene expression signatures, it is desirable to report up- and down-regulated genes separately. For this reason, we capture the two different response directions as distinct immune signatures, using separate rows in the annotation sheet. There is also an option to report the change in gene expression in a directionless manner, by specifying "differentially expressed". It is advised that this option is only used when the actual direction of change ("up" or "down") cannot be readily inferred from the information in the manuscript.
Below are examples of response components from (1) gene expression, and (2) cell type abundance signatures (copied from the annotation spreadsheet):
- response_component: RPS4Y1, HLA-DQA2, CLEC10A, RPL27A, EEF1B2, RPL13A, MS4A6A, RPS20, RPL23, HLA-DPB1, RPS26, RPS11, RPL3, RPS3A, RPL7, RPS18, RPLP0, TOMM7, RPS6, RPL9, RPLP1, RPS14, HLA-DRB5, RPL23A, RPS2, RPL10, RPS27A, TPT1, RPL21, DDX3Y, RPS8, EIF3L, HLA-DPA1, EEF1A1, MT-ND3, RPS23, RPL5, FCER1A, RPL10A, RPL38, RPL13, SEC11A, RPS16, RPL7A, NOP53, RPS3, ZFAS1, RPS24, BRI3, RPL37A, RPL15, RPL6, RPL29, SNHG8, BATF3, RPL18A, EIF3E, RPS7, RPL31, EIF3F, RPS12, RPL27, RPL24, RPS27, RPS5, LST1
- response_component: plasmablasts (CD3-, CD20-, CD56-, HLA-DR+, CD14-, CD16-, CD11c-, CD123-, CD19lo, CD27hi, and CD38hi); effector cd8 T cells(CD3+, CD8+, CD38hi, and HLA-DRhi)
The behavior field captures the directionality (e.g. up or down, positively- or negatively-correlated) of the changes to the response components, under the comparison described by the immune signature.
The tissue type describes the tissue context where the measurement was performed. It can be thought of as one level above whatever is reported as the response component. For example, in an immune signature of cell abundance, the response component can be a particular PBMC cell subtype, e.g., plasmablasts. In that case, the tissue type is PBMC. For immune signatures of differential expression, where the response components are genes/proteins, the tissue type will be that of the sample used for the expression profiling experiment.
Example tissue and response component combinations from the paper.
Note that the same cell type can appear as a response component (first row) or as a tissue (second row) depending on whether a cell frequency or e.g. a protein expression level is being measured (the "Example #" column below references the bullet number of the "Key findings" listed above):
|Example #||Tissue||Response Component||Response Behavior||Figure|
|1||peripheral blood mononuclear cell||plasmacytoid dendric cell (CD3−, CD20−, CD56−, HLA-DR+, CD14−, CD16−, CD11c−, and CD123+)||down||Fig.1C|
|1||plasmacytoid dendritic cell||S6 (mTOR related)||down||Fig. 1D|
|1||myeloid dendritic cell||IkappaBalpha (interferon-a related)||down||Fig. 1E|
|1, 3||myeloid dendritic cell||HLA-DR||down||Figs.5A, 5B|
|2||blood plasma||EN-RAGE, TNFSF14, and oncostatin M||positively correlated||Fig.3|
|2||blood plasma||IL-6, MCP-3, CXCL10||up||Fig.3|
|2||blood plasma||IL-6, TNF, MCP-3, EN-RAGE, OSM, TNFSF14||positively correlated||Fig. 6C|
|3||peripheral blood mononuclear cell||TAP1, IFITM1, IFIT2, STAT1, IFIT1, IFIT3, DDX60, IFIH1, DDX58, OASL, PML, RSAD2, SP100, HESX1, HERC5, USP18, EIF2AK2, PLSCR1, OAS1, IRF7, SERPING1, PARP9, MX2, IFI27, APOBEC3A, OAS3, SIGLEC1, ATF3, C1QB, BCL2A1, IFNGR2, MGST1, HLX||up||Fig. 4G|
The “comparison” field describes the cohort groups whose differential response under the perturbation is measured. Examples of comparison groups include measurements taken at two different time points (e.g. day 1 vs. day 0); correlation with antibody response; differing antibody response outcomes (e.g. high vs. low responders); comparisons across different demographic parameters such as age or sex (e.g. younger vs. older, female vs. male); or comparisons of different disease states (e.g., mild vs. severe). If the sizes of the comparison groups are known, they should be included in the description. E.g.:
- comparison: COVID-19 patients (N=73) vs healthy controls (N=62)
To annotate the infections or vaccinations driving each signature, we utilized the Immune Exposure model, which provides a standardized description of a broad range of potential and actual exposures to different immunological agents (e.g., from laboratory confirmed infection to living in an endemic area). Immune exposures are broken down into their component parts of Exposure Process, Exposure Material, Disease Name, and Disease Stage. Each component is modeled using standardized ontology terminology. For example, the Exposure Material in this study is SARS-CoV-2, which is recorded as the NCBI taxonomy code "ncbi_taxid:2697049". Vaccines can be captured using terms in the Vaccine Ontology (VO), which further link to target pathogens and strains using the NCBI Taxonomy (link, link). Utilizing the Immune Exposure model promotes interoperability with other projects that have adopted its use, both within and outside of HIPC, including repositories such as the immune data portal ImmPort and the Immune Epitope Database.
Other annotation fields
See the following page, Annotation template, for further fields involved in the curation process.