Research ArticleHIV

A vaccine-induced gene expression signature correlates with protection against SIV and HIV in multiple trials

See allHide authors and affiliations

Science Translational Medicine  28 Aug 2019:
Vol. 11, Issue 507, eaaw4236
DOI: 10.1126/scitranslmed.aaw4236

A proxy for protection

Current vaccines for HIV and simian immunodeficiency virus (SIV) offer only partial protection, and better vaccine candidates are much needed. Ehrenberg et al. found a gene expression signature that correlated with protection against SIV and HIV infection across multiple vaccine trials, using different vaccine regimens, in both nonhuman primates and humans. The identified signature comprised gene expression changes related to the B cell activation required for an effective immune response. This gene expression signature might be useful to evaluate which vaccine candidates may offer greater protection against HIV.


Current HIV vaccines are only partially efficacious, presenting an opportunity to identify correlates of protection and, thereby, potential insight into mechanisms that prevent HIV acquisition. Two independent preclinical challenge studies in nonhuman primates (NHPs) previously showed partial efficacy of a mosaic adenovirus 26 (Ad26)–based HIV-1 vaccine candidate. To investigate the basis of this protection, we performed whole transcriptomics profiling by RNA sequencing (RNA-seq) in sorted lymphocytes from peripheral blood samples taken during these studies at different time points after vaccination but before challenge. We observed a transcriptional signature in B cells that associated with protection from acquisition of simian immunodeficiency virus (SIV) or the simian-human immunodeficiency virus (SHIV) in both studies. Strong antibody responses were elicited, and genes from the signature for which expression was enriched specifically associated with higher magnitude of functional antibody responses. The same gene expression signature also associated with protection in RV144 in the only human HIV vaccine trial to date that has shown efficacy and in two additional NHP studies that evaluated similar canarypox-based vaccine regimens. A composite gene expression score derived from the gene signature was one of the top-ranked correlates of protection in the NHP vaccine studies. This study aims to bridge preclinical and clinical data with the identification of a gene signature in B cells that is associated with protection from SIV and HIV infection by providing a new approach for evaluating future vaccine candidates.


A highly efficacious preventive HIV vaccine in humans has been elusive to date. The RV144 HIV-1 vaccine showed partial efficacy in people from Thailand but was insufficient for licensure (1). Mathematical modeling has demonstrated that with effective preexposure prophylaxis, the ability to reduce HIV risk by even 50% could be an effective public health approach to the HIV pandemic (2). Some vaccine regimens have shown 50% or more efficacy in the nonhuman primate (NHP) simian immunodeficiency virus (SIV) or the chimeric simian HIV (SHIV) challenge models, and correlates of infection risk that are linked to vaccine-evoked immune responses have been identified (35). With human clinical trials being time-consuming and expensive, preclinical trials in NHPs are invaluable. There are clear differences in virus exposure and transmission between humans and NHPs, but being able to test a product rapidly from bench to bedside makes the NHP model an attractive alternative to other animal models. Steps are being taken to understand events leading from human exposure to infectious virus to productive infection in order to design an NHP model that most accurately imitates natural HIV infection (6). Although the protection observed in NHPs can be used as evidence that a vaccine regimen may be effective and deserves further development, strong correlates that predict immunogenicity and protection in clinical trials are also needed. Recent advances in next-generation sequencing (NGS) technologies promote a more prominent role for gene expression analysis both in the optimization of vaccination regimens and in gaining understanding of the immunological basis for protection (79).

Protective efficacy of 50% was previously identified in an SIV challenge study with an adenovirus 26 (Ad26) prime followed by a purified envelope glycoprotein (gp140) boost vaccine regimen (3). These findings were replicated in the SHIV challenge model in an independent NHP study (4). We proposed that the partial efficacy observed could be due to variation in host gene expression elicited by vaccination. Here, we used whole transcriptional profiling in different lymphocyte subsets to find a gene signature that correlated with protection in multiple studies.


A B cell signature correlates with protection in multiple trials

We analyzed peripheral blood mononuclear cells (PBMCs) that were available from rhesus monkeys (09-11 study) immunized with the Ad26/gp140 vaccine regimen, where 50% of the animals remained uninfected after six intrarectal SIVmac251 challenges (3). We also included samples that were available from a second NHP preclinical trial (13-19 study) where rhesus monkeys were immunized with the same vaccine that showed 33% efficacy after six intrarectal SHIV-SF162P3 challenges (4). We extracted total RNA at different time points after vaccination but before challenge from four sorted lymphocyte populations of CD4+ T cells, CD8+ T cells, natural killer (NK) cells, and B cells (fig. S1). We then generated whole transcriptome RNA sequencing (RNA-seq) data from sorted populations in 21 animals from the Ad26/gp140 arms of the SIV and SHIV preclinical challenge trials (1012). Differential expression analysis found no overlapping genes that were significant (absolute fold change >1.5, P < 0.05, q < 0.10) when comparing samples from animals that were protected from infection or succumbed to infection at the end of six challenges. We performed gene set enrichment analysis (GSEA) in both studies to identify whether previously defined sets of genes (gene signatures) were associated with infection status at the end of six challenges (1315). We screened signatures from the Molecular Signatures Database (MSigDB) previously observed to be associated with biological phenotypes for enrichment in monkeys that remained uninfected after the challenge [normalized enrichment score (NES) ≥1.4, P < 0.001] (13). We identified one gene set that was significantly enriched in B cells from uninfected animals in both the 09-11 and 13-19 studies (NES ≥1.4, P < 0.001) (Fig. 1A and data file S1). This gene set consisting of 200 genes was previously generated from gene expression datasets from human immunogenicity trials of influenza vaccine when comparing B cells to monocytes at day 7 after vaccination (16). Of the 200 genes enriched in humans, only 140 genes were detected in the RNA-seq data generated in the NHP studies. Enrichment was greatest in the 13-19 study sampled at 2 weeks after vaccination (NES = 2.13, P < 0.001) but was still present in the 09-11 study analyzed 20 weeks after the last vaccine dose (NES = 1.48, P < 0.001). Other than the B cell subset, there were no enriched gene sets that were common between the two studies in the other lymphocyte populations (NES ≥1.4, P < 0.001) (data file S1).

Fig. 1 A gene expression signature seen in B cells is associated with protection in multiple SIV/HIV vaccine studies.

(A) The same B cell–based gene signature was significantly enriched in the Ad26/gp140 arms of the 09-11 (n = 10) and 13-19 (n = 11) NHP studies in the uninfected compared with infected monkeys, (B) in the human RV144 vaccine study (n = 172), (C and D) and in two independent RV144-like pox-protein vaccine regimens in NHP (n = 39).

Given that we observed the same signal in two independent NHP studies, we asked whether this signature was enriched in the human RV144 HIV-1 vaccine efficacy trial (1). Although the vaccine regimen in the RV144 study was different from those of the NHP preclinical trials and included a canarypox virus ALVAC prime with a gp120 protein boost, we observed the same B cell gene signature enrichment in the uninfected compared with the infected RV144 vaccinees in total PBMCs from 172 donors 2 weeks after vaccination (NES = 2.04, P < 0.001) (Fig. 1B) (1). We also observed enrichment of this signature in two independent SIV challenge studies that showed some vaccine protection and used pox-protein regimens similar to RV144 (NES = 2.66, P < 0.001), although significance was slightly decreased for the ALVAC-SIV/gp120 study, where gene expression was measured before the last vaccination (NES = 1.59, P < 0.002) (Fig. 1, C and D) (17, 18). The total number of enriched genes associating with protection from infection in the B cell signature varied between the different studies (table S1).

A composite gene expression score is a predictor of SIV/SHIV acquisition in NHPs

Because this B cell signature associated with protection in multiple studies, we investigated whether the enriched genes in the signature from the Ad26/gp140 arm of the 09-11 study (3) could predict acquisition in the Ad26/gp140 arm of the 13-19 study (4). There were 58 enriched genes in the B cell signature in the 09-11 study, of which only 53 passed the gene expression filtering step in the 13-19 study. We used a composite gene expression score (GES) consisting of an average of standardized expression of 53 normalized genes from the 09-11 training dataset to build our prediction model. Internal validation using optimism-corrected bootstrapping determined an area under the curve (AUC) of 0.84, suggesting that this classifier was able to discriminate infected from uninfected animals in the 09-11 study. This model, consisting of 53 expressed genes, was able to predict protection in the 13-19 test dataset (AUC = 0.86) (Fig. 2A). We then tested this prediction model in an independent arm of the 13-19 study that had received an additional Ad26 boost and was previously shown to be the most protective, with 67% of the animals remaining uninfected after the last challenge (4). We performed RNA-seq on sorted B cells from animals of this arm of the 13-19 study and computed the composite GES for the 53 genes that were enriched in the 09-11 study. The classifier from the 09-11 study was able to predict acquisition in the independent Ad26/Ad26 + gp140 validation dataset with an AUC = 0.75 (Fig. 2B). The GES was not able to predict protection in the Ad26/MVA + gp140 arm of the 13-19 study (AUC = 0.44), which had previously demonstrated 42% efficacy, lower than the Ad26/Ad26 + gp140 arm (fig. S2) (4). We validated the association of higher GES with reduced acquisition by quantitative polymerase chain reaction (qPCR) in the 09-11 cohort (fig. S3). In addition to testing infection as a dichotomous outcome, a time-to-event analysis of the number of SHIV exposures before infection was performed combining animals from the two arms of the 13-19 study. This survival analysis showed that monkeys with a higher GES significantly associated with longer time to infection [hazard ratio (HR) = 0.01, P = 0.005]. This association, however, did not reach significance in the 09-11 study (HR = 0.04, P = 0.06) (Fig. 2C).

Fig. 2 A subset of genes from the B cell gene signature predicts protection and is associated with reduced SIV/SHIV infection.

(A) Fifty-three enriched genes from the 09-11 study B cell signature were used to generate a composite GES consisting of the average of standardized normalized gene expression. The receiver operator characteristic (ROC) curve illustrates the discriminating ability of the GES of enriched genes from the Ad26/gp140 arms of the 09-11 training dataset (AUC = 0.84; 95% CI, 0.52 to 1) and the 13-19 test dataset (AUC = 0.86; 95% CI, 0.56 to 1) to predict protection from acquisition of SIV (n = 10) and SHIV (n = 11), respectively. (B) The model from the 09-11 study was also able to predict protection from acquisition of SHIV in the Ad26/Ad26 + gp140 validation arm of the 13-19 study (n = 12) by logistic regression (AUC = 0.75; 95% CI, 0.41 to 1). (C) Effect of the GES as a continuous variable by Cox proportional hazards ratio (HR) is shown for both NHP studies.

GES is an independent correlate of protection in NHPs

Both studies previously identified functional antibody-based correlates of protection (3, 4). We tested whether the B cell gene signature was enriched with increased magnitude of cellular and antibody immune responses in both studies (table S2). The immune correlates were measured at specific time points and analyzed as categorical variables. When independently testing each individual immune response for enrichment of the B cell gene signature, we observed that it associated with higher antibody-dependent cellular phagocytosis (ADCP), percent tier 2 neutralization, and cellular responses to Pol as measured by enzyme-linked immunosorbent spot (ELISPOT) in the 09-11 study (NES ≥1.4, P < 0.001) (table S3). Higher magnitudes of ELISPOT to gp140, ADCP, Fc antibody polyfunctionality, interferon-γ, immunoglobulin G (IgG) breadth, and antibody-dependent neutrophil phagocytosis (ADNP) also showed significant enrichment of the B cell gene signature in the combined arms of the 13-19 study (NES ≥1.4, P < 0.001) (table S3). ADCP was the only immune response that associated with the B cell gene signature in both NHP studies (NES = 1.99 to 2.21; P < 0.001) (Fig. 3, A and B). Given that the GES was a good predictor of SIV or SHIV acquisition, we tested its relative ranking by Cox regression with respect to other immune measures in the two NHP studies. We used an elastic net method to generate a variable importance in projection (VIP) plot for the immune responses along with the GES and ranked them within each study. In both studies, the GES was present as one of the top-ranked correlates of protection even after accounting for other potential correlates (Fig. 3, C and D). These findings were further supported by a univariate analysis of the GES on time to infection in the 13-19 study but did not reach statistical significance in the 09-11 study (13-19: HR = 0.29, P = 0.005; 09-11: HR = 0.19, P = 0.06) (tables S4 and S5). As this gene signature was seen in B cells, we asked whether the GES correlated with frequency of B cells as a fraction of peripheral blood lymphocytes in the animals within either of the studies, but observed no significant correlation.

Fig. 3 The B cell gene signature and GES associated with immune correlates of protection after vaccination.

(A and B) ADCP is the only functional immune response that associates with the protective enriched B cell gene signature in both the 09-11 (n = 10) and combined 13-19 (n = 23) studies. (C and D) Relative ranking of immune responses (blue) and GES (red) based on time to infection in the 09-11 and 13-19 studies were generated by averaging the VIP scores across 1000 repeated feature selection processes. Standard error bars of the average of the VIP scores are shown for each feature. *week 96, #week 64.

Specific genes in the B cell signature are associated with protection against SIV/SHIV

We performed a network analysis to identify relationships between the 53 enriched genes in the B cell signature and their known role and function. We analyzed these relationships based on coexpression and genetic interactions annotated in curated network databases (19). The genes in the B cell signature either directly or indirectly interact genetically and affect functions such as metabolism, cell signaling, cell differentiation, migration, proliferation, inflammation, immune response, apoptosis, and phagocytosis (Fig. 4A and data file S2). Univariate analyses of the association of the 53 genes with time to infection identified the most protective genes with HR ≤0.5 that associated with protection in both studies (Fig. 4B). Genes associating with protection in both NHP studies were TNFSF13, BHLHE40, SDCBP, and OGFRL1. To evaluate the effect of these four host genes on SIV/SHIV acquisition in a multivariate analysis, we used logistic regression with a least absolute shrinkage and selection operator (Lasso) penalty. The only genes that remained in the model and associated with reduced risk of infection in both the 09-11 and 13-19 studies were OGFRL1 and TNFSF13. Expression of TNFSF13 also correlated with the greatest number of immune responses that were specific to each study, including ADCP and Gag-specific ELISPOT in the 09-11 study and antibody-dependent complement deposition (ADCD), ADNP, IgG breadth, and gp140-specific ELISPOT in the 13-19 study (fig. S4).

Fig. 4 Functional relationships of the 53 enriched genes from the B cell gene signature.

(A) Shown is a gene interaction network derived from the protective B cell gene signature. The network edges represent coexpression (purple) and genetic interactions (green). Red nodes represent the 53 enriched genes, and cyan squares indicate interacting network genes. (B) Effect of expression of the 53 genes individually on time to infection as a continuous variable in the 09-11 (n = 10) and 13-19 (n = 23) studies. The most protective genes with HR ≤ 0.5 are shown in red, and of these the overlapping genes in the 09-11 and 13-19 studies are boxed. *P < 0.05.


Several NHP challenge studies using different vaccine strategies have shown partial protection against SIV or SHIV acquisition (3, 4, 17, 18). To date, there is only one human study that showed partial protective efficacy against HIV (1). Correlates of protection in the human RV144 study (1) included Env-specific binding antibodies, viral genetic signatures, and specific host alleles (2023). Given the partial protection observed in NHP and humans, we asked whether there were patterns of vaccine-induced host gene expression that associated with protection across vaccine regimens and platforms. Although the regimens and time points after vaccination varied between the preclinical and clinical trials, we hypothesized that the protection observed in multiple studies was due to a common gene signature and could be identified by unbiased whole-transcriptome–based analyses. We used an NGS-based transcriptome approach and identified enrichment of a gene set (16) that correlated with protection from infection in the two Ad26-based studies and other NHP studies. This signature was also protective in the human RV144 trial. That this gene set was first identified in the context of influenza vaccine immunogenicity (16), and has now been shown to associate with protective efficacy of HIV vaccination, suggests these genes may be broadly involved in responses to vaccination, although the particular changes that are protective likely differ between viruses. We hypothesize that the gene signature is vaccine induced, as it was not observed in the placebo participants in the RV144 study (1). It will, however, be important to evaluate baseline expression of this gene signature before vaccination, as such variation has demonstrated detectable effects on influenza vaccine responses when using systems biology approaches (24).

As enriched genes in this B cell signature were predictive of protection against both SIV and SHIV infection, we defined a composite GES by averaging Z scores of the expression of the 53 enriched genes for each animal in any given study. The GES was one of the top correlates of protection in the NHP studies and represents a new tool for assessing HIV vaccine candidates. This will allow testing the predictive capacity of this gene signature in other vaccine studies that are completed or ongoing (25) and may aid the empirical design of vaccines with increased efficacy. The signature we identified also furthers understanding of the immunological basis for protection conferred by the present vaccines studied. The signature correlated with functional Fc-mediated phagocytosis responses in NHP, supporting the importance of this mechanism, which was also indicated in a recent study (26).

Our gene network analyses suggest that many of the enriched genes in the signature have direct or ancillary roles in both innate and adaptive immune responses and are functionally associated. Specific genes in the signature directly associating with decreased risk of acquisition in both studies were TNFSF13, BHLHE40, SDCBP, and OGFRL1. TNFSF13 belongs to the tumor necrosis factor ligand superfamily, BHLHE40 has been shown to suppress IL10, SDCBP encodes the syntenin protein whose overexpression inhibits HIV-1 production, and OGFRL1 is a less-characterized paralog of the opioid growth factor (2729). The only gene to have an independent protective effect and correlate with maximum number of immune responses in both the NHP studies was TNFSF13, which is also known as a proliferation-inducing ligand (APRIL). TNFSF13 is a ligand for TNFRSF17/BCMA and TNFSF13B/TACI, both of which are involved in numerous B cell functions including proliferation, class switching, affinity maturation, and antibody production (30). TNFRSF17 expression has previously been described to correlate with increased magnitude of vaccine-induced antibody titers against influenza and yellow fever (16, 31, 32). Together, these data suggest that TNFSF13 signaling correlates with protective vaccine-induced responses against several viruses. TNFSF13 is typically produced by innate immune cells, but can also be induced in activated B cells (28, 33, 34). It is not likely that TNFSF13 alone is responsible for the protection observed with acquisition, as the association of the GES with decreased risk of SIV/SHIV acquisition was stronger than the single gene. Instead, the nature and function of this protein suggest that it may be a potential master regulator of the vaccine-induced protection, in concert with other genes enriched in these studies.

There are limitations to this study, including the small sample size of the 09-11 study, which we think contributed to some less significant associations than in the combined 13-19 study. Further, these analyses were limited to transcriptomics of peripheral blood cells. Other noteworthy limitations were the differences between the 09-11 and 13-19 studies including the antigen (SIV/HIV insert), challenge (SIVmac251/SHIV SF162), vaccine strategy (number/timing of immunization), adjuvant (AS01B/alum), and time points at which RNA-seq was performed (20 weeks/2 weeks after last vaccination). All these have the potential to be confounders in a cross-prediction analysis by decreasing the signal–to–background noise ratio. Despite these limitations, we were able to use the NHP RNA-seq data to identify an overlapping gene set in B cells that was associated with protection against SIV and SHIV challenge and that was then validated in a human efficacy trial of HIV-1 infection.

The B cell signature identified here also showed association with functional antibody-based responses that have previously been observed to be correlates of protection against SIV (3, 4). Prospective interrogation of this transcriptional signature would be of value in the two ongoing HIV vaccine efficacy studies, evaluating both ALVAC (HVTN 702) and Ad26-based (HVTN 705) HIV vaccine regimens. More broadly, our data demonstrate the potential to discover protective correlates using an approach that mines transcriptomic data in multiple preclinical and clinical trials. Single-gene testing methods might be too stringent for studies consisting of small sample sizes, and hence, using a gene set analysis might be more feasible for preclinical and clinical trials. In addition, screening gene sets previously observed to be associated with biological phenotypes for association with protection in our dataset avoids limitations of current mechanistic understanding and focuses on processes of known biological perturbations. The potential significance of this approach is shown by our identification of a gene signature in B cells that associates with protection against both SIV and HIV across different vaccine platforms.


Study design

For discovery analysis, cryopreserved PBMCs from 21 rhesus monkeys were selected from the vaccine arms of two studies where partial efficacy was observed, namely, the Ad26/gp140 arm of the 09-11 SIV challenge study (n = 10) (3) and the Ad26/gp140 arm of the 13-19 SHIV challenge study (n = 11) (4). We measured whole-transcriptome gene expression by RNA-seq at time points after vaccination but before challenge to identify genes or gene sets that associated with protection from infection in both arms. We performed validation studies using animals from the additional arms of the 13-19 study where efficacy was observed, including the Ad26/Ad26 + gp140 (n = 12) and Ad26/MVA + gp140 (n = 9) arms. Sample sizes were based on availability of PBMCs.

Flow cytometry

PBMCs were thawed and stained with a cocktail of antibodies including CD3 Alexa Fluor 700 (clone SP34-2, BD Pharmingen), CD4 Brilliant Violet 605 (clone OKT4, BioLegend), CD8 Brilliant Violet 785 (clone RPA-T8, BioLegend), CD14 Pacific Blue (clone M5E2, BD Pharmingen), CD16 phycoerythrin (PE)–Cy7 (clone 3G8, BD Pharmingen), CD20 fluorescein isothiocyanate (FITC) (clone 2H7, BD Pharmingen), NKG2A PE (clone Z199, Beckman Coulter), and LIVE/DEAD Aqua viability dye (Life Technologies) to differentiate lymphocyte populations. The stained cells were analyzed and sorted using a BD FACSAria II, producing pure bulk populations of CD4+ T cells, CD8+ T cells, NK cells, and B cells. The gating strategy is shown in fig. S1.

RNA extraction and sequencing

Total RNA was extracted from sorted lymphocytes using the Single Cell RNA Purification kit (Norgen Biotek Corp). NGS RNA-seq was performed using SMART-Seq technology as described previously (1012). Briefly, 2.5 ng of sample RNA input was processed using the SMARTer Stranded Total RNA-Seq Pico Input Mammalian kit (Takara Bio Inc.) per the manufacturer’s instructions, with ERCC spike-ins of exogenous control RNA (Thermo Fisher Scientific). External ERCC RNA spike-in was added to each sample to a final dilution of 1:260,000. Final library PCR amplifications used 13 cycles. Quantity and quality of amplified cDNA libraries were determined using the Qubit double-stranded DNA HS kit (Thermo Fisher Scientific) and a High Sensitivity DNA chip (Agilent). MiSeq quantitation was performed using a 50-cycle MiSeq Reagent Kit v2 and MiSeq instrument (Illumina). The final 2 nM sample dilutions were then adjusted for any deviations from expected read percentages. We performed NGS on pooled fractions with either the HiSeq 2500 (with TruSeq PE Cluster cBot v3 and 200-cycle TruSeq SBS v3 kits) or NextSeq 500 (with a 300-cycle NextSeq High Output v2 kit) instrument (all Illumina) per the manufacturer’s instructions for paired-end output targeting 50 million reads per sample.

Sequence analysis

Paired-end sequencing data was converted to fastq, and adapters were masked using bcl2fastq v2.17.1.14 software (Illumina). The raw fastq files were examined for quality using FastQC v0.11.5, and reads were trimmed using Trimmomatic v0.36 when the average quality score of a 15–base pair sliding window fell below Q30. Trimmed reads with a minimum length of 50 base pairs were retained (35). A median of 20% raw reads was discarded when using these parameters. Paired-end reads passing filter were aligned to the NHP genome assembly (Macaca mulatta, NCBI version Mmul 8.0.1) using HiSat2 v2.1.0 (36). Gene expression quantification was performed using HTSeq-count v0.9.1 (37), and a trimmed mean of M values–based normalization of the read counts was carried out using the limma and edgeR v3.18.1 packages (38, 39). Genes were filtered on the basis of expression within a cell subset if ≥50% of the samples within a comparison group had normalized read counts per million >1. On average, a total of 11,146 genes were expressed per animal.

Microarray data

Published normalized, log-transformed microarray data from two vaccine studies, DNA-SIV/ALVAC + gp120 (18) and ALVAC-SIV/gp120 (17), were downloaded from the Gene Expression Omnibus database (accession codes GSE108011 and GSE72624, respectively). For the DNA-SIV/ALVAC + gp120 study (n = 12), we analyzed microarray data at 1 week after the second boost time point. For the ALVAC-SIV/gp120 study (n = 27), we analyzed data at the time point 24 hours after the third immunization (before the last boost). Microarray data were also available for 172 vaccinated individuals from the sample set described in the RV144 immune correlates study at the time point 2 weeks after the last vaccination (20, 40). All microarray datasets used in this study were generated on the HumanHT-12 v4.0 expression BeadChip platform (Illumina). For the RV144 study, the raw expression data were normalized, log transformed, and background subtracted as described (40). The samples in each study were grouped into outcomes after challenge or infection status after immunization (1, 17, 18).

Immune response data

Functional immune response data for the 09-11 and 13-19 studies have been described previously (3, 4). Additional assays including IgG breadth and ADNP were analyzed. IgG binding activity was measured against 15 SIV antigens in the 09-11 study and 22 HIV antigens in the 13-19 study. IgG breadth was calculated by the sum of the measurements whose binding activity was above the median of the measurement across all animals in the trial. ADNP was assessed in the 13-19 study by the measurement of the uptake of antibody-opsonized, antigen-coated fluorescent beads by primary neutrophils. Biotinylated A244 gp120 was used to saturate the binding sites on 1-μm fluorescent neutravidin beads (Invitrogen). Excess antigen was removed by washing the beads, which were then incubated with NHP antibody samples for 20 min at 37°C. Leukocytes were isolated from blood collected from HIV seronegative donors after ACK lysis of red blood cells. After opsonization, the freshly isolated leukocytes were added, and the cells were incubated for 1 hour at 37°C to allow phagocytosis. The cells were then stained for CD66b to identify neutrophils and then fixed, and the extent of phagocytosis was measured by flow cytometry on a Stratedigm S1000EXi flow cytometer. The data are reported as a phagocytic score as described previously (41).

Quantitative polymerase chain reaction

Gene expression data were confirmed by qPCR. Because only small amounts of RNA were available, we used the Biomark platform (Fluidigm) for validation on a subset of genes based on assay availability and robustness. Briefly, cDNA was synthesized by reverse transcription (RT) from 2.5 ng of RNA extracted from sorted B cells using an RT preamplification reaction mix, composed of TaqMan Fast Virus 1-Step Master Mix (Applied Biosystems) and a pooled 96 target TaqMan assay mix (Thermo Fisher Scientific). RT was performed in three steps: cDNA synthesis (50°C for 10 min), denaturation (95°C for 2 min), and preamplification (18 cycles of 95°C for 15 s and 60°C for 4 min). Duplicate technical replicates of cDNA samples were assayed as described previously (42), using the GE 96x96 standard v2 protocol. The cycle threshold (Ct) values were validated and exported using the Biomark gene expression real-time PCR analysis software (Fluidigm).

Statistical analyses

Two-class differential gene expression was performed using the edgeR package (38), comparing uninfected to infected animals in the 09-11 and 13-19 discovery studies. Genes were considered significant based on the threshold of absolute fold change >1.5, P < 0.05, and q < 0.10. To reduce the stringent limitations of single gene differential expression analyses due to multiple testing corrections, normalized RNA-seq transcription profiles from different cell subsets were analyzed using the GSEA method (13). Genes were analyzed against the C2 canonical pathways, C7 immunologic signatures, and the blood transcription modules in the MSigDB (13, 15, 32). For each GSEA comparison, we tested whether gene sets were significantly associated with protection against challenge. Signatures were considered significantly protective if genes were enriched in the uninfected versus infected animals using a threshold of NES ≥1.4 and P < 0.001. Signatures derived only from primates were further downselected if they overlapped in the Ad26/gp140 arms of the 09-11 and 13-19 studies.

For further analysis, a total of 58 leading-edge genes (enriched genes) in the B cell signature from the 09-11 study were identified to associate with SIV acquisition by GSEA and were selected for downstream processing as described below. A composite GES of enriched genes in the 09-11 study was computed by averaging Z scores of the normalized gene expression for each monkey. We generated prediction models for infection status using the composite GES from 53 of the 58 enriched genes from the 09-11 study. Five of the enriched genes (ALDH3B1, CD302, RTN1, BST1, and SLC27A3) were not considered as they did not pass the edgeR filtering step in the 13-19 study. We also computed a GES for the same 53 genes in the different arms of the 13-19 study. Bootstrap optimism-corrected area under receiver operating characteristics (ROC) curves were used for internal validation of model discrimination. The performance of the classifier was assessed using AUC with 95% confidence interval (CI; an AUC of 0.5 indicates random discrimination, and 1 indicates perfect discrimination capabilities). We then assessed the performance of this classifier on the Ad26/gp140 test and Ad26/Ad26 + gp140 validation datasets of the 13-19 study. We performed a similar prediction analysis using a GES of 51 enriched genes from the 09-11 study and tested in the Ad26/MVA + gp140 arm of the 13-19 study, as two of the enriched genes (CREB5, RRAGD) did not pass the filtering step. We analyzed protection, defined as number of challenges required for infection by Cox proportional hazard regression using the GES as a continuous variable for both the 09-11 and 13-19 studies. The GES from the two arms of the 13-19 study (Ad26/gp140 and Ad26/Ad26 + gp140) were combined for this analysis and all other analyses mentioned below. All uninfected animals were censored at the last challenge. The proportionality assumption for the Cox model was acceptable on the basis of the test of independence between the scaled Schoenfeld residuals and time.

We further evaluated the gene expression B cell signature and GES by comparing them to other immune responses generated from the 09-11 and 13-19 studies. Using GSEA, we tested the effect of the B cell gene signature on the magnitude of immune response as a categorical variable based on the median values. We also used a penalized multivariable model to assess what combination of GES and other immune response variables was jointly associated with infection risk. The embedded method with elastic net regularization, which combines both L1 and L2 penalties, was used to test the relative ranking of the GES with respect to all other 15 and 11 immune measures for the 09-11 and 13-19 studies, respectively, with time to infection as outcome (43). The optimal values of the elastic net penalty α and the tuning parameter λ were selected using fivefold cross validation. We ranked candidate features using the VIP score, which reflects the magnitude of the contribution of an individual feature to the generated model after accounting for other features. We generated VIP plots of all features of the 09-11 and 13-19 studies by averaging the variable importance score from 1000 repeated feature selection processes. Time to event was analyzed by univariate Cox proportional hazard regression for all immune correlates along with the GES for both studies to test their independent effects on acquisition.

To query the 53 genes in the GES against published networks for possible interactions, we used the GeneMANIA application (v3.4) on Cytoscape (v3.2.1) with default parameters (19, 44). Networks for these 53 queried genes were predicted on the basis of the most-weighted interaction scores using default categories of association data in GeneMANIA, namely, correlations in expression and genetic interactions. The genes within these predicted networks were then manually separated into groups based on their function as ascertained from published literature. Univariate Cox proportional hazard regression was performed for all 53 enriched genes for both studies to test their effects on time to infection. Forest plots of HR with 95% CI were generated for all 53 genes from both studies to select the most significant variables in a univariate analysis. A multivariate analysis by the Lasso method was used to downselect the genes that were most associated with infection status (45). We selected the tuning parameter through a fivefold cross validation. The algorithm was run 1000 times, and genes that were repeatedly retained more than 80% of the time were selected. Correlations between immune responses and the 53 genes were investigated by Spearman correlation.

A two-sided P value of <0.05 was considered statistically significant for all statistical analyses described above. All descriptive and inferential statistical analyses were performed using R 3.4.1 GUI 1.70 build (7375) v3.0 and higher, and GraphPad Prism 7 statistical software packages (GraphPad Software).


Fig. S1. Gating strategy used for flow sorting lymphocyte populations in rhesus monkeys.

Fig. S2. Logistic regression prediction model in the Ad26/MVA + gp140 arm.

Fig. S3. qPCR validates higher GES in B cells as protective.

Fig. S4. Correlation of gene expression with magnitude of immune responses in the 09-11 and 13-19 studies.

Table S1. Total number of enriched genes in the B cell gene signature for the different trials in this study.

Table S2. Time points for different assays in the 09-11 and 13-19 studies.

Table S3. B cell gene signature enrichment associated with higher magnitude of immune responses in the 09-11 and 13-19 studies.

Table S4. Univariate analysis of immune responses and GES on time to infection in the 13-19 study.

Table S5. Univariate analysis of immune responses and GES on time to infection in the 09-11 study.

Data file S1. Enriched GSEA gene sets (NES ≥ 1.4, P < 0.001) in different vaccine regimens in the 09-11 and 13-19 studies.

Data file S2. Functional categorization of the 53 genes in the B cell gene signature by network analysis.


Acknowledgments: We thank K. May (MHRP) for performing RNA-seq for the NHP studies and N. Rahman (MHRP) for the ad hoc bioinformatics support. We acknowledge P. Dawson (EMMES Corporation) for support related to statistical analysis. The views expressed are those of the authors and should not be construed to represent the positions of the U.S. Army or the U.S. Department of Defense (DOD). Funding: This work was supported by a cooperative agreement (W81XWH-07-2-0067) between the Henry M. Jackson Foundation for the Advancement of Military Medicine Inc. and the DOD. This research was funded, in part, by the U.S. National Institute of Allergy and Infectious Diseases. D.H.B. received research funding from the NIH, DOD, Gates Foundation, amfAR, Gilead, Janssen, and Novavax. Author contributions: Author contributions are listed in alphabetical order. R.T. conceived and supervised the project. D.H.B., M.L.R., N.L.M., R.A.G., and R.T. contributed to the study rationale. A.G., B.I., C.B., P.K.E., R.A., R.T., and S.S. contributed to the data analyses and interpretation and manuscript preparation. D.L.B., G.A., M.A.E., M.C., P.K.E., R.P.S., and T.I. provided reagents and performed laboratory experiments/analyses. D.H.B., F.W., H.S., and M.G.P. provided samples and immune response data from the preclinical trials. All authors edited the manuscript. Competing interests: M.G.P., H.S., and F.W. are employees of Janssen, a pharmaceutical company of Johnson & Johnson. D.H.B. received consulting payments from Merck, IGM Biosciences, and SQZ Biotech. D.H.B. is a coinventor of HIV vaccine patents PCT/US2009/064999 “Antiviral vaccines with improved cellular immunogenicity” and PCT/US2009/060494 “Biochemically stabilized HIV-1 env trimer vaccine.” M.G.P. is a coinventor on patents PCT/US2017/049817 and PCT/US2018/051274 “Methods for inducing an immune response against human immunodeficiency virus infection in subjects undergoing antiretroviral treatment.” H.S. is a coinventor on PCT/US2017/049817. M.G.P. and H.S. are coauthors on patents PCT/US2018/043016 “Methods for safe induction of cross-clade immunity against human immunodeficiency virus infection in human” and PCT/US2015/051891 “Methods and compositions for inducing protective immunity against human immunodeficiency virus infection.” Data and materials availability: All data associated with this study are present in the paper or the Supplementary Materials. NGS sequencing data have been deposited in the European Nucleotide Archive repository ( under study accession number PRJEB31297.

Stay Connected to Science Translational Medicine

Navigate This Article