The high heritabilities of psychiatric disorders mean that if specific genetic risk factors could be identified, they would provide important scientific clues to the processes that go awry in the brain and thus inform the identification of objective biological markers of illness and suggest approaches to therapeutics. Yet because psychiatric disorders result from the action of many genes, each making a small contribution to illness, together with developmental and environmental factors, extremely large numbers of DNA samples and advanced technology and computing power are needed. To succeed in finding these important clues, the Stanley Center, a founding member of the Psychiatric Genomics Consortium, has built the world’s largest collection of human DNA samples in psychiatric research — currently just over 500,000 samples — that includes patients with schizophrenia, bipolar disorder, and other psychiatric disorders, as well as healthy comparison subjects.
The Stanley Center’s genetics program has three key objectives:
1) Find the genes that drive neuropsychiatric illnesses
We will do this by increasing the diversity of genetic discovery efforts as well as aggregating as much of the world’s existing data as possible, in order to fully capture the genetic risk for neuropsychiatric illness and ensure portability of genetic risk scores across populations.
2) Characterize the full range of effects for genetic risk factors
We incorporate information regarding biochemical consequences, biomarkers, population traits, and clinical disorders in our analyses to improve the interpretation of genetic discoveries in service of biological understanding and therapeutic development.
3) Develop the methods and tools for analyzing and sharing genomic data and results
These innovations enable the biological interpretation and therapeutic translation of genetic findings across all human populations at scale.
The continued success of the genetics program starts with sample availability. The genotyping and sequencing efforts of the past decade have demonstrated that psychiatric illnesses are highly polygenic, with thousands of risk variants of modest effects scattered across the genome. Such a genetic architecture necessitates genetic data generation and analysis at scale and, crucially, with sufficient representation from all ancestries to capture the extent of genetic diversity. Greater representation from populations of diverse ancestries will identify new genes, improve the resolution of fine-mapping, and improve the quality of polygenic risk scores for patients all over the world. The Stanley Global Neuropsychiatric Genetics Initiative (Stanley Global), which aims to diversify genetic sample collection outside of the United States and Northern Europe, is described here.
Between our sample collection and our worldwide collaborations, we have identified hundreds of locations in the genome associated with schizophrenia, bipolar disorder, ADHD, and autism spectrum disorders. That, in turn, has allowed us to focus on genes that begin to illuminate the biology of these disorders, such as the C4A gene in schizophrenia (Sekar et al., 2016).
For most genome-wide significant loci nominated by analyses of common variants, the unequivocal identification of individual genes is much more the exception rather than the rule, as many of these loci do not lie within or near protein coding regions. Genome-wide heritability analyses provide an additional way to nominate pathways, networks, and cell types that are relevant to disease by identifying how much of the phenotypic variation is explained by a particular class or annotation of genetic variants. As the field’s knowledge of which genes are expressed in different cell types and which genes interact with each other continues to improve, so too will the discriminatory power of these kinds of analyses to nominate specific cell types and biological networks and pathways for follow-up. Similarly, by continuing to increase the sample size for the genetics, the resolution of the genetic mapping will continue to improve.
In parallel, whole exome sequencing studies, which provide in-depth exploration of these coding regions specifically, are continuing to expand and increase the power to discover individual genes unequivocally associated with disease. The genetics team has also made transformative progress in identifying rare damaging mutations in 10 genes that have a large effect on risk of schizophrenia with the SCHEMA (Schizophrenia Exome Meta-Analysis) consortium. The SCHEMA consortium is a global collaboration led by the Stanley Center that analyzes exome sequencing data of people with schizophrenia in the search for genes that contribute to the risk for schizophrenia. The first phase of the project analyzed 24,248 sequenced cases and 97,322 controls to discover rare coding variants in ten genes as conferring substantial risk for schizophrenia. The manuscript describing these results can be found here.
The Stanley Center genetics program also contributed to the Bipolar Exome (BipEx) collaboration analysis of whole exome sequencing of 13,933 individuals diagnosed with bipolar disorder, matched with 14,422 controls. When combining these data with schizophrenia, AKAP11 emerges as a definitive risk gene. The protein form of the gene, also known as AKAP220, is known to interact with GSK3β, the hypothesized mechanism of action for lithium, one of the few treatments for bipolar disorder.
Understanding the full range of associations for relevant genetic variants can help prioritize follow-up biological interrogation and prompt novel therapeutic hypotheses. Polygenic risk score analysis can also shed light on the range of consequences and outcomes associated with genetic risk for severe mental illness as well as potentially enable patient stratification based on genetic risk factors.
Polygenic risk score analysis
Polygenic risk scores are less effective when ported across different ancestral populations, highlighting the importance of including all major ancestral groups to ensure that findings are relevant for global populations. To ensure that polygenic risk scores are beneficial to everyone, researchers at the Stanley Center developed the first principled Bayesian PRS construction method that jointly models genome-wide association study (GWAS) summary statistics from multiple populations.
Patient stratification initiative
Across major psychiatric disorders, there is also widespread sharing of common variant risk factors (for instance, between bipolar disorder and schizophrenia), suggesting that the disease constructs do not reflect unitary pathogenic processes. Genotype to phenotype analyses can provide insight into the genetic heterogeneity of a disorder, as well as the specific behavioral and cognitive outcomes most associated with the different aspects of genetic risk for neuropsychiatric diseases. The Stanley Center Patient Stratification Initiative is an effort to more thoroughly integrate phenotypic and genetic variation among psychiatric patients in order to be able to more effectively stratify patients for further genetic, prognostic, and therapeutic investigations. The initiative recently released a two page instrument designed to capture phenotypic variation in patients with schizophrenia, schizoaffective disorder, and bipolar disorder that is likely to be relevant for genetic discovery and that is not covered by standard diagnostic instruments, such as the SCID.
Leveraging genetic variation to develop and deploy biological assays
An emerging opportunity for genetic analysis is to leverage genetic variation to help identify the experimental assays that are most relevant and useful for interrogating pathogenic processes. With the development of human pluripotent stem cell differentiation technologies, it is now possible to create in vitro systems that can be used to model pathogenesis — but there are real challenges in identifying what models are relevant and what assays are sufficiently precise and quantitative to be useful.
The team perceives two potential approaches to advancing the understanding of the biological basis of pathogenic processes: deep investigation of high-penetrance mutations in model systems and development of population-scale assays that enable the simultaneous analysis of cells from large numbers of donors with diverse genomes. While the former model has been used in published studies, there are also potentially great undeveloped strengths in the latter model — in large part because of the highly distributed and polygenic nature of genetic results to date.
As the size and diversity of genomic and biological datasets grow, so too do the opportunities to innovate in analyzing these datasets. Similarly, sharing the results of these studies is of paramount importance to advance the field. We are committed to continuing to make the results of genomics studies accessible as soon as possible.
Furthermore, the team, in collaboration with the Finucane lab, is invested in creating novel statistical methods that allow researchers to parse the uncovered associations to gain insight into how and where they are acting to contribute to disease risk.
To meet the challenges posed by growing study sizes, the genetics team is continuing to develop more useful and powerful computational tools to handle the unprecedented scale of the genomic data being generated, and remains committed to sharing these open-source programs with the broader community. Some of these tools are listed below:
Hail, an open-source, scalable framework for exploring and analyzing genetic data, is available online.
Genome STRiP is software for analyzing genome structural variation in genome sequence data.
SCHEMA is a web-based browser of exome sequencing analysis results from meta-analysis of schizophrenia studies.
Epi25k browser is an exome sequencing analysis results resource of developmental and epileptic encephalopathy, genetic generalized epilepsy, and non-acquired focal epilepsy from the Epi25k collaborative.
Autism Sequencing Consortium browser is a web-based browser of results from variant and gene-level data from the most recent Autism Sequencing Consortium exome sequencing analysis.
BipEx is a web browser delivering single variant and gene-based association results from a meta-analysis of over 14,000 cases of bipolar disorder and 14,000 controls.
UK Biobank GWAS of 4,000 phenotypes is a public release of GWAS results for over 4,000 phenotypes collected in the UK Biobank, including blog posts explaining how analysis was conducted and code shared via GitHub.
Pan-UK biobank is a multi-ancestry analysis of 7,221 phenotypes, across 6 continental ancestry groups, for a total of 16,119 GWAS.
Tractor is a statistical framework and software package used to facilitate the inclusion of admixed individuals in association studies by leveraging local ancestry. Tractor generates accurate ancestry-specific effect-size estimates and P values, can boost GWAS power, and improves the resolution of association signals.