The Stanley Center’s genetics program has four key objectives:

1) Find the genes that drive neuropsychiatric illnesses
2) Determine the phenotypic structure of neuropsychiatric illnesses and the intermediate phenotypes related to each disorder
3) Place identified genes into pathways and networks that govern pathogenesis
4) Leverage genetic variation to identify biological assays that will be effective models for disease

1) Find the genes that drive neuropsychiatric illnesses

Sample collection and data generation

The continued success of the genetics program starts with sample availability. The genotyping and sequencing efforts of the past decade have demonstrated that psychiatric illnesses are highly polygenic, with thousands of risk variants of modest effects scattered across the genome. Such a genetic architecture necessitates genetic data generation and analysis at scale.

For genetic analysis of psychiatric disorders, the genetics program pursues three primary data generation activities: array-based genotyping, whole exome sequencing (WES) and whole genome sequencing (WGS). Each approach has upsides and downsides, with the primary tradeoff between the depth of characterization of each genome and the sample size achievable (given cost).

These efforts have succeeded in identifying hundreds of genome-wide significant loci across schizophrenia, bipolar disorder, ADHD, autism spectrum disorder (ASD) and other brain disorders, as well as in identifying more than 30 genes that drive ASD, primarily with co-morbid intellectual disability.

Computational methods

While the scale of sample collection has been key, the aggregation of vast numbers of samples does not in itself guarantee success — and with growing study sizes come new challenges. To meet these challenges, the genetics team is continuing to develop more useful and powerful computational tools to handle the unprecedented scale of the genomic data being generated, and remains committed to sharing these open-source programs with the broader community. For example, Hail, an open-source, scalable framework for exploring and analyzing genetic data, is available online, and Genome STRucture in Populations (Genome STRiP), for analysis of genome structural variation, has been downloaded more than 2,000 times. Furthermore, the team is invested in creating novel statistical methods that allow researchers to parse the uncovered associations to gain insight into how and where they are acting to contribute to disease risk.

2) Determine the phenotypic structure of psychiatric disorders and identify the intermediate phenotypes related to each disorder

Genetic analyses of common and rare variation have already revealed a number of key insights into the structure of psychiatric disorders. Across major psychiatric disorders, there is widespread sharing of common variant risk factors (for instance, between bipolar disorder and schizophrenia), suggesting that the disease constructs do not reflect unitary pathogenic processes.

Within ASDs, for example, it has been clearly established that strong acting de novo variants are more likely to be seen in cases with co-occurring neurodevelopmental problems, such as intellectual disabilities or epilepsy. Indeed, ASD-associated de novo mutations that yield protein truncations are far more commonly observed in global developmental delay than in autism itself. Recent work, however, has shown that the common variant influences on ASD risk have different phenotypic affinities. Specifically, common polygenic risk for ASDs is positively associated with intelligence and educational attainment. These findings suggest that different genetic risk factors for a diagnostic outcome like ASD may create that risk through different pathways and processes.

Genotype to phenotype analyses can provide insight into the genetic heterogeneity of a disorder, as well as the specific behavioral and cognitive outcomes most associated with the different flavors of genetic risk for neuropsychiatric diseases.

3) Place identified genes into pathways and networks that govern pathogenesis

The identification of genetic risk factors is, in many ways, simply laying a foundation for novel insights into the biological process driving disease. The Stanley Center genetics team perceives two strategies for advancing from genetic findings to the pathways and networks that drive disease: gene-driven experimental assessment and genome-wide heritability-based analyses.

For gene-driven experimental assessment, the evolution of the C4 story is particularly instructive. The genetic case for C4 involvement was built primarily on the dose-response evidence from the genetics — the observation that, across a series of C4 alleles and haplotypes, risk rises with increasing expression of C4A. Such a dose-response risk curve is extremely unlikely to be spurious, and motivated the follow-up biological characterization of C4.

For most genome-wide significant loci nominated by analyses of common variants, the unequivocal identification of individual genes is much more the exception rather than the rule, as many of these loci do not lie within or near protein coding regions. In contrast, WES studies, which provide in-depth exploration of these coding regions specifically, are continuing to expand and increase the power to discover individual genes unequivocally associated with disease. The responsibility of the geneticists is to make such genes known as rapidly as possible. To facilitate this, the team is building a WES results browser to provide global access to WES results.

Genome-wide heritability analyses provide an additional way to nominate pathways, networks, and cell types that are relevant to disease by identifying how much of the phenotypic variation is explained by a particular class or annotation of genetic variants. As the field’s knowledge of which genes are expressed in different cell types and which genes interact with each other continues to improve, so too will the discriminatory power of these kinds of analyses to nominate specific cell types and biological networks and pathways for follow-up. Similarly, by continuing to increase the sample size for the genetics, the resolution of the genetic mapping will continue to improve.

4) Leverage genetic variation to develop and identify biological assays that are relevant and effective models for disease processes

An emerging opportunity for genetic analysis is to leverage genetic variation to help identify the experimental assays that are most relevant and useful for interrogating pathogenic processes. With the development of human pluripotent stem cell differentiation technologies, it is now possible to create in vitro systems that can be used to model pathogenesis — but there are real challenges in identifying what models are relevant and what assays are sufficiently precise and quantitative to be useful.

The team perceives two potential approaches to advancing the understanding of the biological basis of pathogenic processes: deep investigation of high-penetrance mutations in model systems and development of population-scale assays that enable the simultaneous analysis of cells from large numbers of donors with diverse genomes. While the former model has been used in published studies, there are also potentially great undeveloped strengths in the latter model — in large part because of the highly distributed and polygenic nature of genetic results to date.

Genetics can play a key role in validating assays by evaluating the extent to which measured phenotypes are heritable and have genetic correlations to the clinical phenotypes themselves. The team hopes to help develop experimental systems in which researchers can learn simultaneously from the entire spectrum of genetic effects — common and rare, large and small — that human genome variation generates.

Principal investigators

Mark Daly

Karestan Koenen

Steve McCarroll

Benjamin Neale

Aarno Palotie

Elise Robinson