Regional missense constraint improves variant deleteriousness prediction, bioRxiv, 2017-06-13
AbstractGiven increasing numbers of patients who are undergoing exome or genome sequencing, it is critical to establish tools and methods to interpret the impact of genetic variation. While the ability to predict deleteriousness for any given variant is limited, missense variants remain a particularly challenging class of variation to interpret, since they can have drastically different effects depending on both the precise location and specific amino acid substitution of the variant. In order to better evaluate missense variation, we leveraged the exome sequencing data of 60,706 individuals from the Exome Aggregation Consortium (ExAC) dataset to identify sub-genic regions that are depleted of missense variation. We further used this depletion as part of a novel missense deleteriousness metric named MPC. We applied MPC to de novo missense variants and identified a category of de novo missense variants with the same impact on neurodevelopmental disorders as truncating mutations in intolerant genes, supporting the value of incorporating regional missense constraint in variant interpretation.
biorxiv genomics 100-200-users 2017A single-cell anatomical blueprint for intracortical information transfer from primary visual cortex, bioRxiv, 2017-06-10
The wiring diagram of the neocortex determines how information is processed across dozens of cortical areas. Each area communicates with multiple others via extensive long-range axonal projections 1–6, but the logic of inter-area information transfer is unresolved. Specifically, the extent to which individual neurons send dedicated projections to single cortical targets or distribute their signals across multiple areas remains unclear5,7–20. Distinguishing between these possibilities has been challenging because axonal projections of only a few individual neurons have been reconstructed. Here we map the projection patterns of axonal arbors from 591 individual neurons in mouse primary visual cortex (V1) using two complementary methods whole-brain fluorescence-based axonal tracing21,22 and high-throughput DNA sequencing of genetically barcoded neurons (MAPseq)23. Although our results confirm the existence of dedicated projections to certain cortical areas, we find these are the exception, and that the majority of V1 neurons broadcast information to multiple cortical targets. Furthermore, broadcasting cells do not project to all targets randomly, but rather comprise subpopulations that either avoid or preferentially innervate specific subsets of cortical areas. Our data argue against a model of dedicated lines of intracortical information transfer via “one neuron – one target area” mapping. Instead, long-range communication between a sensory cortical area and its targets may be based on a principle whereby individual neurons copy information to, and potentially coordinate activity across, specific subsets of cortical areas.
biorxiv neuroscience 100-200-users 2017Indexcov fast coverage quality control for whole-genome sequencing, bioRxiv, 2017-06-10
AbstractThe BAM1 and CRAM2 formats provide a supplementary linear index that facilitates rapid access to sequence alignments in arbitrary genomic regions. Comparing consecutive entries in a BAM or CRAM index allows one to infer the number of alignment records per genomic region for use as an effective proxy of sequence depth in each genomic region. Based on these properties, we have developed indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httphttpsgithub.combrentpgoleft>httpsgithub.combrentpgoleft<jatsext-link> under the MIT license.
biorxiv genomics 100-200-users 2017Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum, bioRxiv, 2017-06-10
AbstractThere is a limited understanding about the impact of rare protein truncating variants across multiple phenotypes. We explore the impact of this class of variants on 13 quantitative traits and 10 diseases using whole-exome sequencing data from 100,296 individuals. Protein truncating variants in genes intolerant to this class of mutations increased risk of autism, schizophrenia, bipolar disorder, intellectual disability, ADHD. In individuals without these disorders, there was an association with shorter height, lower education, increased hospitalization and reduced age. Gene sets implicated from GWAS did not show a significant protein truncating variants-burden beyond what captured by established Mendelian genes. In conclusion, we provide the most thorough investigation to date of the impact of rare deleterious coding variants on complex traits, suggesting widespread pleiotropic risk.Main abbreviations<jatsdef-list><jatsdef-item>PTV= Protein Truncating Variants<jatsdef-item><jatsdef-item>PI= Protein Truncating Intolerant<jatsdef-item><jatsdef-item>PI-PTV= Protein Truncating Variant in genes that are Intolerant to Protein Truncating Variants<jatsdef-item><jatsdef-list>
biorxiv genetics 100-200-users 2017Detecting polygenic adaptation in admixture graphs, bioRxiv, 2017-06-06
AbstractAn open question in human evolution is the importance of polygenic adaptation adaptive changes in the mean of a multifactorial trait due to shifts in allele frequencies across many loci. In recent years, several methods have been developed to detect polygenic adaptation using loci identified in genome-wide association studies (GWAS). Though powerful, these methods suffer from limited interpretability they can detect which sets of populations have evidence for polygenic adaptation, but are unable to reveal where in the history of multiple populations these processes occurred. To address this, we created a method to detect polygenic adaptation in an admixture graph, which is a representation of the historical divergences and admixture events relating different populations through time. We developed a Markov chain Monte Carlo (MCMC) algorithm to infer branch-specific parameters reflecting the strength of selection in each branch of a graph. Additionally, we developed a set of summary statistics that are fast to compute and can indicate which branches are most likely to have experienced polygenic adaptation. We show via simulations that this method - which we call PolyGraph - has good power to detect polygenic adaptation, and applied it to human population genomic data from around the world. We also provide evidence that variants associated with several traits, including height, educational attainment, and self-reported unibrow, have been influenced by polygenic adaptation in different populations during human evolution.
biorxiv evolutionary-biology 100-200-users 2017Improving the value of public RNA-seq expression data by phenotype prediction, bioRxiv, 2017-06-04
Abstract<jatssec id=sa1>BackgroundPublicly available genomic data are a valuable resource for studying normal human variation and disease, but these data are often not well labeled or annotated. The lack of phenotype information for public genomic data severely limits their utility for addressing targeted biological questions.<jatssec id=sa2>ResultsWe develop an in silico phenotyping approach for predicting critical missing annotation directly from genomic measurements using, well-annotated genomic and phenotypic data produced by consortia like TCGA and GTEx as training data. We apply in silico phenotyping to a set of 70,000 RNA-seq samples we recently processed on a common pipeline as part of the recount2 project (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsjhubiostatistics.shinyapps.iorecount>httpsjhubiostatistics.shinyapps.iorecount<jatsext-link>). We use gene expression data to build and evaluate predictors for both biological phenotypes (sex, tissue, sample source) and experimental conditions (sequencing strategy). We demonstrate how these predictions can be used to study cross-sample properties of public genomic data, select genomic projects with specific characteristics, and perform downstream analyses using predicted phenotypes. The methods to perform phenotype prediction are available in the phenopredict R package (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comleekgroupphenopredict>httpsgithub.comleekgroupphenopredict<jatsext-link>) and the predictions for recount2 are available from the recount R package (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsbioconductor.orgpackagesreleasebiochtmlrecount.html>httpsbioconductor.orgpackagesreleasebiochtmlrecount.html<jatsext-link>)<jatssec id=sa3>ConclusionHaving leveraging massive public data sets to generate a well-phenotyped set of expression data for more than 70,000 human samples, expression data is available for use on a scale that was not previously feasible.
biorxiv bioinformatics 100-200-users 2017