Prevalence, phenotype and architecture of developmental disorders caused by de novo mutation The Deciphering Developmental Disorders Study, bioRxiv, 2016-04-21
AbstractIndividuals with severe, undiagnosed developmental disorders (DDs) are enriched for damaging de novo mutations (DNMs) in developmentally important genes. We exome sequenced 4,293 families with individuals with DDs, and meta-analysed these data with published data on 3,287 individuals with similar disorders. We show that the most significant factors influencing the diagnostic yield of de novo mutations are the sex of the affected individual, the relatedness of their parents and the age of both father and mother. We identified 94 genes enriched for damaging de novo mutation at genome-wide significance (P < 7 × 10−7), including 14 genes for which compelling data for causation was previously lacking. We have characterised the phenotypic diversity among these genetic disorders. We demonstrate that, at current cost differentials, exome sequencing has much greater power than genome sequencing for novel gene discovery in genetically heterogeneous disorders. We estimate that 42% of our cohort carry pathogenic DNMs (single nucleotide variants and indels) in coding sequences, with approximately half operating by a loss-of-function mechanism, and the remainder resulting in altered-function (e.g. activating, dominant negative). We established that most haplo insufficient developmental disorders have already been identified, but that many altered-function disorders remain to be discovered. Extrapolating from the DDD cohort to the general population, we estimate that developmental disorders caused by DNMs have an average birth prevalence of 1 in 213 to 1 in 448 (0.22-0.47% of live births), depending on parental age.Abbreviations<jatsdef-list><jatsdef-item>PTVProtein-Truncating Variant<jatsdef-item><jatsdef-item>DNMDe Novo Mutation<jatsdef-item><jatsdef-item>DDDevelopmental Disorder<jatsdef-item><jatsdef-item>DDDDeciphering Developmental Disorders study<jatsdef-item><jatsdef-list>
biorxiv genetics 200-500-users 2016The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications, bioRxiv, 2016-04-21
ABSTRACTInaccurate data in scientific papers can result from honest error or intentional falsification. This study attempted to determine the percentage of published papers containing inappropriate image duplication, a specific type of inaccurate data. The images from a total of 20,621 papers in 40 scientific journals from 1995-2014 were visually screened. Overall, 3.8% of published papers contained problematic figures, with at least half exhibiting features suggestive of deliberate manipulation. The prevalence of papers with problematic images rose markedly during the past decade. Additional papers written by authors of papers with problematic images had an increased likelihood of containing problematic images as well. As this analysis focused only on one type of data, it is likely that the actual prevalence of inaccurate data in the published literature is higher. The marked variation in the frequency of problematic images among journals suggest that journal practices, such as pre-publication image screening, influence the quality of the scientific literature.
biorxiv scientific-communication-and-education 200-500-users 2016Impact of knowledge accumulation on pathway enrichment analysis, bioRxiv, 2016-04-20
Pathway-based interpretation of gene lists is a staple of genome analysis. It depends on frequently updated gene annotation databases. We analyzed the evolution of gene annotations over the past seven years and found that the vocabulary of pathways and processes has doubled. This strongly impacts practical analysis of genes 80% of publications we surveyed in 2015 used outdated software that only captured 20% of pathway enrichments apparent in current annotations.
biorxiv bioinformatics 200-500-users 2016High-resolution interrogation of functional elements in the noncoding genome, bioRxiv, 2016-04-19
The noncoding genome plays a major role in gene regulation and disease yet we lack tools for rapid identification and manipulation of noncoding elements. Here, we develop a large-scale CRISPR screen employing ~18,000 sgRNAs targeting >700 kb of noncoding sequence in an unbiased manner surrounding three genes (NF1, NF2, and CUL3) involved in resistance to the BRAF inhibitor vemurafenib in the BRAF-mutant melanoma cell line A375. We identify specific noncoding locations near genes that modulate drug resistance when mutated. These sites have predictive hallmarks of noncoding function, such as physical interaction with gene promoters, evolutionary conservation and tissue-specific chromatin accessibility. At a subset of identified elements at the CUL3 locus, we show that engineered mutations lead to a loss of gene expression associated with changes in transcription factor occupancy and in long-range and local epigenetic environments, implicating these sites in gene regulation and chemotherapeutic resistance. This demonstration of an unbiased mutagenesis screen across large noncoding regions expands the potential of pooled CRISPR screens for fundamental genomic discovery and for elucidating biologically relevant mechanisms of gene regulation.
biorxiv molecular-biology 100-200-users 2016Analysis of Shared Heritability in Common Disorders of the Brain, bioRxiv, 2016-04-17
AbstractDisorders of the brain exhibit considerable epidemiological comorbidity and frequently share symptoms, provoking debate about the extent of their etiologic overlap. We quantified the genetic sharing of 25 brain disorders based on summary statistics from genome-wide association studies of 215,683 patients and 657,164 controls, and their relationship to 17 phenotypes from 1,191,588 individuals. Psychiatric disorders show substantial sharing of common variant risk, while neurological disorders appear more distinct from one another. We observe limited evidence of sharing between neurological and psychiatric disorders, but do identify robust sharing between disorders and several cognitive measures, as well as disorders and personality types. We also performed extensive simulations to explore how power, diagnostic misclassification and phenotypic heterogeneity affect genetic correlations. These results highlight the importance of common genetic variation as a source of risk for brain disorders and the value of heritability-based methods in understanding their etiology.
biorxiv genetics 100-200-users 2016plasmidSPAdes Assembling Plasmids from Whole Genome Sequencing Data, bioRxiv, 2016-04-16
ABSTRACTMotivationPlasmids are stably maintained extra-chromosomal genetic elements that replicate independently from the host cell’s chromosomes. Although plasmids harbor biomedically important genes, (such as genes involved in virulence and antibiotics resistance), there is a shortage of specialized software tools for extracting and assembling plasmid data from whole genome sequencing projects.ResultsWe present the plasmidSPAdes algorithm and software tool for assembling plasmids from whole genome sequencing data and benchmark its performance on a diverse set of bacterial genomes.Availability and implementationPLASMIDSPADES is publicly available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpspades.bioinf.spbau.ruplasmidSPAdes>httpspades.bioinf.spbau.ruplasmidSPAdes<jatsext-link>Contactd.antipov@spbu.ru
biorxiv bioinformatics 0-100-users 2016