The Functional False Discovery Rate with Applications to Genomics, bioRxiv, 2017-12-31
AbstractThe false discovery rate measures the proportion of false discoveries among a set of hypothesis tests called significant. This quantity is typically estimated based on p-values or test statistics. In some scenarios, there is additional information available that may be used to more accurately estimate the false discovery rate. We develop a new framework for formulating and estimating false discovery rates and q-values when an additional piece of information, which we call an “informative variable”, is available. For a given test, the informative variable provides information about the prior probability a null hypothesis is true or the power of that particular test. The false discovery rate is then treated as a function of this informative variable. We consider two applications in genomics. Our first is a genetics of gene expression (eQTL) experiment in yeast where every genetic marker and gene expression trait pair are tested for associations. The informative variable in this case is the distance between each genetic marker and gene. Our second application is to detect differentially expressed genes in an RNA-seq study carried out in mice. The informative variable in this study is the per-gene read depth. The framework we develop is quite general, and it should be useful in a broad range of scientific applications.
biorxiv genomics 0-100-users 2017Improved Aedes aegypti mosquito reference genome assembly enables biological discovery and vector control, bioRxiv, 2017-12-30
Female Aedes aegypti mosquitoes infect hundreds of millions of people each year with dangerous viral pathogens including dengue, yellow fever, Zika, and chikungunya. Progress in understanding the biology of this insect, and developing tools to fight it, has been slowed by the lack of a high-quality genome assembly. Here we combine diverse genome technologies to produce AaegL5, a dramatically improved and annotated assembly, and demonstrate how it accelerates mosquito science and control. We anchored the physical and cytogenetic maps, resolved the size and composition of the elusive sex-determining “M locus”, significantly increased the known members of the glutathione-S-transferase genes important for insecticide resistance, and doubled the number of chemosensory ionotropic receptors that guide mosquitoes to human hosts and egg-laying sites. Using high-resolution QTL and population genomic analyses, we mapped new candidates for dengue vector competence and insecticide resistance. We predict that AaegL5 will catalyse new biological insights and intervention strategies to fight this deadly arboviral vector.
biorxiv genomics 200-500-users 2017Discovery and characterization of coding and non-coding driver mutations in more than 2,500 whole cancer genomes, bioRxiv, 2017-12-24
AbstractDiscovery of cancer drivers has traditionally focused on the identification of protein-coding genes. Here we present a comprehensive analysis of putative cancer driver mutations in both protein-coding and non-coding genomic regions across >2,500 whole cancer genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. We developed a statistically rigorous strategy for combining significance levels from multiple driver discovery methods and demonstrate that the integrated results overcome limitations of individual methods. We combined this strategy with careful filtering and applied it to protein-coding genes, promoters, untranslated regions (UTRs), distal enhancers and non-coding RNAs. These analyses redefine the landscape of non-coding driver mutations in cancer genomes, confirming a few previously reported elements and raising doubts about others, while identifying novel candidate elements across 27 cancer types. Novel recurrent events were found in the promoters or 5’UTRs of TP53, RFTN1, RNF34, and MTG2, in the 3’UTRs of NFKBIZ and TOB1, and in the non-coding RNA RMRP. We provide evidence that the previously reported non-coding RNAs NEAT1 and MALAT1 may be subject to a localized mutational process. Perhaps the most striking finding is the relative paucity of point mutations driving cancer in non-coding genes and regulatory elements. Though we have limited power to discover infrequent non-coding drivers in individual cohorts, combined analysis of promoters of known cancer genes show little excess of mutations beyond TERT.
biorxiv genomics 100-200-users 2017A single-cell catalogue of regulatory states in the ageing Drosophila brain, bioRxiv, 2017-12-22
SummaryThe diversity of cell types and regulatory states in the brain, and how these change during ageing, remains largely unknown. Here, we present a single-cell transcriptome catalogue of the entire adult Drosophila melanogaster brain sampled across its lifespan. Both neurons and glia age through a process of “regulatory erosion”, characterized by a strong decline of RNA content, and accompanied by increasing transcriptional and chromatin noise. We identify more than 50 cell types by specific transcription factors and their downstream gene regulatory networks. In addition to neurotransmitter types and neuroblast lineages, we find a novel neuronal cell state driven by datilografo and prospero. This state relates to neuronal birth order, the metabolic profile, and the activity of a neuron. Our single-cell brain catalogue reveals extensive regulatory heterogeneity linked to ageing and brain function and will serve as a reference for future studies of genetic variation and disease mutations.
biorxiv genomics 0-100-users 2017Cell “hashing” with barcoded antibodies enables multiplexing and doublet detection for single cell genomics, bioRxiv, 2017-12-22
ABSTRACTDespite rapid developments in single cell sequencing technology, sample-specific batch effects, detection of cell doublets, and the cost of generating massive datasets remain outstanding challenges. Here, we introduce cell “hashing”, where oligo-tagged antibodies against ubiquitously expressed surface proteins are used to uniquely label cells from distinct samples, which can be subsequently pooled. By sequencing these tags alongside the cellular transcriptome, we can assign each cell to its sample of origin, and robustly identify doublets originating from multiple samples. We demonstrate our approach by pooling eight human PBMC samples on a single run of the 10x Chromium system, substantially reducing our per-cell costs for library generation. Cell “hashing” is inspired by, and complementary to, elegant multiplexing strategies based on genetic variation, which we also leverage to validate our results. We therefore envision that our approach will help to generalize the benefits of single cell multiplexing to diverse samples and experimental designs.
biorxiv genomics 100-200-users 2017Firefly genomes illuminate parallel origins of bioluminescence in beetles, bioRxiv, 2017-12-22
AbstractFireflies and their fascinating luminous courtships have inspired centuries of scientific study. Today firefly luciferase is widely used in biotechnology, but the evolutionary origin of their bioluminescence remains unclear. To shed light on this long-standing question, we sequenced the genomes of two firefly species that diverged over 100 million-years-ago the North American Photinus pyralis and Japanese Aquatica lateralis. We also sequenced the genome of a related click-beetle, the Caribbean Ignelater luminosus, with bioluminescent biochemistry near-identical to fireflies, but anatomically unique light organs, suggesting the intriguing but contentious hypothesis of parallel gains of bioluminescence. Our analyses support two independent gains of bioluminescence between fireflies and click-beetles, and provide new insights into the genes, chemical defenses, and symbionts that evolved alongside their luminous lifestyle.One Sentence SummaryComparative analyses of the first linkage-group-resolution genomes of fireflies and related bioluminescent beetles address long-standing questions of the origin and evolution of bioluminescence and its associated traits.
biorxiv genomics 200-500-users 2017