Insights into human genetic variation and population history from 929 diverse genomes, bioRxiv, 2019-06-28
AbstractGenome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing. Analyses of these genomes reveal an excess of previously undocumented private genetic variation in southern and central Africa and in Oceania and the Americas, but an absence of fixed, private variants between major geographical regions. We also find deep and gradual population separations within Africa, contrasting population size histories between hunter-gatherer and agriculturalist groups in the last 10,000 years, a potentially major population growth episode after the peopling of the Americas, and a contrast between single Neanderthal but multiple Denisovan source populations contributing to present-day human populations. We also demonstrate benefits to the study of population relationships of genome sequences over ascertained array genotypes. These genome sequences are freely available as a resource with no access or analysis restrictions.
biorxiv genomics 200-500-users 2019MitoFinder efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics, bioRxiv, 2019-06-28
AbstractThanks to the development of high-throughput sequencing technologies, target enrichment sequencing of nuclear ultraconserved DNA elements (UCEs) now allows routinely inferring phylogenetic relationships from thousands of genomic markers. Recently, it has been shown that mitochondrial DNA (mtDNA) is frequently sequenced alongside the targeted loci in such capture experiments. Despite its broad evolutionary interest, mtDNA is rarely assembled and used in conjunction with nuclear markers in capture-based studies. Here, we developed MitoFinder, a user-friendly bioinformatic pipeline, to efficiently assemble and annotate mitogenomic data from hundreds of UCE libraries. As a case study, we used ants (Formicidae) for which 501 UCE libraries have been sequenced whereas only 29 mitogenomes are available. We compared the efficiency of four different assemblers (IDBA-UD, MEGAHIT, MetaSPAdes, and Trinity) for assembling both UCE and mtDNA loci. Using MitoFinder, we show that metagenomic assemblers, in particular MetaSPAdes, are well suited to assemble both UCEs and mtDNA. Mitogenomic signal was successfully extracted from all 501 UCE libraries allowing confirming species identification using COI barcoding. Moreover, our automated procedure retrieved 296 cases in which the mitochondrial genome was assembled in a single contig, thus increasing the number of available ant mitogenomes by an order of magnitude. By leveraging the power of metagenomic assemblers, MitoFinder provides an efficient tool to extract complementary mitogenomic data from UCE libraries, allowing testing for potential mito-nuclear discordance. Our approach is potentially applicable to other sequence capture methods, transcriptomic data, and whole genome shotgun sequencing in diverse taxa.
biorxiv evolutionary-biology 0-100-users 2019Octopi Open configurable high-throughput imaging platform for infectious disease diagnosis in the field, bioRxiv, 2019-06-28
AbstractAccess to quantitative, robust, yet affordable diagnostic tools is necessary to reduce global infectious disease burden. Manual microscopy has served as a bedrock for diagnostics with wide adaptability, although at a cost of tedious labor and human errors. Automated robotic microscopes are poised to enable a new era of smart field microscopy but current platforms remain cost prohibitive and largely inflexible, especially for resource poor and field settings. Here we present Octopi, a low-cost ($250-$500) and reconfigurable autonomous microscopy platform capable of automated slide scanning and correlated bright-field and fluorescence imaging. Being highly modular, it also provides a framework for new disease-specific modules to be developed. We demonstrate the power of the platform by applying it to automated detection of malaria parasites in blood smears. Specifically, we discovered a spectral shift on the order of 10 nm for DAPI-stained Plasmodium falciparum malaria parasites. This shift allowed us to detect the parasites with a low magnification (equivalent to 10x) large field of view (2.56 mm2) module. Combined with automated slide scanning, real time computer vision and machine learning-based classification, Octopi is able to screen more than 1.5 million red blood cells per minute for parasitemia quantification, with estimated diagnostic sensitivity and specificity exceeding 90% at parasitemia of 50ul and 100% for parasitemia higher than 150l. With different modules, we further showed imaging of tissue slice and sputum sample on the platform. With roughly two orders of magnitude in cost reduction, Octopi opens up the possibility of a large robotic microscope network for improved disease diagnosis while providing an avenue for collective efforts for development of modular instruments.One sentence summaryWe developed a low-cost ($250-$500) automated imaging platform that can quantify malaria parasitemia by scanning 1.5 million red blood cells per minute.
biorxiv bioengineering 500+-users 2019RADICL-seq identifies general and cell type-specific principles of genome-wide RNA-chromatin interactions, bioRxiv, 2019-06-28
AbstractMammalian genomes encode tens of thousands of noncoding RNAs. Most noncoding transcripts exhibit nuclear localization and several have been shown to play a role in the regulation of gene expression and chromatin remodelling. To investigate the function of such RNAs, methods to massively map the genomic interacting sites of multiple transcripts have been developed. However, they still present some limitations. Here, we introduce RNA And DNA Interacting Complexes Ligated and sequenced (RADICL-seq), a technology that maps genome-wide RNA-chromatin interactions in intact nuclei. RADICL-seq is a proximity ligation-based methodology that reduces the bias for nascent transcription, while increasing genomic coverage and unique mapping rate efficiency compared to existing methods. RADICL-seq identifies distinct patterns of genome occupancy for different classes of transcripts as well as cell type-specific RNA-chromatin interactions, and emphasizes the role of transcription in the establishment of chromatin structure.
biorxiv genomics 0-100-users 2019Single-cell genomic atlas of great ape cerebral organoids uncovers human-specific features of brain development, bioRxiv, 2019-06-28
ABSTRACTThe human brain has changed dramatically since humans diverged from our closest living relatives, chimpanzees and the other great apes1–5. However, the genetic and developmental programs underlying this divergence are not fully understood6–8. Here, we have analyzed stem cell-derived cerebral organoids using single-cell transcriptomics (scRNA-seq) and accessible chromatin profiling (scATAC-seq) to explore gene regulatory changes that are specific to humans. We first analyze cell composition and reconstruct differentiation trajectories over the entire course of human cerebral organoid development from pluripotency, through neuroectoderm and neuroepithelial stages, followed by divergence into neuronal fates within the dorsal and ventral forebrain, midbrain and hindbrain regions. We find that brain region composition varies in organoids from different iPSC lines, yet regional gene expression patterns are largely reproducible across individuals. We then analyze chimpanzee and macaque cerebral organoids and find that human neuronal development proceeds at a delayed pace relative to the other two primates. Through pseudotemporal alignment of differentiation paths, we identify human-specific gene expression resolved to distinct cell states along progenitor to neuron lineages in the cortex. We find that chromatin accessibility is dynamic during cortex development, and identify instances of accessibility divergence between human and chimpanzee that correlate with human-specific gene expression and genetic change. Finally, we map human-specific expression in adult prefrontal cortex using single-nucleus RNA-seq and find developmental differences that persist into adulthood, as well as cell state-specific changes that occur exclusively in the adult brain. Our data provide a temporal cell atlas of great ape forebrain development, and illuminate dynamic gene regulatory features that are unique to humans.
biorxiv developmental-biology 0-100-users 2019The mutational footprints of cancer therapies, bioRxiv, 2019-06-28
Some cancer therapies damage DNA and cause mutations both in cancer and healthy cells of the patient1. These therapy-induced mutations may underlie some of the long-term and late side effects of the treatment, such as mental disabilities, organ toxicities and secondary neoplasms. Currently we ignore the mutation pattern and burden caused by different cancer treatments. Here we identify mutational signatures, or footprints of six widely-used anti-cancer therapies with the study of whole-genomes from more than 3500 metastatic tumors originated in different organs. These include previously known and new mutational signatures generated by platinum-based drugs, and a novel signature of treatment with nucleoside metabolic inhibitors. Exploiting these mutational footprints, we estimate the contribution of different treatments to the mutation burden of tumors and their risk of causing coding and likely driver mutations in the genome. In summary, the mutational footprints identified here open a window to precisely appraise the mutational risk of different cancer therapies to understand their late side effects.
biorxiv genomics 100-200-users 2019