SAVER Gene expression recovery for UMI-based single cell RNA sequencing, bioRxiv, 2017-05-17
AbstractRapid advances in massively parallel single cell RNA sequencing (scRNA-seq) is paving the way for high-resolution single cell profiling of biological samples. In most scRNA-seq studies, only a small fraction of the transcripts present in each cell are sequenced. The efficiency, that is, the proportion of transcripts in the cell that are sequenced, can be especially low in highly parallelized experiments where the number of reads allocated for each cell is small. This leads to unreliable quantification of lowly and moderately expressed genes, resulting in extremely sparse data and hindering downstream analysis. To address this challenge, we introduce SAVER (Single-cell Analysis Via Expression Recovery), an expression recovery method for scRNA-seq that borrows information across genes and cells to impute the zeros as well as to improve the expression estimates for all genes. We show, by comparison to RNA fluorescence in situ hybridization (FISH) and by data down-sampling experiments, that SAVER reliably recovers cell-specific gene expression concentrations, cross-cell gene expression distributions, and gene-to-gene and cell-to-cell correlations. This improves the power and accuracy of any downstream analysis involving genes with low to moderate expression.
biorxiv genomics 0-100-users 2017The Beaker Phenomenon and the Genomic Transformation of Northwest Europe, bioRxiv, 2017-05-12
Bell Beaker pottery spread across western and central Europe beginning around 2750 BCE before disappearing between 2200–1800 BCE. The mechanism of its expansion is a topic of long-standing debate, with support for both cultural diffusion and human migration. We present new genome-wide ancient DNA data from 170 Neolithic, Copper Age and Bronze Age Europeans, including 100 Beaker-associated individuals. In contrast to the Corded Ware Complex, which has previously been identified as arriving in central Europe following migration from the east, we observe limited genetic affinity between Iberian and central European Beaker Complex-associated individuals, and thus exclude migration as a significant mechanism of spread between these two regions. However, human migration did have an important role in the further dissemination of the Beaker Complex, which we document most clearly in Britain using data from 80 newly reported individuals dating to 3900–1200 BCE. British Neolithic farmers were genetically similar to contemporary populations in continental Europe and in particular to Neolithic Iberians, suggesting that a portion of the farmer ancestry in Britain came from the Mediterranean rather than the Danubian route of farming expansion. Beginning with the Beaker period, and continuing through the Bronze Age, all British individuals harboured high proportions of Steppe ancestry and were genetically closely related to Beaker-associated individuals from the Lower Rhine area. We use these observations to show that the spread of the Beaker Complex to Britain was mediated by migration from the continent that replaced >90% of Britain’s Neolithic gene pool within a few hundred years, continuing the process that brought Steppe ancestry into central and northern Europe 400 years earlier.
biorxiv genomics 200-500-users 2017A Next Generation Connectivity Map L1000 Platform And The First 1,000,000 Profiles, bioRxiv, 2017-05-11
SUMMARYWe previously piloted the concept of a Connectivity Map (CMap), whereby genes, drugs and disease states are connected by virtue of common gene-expression signatures. Here, we report more than a 1,000-fold scale-up of the CMap as part of the NIH LINCS Consortium, made possible by a new, low-cost, high throughput reduced representation expression profiling method that we term L1000. We show that L1000 is highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts. We further show that the expanded CMap can be used to discover mechanism of action of small molecules, functionally annotate genetic variants of disease genes, and inform clinical trials. The 1.3 million L1000 profiles described here, as well as tools for their analysis, are available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsclue.io>httpsclue.io<jatsext-link>.HIGHLIGHTS<jatslist list-type=bullet><jatslist-item>A new gene expression profiling method, L1000, dramatically lowers cost<jatslist-item><jatslist-item>The Connectivity Map database now includes 1.3 million publicly accessible L1000 perturbational profiles<jatslist-item><jatslist-item>This expanded Connectivity Map facilitates discovery of small molecule mechanism of action and functional annotation of genetic variants<jatslist-item><jatslist-item>The work establishes feasibility and utility of a truly comprehensive Connectivity Map<jatslist-item>
biorxiv genomics 0-100-users 2017The population genomics of archaeological transition in west Iberia Investigation of ancient substructure using imputation and haplotype-based methods, bioRxiv, 2017-05-11
AbstractWe analyse new genomic data (0.05-2.95x) from 14 ancient individuals from Portugal distributed from the Middle Neolithic (4200-3500 BC) to the Middle Bronze Age (1740-1430 BC) and impute genomewide diploid genotypes in these together with published ancient Eurasians. While discontinuity is evident in the transition to agriculture across the region, sensitive haplotype-based analyses suggest a significant degree of local hunter-gatherer contribution to later Iberian Neolithic populations. A more subtle genetic influx is also apparent in the Bronze Age, detectable from analyses including haplotype sharing with both ancient and modern genomes, D-statistics and Y-chromosome lineages. However, the limited nature of this introgression contrasts with the major Steppe migration turnovers within third Millennium northern Europe and echoes the survival of non-Indo-European language in Iberia. Changes in genomic estimates of individual height across Europe are also associated with these major cultural transitions, and ancestral components continue to correlate with modern differences in stature.Author SummaryRecent ancient DNA work has demonstrated the significant genetic impact of mass migrations from the Steppe into Central and Northern Europe during the transition from the Neolithic to the Bronze Age. In Iberia, archaeological change at the level of material culture and funerary rituals has been reported during this period, however, the genetic impact associated with this cultural transformation has not yet been estimated. In order to investigate this, we sequence Neolithic and Bronze Age samples from Portugal, which we compare to other ancient and present-day individuals. Genome-wide imputation of a large dataset of ancient samples enabled sensitive methods for detecting population structure and selection in ancient samples. We revealed subtle genetic differentiation between the Portuguese Neolithic and Bronze Age samples suggesting a markedly reduced influx in Iberia compared to other European regions. Furthermore, we predict individual height in ancients, suggesting that stature was reduced in the Neolithic and affected by subsequent admixtures. Lastly, we examine signatures of strong selection in important traits and the timing of their origins.
biorxiv genomics 100-200-users 2017Consequences of natural perturbations in the human plasma proteome, bioRxiv, 2017-05-06
AbstractProteins are the primary functional units of biology and the direct targets of most drugs, yet there is limited knowledge of the genetic factors determining inter-individual variation in protein levels. Here we reveal the genetic architecture of the human plasma proteome, testing 10.6 million DNA variants against levels of 2,994 proteins in 3,301 individuals. We identify 1,927 genetic associations with 1,478 proteins, a 4-fold increase on existing knowledge, including trans associations for 1,104 proteins. To understand consequences of perturbations in plasma protein levels, we introduce an approach that links naturally occurring genetic variation with biological, disease, and drug databases. We provide insights into pathogenesis by uncovering the molecular effects of disease-associated variants. We identify causal roles for protein biomarkers in disease through Mendelian randomization analysis. Our results reveal new drug targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.
biorxiv genomics 100-200-users 2017Reconstructing the Gigabase Plant Genome of Solanum pennellii using Nanopore Sequencing, bioRxiv, 2017-04-22
Recent updates in sequencing technology have made it possible to obtain Gigabases of sequence data from one single flowcell. Prior to this update, the nanopore sequencing technology was mainly used to analyze and assemble microbial samples1-3. Here, we describe the generation of a comprehensive nanopore sequencing dataset with a median fragment size of 11,979 bp for the wild tomato species Solanum pennellii featuring an estimated genome size of ca 1.0 to 1.1 Gbases. We describe its genome assembly to a contig N50 of 2.5 MB using a pipeline comprising a Canu4 pre-processing and a subsequent assembly using SMARTdenovo. We show that the obtained nanopore based de novo genome reconstruction is structurally highly similar to that of the reference S. pennellii LA7165 genome but has a high error rate caused mostly by deletions in homopolymers. After polishing the assembly with Illumina short read data we obtained an error rate of <0.02 % when assessed versus the same Illumina data. More importantly however we obtained a gene completeness of 96.53% which even slightly surpasses that of the reference S. pennellii genome5. Taken together our data indicate such long read sequencing data can be used to affordably sequence and assemble Gbase sized diploid plant genomes.Raw data is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpwww.plabipd.deportalsolanum-pennellii>httpwww.plabipd.deportalsolanum-pennellii<jatsext-link> and has been deposited as PRJEB19787.
biorxiv genomics 0-100-users 2017