Efficient long single molecule sequencing for cost effective and accurate sequencing, haplotyping, and de novo assembly, bioRxiv, 2018-05-17

Obtaining accurate sequences from long DNA molecules is very important for genome assembly and other applications. Here we describe single tube long fragment read (stLFR), a technology that enables this a low cost. It is based on adding the same barcode sequence to sub-fragments of the original long DNA molecule (DNA co-barcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process up to 3.6 billion unique barcode sequences were generated on beads, enabling practically non-redundant co-barcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique co-barcoding of over 8 million 20-300 kb genomic DNA fragments. Analysis of the genome of the human genome NA12878 with stLFR demonstrated high quality variant calling and phasing into contigs up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries and their construction did not significantly add to the time or cost of whole genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.

biorxiv genomics 0-100-users 2018

Whole-genome deep learning analysis reveals causal role of noncoding mutations in autism, bioRxiv, 2018-05-11

AbstractWe address the challenge of detecting the contribution of noncoding mutations to disease with a deep-learning-based framework that predicts specific regulatory effects and deleterious disease impact of genetic variants. Applying this framework to 1,790 Autism Spectrum Disorder (ASD) simplex families reveals autism disease causality of noncoding mutations by demonstrating that ASD probands harbor transcriptional (TRDs) and post-transcriptional (RRDs) regulation-disrupting mutations of significantly higher functional impact than unaffected siblings. Importantly, we detect this significant noncoding contribution at each level, transcriptional and post-transcriptional, independently and after multiple hypothesis correction. Further analysis suggests involvement of noncoding mutations in synaptic transmission and neuronal development, and reveals a convergent genetic landscape of coding and noncoding (TRD and RRD) de novo mutations in ASD. We demonstrate that sequences carrying prioritized proband de novo mutations possess transcriptional regulatory activity and drive expression differentially, and highlight a link between noncoding mutations and IQ heterogeneity in ASD probands. Our predictive genomics framework illuminates the role of noncoding mutations in ASD, prioritizes high impact transcriptional and post-transcriptional regulatory mutations for further study, and is broadly applicable to complex human diseases.

biorxiv genomics 100-200-users 2018

crisprQTL mapping as a genome-wide association framework for cellular genetic screens, bioRxiv, 2018-05-04

AbstractExpression quantitative trait locus (eQTL) and genome-wide association studies (GWAS) are powerful paradigms for mapping the determinants of gene expression and organismal phenotypes, respectively. However, eQTL mapping and GWAS are limited in scope (to naturally occurring, common genetic variants) and resolution (by linkage disequilibrium). Here, we present crisprQTL mapping, a framework in which large numbers of CRISPRCas9 perturbations are introduced to each cell on an isogenic background, followed by single-cell RNA-seq (scRNA-seq). crisprQTL mapping is analogous to conventional human eQTL studies, but with individual humans replaced by individual cells; genetic variants replaced by unique combinations of ‘unlinked’ guide RNA (gRNA)-programmed perturbations per cell; and tissue-level RNA-seq of many individuals replaced by scRNA-seq of many cells. By randomly introducing gRNAs, a single population of cells can be leveraged to test for association between each perturbation and the expression of any potential target gene, analogous to how eQTL studies leverage populations of humans to test millions of genetic variants for associations with expression in a genome-wide manner. However, crisprQTL mapping is neither limited to naturally occurring, common genetic variants nor by linkage disequilibrium. As a proof-of-concept, we applied crisprQTL mapping to evaluate 1,119 candidate enhancers with no strong a priori hypothesis as to their target gene(s). Perturbations were made by a nuclease-dead Cas9 (dCas9) tethered to KRAB, and introduced at a mean ‘allele frequency’ of 1.1% into a population of 47,650 profiled human K562 cells (median of 15 gRNAs identified per cell). We tested for differential expression of all genes within 1 megabase (Mb) of each candidate enhancer, effectively evaluating 17,584 potential enhancer-target gene relationships within a single experiment. At an empirical false discovery rate (FDR) of 10%, we identify 128 cis crisprQTLs (11%) whose targeting resulted in downregulation of 105 nearby genes. crisprQTLs were strongly enriched for proximity to their target genes (median 34.3 kilobases (Kb)) and the strength of H3K27ac, p300, and lineage-specific transcription factor (TF) ChIP-seq peaks. Our results establish the power of the eQTL mapping paradigm as applied to programmed variation in populations of cells, rather than natural variation in populations of individuals. We anticipate that crisprQTL mapping will facilitate the comprehensive elucidation of the cis-regulatory architecture of the human genome.

biorxiv genomics 200-500-users 2018

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo