On the design of CRISPR-based single cell molecular screens, bioRxiv, 2018-01-30
AbstractSeveral groups recently reported coupling CRISPRCas9 perturbations and single cell RNA-seq as a potentially powerful approach for forward genetics. Here we demonstrate that vector designs for such screens that rely on cis linkage of guides and distally located barcodes suffer from swapping of intended guide-barcode associations at rates approaching 50% due to template switching during lentivirus production, greatly reducing sensitivity. We optimize a published strategy, CROP-seq, that instead uses a Pol II transcribed copy of the sgRNA sequence itself, doubling the rate at which guides are assigned to cells to 94%. We confirm this strategy performs robustly and further explore experimental best practices for CRISPRCas9-based single cell molecular screens.
biorxiv genomics 100-200-users 2018The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under 1000, bioRxiv, 2018-01-29
Hi-C contact maps are valuable for genome assembly (Lieberman-Aiden, van Berkum et al. 2009; Burton et al. 2013; Dudchenko et al. 2017). Recently, we developed Juicebox, a system for the visual exploration of Hi-C data (Durand, Robinson et al. 2016), and 3D-DNA, an automated pipeline for using Hi-C data to assemble genomes (Dudchenko et al. 2017). Here, we introduce “Assembly Tools,” a new module for Juicebox, which provides a point-and-click interface for using Hi-C heatmaps to identify and correct errors in a genome assembly. Together, 3D-DNA and the Juicebox Assembly Tools greatly reduce the cost of accurately assembling complex eukaryotic genomes. To illustrate, we generated de novo assemblies with chromosome-length scaffolds for three mammals the wombat, Vombatus ursinus (3.3Gb), the Virginia opossum, Didelphis virginiana (3.3Gb), and the raccoon, Procyon lotor (2.5Gb). The only inputs for each assembly were Illumina reads from a short insert DNA-Seq library (300 million Illumina reads, maximum length 2x150 bases) and an in situ Hi-C library (100 million Illumina reads, maximum read length 2x150 bases), which cost <$1000.
biorxiv genomics 100-200-users 2018Integrating single-cell RNA-Seq with spatial transcriptomics in pancreatic ductal adenocarcinoma using multimodal intersection analysis, bioRxiv, 2018-01-27
To understand tissue architecture, it is necessary to understand both which cell types are present and the physical relationships among them. Single-cell RNA-Seq (scRNA-Seq) has made significant progress towards the unbiased and systematic identification of cell populations within a tissue, however, the characterization of their spatial organization within it has been more elusive. The recently introduced ‘spatial transcriptomics’ method (ST) reveals the spatial pattern of gene expression within a tissue section at a resolution of a thousand 100 µm spots across the tissue, each capturing the transcriptomes of multiple cells. Here, we present an approach for the integration of scRNA-Seq and ST data generated from the same sample, and deploy it on primary pancreatic tumors from two patients. Applying our multimodal intersection analysis (MIA), we annotated the distinct micro-environment of each cell type identified by scRNA-Seq. We further found that subpopulations of ductal cells, macrophages, dendritic cells, and cancer cells have spatially restricted localizations across the tissue, as well as distinct co-enrichments with other cell types. Our mapping approach provides an efficient framework for the integration of the scRNA-Seq-defined subpopulation structure and the ST-defined tissue architecture in any tissue.
biorxiv cancer-biology 100-200-users 2018Neural spiking for causal inference, bioRxiv, 2018-01-26
AbstractWhen a neuron is driven beyond its threshold it spikes, and the fact that it does not communicate its continuous membrane potential is usually seen as a computational liability. Here we show that this spiking mechanism allows neurons to produce an unbiased estimate of their causal influence, and a way of approximating gradient descent learning. Importantly, neither activity of upstream neurons, which act as confounders, nor downstream non-linearities bias the results. By introducing a local discontinuity with respect to their input drive, we show how spiking enables neurons to solve causal estimation and learning problems.
biorxiv neuroscience 100-200-users 2018Genomic risk prediction of coronary artery disease in nearly 500,000 adults implications for early screening and primary prevention, bioRxiv, 2018-01-20
AbstractBackgroundCoronary artery disease (CAD) has substantial heritability and a polygenic architecture; however, genomic risk scores have not yet leveraged the totality of genetic information available nor been externally tested at population-scale to show potential utility in primary prevention.MethodsUsing a meta-analytic approach to combine large-scale genome-wide and targeted genetic association data, we developed a new genomic risk score for CAD (metaGRS), consisting of 1.7 million genetic variants. We externally tested metaGRS, individually and in combination with available conventional risk factors, in 22,242 CAD cases and 460,387 non-cases from UK Biobank.FindingsIn UK Biobank, a standard deviation increase in metaGRS had a hazard ratio (HR) of 1.71 (95% CI 1.68–1.73) for CAD, greater than any other externally tested genetic risk score. Individuals in the top 20% of the metaGRS distribution had a HR of 4.17 (95% CI 3.97–4.38) compared with those in the bottom 20%. The metaGRS had higher C-index (C=0.623, 95% CI 0.615–0.631) for incident CAD than any of four conventional factors (smoking, diabetes, hypertension, and body mass index), and addition of the metaGRS to a model of conventional risk factors increased C-index by 3.7%. In individuals on lipid-lowering or anti-hypertensive medications at recruitment, metaGRS hazard for incident CAD was significantly but only partially attenuated with HR of 2.83 (95% CI 2.61– 3.07) between the top and bottom 20% of the metaGRS distribution.InterpretationRecent genetic association studies have yielded enough information to meaningfully stratify individuals using the metaGRS for CAD risk in both early and later life, thus enabling targeted primary intervention in combination with conventional risk factors. The metaGRS effect was partially attenuated by lipid and blood pressure-lowering medication, however other prevention strategies will be required to fully benefit from earlier genomic risk stratification.FundingNational Health and Medical Research Council of Australia, British Heart Foundation, Australian Heart Foundation.
biorxiv genetics 100-200-users 2018FAIRsharing, a cohesive community approach to the growth in standards, repositories and policies, bioRxiv, 2018-01-18
AbstractIn this modern, data-driven age, governments, funders and publishers expect greater transparency and reuse of research data, as well as greater access to and preservation of the data that supports research findings. Community-developed standards, such as those for the identification1 and reporting2 of data, underpin reproducible and reusable research, aid scholarly publishing, and drive both the discovery and evolution of scientific practice. The number of these standardization efforts, driven by large organizations or at the grass root level, has been on the rise since the early 2000s. Thousands of community-developed standards are available (across all disciplines), many of which have been created andor implemented by several thousand data repositories. Nevertheless, their uptake by the research community, however, has been slow and uneven. This is mainly because investigators lack incentives to follow and adopt standards. The situation is exacerbated if standards are not promptly implemented by databases, repositories and other research tools, or endorsed by infrastructures. Furthermore, the fragmentation of community efforts results in the development of arbitrarily different, incompatible standards. In turn, this leads to standards becoming rapidly obsolete in fast-evolving research areas.As with any other digital object, standards, databases and repositories are dynamic in nature, with a ‘life cycle’ that encompasses formulation, development and maintenance; their status in this cycle may vary depending on the level of activity of the developing group or community. There is an urgent need for a service that enhances the information available on the evolving constellation of heterogeneous standards, databases and repositories, guides users in the selection of these resources, and that works with developers and maintainers of these resources to foster collaboration and promote harmonization. Such an informative and educational service is vital to reduce the knowledge gap among those involved in producing, managing, serving, curating, preserving, publishing or regulating data. A diverse set of stakeholders-representing academia, industry, funding agencies, standards organizations, infrastructure providers and scholarly publishers— both national and domain-specific as well global and general organizations— have come together as a community, representing the core adopters, advisory board members, andor key collaborators of the FAIRsharing resource. Here, we introduce its mission and community network. We present an evaluation of the standards landscape, focusing on those for reporting data and metadata - the most diverse and numerous of the standards - and their implementation by databases and repositories. We report on the ongoing challenge to recommend resources, and we discuss the importance of making standards invisible to the end users. We report on the ongoing challenge to recommend resources, and we discuss the importance of making standards invisible to the end users. We present guidelines that highlight the role each stakeholder group must play to maximize the visibility and adoption of standards, databases and repositories.
biorxiv scientific-communication-and-education 100-200-users 2018