Inference of CRISPR Edits from Sanger Trace Data, bioRxiv, 2018-01-21
AbstractEfficient precision genome editing requires a quick, quantitative, and inexpensive assay of editing outcomes. Here we present ICE (Inference of CRISPR Edits), which enables robust analysis of CRISPR edits using Sanger data. ICE proposes potential outcomes for editing with guide RNAs (gRNAs) and then determines which are supported by the data via regression. Additionally, we develop a score called ICE-D (Discordance) that can provide information on large or unexpected edits. We empirically confirm through over 1,800 edits that the ICE algorithm is robust, reproducible, and can analyze CRISPR experiments within days after transfection. We also confirm that ICE strongly correlates with next-generation sequencing of amplicons (Amp-Seq). The ICE tool is free to use and offers several improvements over current analysis tools. For instance, ICE can analyze individual experiments as well as multiple experiments simultaneously (batch analysis). ICE can also detect a wider variety of outcomes, including multi-guide edits (multiple gRNAs per target) and edits resulting from homology-directed repair (HDR), such as knock-ins and base edits. ICE is a reliable analysis tool that can significantly expedite CRISPR editing workflows. It is available online at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpice.synthego.com>ice.synthego.com<jatsext-link>, and the source code is at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpgithub.comsynthego-openice>github.comsynthego-openice<jatsext-link>
biorxiv bioinformatics 0-100-users 2018Genomic risk prediction of coronary artery disease in nearly 500,000 adults implications for early screening and primary prevention, bioRxiv, 2018-01-20
AbstractBackgroundCoronary artery disease (CAD) has substantial heritability and a polygenic architecture; however, genomic risk scores have not yet leveraged the totality of genetic information available nor been externally tested at population-scale to show potential utility in primary prevention.MethodsUsing a meta-analytic approach to combine large-scale genome-wide and targeted genetic association data, we developed a new genomic risk score for CAD (metaGRS), consisting of 1.7 million genetic variants. We externally tested metaGRS, individually and in combination with available conventional risk factors, in 22,242 CAD cases and 460,387 non-cases from UK Biobank.FindingsIn UK Biobank, a standard deviation increase in metaGRS had a hazard ratio (HR) of 1.71 (95% CI 1.68–1.73) for CAD, greater than any other externally tested genetic risk score. Individuals in the top 20% of the metaGRS distribution had a HR of 4.17 (95% CI 3.97–4.38) compared with those in the bottom 20%. The metaGRS had higher C-index (C=0.623, 95% CI 0.615–0.631) for incident CAD than any of four conventional factors (smoking, diabetes, hypertension, and body mass index), and addition of the metaGRS to a model of conventional risk factors increased C-index by 3.7%. In individuals on lipid-lowering or anti-hypertensive medications at recruitment, metaGRS hazard for incident CAD was significantly but only partially attenuated with HR of 2.83 (95% CI 2.61– 3.07) between the top and bottom 20% of the metaGRS distribution.InterpretationRecent genetic association studies have yielded enough information to meaningfully stratify individuals using the metaGRS for CAD risk in both early and later life, thus enabling targeted primary intervention in combination with conventional risk factors. The metaGRS effect was partially attenuated by lipid and blood pressure-lowering medication, however other prevention strategies will be required to fully benefit from earlier genomic risk stratification.FundingNational Health and Medical Research Council of Australia, British Heart Foundation, Australian Heart Foundation.
biorxiv genetics 100-200-users 2018FAIRsharing, a cohesive community approach to the growth in standards, repositories and policies, bioRxiv, 2018-01-18
AbstractIn this modern, data-driven age, governments, funders and publishers expect greater transparency and reuse of research data, as well as greater access to and preservation of the data that supports research findings. Community-developed standards, such as those for the identification1 and reporting2 of data, underpin reproducible and reusable research, aid scholarly publishing, and drive both the discovery and evolution of scientific practice. The number of these standardization efforts, driven by large organizations or at the grass root level, has been on the rise since the early 2000s. Thousands of community-developed standards are available (across all disciplines), many of which have been created andor implemented by several thousand data repositories. Nevertheless, their uptake by the research community, however, has been slow and uneven. This is mainly because investigators lack incentives to follow and adopt standards. The situation is exacerbated if standards are not promptly implemented by databases, repositories and other research tools, or endorsed by infrastructures. Furthermore, the fragmentation of community efforts results in the development of arbitrarily different, incompatible standards. In turn, this leads to standards becoming rapidly obsolete in fast-evolving research areas.As with any other digital object, standards, databases and repositories are dynamic in nature, with a ‘life cycle’ that encompasses formulation, development and maintenance; their status in this cycle may vary depending on the level of activity of the developing group or community. There is an urgent need for a service that enhances the information available on the evolving constellation of heterogeneous standards, databases and repositories, guides users in the selection of these resources, and that works with developers and maintainers of these resources to foster collaboration and promote harmonization. Such an informative and educational service is vital to reduce the knowledge gap among those involved in producing, managing, serving, curating, preserving, publishing or regulating data. A diverse set of stakeholders-representing academia, industry, funding agencies, standards organizations, infrastructure providers and scholarly publishers— both national and domain-specific as well global and general organizations— have come together as a community, representing the core adopters, advisory board members, andor key collaborators of the FAIRsharing resource. Here, we introduce its mission and community network. We present an evaluation of the standards landscape, focusing on those for reporting data and metadata - the most diverse and numerous of the standards - and their implementation by databases and repositories. We report on the ongoing challenge to recommend resources, and we discuss the importance of making standards invisible to the end users. We report on the ongoing challenge to recommend resources, and we discuss the importance of making standards invisible to the end users. We present guidelines that highlight the role each stakeholder group must play to maximize the visibility and adoption of standards, databases and repositories.
biorxiv scientific-communication-and-education 100-200-users 2018Homology Directed Repair by Cas9Donor Co-localization in Mammalian Cells, bioRxiv, 2018-01-17
AbstractHomology directed repair (HDR) induced by site specific DNA double strand breaks (DSB) with CRISPRCas9 is a precision gene editing approach that occurs at low frequency in comparison to indel forming non homologous end joining (NHEJ). In order to obtain high HDR percentages in mammalian cells, we engineered Cas9 protein fused to a high-affinity monoavidin domain to deliver biotinylated donor DNA to a DSB site. In addition, we used the cationic polymer, polyethylenimine, to deliver Cas9 RNP-donor DNA complex into the cell. Combining these strategies improved HDR percentages of up to 90% in three tested loci (CXCR4, EMX1, and TLR) in standard HEK293 cells. Our approach offers a cost effective, simple and broadly applicable gene editing method, thereby expanding the CRISPRCas9 genome editing toolbox.SummaryPrecision gene editing occurs at a low percentage in mammalian cells using Cas9. Colocalization of donor with Cas9MAV and PEI delivery raises HDR occurrence.
biorxiv biochemistry 0-100-users 2018Resistance gene discovery and cloning by sequence capture and association genetics, bioRxiv, 2018-01-16
Genetic resistance is the most economic and environmentally sustainable approach for crop disease protection. Disease resistance (R) genes from wild relatives are a valuable resource for breeding resistant crops. However, introgression of R genes into crops is a lengthy process often associated with co-integration of deleterious linked genes1, 2 and pathogens can rapidly evolve to overcome R genes when deployed singly3. Introducing multiple cloned R genes into crops as a stack would avoid linkage drag and delay emergence of resistance-breaking pathogen races4. However, current R gene cloning methods require segregating or mutant progenies5–10, which are difficult to generate for many wild relatives due to poor agronomic traits. We exploited natural pan-genome variation in a wild diploid wheat by combining association genetics with R gene enrichment sequencing (AgRenSeq) to clone four stem rust resistance genes in <6 months. RenSeq combined with diversity panels is therefore a major advance in isolating R genes for engineering broad-spectrum resistance in crops.
biorxiv genomics 100-200-users 2018Stability of association between Arabidopsis thaliana and Pseudomonas pathogens over evolutionary time scales, bioRxiv, 2018-01-16
SummaryCrop disease outbreaks are often associated with clonal expansions of single pathogenic lineages. To determine whether similar boom-and-bust scenarios hold for wild plant pathogens, we carried out a multi-year multi-site survey of Pseudomonas in the natural host Arabidopsis thaliana. The most common Pseudomonas lineage corresponded to a pathogenic clade present in all sites. Sequencing of 1,524 Pseudomonas genomes revealed this lineage to have diversified approximately 300,000 years ago, containing dozens of genetically distinct pathogenic sublineages. These sublineages have expanded in parallel within the same populations and are differentiated both at the level of gene content and disease phenotype. Such coexistence of diverse sublineages indicates that in contrast to crop systems, no single strain has been able to overtake these A. thaliana populations in the recent past. Our results suggest that the selective pressures acting on a plant pathogen in wild hosts may be more complex than those in agricultural systems.
biorxiv microbiology 0-100-users 2018