Platform for rapid nanobody discovery in vitro, bioRxiv, 2017-06-17
AbstractCamelid single-domain antibody fragments (“nanobodies”) provide the remarkable specificity of antibodies within a single immunoglobulin VHH domain. This unique feature enables applications ranging from their use as biochemical tools to therapeutic agents. Virtually all nanobodies reported to date have been obtained by animal immunization, a bottleneck restricting many applications of this technology. To solve this problem, we developed a fully in vitro platform for nanobody discovery based on yeast surface display of a synthetic nanobody scaffold. This platform provides a facile and cost-effective method for rapidly isolating nanobodies targeting a diverse range of antigens. We provide a blueprint for identifying nanobodies starting from both purified and non-purified antigens, and in addition, we demonstrate application of the platform to discover rare conformationally-selective nanobodies to a lipid flippase and a G protein-coupled receptor. To facilitate broad deployment of this platform, we have made the library and all associated protocols publicly available.
biorxiv biochemistry 0-100-users 2017Identification of a novel interspecific hybrid yeast from a metagenomic spontaneously inoculated beer sample using Hi-C, bioRxiv, 2017-06-16
AbstractInterspecific hybridization is a common mechanism enabling genetic diversification and adaptation; however, the detection of hybrid species has been quite difficult. The identification of microbial hybrids is made even more complicated, as most environmental microbes are resistant to culturing and must be studied in their native mixed communities. We have previously adapted the chromosome conformation capture method Hi-C to the assembly of genomes from mixed populations. Here, we show the method’s application in assembling genomes directly from an uncultured, mixed population from a spontaneously inoculated beer sample. Our assembly method has enabled us to de-convolute 4 bacterial and 4 yeast genomes from this sample, including a putative yeast hybrid. Downstream isolation and analysis of this hybrid confirmed its genome to consist of Pichia membranifaciens and that of another related, but undescribed yeast. Our work shows that Hi-C-based metagenomic methods can overcome the limitation of traditional sequencing methods in studying complex mixtures of genomes.
biorxiv genomics 0-100-users 2017Beyond Consensus Embracing Heterogeneity in Curated Neuroimaging Meta-Analysis, bioRxiv, 2017-06-14
Coordinate-based meta-analysis can provide important insights into mind-brain relationships. A popular approach for curated small-scale meta-analysis is activation likelihood estimation (ALE), which identifies brain regions consistently activated across a selected set of experiments, such as within a functional domain or mental disorder. ALE can also be utilized in meta-analytic co-activation modeling (MACM) to identify brain regions consistently co-activated with a seed region. Therefore, ALE aims to find consensus across experiments, treating heterogeneity across experiments as noise. However, heterogeneity within an ALE analysis of a functional domain might indicate the presence of functional sub-domains. Similarly, heterogeneity within a MACM analysis might indicate the involvement of a seed region in multiple co-activation patterns that are dependent on task contexts. Here, we demonstrate the use of the author-topic model to automatically determine if heterogeneities within ALE-type meta-analyses can be robustly explained by a small number of latent patterns. In the first application, the author-topic modeling of experiments involving self-generated thought (N = 179) revealed cognitive components fractionating the default network. In the second application, the author-topic model revealed that the left inferior frontal junction (IFJ) participated in multiple task-dependent co-activation patterns (N = 323). Furthermore, the author-topic model estimates compared favorably with spatial independent component analysis in both simulation and real data. Overall, the results suggest that the author-topic model is a flexible tool for exploring heterogeneity in ALE-type meta-analyses that might arise from functional sub-domains, mental disorder subtypes or task-dependent co-activation patterns. Code for this study is publicly available (httpsgithub.comThomasYeoLabCBIGtreemasterstable_projectsmeta-analysisNgo2019_AuthorTopic).
biorxiv neuroscience 0-100-users 2017Graphtyper Population-scale genotyping using pangenome graphs, bioRxiv, 2017-06-10
AbstractA fundamental requisite for genetic studies is an accurate determination of sequence variation. While human genome sequence diversity is increasingly well characterized, there is a need for efficient ways to utilize this knowledge in sequence analysis. Here we present Graphtyper, a publicly available novel algorithm and software for discovering and genotyping sequence variants. Graphtyper realigns short-read sequence data to a pangenome, a variation-aware graph structure that encodes sequence variation within a population by representing possible haplotypes as graph paths. Our results show that Graphtyper is fast, highly scalable, and provides sensitive and accurate genotype calls. Graphtyper genotyped 89.4 million sequence variants in whole-genomes of 28,075 Icelanders using less than 100,000 CPU days, including detailed genotyping of six human leukocyte antigen (HLA) genes. We show that Graphtyper is a valuable tool in characterizing sequence variation in population-scale sequencing studies.
biorxiv bioinformatics 0-100-users 2017Highly parallel genome variant engineering with CRISPRCas9 in eukaryotic cells, bioRxiv, 2017-06-09
AbstractDirect measurement of functional effects of DNA sequence variants throughout a genome is a major challenge. We developed a method that uses CRISPRCas9 to engineer many specific variants of interest in parallel in the budding yeast Saccharomyces cerevisiae, and to screen them for functional effects. We used the method to examine the functional consequences of premature termination codons (PTCs) at different locations within all annotated essential genes in yeast. We found that most PTCs were highly deleterious unless they occurred close to the C-terminal end and did not interrupt an annotated protein domain. Surprisingly, we discovered that some putatively essential genes are dispensable, while others have large dispensable regions. This approach can be used to profile the effects of large classes of variants in a high-throughput manner.
biorxiv genomics 0-100-users 2017Integrating long-range connectivity information into de Bruijn graphs, bioRxiv, 2017-06-09
AbstractMotivationThe de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input.ResultsWe present a novel assembly graph data structure the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both the de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterise the genomic context of drug-resistance genes.AvailabilityLinked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, available under the MIT license at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httphttpsgithub.commcveanmccortex>httpsgithub.commcveanmccortex<jatsext-link>.Contactturner.isaac@gmail.com.
biorxiv bioinformatics 0-100-users 2017