A critical comparison of technologies for a plant genome sequencing project, bioRxiv, 2017-10-12
A high quality genome sequence of your model organism is an essential starting point for many studies. Old clone based methods are slow and expensive, whereas faster, cheaper short read only assemblies can be incomplete and highly fragmented, which minimises their usefulness. The last few years have seen the introduction of many new technologies for genome assembly. These new technologies and new algorithms are typically benchmarked on microbial genomes or, if they scale appropriately, human. However, plant genomes can be much more repetitive and larger than human, and plant biology makes obtaining high quality DNA free from contaminants difficult. Reflecting their challenging nature we observe that plant genome assembly statistics are typically poorer than for vertebrates. Here we compare Illumina short read, PacBio long read, 10x Genomics linked reads, Dovetail Hi-C and BioNano Genomics optical maps, singly and combined, in producing high quality long range genome assemblies of the potato species S. verrucosum. We benchmark the assemblies for completeness and accuracy, as well as DNA, compute requirements and sequencing costs. We expect our results will be helpful to other genome projects, and that these datasets will be used in benchmarking by assembly algorithm developers.
biorxiv genomics 0-100-users 2017A cancer pharmacogenomic screen powering crowd-sourced advancement of drug combination prediction, bioRxiv, 2017-10-10
AbstractThe effectiveness of most cancer targeted therapies is short lived since tumors evolve and develop resistance. Combinations of drugs offer the potential to overcome resistance, however the number of possible combinations is vast necessitating data-driven approaches to find optimal treatments tailored to a patient’s tumor. AstraZeneca carried out 11,576 experiments on 910 drug combinations across 85 cancer cell lines, recapitulating in vivo response profiles. These data, the largest openly available screen, were hosted by DREAM alongside deep molecular characterization from the Sanger Institute for a Challenge to computationally predict synergistic drug pairs and associated biomarkers. 160 teams participated to provide the most comprehensive methodological development and subsequent benchmarking to date. Winning methods incorporated prior knowledge of putative drug target interactions. For >60% of drug combinations synergy was reproducibly predicted with an accuracy matching biological replicate experiments, however 20% of drug combinations were poorly predicted by all methods. Genomic rationale for synergy predictions were identified, including antagonism unique to combined PIK3CBD inhibition with the ADAM17 inhibitor where synergy is seen with other PI3K pathway inhibitors. All data, methods and code are freely available as a resource to the community.
biorxiv bioinformatics 0-100-users 2017Assessment of batch-correction methods for scRNA-seq data with a new test metric, bioRxiv, 2017-10-10
AbstractSingle-cell transcriptomics is a versatile tool for exploring heterogeneous cell populations. As with all genomics experiments, batch effects can hamper data integration and interpretation. The success of batch effect correction is often evaluated by visual inspection of dimension-reduced representations such as principal component analysis. This is inherently imprecise due to the high number of genes and non-normal distribution of gene expression. Here, we present a k-nearest neighbour batch effect test (kBET, <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comtheislabkBET>httpsgithub.comtheislabkBET<jatsext-link>) to quantitatively measure batch effects. kBET is easier to interpret, more sensitive and more robust than visual evaluation and other measures of batch effects. We use kBET to assess commonly used batch regression and normalisation approaches, and quantify the extent to which they remove batch effects while preserving biological variability. Our results illustrate that batch correction based on log-transformation or scran pooling followed by ComBat reduced the batch effect while preserving structure across data sets. Finally we show that kBET can pinpoint successful data integration methods across multiple data sets, in this case from different publications all charting mouse embryonic development. This has important implications for future data integration efforts, which will be central to projects such as the Human Cell Atlas where data for the same tissue may be generated in multiple locations around the world.[Before final publication, we will upload the R package to Bioconductor]
biorxiv bioinformatics 0-100-users 2017High-Precision Automated Reconstruction of Neurons with Flood-filling Networks, bioRxiv, 2017-10-10
AbstractReconstruction of neural circuits from volume electron microscopy data requires the tracing of complete cells including all their neurites. Automated approaches have been developed to perform the tracing, but without costly human proofreading their error rates are too high to obtain reliable circuit diagrams. We present a method for automated segmentation that, like the majority of previous efforts, employs convolutional neural networks, but contains in addition a recurrent pathway that allows the iterative optimization and extension of the reconstructed shape of individual neural processes. We used this technique, which we call flood-filling networks, to trace neurons in a data set obtained by serial block-face electron microscopy from a male zebra finch brain. Our method achieved a mean error-free neurite path length of 1.1 mm, an order of magnitude better than previously published approaches applied to the same dataset. Only 4 mergers were observed in a neurite test set of 97 mm path length.
biorxiv neuroscience 0-100-users 2017Predictive Coding of Novel versus Familiar Stimuli in the Primary Visual Cortex, bioRxiv, 2017-10-04
AbstractTo explore theories of predictive coding, we presented mice with repeated sequences of images with novel images sparsely substituted. Under these conditions, mice could be rapidly trained to lick in response to a novel image, demonstrating a high level of performance on the first day of testing. Using 2-photon calcium imaging to record from layer 23 neurons in the primary visual cortex, we found that novel images evoked excess activity in the majority of neurons. When a new stimulus sequence was repeatedly presented, a majority of neurons had similarly elevated activity for the first few presentations, which then decayed to almost zero activity. The decay time of these transient responses was not fixed, but instead scaled with the length of the stimulus sequence. However, at the same time, we also found a small fraction of the neurons within the population (∼2%) that continued to respond strongly and periodically to the repeated stimulus. Decoding analysis demonstrated that both the transient and sustained responses encoded information about stimulus identity. We conclude that the layer 23 population uses a two-channel predictive code a dense transient code for novel stimuli and a sparse sustained code for familiar stimuli. These results extend and unify existing theories about the nature of predictive neural codes.
biorxiv neuroscience 0-100-users 2017Directed evolution of TurboID for efficient proximity labeling in living cells and organisms, bioRxiv, 2017-10-03
AbstractProtein interaction networks and protein compartmentation underlie every signaling process and regulatory mechanism in cells. Recently, proximity labeling (PL) has emerged as a new approach to study the spatial and interaction characteristics of proteins in living cells. However, the two enzymes commonly used for PL come with tradeoffs – BioID is slow, requiring tagging times of 18-24 hours, while APEX peroxidase uses substrates that have limited cell permeability and high toxicity. To address these problems, we used yeast display-based directed evolution to engineer two mutants of biotin ligase, TurboID and miniTurbo, with much greater catalytic efficiency than BioID, and the ability to carry out PL in cells in much shorter time windows (as little as 10 minutes) with non-toxic and easily deliverable biotin. In addition to shortening PL time by 100-fold and increasing PL yield in cell culture, TurboID enabled biotin-based PL in new settings, including yeast, Drosophila, and C. elegans.
biorxiv bioengineering 0-100-users 2017