Pan-cancer analysis of whole genomes, bioRxiv, 2017-07-13
We report the integrative analysis of more than 2,600 whole cancer genomes and their matching normal tissues across 39 distinct tumour types. By studying whole genomes we have been able to catalogue non-coding cancer driver events, study patterns of structural variation, infer tumour evolution, probe the interactions among variants in the germline genome, the tumour genome and the transcriptome, and derive an understanding of how coding and non-coding variations together contribute to driving individual patient's tumours. This work represents the most comprehensive look at cancer whole genomes to date. NOTE TO READERS This is an incomplete draft of the marker paper for the Pan-Cancer Analysis of Whole Genomes Project, and is intended to provide the background information for a series of in-depth papers that will be posted to BioRixv during the summer of 2017.
biorxiv cancer-biology 0-100-users 2017Why Does the Neocortex Have Columns, A Theory of Learning the Structure of the World, bioRxiv, 2017-07-13
ABSTRACTNeocortical regions are organized into columns and layers. Connections between layers run mostly perpendicular to the surface suggesting a columnar functional organization. Some layers have long-range excitatory lateral connections suggesting interactions between columns. Similar patterns of connectivity exist in all regions but their exact role remain a mystery. In this paper, we propose a network model composed of columns and layers that performs robust object learning and recognition. Each column integrates its changing input over time to learn complete predictive models of observed objects. Excitatory lateral connections across columns allow the network to more rapidly infer objects based on the partial knowledge of adjacent columns. Because columns integrate input over time and space, the network learns models of complex objects that extend well beyond the receptive field of individual cells. Our network model introduces a new feature to cortical columns. We propose that a representation of location relative to the object being sensed is calculated within the sub-granular layers of each column. The location signal is provided as an input to the network, where it is combined with sensory data. Our model contains two layers and one or more columns. Simulations show that using Hebbian-like learning rules small single-column networks can learn to recognize hundreds of objects, with each object containing tens of features. Multi-column networks recognize objects with significantly fewer movements of the sensory receptors. Given the ubiquity of columnar and laminar connectivity patterns throughout the neocortex, we propose that columns and regions have more powerful recognition and modeling capabilities than previously assumed.
biorxiv neuroscience 100-200-users 2017Text mining of 15 million full-text scientific articles, bioRxiv, 2017-07-12
AbstractAcross academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.
biorxiv bioinformatics 100-200-users 2017The evolutionary history of 2,658 cancers, bioRxiv, 2017-07-12
SummaryCancer develops through a process of somatic evolution. Here, we use whole-genome sequencing of 2,778 tumour samples from 2,658 donors to reconstruct the life history, evolution of mutational processes, and driver mutation sequences of 39 cancer types. The early phases of oncogenesis are driven by point mutations in a small set of driver genes, often including biallelic inactivation of tumour suppressors. Early oncogenesis is also characterised by specific copy number gains, such as trisomy 7 in glioblastoma or isochromosome 17q in medulloblastoma. By contrast, increased genomic instability, a nearly four-fold diversification of driver genes, and an acceleration of point mutation processes are features of later stages. Copy-number alterations often occur in mitotic crises leading to simultaneous gains of multiple chromosomal segments. Timing analysis suggests that driver mutations often precede diagnosis by many years, and in some cases decades, providing a window of opportunity for early cancer detection.
biorxiv cancer-biology 200-500-users 2017Sequential regulatory activity prediction across chromosomes with convolutional neural networks, bioRxiv, 2017-07-11
AbstractModels for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to predict cell type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. Using convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.
biorxiv genomics 0-100-users 2017Speed breeding a powerful tool to accelerate crop research and breeding, bioRxiv, 2017-07-10
The growing human population and a changing environment have raised significant concern for global food security, with the current improvement rate of several important crops inadequate to meet future demand [1]. This slow improvement rate is attributed partly to the long generation times of crop plants. Here we present a method called ‘speed breeding’, which greatly shortens generation time and accelerates breeding and research programs. Speed breeding can be used to achieve up to 6 generations per year for spring wheat (Triticum aestivum), durum wheat (T. durum), barley (Hordeum vulgare), chickpea (Cicer arietinum), and pea (Pisum sativum) and 4 generations for canola (Brassica napus), instead of 2-3 under normal glasshouse conditions. We demonstrate that speed breeding in fully-enclosed controlled-environment growth chambers can accelerate plant development for research purposes, including phenotyping of adult plant traits, mutant studies, and transformation. The use of supplemental lighting in a glasshouse environment allows rapid generation cycling through single seed descent and potential for adaptation to larger-scale crop improvement programs. Cost-saving through LED supplemental lighting is also outlined. We envisage great potential for integrating speed breeding with other modern crop breeding technologies, including high-throughput genotyping, genome editing, and genomic selection, accelerating the rate of crop improvement.
biorxiv plant-biology 200-500-users 2017