Identification of transcriptional signatures for cell types from single-cell RNA-Seq, bioRxiv, 2018-02-13
AbstractSingle-cell RNA-Seq makes it possible to characterize the transcriptomes of cell types and identify their transcriptional signatures via differential analysis. We present a fast and accurate method for discriminating cell types that takes advantage of the large numbers of cells that are assayed. When applied to transcript compatibility counts obtained via pseudoalignment, our approach provides a quantification-free analysis of 3’ single-cell RNA-Seq that can identify previously undetectable marker genes.
biorxiv bioinformatics 100-200-users 2018Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences1, bioRxiv, 2018-02-12
AbstractHumans vary substantially in their willingness to take risks. In a combined sample of over one million individuals, we conducted genome-wide association studies (GWAS) of general risk tolerance, adventurousness, and risky behaviors in the driving, drinking, smoking, and sexual domains. We identified 611 approximately independent genetic loci associated with at least one of our phenotypes, including 124 with general risk tolerance. We report evidence of substantial shared genetic influences across general risk tolerance and risky behaviors 72 of the 124 general risk tolerance loci contain a lead SNP for at least one of our other GWAS, and general risk tolerance is moderately to strongly genetically correlated (<jatsinline-formula><jatsinline-graphic xmlnsxlink=httpwww.w3.org1999xlink xlinkhref=261081_inline1.gif ><jatsinline-formula> to 0.50) with a range of risky behaviors. Bioinformatics analyses imply that genes near general-risk-tolerance-associated SNPs are highly expressed in brain tissues and point to a role for glutamatergic and GABAergic neurotransmission. We find no evidence of enrichment for genes previously hypothesized to relate to risk tolerance.
biorxiv genetics 200-500-users 2018Impact of genetically engineered maize on agronomic, environmental and toxicological traits a meta-analysis of 21 years of field data, Scientific Reports, 2018-02-09
Despite the extensive cultivation of genetically engineered (GE) maize and considerable number of scientific reports on its agro-environmental impact, the risks and benefits of GE maize are still being debated and concerns about safety remain. This meta-analysis aimed at increasing knowledge on agronomic, environmental and toxicological traits of GE maize by analyzing the peer-reviewed literature (from 1996 to 2016) on yield, grain quality, non-target organisms (NTOs), target organisms (TOs) and soil biomass decomposition. Results provided strong evidence that GE maize performed better than its near isogenic line grain yield was 5.6 to 24.5% higher with lower concentrations of mycotoxins (−28.8%), fumonisin (−30.6%) and thricotecens (−36.5%). The NTOs analyzed were not affected by GE maize, except for Braconidae, represented by a parasitoid of European corn borer, the target of Lepidoptera active Bt maize. Biogeochemical cycle parameters such as lignin content in stalks and leaves did not vary, whereas biomass decomposition was higher in GE maize. The results support the cultivation of GE maize, mainly due to enhanced grain quality and reduction of human exposure to mycotoxins. Furthermore, the reduction of the parasitoid of the target and the lack of consistent effects on other NTOs are confirmed.
scientific reports genetics 500+-users 2018Deep Convolutional Neural Networks for Breast Cancer Histology Image Analysis, bioRxiv, 2018-02-08
AbstractBreast cancer is one of the main causes of cancer death worldwide. Early diagnostics significantly increases the chances of correct treatment and survival, but this process is tedious and often leads to a disagreement between pathologists. Computer-aided diagnosis systems showed potential for improving the diagnostic accuracy. In this work, we develop the computational approach based on deep convolution neural networks for breast cancer histology image classification. Hematoxylin and eosin stained breast histology microscopy image dataset is provided as a part of the ICIAR 2018 Grand Challenge on Breast Cancer Histology Images. Our approach utilizes several deep neural network architectures and gradient boosted trees classifier. For 4-class classification task, we report 87.2% accuracy. For 2-class classification task to detect carcinomas we report 93.8% accuracy, AUC 97.3%, and sensitivityspecificity 96.588.0% at the high-sensitivity operating point. To our knowledge, this approach outperforms other common methods in automated histopathological image classification. The source code for our approach is made publicly available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comalexander-rakhlinICIAR2018>httpsgithub.comalexander-rakhlinICIAR2018<jatsext-link>
biorxiv pathology 100-200-users 2018Integrating Hi-C links with assembly graphs for chromosome-scale assembly, bioRxiv, 2018-02-08
AbstractLong-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.commachinegunSALSA>httpsgithub.commachinegunSALSA<jatsext-link>Author summaryHi-C technology was originally proposed to study the 3D organization of a genome. Recently, it has also been applied to assemble large eukaryotic genomes into chromosome-scale scaffolds. Despite this, there are few open source methods to generate these assemblies. Existing methods are also prone to small inversion errors due to noise in the Hi-C data. In this work, we address these challenges and develop a method, named SALSA2. SALSA2 uses sequence overlap information from an assembly graph to correct inversion errors and provide accurate chromosome-scale assemblies.
biorxiv bioinformatics 100-200-users 2018A proposal for a standardized bacterial taxonomy based on genome phylogeny, bioRxiv, 2018-01-31
AbstractTaxonomy is a fundamental organizing principle of biology, which ideally should be based on evolutionary relationships. Microbial taxonomy has been greatly restricted by the inability to obtain most microorganisms in pure culture and, to a lesser degree, the historical use of phenotypic properties as the basis for classification. However, we are now at the point of obtaining genome sequences broadly representative of microbial diversity by using culture-independent techniques, which provide the opportunity to develop a comprehensive genome-based taxonomy. Here we propose a standardized bacterial taxonomy based on a concatenated protein phylogeny that conservatively removes polyphyletic groups and normalizes ranks based on relative evolutionary divergence. From 94,759 bacterial genomes, 99 phyla are described including six major normalized monophyletic units from the subdivision of the Proteobacteria, and amalgamation of the Candidate Phyla Radiation into the single phylum Patescibacteria. In total, 73% of taxa had one or more changes to their existing taxonomy.
biorxiv microbiology 200-500-users 2018