Health and population effects of rare gene knockouts in adult humans with related parents, bioRxiv, 2015-11-15
Complete gene knockouts are highly informative about gene function. We exome sequenced 3,222 British Pakistani-heritage adults with high parental relatedness, discovering 1,111 rare-variant homozygous likely loss of function (rhLOF) genotypes predicted to disrupt (knockout) 781 genes. Based on depletion of rhLOF genotypes, we estimate that 13.6% of knockouts are incompatible with adult life, finding on average 1.6 heterozygous recessive lethal LOF variants per adult. Linking to lifelong health records, we observed no association of rhLOF genotypes with prescription- or doctor-consultation rate, and no disease-related phenotypes in 33 of 42 individuals with rhLOF genotypes in recessive Mendelian disease genes. Phased genome sequencing of a healthy PRDM9 knockout mother, her child and controls, showed meiotic recombination sites localised away from PRDM9-dependent hotspots, demonstrating PRDM9 redundancy in humans.
biorxiv genomics 0-100-users 2015The Genetic cost of Neanderthal introgression, bioRxiv, 2015-10-31
AbstractApproximately 2-4% of genetic material in human populations outside Africa is derived from Neanderthals who interbred with anatomically modern humans. Recent studies have shown that this Neanderthal DNA is depleted around functional genomic regions; this has been suggested to be a consequence of harmful epistatic interactions between human and Neanderthal alleles. However, using published estimates of Neanderthal inbreeding and the distribution of mutational fitness effects, we infer that Neanderthals had at least 40% lower fitness than humans on average; this increased load predicts the reduction in Neanderthal introgression around genes without the need to invoke epistasis. We also predict a residual Neanderthal mutational load in non-Africans, leading to a fitness reduction of at least 0.5%. This effect of Neanderthal admixture has been left out of previous debate on mutation load differences between Africans and non-Africans. We also show that if many deleterious mutations are recessive, the Neanderthal admixture fraction could increase over time due to the protective effect of Neanderthal haplotypes against deleterious alleles that arose recently in the human population. This might partially explain why so many organisms retain gene flow from other species and appear to derive adaptive benefits from introgression.
biorxiv evolutionary-biology 0-100-users 2015Deep neural networks a new framework for modelling biological vision and brain information processing, bioRxiv, 2015-10-27
Recent advances in neural network modelling have enabled major strides in computer vision and other artificial intelligence applications. Human-level visual recognition abilities are coming within reach of artificial systems. Artificial neural networks are inspired by the brain and their computations could be implemented in biological neurons. Convolutional feedforward networks, which now dominate computer vision, take further inspiration from the architecture of the primate visual hierarchy. However, the current models are designed with engineering goals and not to model brain computations. Nevertheless, initial studies comparing internal representations between these models and primate brains find surprisingly similar representational spaces. With human-level performance no longer out of reach, we are entering an exciting new era, in which we will be able to build neurobiologically faithful feedforward and recurrent computational models of how biological brains perform high-level feats of intelligence, including vision.
biorxiv neuroscience 0-100-users 2015Mash fast genome and metagenome distance estimation using MinHash, bioRxiv, 2015-10-27
ABSTRACTMash extends the MinHash dimensionality-reduction technique to include a pairwise mutation distance and P-value significance test, enabling the efficient clustering and search of massive sequence collections. Mash reduces large sequences and sequence sets to small, representative sketches, from which global mutation distances can be rapidly estimated. We demonstrate several use cases, including the clustering of all 54,118 NCBI RefSeq genomes in 33 CPU hours; real-time database search using assembled or unassembled Illumina, Pacific Biosciences, and Oxford Nanopore data; and the scalable clustering of hundreds of metagenomic samples by composition. Mash is freely released under a BSD license (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.commarblmash>httpsgithub.commarblmash<jatsext-link>).
biorxiv bioinformatics 0-100-users 2015RapMap A Rapid, Sensitive and Accurate Tool for Mapping RNA-seq Reads to Transcriptomes, bioRxiv, 2015-10-23
AbstractMotivation The alignment of sequencing reads to a transcriptome is a common and important step in many RNA-seq analysis tasks. When aligning RNA-seq reads directly to a transcriptome (as is common in the de novo setting or when a trusted reference annotation is available), care must be taken to report the potentially large number of multi-mapping locations per read. This can pose a substantial computational burden for existing aligners, and can considerably slow downstream analysis.Results We introduce a novel concept, quasi-mapping, and an efficient algorithm implementing this approach for mapping sequencing reads to a transcriptome. By attempting only to report the potential loci of origin of a sequencing read, and not the base-to-base alignment by which it derives from the reference, RapMap— our tool implementing quasi-mapping— is capable of mapping sequencing reads to a target transcriptome substantially faster than existing alignment tools. The algorithm we employ to implement quasi-mapping uses several efficient data structures and takes advantage of the special structure of shared sequence prevalent in transcriptomes to rapidly provide highly-accurate mapping information. We demonstrate how quasi-mapping can be successfully applied to the problems of transcript-level quantification from RNA-seq reads and the clustering of contigs from de novo assembled transcriptomes into biologically-meaningful groups.AvailabilityRapMap is implemented in C++11 and is available as open-source software, under GPL v3, at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comCOMBINE-labRapMap>httpsgithub.comCOMBINE-labRapMap<jatsext-link>.Contactrob.patro@cs.stonybrook.edu
biorxiv bioinformatics 0-100-users 2015Basset Learning the regulatory code of the accessible genome with deep convolutional neural networks, bioRxiv, 2015-10-06
AbstractThe complex language of eukaryotic gene expression remains incompletely understood. Despite the importance suggested by many noncoding variants statistically associated with human disease, nearly all such variants have unknown mechanism. Here, we address this challenge using an approach based on a recent machine learning advance—deep convolutional neural networks (CNNs). We introduce an open source package Basset (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comdavek44Basset>httpsgithub.comdavek44Basset<jatsext-link>) to apply CNNs to learn the functional activity of DNA sequences from genomics data. We trained Basset on a compendium of accessible genomic sites mapped in 164 cell types by DNaseI-seq and demonstrate far greater predictive accuracy than previous methods. Basset predictions for the change in accessibility between variant alleles were far greater for GWAS SNPs that are likely to be causal relative to nearby SNPs in linkage disequilibrium with them. With Basset, a researcher can perform a single sequencing assay in their cell type of interest and simultaneously learn that cell’s chromatin accessibility code and annotate every mutation in the genome with its influence on present accessibility and latent potential for accessibility. Thus, Basset offers a powerful computational approach to annotate and interpret the noncoding genome.
biorxiv genomics 0-100-users 2015