Analysis of protein-coding genetic variation in 60,706 humans, bioRxiv, 2015-10-31
SummaryLarge-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) sequence data for 60,706 individuals of diverse ethnicities generated as part of the Exome Aggregation Consortium (ExAC). The resulting catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We show that this catalogue can be used to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; we identify 3,230 genes with near-complete depletion of truncating variants, 72% of which have no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human “knockout” variants in protein-coding genes.
biorxiv genomics 200-500-users 2015The Genetic cost of Neanderthal introgression, bioRxiv, 2015-10-31
AbstractApproximately 2-4% of genetic material in human populations outside Africa is derived from Neanderthals who interbred with anatomically modern humans. Recent studies have shown that this Neanderthal DNA is depleted around functional genomic regions; this has been suggested to be a consequence of harmful epistatic interactions between human and Neanderthal alleles. However, using published estimates of Neanderthal inbreeding and the distribution of mutational fitness effects, we infer that Neanderthals had at least 40% lower fitness than humans on average; this increased load predicts the reduction in Neanderthal introgression around genes without the need to invoke epistasis. We also predict a residual Neanderthal mutational load in non-Africans, leading to a fitness reduction of at least 0.5%. This effect of Neanderthal admixture has been left out of previous debate on mutation load differences between Africans and non-Africans. We also show that if many deleterious mutations are recessive, the Neanderthal admixture fraction could increase over time due to the protective effect of Neanderthal haplotypes against deleterious alleles that arose recently in the human population. This might partially explain why so many organisms retain gene flow from other species and appear to derive adaptive benefits from introgression.
biorxiv evolutionary-biology 0-100-users 2015Deep neural networks a new framework for modelling biological vision and brain information processing, bioRxiv, 2015-10-27
Recent advances in neural network modelling have enabled major strides in computer vision and other artificial intelligence applications. Human-level visual recognition abilities are coming within reach of artificial systems. Artificial neural networks are inspired by the brain and their computations could be implemented in biological neurons. Convolutional feedforward networks, which now dominate computer vision, take further inspiration from the architecture of the primate visual hierarchy. However, the current models are designed with engineering goals and not to model brain computations. Nevertheless, initial studies comparing internal representations between these models and primate brains find surprisingly similar representational spaces. With human-level performance no longer out of reach, we are entering an exciting new era, in which we will be able to build neurobiologically faithful feedforward and recurrent computational models of how biological brains perform high-level feats of intelligence, including vision.
biorxiv neuroscience 0-100-users 2015Mash fast genome and metagenome distance estimation using MinHash, bioRxiv, 2015-10-27
ABSTRACTMash extends the MinHash dimensionality-reduction technique to include a pairwise mutation distance and P-value significance test, enabling the efficient clustering and search of massive sequence collections. Mash reduces large sequences and sequence sets to small, representative sketches, from which global mutation distances can be rapidly estimated. We demonstrate several use cases, including the clustering of all 54,118 NCBI RefSeq genomes in 33 CPU hours; real-time database search using assembled or unassembled Illumina, Pacific Biosciences, and Oxford Nanopore data; and the scalable clustering of hundreds of metagenomic samples by composition. Mash is freely released under a BSD license (<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.commarblmash>httpsgithub.commarblmash<jatsext-link>).
biorxiv bioinformatics 0-100-users 2015RapMap A Rapid, Sensitive and Accurate Tool for Mapping RNA-seq Reads to Transcriptomes, bioRxiv, 2015-10-23
AbstractMotivation The alignment of sequencing reads to a transcriptome is a common and important step in many RNA-seq analysis tasks. When aligning RNA-seq reads directly to a transcriptome (as is common in the de novo setting or when a trusted reference annotation is available), care must be taken to report the potentially large number of multi-mapping locations per read. This can pose a substantial computational burden for existing aligners, and can considerably slow downstream analysis.Results We introduce a novel concept, quasi-mapping, and an efficient algorithm implementing this approach for mapping sequencing reads to a transcriptome. By attempting only to report the potential loci of origin of a sequencing read, and not the base-to-base alignment by which it derives from the reference, RapMap— our tool implementing quasi-mapping— is capable of mapping sequencing reads to a target transcriptome substantially faster than existing alignment tools. The algorithm we employ to implement quasi-mapping uses several efficient data structures and takes advantage of the special structure of shared sequence prevalent in transcriptomes to rapidly provide highly-accurate mapping information. We demonstrate how quasi-mapping can be successfully applied to the problems of transcript-level quantification from RNA-seq reads and the clustering of contigs from de novo assembled transcriptomes into biologically-meaningful groups.AvailabilityRapMap is implemented in C++11 and is available as open-source software, under GPL v3, at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comCOMBINE-labRapMap>httpsgithub.comCOMBINE-labRapMap<jatsext-link>.Contactrob.patro@cs.stonybrook.edu
biorxiv bioinformatics 0-100-users 2015Relative Citation Ratio (RCR) A new metric that uses citation rates to measure influence at the article level, bioRxiv, 2015-10-23
AbstractDespite their recognized limitations, bibliometric assessments of scientific productivity have been widely adopted. We describe here an improved method that makes novel use of the co-citation network of each article to field-normalize the number of citations it has received. The resulting Relative Citation Ratio is article-level and field-independent, and provides an alternative to the invalid practice of using Journal Impact Factors to identify influential papers. To illustrate one application of our method, we analyzed 88,835 articles published between 2003 and 2010, and found that the National Institutes of Health awardees who authored those papers occupy relatively stable positions of influence across all disciplines. We demonstrate that the values generated by this method strongly correlate with the opinions of subject matter experts in biomedical research, and suggest that the same approach should be generally applicable to articles published in all areas of science. A beta version of iCite, our web tool for calculating Relative Citation Ratios of articles listed in PubMed, is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsicite.od.nih.gov>httpsicite.od.nih.gov<jatsext-link>.
biorxiv scientific-communication-and-education 200-500-users 2015