Nucleotide sequence and DNaseI sensitivity are predictive of 3D chromatin architecture, bioRxiv, 2017-01-28
AbstractRecently, Hi-C has been used to probe the 3D chromatin architecture of multiple organisms and cell types. The resulting collections of pairwise contacts across the genome have connected chromatin architecture to many cellular phenomena, including replication timing and gene regulation. However, high resolution (10 kb or finer) contact maps remain scarce due to the expense and time required for collection. A computational method for predicting pairwise contacts without the need to run a Hi-C experiment would be invaluable in understanding the role that 3D chromatin architecture plays in genome biology. We describe Rambutan, a deep convolutional neural network that predicts Hi-C contacts at 1 kb resolution using nucleotide sequence and DNaseI assay signal as inputs. Specifically, Rambutan identifies locus pairs that engage in high confidence contacts according to Fit-Hi-C, a previously described method for assigning statistical confidence estimates to Hi-C contacts. We first demonstrate Rambutan’s performance across chromosomes at 1 kb resolution in the GM12878 cell line. Subsequently, we measure Rambutan’s performance across six cell types. In this setting, the model achieves an area under the receiver operating characteristic curve between 0.7662 and 0.8246 and an area under the precision-recall curve between 0.3737 and 0.9008. We further demonstrate that the predicted contacts exhibit expected trends relative to histone modification ChlP-seq data, replication timing measurements, and annotations of functional elements such as promoters and enhancers. Finally, we predict Hi-C contacts for 53 human cell types and show that the predictions cluster by cellular function. [NOTE After our original submission we discovered an error in our calling of statistically significant contacts. Briefly, when calculating the prior probability of a contact, we used the number of contacts at a certain genomic distance in a chromosome but divided by the total number of bins in the full genome. When we corrected this mistake we noticed that the Rambutan model, as it curently stands, did not outperform simply using the GM12878 contact map that Rambutan was trained on as the predictor in other cell types. While we investigate these new results, we ask that readers treat this manuscript skeptically.]
biorxiv bioinformatics 0-100-users 2017Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types, bioRxiv, 2017-01-26
ABSTRACTGenetics can provide a systematic approach to discovering the tissues and cell types relevant for a complex disease or trait. Identifying these tissues and cell types is critical for following up on non-coding allelic function, developing ex-vivo models, and identifying therapeutic targets. Here, we analyze gene expression data from several sources, including the GTEx and PsychENCODE consortia, together with genome-wide association study (GWAS) summary statistics for 48 diseases and traits with an average sample size of 169,331, to identify disease-relevant tissues and cell types. We develop and apply an approach that uses stratified LD score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We detect tissue-specific enrichments at FDR < 5% for 34 diseases and traits across a broad range of tissues that recapitulate known biology. In our analysis of traits with observed central nervous system enrichment, we detect an enrichment of neurons over other brain cell types for several brain-related traits, enrichment of inhibitory over excitatory neurons for bipolar disorder but excitatory over inhibitory neurons for schizophrenia and body mass index, and enrichments in the cortex for schizophrenia and in the striatum for migraine. In our analysis of traits with observed immunological enrichment, we identify enrichments of T cells for asthma and eczema, B cells for primary biliary cirrhosis, and myeloid cells for Alzheimer's disease, which we validated with independent chromatin data. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signal.
biorxiv genetics 0-100-users 2017DNA-dependent RNA cleavage by the Natronobacterium gregoryi Argonaute, bioRxiv, 2017-01-21
AbstractWe show here that, unlike most other prokaryotic Argonaute (Ago) proteins, which are DNA-guided endonucleases, the Natronobacterium gregoryi-derived Ago (NgAgo) can function as a DNA-guided endoribonuclease, cleaving RNA, rather than DNA, in a targeted manner. The NgAgo protein, in complex with 5’-hydroxylated or 5’-phosphrylated oligodeoxyribonucleotides (ODNs) of variable lengths, split RNA targets into two or more fragments in vitro, suggesting its physiological role in bacteria and demonstrating a potential for degrading RNA molecules such as mRNA or lncRNA in eukaryotic cells in a targeted manner.
biorxiv biochemistry 0-100-users 2017Modern human origins multiregional evolution of autosomes and East Asia origin of Y and mtDNA, bioRxiv, 2017-01-19
AbstractThe neutral theory has been used as a null model for interpreting nature and produced the Recent Out of Africa model of anatomically modern humans. Recent studies, however, have established that genetic diversities are mostly at maximum saturation levels maintained by selection, therefore challenging the explanatory power of the neutral theory and rendering the present molecular model of human origins untenable. Using improved methods and public data, we have revisited human evolution and found sharing of genetic variations among racial groups to be largely a result of parallel mutations rather than recent common ancestry and admixture as commonly assumed. We derived an age of 1.86-1.92 million years for the first split in modern human populations based on autosomal diversity data. We found evidence of modern Y and mtDNA originating in East Asia and dispersing via hybridization with archaic humans. Analyses of autosomes, Y and mtDNA all suggest that Denisovan and Neanderthal were archaic Africans with Eurasian admixtures and ancestors of South Asia Negritos and Aboriginal Australians. Verifying our model, we found more ancestry of Southern Chinese from Hunan in Africans relative to other East Asian groups examined. These results suggest multiregional evolution of autosomes and replacements of archaic Y and mtDNA by modern ones originating in East Asia, thereby leading to a coherent account of modern human origins.
biorxiv evolutionary-biology 0-100-users 2017Untangling intelligence, psychopathy, antisocial personality disorder, & conduct problems A meta-analytic review, bioRxiv, 2017-01-18
AbstractSubstantial research has investigated the association between intelligence and psychopathic traits. The findings to date have been inconsistent and have not always considered the multi-dimensional nature of psychopathic traits. Moreover, there has been a tendency to confuse psychopathy with other closely related, clinically significant disorders. The current study represents a meta-analysis conducted to evaluate the direction and magnitude of the association of intelligence with global psychopathy, as well as its factors and facets, and related disorders (Antisocial Personality Disorder, Conduct Disorder, and Oppositional Defiant Disorder). Our analyses revealed a small, significant, negative relationship between intelligence and total psychopathy (r = -.07, p = .001). Analysis of factors and facets found differential associations, including both significant positive (e.g., interpersonal facet) and negative (e.g., affective facet) associations, further affirming that psychopathy is a multi-dimensional construct. Additionally, intelligence was negatively associated with Antisocial Personality Disorder (r = -.13, p = .001) and Conduct Disorder (r = -.11, p = .001), but positively with Oppositional Defiant Disorder (r = .06, p = .001). There was significant heterogeneity across studies for most effects, but the results of moderator analyses were inconsistent. Finally, bias analyses did not find significant evidence for publication bias or outsized effects of outliers.
biorxiv neuroscience 0-100-users 2017Evaluation of Oxford Nanopore MinIONTM Sequencing for 16S rRNA Microbiome Characterization, bioRxiv, 2017-01-13
AbstractIn this manuscript we evaluate the potential for microbiome characterization by sequencing of near-full length 16S rRNA gene region fragments using the Oxford Nanopore MinION (hereafter ‘Nanopore’) sequencing platform. We analyzed pure-culture E. coli and P. fluorescens, as well as a low-diversity mixed community sample from hydraulic fracturing produced water. Both closed and open reference operational taxonomic unit (OTU) picking failed, necessitating the direct use of sequences without OTU picking. The Ribosomal Database Project classifier against the Green Genes database was found to be the optimal annotation approach, with average pure-culture annotation accuracies of 93.8% and 82.0% at the phyla and genus levels, respectively. Comparative analysis of an environmental sample using Nanopore and Illumina MiSeq sequencing identified high taxonomic similarity when using a weighted metric (Bray-Curtis), and significantly reduced similarity when using an unweighted metric (Jaccard). These results highlight the great potential of Nanopore sequencing to analyze broad microbial community trends, and the challenge of applying Nanopore sequencing to discern rare taxa in mixed microbial communities. Finally, we observed that between-run carryover following washes on the same flowcell accounted for >10% of sequence reads, necessitating future development to either prevent carryover or filter sequences of interest (e.g. barcoding).
biorxiv microbiology 0-100-users 2017