Ohana detecting selection in multiple populations by modelling ancestral admixture components, bioRxiv, 2019-02-15
One of the most powerful and commonly used methods for detecting local adaptation in the genome is the identification of extreme allele frequency differences between populations. In this paper, we present a new maximum likelihood method for finding regions under positive selection. The method is based on a Gaussian approximation to allele frequency changes and it incorporates admixture between populations. The method can analyze multiple populations simultaneously and retains power to detect selection signatures specific to ancestry components that are not representative of any extant populations. We evaluate the method using simulated data and compare it to related methods based on summary statistics. We also apply it to human genomic data and identify loci with extreme genetic differentiation between major geographic groups. Many of the genes identified are previously known selected loci relating to hair pigmentation and morphology, skin and eye pigmentation. We also identify new candidate regions, including various selected loci in the Native American component of admixed Mexican-Americans. These involve diverse biological functions, like immunity, fat distribution, food intake, vision and hair development.
biorxiv bioinformatics 100-200-users 2019Global Signal Regression Strengthens Association between Resting-State Functional Connectivity and Behavior, bioRxiv, 2019-02-14
Global signal regression (GSR) is one of the most debated preprocessing strategies for resting-state functional MRI. GSR effectively removes global artifacts driven by motion and respiration, but also discards globally distributed neural information and introduces negative correlations between certain brain regions. The vast majority of previous studies have focused on the effectiveness of GSR in removing imaging artifacts, as well as its potential biases. Given the growing interest in functional connectivity fingerprinting, here we considered the utilitarian question of whether GSR strengthens or weakens associations between resting-state functional connectivity (RSFC) and multiple behavioral measures across cognition, personality and emotion. By applying the variance component model to the Brain Genomics Superstruct Project (GSP), we found that behavioral variance explained by whole-brain RSFC increased by an average of 47% across 23 behavioral measures after GSR. In the Human Connectome Project (HCP), we found that behavioral variance explained by whole-brain RSFC increased by an average of 40% across 58 behavioral measures, when GSR was applied after ICA-FIX de-noising. To ensure generalizability, we repeated our analyses using kernel regression. GSR improved behavioral prediction accuracies by an average of 64% and 12% in the GSP and HCP datasets respectively. Importantly, the results were consistent across methods. A behavioral measure with greater RSFC-explained variance (using the variance component model) also exhibited greater prediction accuracy (using kernel regression). A behavioral measure with greater improvement in behavioral variance explained after GSR (using the variance component model) also enjoyed greater improvement in prediction accuracy after GSR (using kernel regression). Furthermore, GSR appeared to benefit task performance measures more than self-reported measures. Since GSR was more effective at removing motion-related and respiratory-related artifacts, GSR-related increases in variance explained and prediction accuracies were unlikely the result of motion-related or respiratory-related artifacts. However, it is worth emphasizing that the current study focused on whole-brain RSFC, so it remains unclear whether GSR improves RSFC-behavioral associations for specific connections or networks. Overall, our results suggest that at least in the case for young healthy adults, GSR strengthens the associations between RSFC and most (although not all) behavioral measures. Code for the variance component model and ridge regression can be found here httpsgithub.comThomasYeoLabCBIGtreemasterstable_projectspreprocessingLi2019_GSR.
biorxiv neuroscience 100-200-users 2019Exercise twice-a-day potentiates skeletal muscle signalling responses associated with mitochondrial biogenesis in humans, which are independent of lowered muscle glycogen content, bioRxiv, 2019-02-12
Endurance exercise begun with reduced muscle glycogen stores seems to potentiate skeletal muscle protein abundance and gene expression. However, it is unknown whether this greater signalling responses is due to low muscle glycogen per se or to performing two exercise sessions in close proximity - as a first exercise session is necessary to reduce the muscle glycogen stores. In the present study, we manipulated the recovery duration between a first muscle glycogen-depleting exercise and a second exercise session, such that the second exercise session started with reduced muscle glycogen in both approaches but was performed either two or 15 h after the first exercise session (so-called twice-a-day and once-daily approaches, respectively). We found that exercise twice-a-day increased the nuclear abundance of transcription factor EB (TFEB) and nuclear factor of activated T cells (NFAT) and potentiated the transcription of peroxisome proliferator-activated receptor-ɣ coactivator 1 alpha (PGC-1a), peroxisome proliferator-activated receptor alpha (PPARa;) and peroxisome proliferator-activated receptor betadelta (PPARbd) genes, in comparison with the once-daily exercise. These results suggest that the elevated molecular signalling reported with previous train-low approaches can be attributed to performing two exercise sessions in close proximity rather than the reduced muscle glycogen content per se. The twice-a-day approach might be an effective strategy to induce adaptations related to mitochondrial biogenesis and fat oxidation.
biorxiv molecular-biology 100-200-users 2019Full-length mRNA sequencing reveals principles of poly(A) tail length control, bioRxiv, 2019-02-12
Although mRNAs are key molecules for understanding life, there exists no method to determine the full-length sequence of endogenous mRNAs including their poly(A) tails. Moreover, although poly(A) tails can be modified in functionally important ways, there also exists no method to accurately sequence them. Here, we present FLAM-seq, a rapid and simple method for high-quality sequencing of entire mRNAs. We report a cDNA library preparation method coupled to single-molecule sequencing to perform FLAM-seq. Using human cell lines, brain organoids, and C. elegans we show that FLAM-seq delivers high-quality full-length mRNA sequences for thousands of different genes per sample. We find that (a) 3' UTR length is correlated with poly(A) tail length, (b) alternative polyadenylation sites and alternative promoters for the same gene are linked to different tail lengths, (c) tails contain a significant number of cytosines. Thus, we provide a widely useful method and fundamental insights into poly(A) tail regulation.
biorxiv systems-biology 100-200-users 2019SyRI identification of syntenic and rearranged regions from whole-genome assemblies, bioRxiv, 2019-02-12
AbstractWe present SyRI, an efficient tool for genome-wide identification of structural rearrangements (SR) from genome graphs, which are built up from pair-wise whole-genome alignments. Instead of searching for differences, SyRI starts by finding all co-linear regions between the genomes. As all remaining regions are SRs by definition, they can be classified as inversions, translocations, or duplications based on their positions in convoluted networks of repetitive alignments. Finally, SyRI reports local variations like SNPs and indels within syntenic and rearranged regions. We show SyRI’s broad applicability to multiple species and genetically validate the presence of ∽100 translocations identified in Arabidopsis.
biorxiv bioinformatics 100-200-users 2019Transposable elements contribute to dynamic genome content in maize, bioRxiv, 2019-02-12
Transposable elements (TEs) are ubiquitous components of eukaryotic genomes and can create variation in genomic organization. The majority of maize genomes are composed of TEs. We developed an approach to define shared and variable TE insertions across genome assemblies and applied this method to four maize genomes (B73, W22, Mo17, and PH207). Among these genomes we identified 1.6 Gb of variable TE sequence representing a combination of recent TE movement and deletion of previously existing TEs. Although recent TE movement only accounted for a portion of the TE variability, we identified 4,737 TEs unique to one genome with defined insertion sites in all other genomes. Variable TEs are found for all superfamilies and are distributed across the genome, including in regions of recent shared ancestry among individuals. There are 2,380 genes annotated in the B73 genome located within variable TEs, providing evidence for the role of TEs in contributing to the substantial differences in gene content among these genotypes. The large scope of TE variation present in this limited sample of temperate maize genomes highlights the major contribution of TEs in driving variation in genome organization and gene content.
biorxiv genomics 100-200-users 2019