Genomics of a complete butterfly continent, bioRxiv, 2019-11-05
Never before have we had the luxury of choosing a continent, picking a large phylogenetic group of animals, and obtaining genomic data for its every species. Here, we sequence all 845 species of butterflies recorded from North America north of Mexico. Our comprehensive approach reveals the pattern of diversification and adaptation occurring in this phylogenetic lineage as it has spread over the continent, which cannot be seen on a sample of selected species. We observe bursts of diversification that generated taxonomic ranks subfamily, tribe, subtribe, genus, and species. The older burst around 70 Mya resulted in the butterfly subfamilies, with the major evolutionary inventions being unique phenotypic traits shaped by high positive selection and gene duplications. The recent burst around 5 Mya is caused by explosive radiation in diverse butterfly groups associated with diversification in transcription and mRNA regulation, morphogenesis, and mate selection. Rapid radiation correlates with more frequent introgression of speciation-promoting and beneficial genes among radiating species. Radiation and extinction patterns over the last 100 million years suggest the following general model of animal evolution. A population spreads over the land, adapts to various conditions through mutations, and diversifies into several species. Occasional hybridization between these species results in accumulation of beneficial alleles in one, which eventually survives, while others become extinct. Not only butterflies, but also the hominids may have followed this path.
biorxiv genomics 500+-users 2019Architectural RNA is required for heterochromatin organization, bioRxiv, 2019-09-28
AbstractIn addition to its known roles in protein synthesis and enzyme catalysis, RNA has been proposed to stabilize higher-order chromatin structure. To distinguish presumed architectural roles of RNA from other functions, we applied a ribonuclease digestion strategy to our CUT&RUN in situ chromatin profiling method (CUT&RUN.RNase). We find that depletion of RNA compromises association of the murine nucleolar protein Nucleophosmin with pericentric heterochromatin and alters the chromatin environment of CCCTC-binding factor (CTCF) bound regions. Strikingly, we find that RNA maintains the integrity of both constitutive (H3K9me3 marked) and facultative (H3K27me3 marked) heterochromatic regions as compact domains, but only moderately stabilizes euchromatin. To establish the specificity of heterochromatin stabilization by RNA, we performed CUT&RUN on cells deleted for the Firre long non-coding RNA and observed disruption of H3K27me3 domains on several chromosomes. We conclude that RNA maintains local and global chromatin organization by acting as a structural scaffold for heterochromatic domains.
biorxiv genomics 100-200-users 2019Removing reference bias in ancient DNA data analysis by mapping to a sequence variation graph, bioRxiv, 2019-09-27
AbstractBackgroundDuring the last decade, the analysis of ancient DNA (aDNA) sequence has become a powerful tool for the study of past human populations. However, the degraded nature of aDNA means that aDNA sequencing reads are short, single-ended and frequently mutated by post-mortem chemical modifications. All these features decrease read mapping accuracy and increase reference bias, in which reads containing non-reference alleles are less likely to be mapped than those containing reference alleles. Recently, alternative approaches for read mapping and genetic variation analysis have been developed that replace the linear reference by a variation graph which includes all the alternative variants at each genetic locus. Here, we evaluate the use of variation graph software vg to avoid reference bias for ancient DNA.ResultsWe used vg to align multiple previously published aDNA samples to a variation graph containing 1000 Genome Project variants, and compared these with the same data aligned with bwa to the human linear reference genome. We show that use of vg leads to a much more balanced allelic representation at polymorphic sites and better variant detection in comparison with bwa, especially in the presence of post-mortem changes, effectively removing reference bias. A recently published approach that filters bwa alignments using modified reads also removes bias, but has lower sensitivity than vg.ConclusionsOur findings demonstrate that aligning aDNA sequences to variation graphs allows recovering a higher fraction of non-reference variation and effectively mitigates the impact of reference bias in population genetics analyses using aDNA, while retaining mapping sensitivity.
biorxiv genomics 100-200-users 2019Hackflex low cost Illumina sequencing library construction for high sample counts, bioRxiv, 2019-09-23
ABSTRACTWe developed Hackflex, a low-cost method for the production of Illumina-compatible sequencing libraries that allows up to 11 times more libraries for high-throughput Illumina sequencing to be generated at a fixed cost. We call this new method Hackflex. Quality of library preparation was tested by constructing libraries from E. coli MG1655 genomic DNA using either Hackflex, standard Nextera Flex or a variation of standard Nextera Flex in which the bead-linked transposase is diluted prior to use. We demonstrated that Hackflex can produce high quality libraries and yields a highly uniform coverage, equivalent to the standard Nextera Flex kit. Using Hackflex, we were able to achieve a per sample reagent cost of library prep of A$8.66, which is 8.23 times lower than the Standard Nextera Flex protocol at advertised retail price. An additional simple modification to the protocol enables a further price reduction of up to 11 fold or about A$6.50sample. This method will allow researchers to construct more libraries within a given budget, thereby yielding more data and facilitating research programs where sequencing large numbers of libraries is beneficial.
biorxiv genomics 200-500-users 2019Gene capture by transposable elements leads to epigenetic conflict, bioRxiv, 2019-09-21
ABSTRACTPlant transposable elements (TEs) regularly capture fragments of host genes. When the host employs siRNAs to silence these TEs, siRNAs homologous to the captured regions may target both the TEs and the genes, potentially leading to their silencing. This epigenetic cross-talk establishes an intragenomic conflict silencing the TEs comes with the potential cost of silencing the genes. If the genes are important, however, natural selection will act to maintain function by moderating the silencing response. Such moderation may advantage the TEs. Here, we examined the potential for these epigenetic conflicts by focusing on three TE families in maize - Helitrons, Pack-MULEs and Sirevirus LTR retrotransposons. We documented 1,508 TEs with fragments captured from 2,019 donor genes and characterized the epigenetic profiles of both. Consistent with epigenetic conflict, donor genes mapped more siRNAs and were more methylated than ‘free’ genes that had no evidence of exon capture. However, these patterns differed between syntelog vs. transposed donor genes. Syntelog genes appeared to maintain function, consistent with moderation of the epigenetic response for important genes before reaching a deleterious threshold, while transposed genes bore the signature of silencing and potential pseudogenization. Intriguingly, transposed genes were overrepresented among donor genes, suggesting a link between capture and gene movement. We also investigated the potential for TEs to gain an advantage. TEs with captured fragments were older, mapped fewer siRNAs and had lower levels of methylation than ‘free’ TEs without gene fragments, but they showed no obvious evidence of increased copy numbers. Altogether, our results demonstrate that TE capture triggers an epigenetic conflict when genes are important, contrasting the loss of function for genes that are not under strong selective constraint. The evidence for an advantage to TEs is currently less obvious.
biorxiv genomics 0-100-users 2019A multi-tissue transcriptome analysis of human metabolites guides the interpretability of associations based on multi-SNP models for gene expression, bioRxiv, 2019-09-20
AbstractThere is particular interest in transcriptome-wide association studies (TWAS) - gene-level tests based on multi-SNP predictive models of gene expression - for identifying causal genes at loci associated with complex traits. However, interpretation of TWAS associations may be complicated by divergent effects of model SNPs on trait phenotype and gene expression. We developed an iterative modelling scheme for obtaining multi-SNP models of gene expression and applied this framework to generate expression models for 43 human tissues from the Genotype-Tissues Expression (GTEx) Project. We characterized the performance of single- and multi-SNP TWAS models for identifying causal genes in GWAS data for 46 circulating metabolites. We show that (a) multi-SNP models captured more variation in expression than the top cis-eQTL (median 2 fold improvement); (b) predicted expression based on multi-SNP models was associated (FDR<0.01) with metabolite levels for 826 unique gene-metabolite pairs, but, after step-wise conditional analyses, 90% were dominated by a single eQTL SNP; (c) amongst the 35% of associations where a SNP in the expression model was a significant cis-eQTL and metabolomic-QTL (met-QTL), 92% demonstrated colocalization between these signals, but interpretation was often complicated by incomplete overlap of QTLs in multi-SNP models; (d) using a “truth” set of causal genes at 61 met-QTLs, the sensitivity was high (67%), but the positive predictive value was low, as only 8% of TWAS associations at met-QTLs involved true causal genes. These results guide the interpretation of TWAS and highlight the need for corroborative data to provide confident assignment of causality.
biorxiv genomics 100-200-users 2019