High molecular weight DNA isolation method from diverse plant species for use with Oxford Nanopore sequencing, bioRxiv, 2019-09-27
AbstractThe ability to generate long reads on the Oxford Nanopore Technologies sequencing platform is dependent on the isolation of high molecular weight DNA free of impurities. For some taxa, this is relatively straightforward; however, for plants, the presence of cell walls and a diverse set of specialized metabolites such as lignin, phenolics, alkaloids, terpenes, and flavonoids present significant challenges in the generation of DNA suitable for production of long reads. Success in generating long read lengths and genome assemblies of plants has been reported using diverse DNA isolation methods, some of which were tailored to the target species andor required extensive labor. To avoid the need to optimize DNA isolation for each species, we developed a taxa-independent DNA isolation method that is relatively simple and efficient. This method expands on the Oxford Nanopore Technologies high molecular weight genomic DNA protocol from plant leaves and utilizes a conventional cetyl trimethylammonium bromide extraction followed by removal of impurities and short DNA fragments using commercially available kits that yielded robust N50 read lengths and yield on Oxford Nanopore Technologies flow cells.
biorxiv plant-biology 100-200-users 2019Removing reference bias in ancient DNA data analysis by mapping to a sequence variation graph, bioRxiv, 2019-09-27
AbstractBackgroundDuring the last decade, the analysis of ancient DNA (aDNA) sequence has become a powerful tool for the study of past human populations. However, the degraded nature of aDNA means that aDNA sequencing reads are short, single-ended and frequently mutated by post-mortem chemical modifications. All these features decrease read mapping accuracy and increase reference bias, in which reads containing non-reference alleles are less likely to be mapped than those containing reference alleles. Recently, alternative approaches for read mapping and genetic variation analysis have been developed that replace the linear reference by a variation graph which includes all the alternative variants at each genetic locus. Here, we evaluate the use of variation graph software vg to avoid reference bias for ancient DNA.ResultsWe used vg to align multiple previously published aDNA samples to a variation graph containing 1000 Genome Project variants, and compared these with the same data aligned with bwa to the human linear reference genome. We show that use of vg leads to a much more balanced allelic representation at polymorphic sites and better variant detection in comparison with bwa, especially in the presence of post-mortem changes, effectively removing reference bias. A recently published approach that filters bwa alignments using modified reads also removes bias, but has lower sensitivity than vg.ConclusionsOur findings demonstrate that aligning aDNA sequences to variation graphs allows recovering a higher fraction of non-reference variation and effectively mitigates the impact of reference bias in population genetics analyses using aDNA, while retaining mapping sensitivity.
biorxiv genomics 100-200-users 2019Massive gene presenceabsence variation in the mussel genome as an adaptive strategy first evidence of a pan-genome in Metazoa, bioRxiv, 2019-09-26
AbstractMussels are ecologically and economically relevant edible marine bivalves, highly invasive and resilient to biotic and abiotic stressors causing recurrent massive mortalities in other species. Here we show that the Mediterranean mussel Mytilus galloprovincialis has a complex pan-genomic architecture, which includes a core set of 45,000 genes shared by all individuals plus a surprisingly high number of dispensable genes (∼15,000). The latter are subject to presenceabsence variation (PAV), i.e., they may be entirely missing in a given individual and, when present, they are frequently found as a single copy. The enrichment of dispensable genes in survival functions suggests an adaptive value for PAV, which might be the key to explain the extraordinary capabilities of adaptation and invasiveness of this species. Our study underpins a unique metazoan pan-genome architecture only previously described in prokaryotes and in a few non-metazoan eukaryotes, but that might also characterize other marine invertebrates.Significance statementIn animals, intraspecific genomic diversity is generally thought to derive from relatively small-scale variants, such as single nucleotide polymorphisms, small indels, duplications, inversions and translocations. On the other hand, large-scale structural variations which involve the loss of genomic regions encoding protein-coding genes in some individuals (i.e. presenceabsence variation, PAV) have been so far only described in bacteria and, occasionally, in plants and fungi. Here we report the first evidence of a pan-genome in the animal kingdom, revealing that 25% of the genes of the Mediterranean mussel are subject to PAV. We show that this unique feature might have an adaptive value, due to the involvement of dispensable genes in functions related with defense and survival.
biorxiv genetics 100-200-users 2019An inducible genome editing system for plants, bioRxiv, 2019-09-23
ABSTRACTConditional manipulation of gene expression is a key approach to investigating the primary function of a gene in a biological process. While conditional and cell-type specific overexpression systems exist for plants, there are currently no systems available to disable a gene completely and conditionally. Here, we present a novel tool with which target genes can be efficiently conditionally knocked out at any developmental stage. The target gene is manipulated using the CRISPR-Cas9 genome editing technology, and conditionality is achieved with the well-established estrogen-inducible XVE system. Target genes can also be knocked-out in a cell-type specific manner. Our tool is easy to construct and will be particularly useful for studying genes which have null-alleles that are non-viable or show strong developmental defects.
biorxiv plant-biology 100-200-users 2019A multi-tissue transcriptome analysis of human metabolites guides the interpretability of associations based on multi-SNP models for gene expression, bioRxiv, 2019-09-20
AbstractThere is particular interest in transcriptome-wide association studies (TWAS) - gene-level tests based on multi-SNP predictive models of gene expression - for identifying causal genes at loci associated with complex traits. However, interpretation of TWAS associations may be complicated by divergent effects of model SNPs on trait phenotype and gene expression. We developed an iterative modelling scheme for obtaining multi-SNP models of gene expression and applied this framework to generate expression models for 43 human tissues from the Genotype-Tissues Expression (GTEx) Project. We characterized the performance of single- and multi-SNP TWAS models for identifying causal genes in GWAS data for 46 circulating metabolites. We show that (a) multi-SNP models captured more variation in expression than the top cis-eQTL (median 2 fold improvement); (b) predicted expression based on multi-SNP models was associated (FDR<0.01) with metabolite levels for 826 unique gene-metabolite pairs, but, after step-wise conditional analyses, 90% were dominated by a single eQTL SNP; (c) amongst the 35% of associations where a SNP in the expression model was a significant cis-eQTL and metabolomic-QTL (met-QTL), 92% demonstrated colocalization between these signals, but interpretation was often complicated by incomplete overlap of QTLs in multi-SNP models; (d) using a “truth” set of causal genes at 61 met-QTLs, the sensitivity was high (67%), but the positive predictive value was low, as only 8% of TWAS associations at met-QTLs involved true causal genes. These results guide the interpretation of TWAS and highlight the need for corroborative data to provide confident assignment of causality.
biorxiv genomics 100-200-users 2019Multiple testing correction over contrasts for brain imaging, bioRxiv, 2019-09-20
AbstractThe multiple testing problem arises not only when there are many voxels or vertices in an image representation of the brain, but also when multiple contrasts of parameter estimates (that is, hypotheses) are tested in the same general linear model. Here we argue that a correction for this multiplicity must be performed to avoid excess of false positives. Various methods have been proposed in the literature, but few have been applied to brain imaging. Here we discuss and compare different methods to make such correction in different scenarios, showing that one classical and well known method is invalid, and argue that permutation is the best option to perform such correction due to its exactness and flexibility to handle a variety of common imaging situations.
biorxiv neuroscience 100-200-users 2019