Extensive loss of cell cycle and DNA repair genes in an ancient lineage of bipolar budding yeasts, bioRxiv, 2019-02-12
Cell cycle checkpoints and DNA repair processes protect organisms from potentially lethal mutational damage. Compared to other budding yeasts in the subphylum Saccharomycotina, we noticed that a lineage in the genus Hanseniaspora exhibited very high evolutionary rates, low GC content, small genome sizes, and lower gene numbers. To better understand Hanseniaspora evolution, we analyzed 25 genomes, including 11 newly sequenced, representing 18 21 known species in the genus. Our phylogenomic analyses identify two Hanseniaspora lineages, the fast-evolving lineage (FEL), which began diversifying ~87 million years ago (mya), and the slow-evolving lineage (SEL), which began diversifying ~54 mya. Remarkably, both lineages lost genes associated with the cell cycle and genome integrity, but these losses were greater in the FEL. For example, all species lost the cell cycle regulator WHI5, and the FEL lost components of the spindle checkpoint pathway (e.g., MAD1, MAD2) and DNA damage checkpoint pathway (e.g., MEC3, RAD9). Similarly, both lineages lost genes involved in DNA repair pathways, including the DNA glycosylase gene MAG1, which is part of the base excision repair pathway, and the DNA photolyase gene PHR1, which is involved in pyrimidine dimer repair. Strikingly, the FEL lost 33 additional genes, including polymerases (i.e., POL4 and POL32) and telomere-associated genes (e.g., RIF1, RFA3, CDC13, PBP2). Echoing these losses, molecular evolutionary analyses reveal that, compared to the SEL, the FEL stem lineage underwent a burst of accelerated evolution, which resulted in greater mutational loads, homopolymer instabilities, and higher fractions of mutations associated with the common endogenously damaged base, 8-oxoguanine. We conclude that Hanseniaspora is an ancient lineage that has diversified and thrived, despite lacking many otherwise highly conserved cell cycle and genome integrity genes and pathways, and may represent a novel system for studying cellular life without them.
biorxiv evolutionary-biology 0-100-users 2019Full-length mRNA sequencing reveals principles of poly(A) tail length control, bioRxiv, 2019-02-12
Although mRNAs are key molecules for understanding life, there exists no method to determine the full-length sequence of endogenous mRNAs including their poly(A) tails. Moreover, although poly(A) tails can be modified in functionally important ways, there also exists no method to accurately sequence them. Here, we present FLAM-seq, a rapid and simple method for high-quality sequencing of entire mRNAs. We report a cDNA library preparation method coupled to single-molecule sequencing to perform FLAM-seq. Using human cell lines, brain organoids, and C. elegans we show that FLAM-seq delivers high-quality full-length mRNA sequences for thousands of different genes per sample. We find that (a) 3' UTR length is correlated with poly(A) tail length, (b) alternative polyadenylation sites and alternative promoters for the same gene are linked to different tail lengths, (c) tails contain a significant number of cytosines. Thus, we provide a widely useful method and fundamental insights into poly(A) tail regulation.
biorxiv systems-biology 100-200-users 2019Predictive neural processing in adult zebrafish depends on shank3b, bioRxiv, 2019-02-12
Intelligent behavior requires a comparison between the predicted and the actual consequences of behavioral actions. According to the theory of predictive processing, this comparison relies on a neuronal error signal that reflects the mismatch between an internal prediction and sensory input. Inappropriate error signals may generate pathological experiences in neuropsychiatric conditions. To examine the processing of sensorimotor prediction errors across different telencephalic brain areas we optically measured neuronal activity in head-fixed, adult zebrafish in a virtual reality. Brief perturbations of visuomotor feedback triggered distinct changes in swimming behavior and different neuronal responses. Neuronal activity reflecting sensorimotor mismatch, rather than sensory input or motor output alone, was prominent throughout multiple forebrain areas. This activity preceded and predicted the transition in motor behavior. Error signals were altered in specific forebrain regions by a mutation in the autism-related gene shank3b. Predictive processing is therefore a widespread phenomenon that may contribute to disease phenotypes.
biorxiv neuroscience 0-100-users 2019SyRI identification of syntenic and rearranged regions from whole-genome assemblies, bioRxiv, 2019-02-12
AbstractWe present SyRI, an efficient tool for genome-wide identification of structural rearrangements (SR) from genome graphs, which are built up from pair-wise whole-genome alignments. Instead of searching for differences, SyRI starts by finding all co-linear regions between the genomes. As all remaining regions are SRs by definition, they can be classified as inversions, translocations, or duplications based on their positions in convoluted networks of repetitive alignments. Finally, SyRI reports local variations like SNPs and indels within syntenic and rearranged regions. We show SyRI’s broad applicability to multiple species and genetically validate the presence of ∽100 translocations identified in Arabidopsis.
biorxiv bioinformatics 100-200-users 2019Transposable elements contribute to dynamic genome content in maize, bioRxiv, 2019-02-12
Transposable elements (TEs) are ubiquitous components of eukaryotic genomes and can create variation in genomic organization. The majority of maize genomes are composed of TEs. We developed an approach to define shared and variable TE insertions across genome assemblies and applied this method to four maize genomes (B73, W22, Mo17, and PH207). Among these genomes we identified 1.6 Gb of variable TE sequence representing a combination of recent TE movement and deletion of previously existing TEs. Although recent TE movement only accounted for a portion of the TE variability, we identified 4,737 TEs unique to one genome with defined insertion sites in all other genomes. Variable TEs are found for all superfamilies and are distributed across the genome, including in regions of recent shared ancestry among individuals. There are 2,380 genes annotated in the B73 genome located within variable TEs, providing evidence for the role of TEs in contributing to the substantial differences in gene content among these genotypes. The large scope of TE variation present in this limited sample of temperate maize genomes highlights the major contribution of TEs in driving variation in genome organization and gene content.
biorxiv genomics 100-200-users 2019Environmental DNA (eDNA) metabarcoding of pond water as a tool to survey conservation and management priority mammals, bioRxiv, 2019-02-11
Abstract<jatslist list-type=order><jatslist-item>Environmental DNA (eDNA) metabarcoding is largely used to survey aquatic communities, but can also provide data on terrestrial taxa utilising aquatic habitats. However, the entry, dispersal, and detection of terrestrial species’ DNA within waterbodies is understudied.<jatslist-item><jatslist-item>We evaluated eDNA metabarcoding of pond water for monitoring semi-aquatic, ground-dwelling, and arboreal mammals, and examined spatiotemporal variation in mammal eDNA signals using experiments in captive and wild conditions.<jatslist-item><jatslist-item>We selected nine focal species of conservation and management concern European water vole, European otter, Eurasian beaver, European hedgehog, European badger, red deer, Eurasian lynx, red squirrel, and European pine marten. We hypothesised that eDNA signals (i.e. proportional read counts) would be stronger for semi-aquatic than terrestrial species, and at sites where mammals exhibited behaviours (e.g. swimming, urination). We tested this by sampling waterbodies in enclosures of captive focal species at specific sites where behaviours had been observed (‘directed’ sampling) and at equidistant intervals along the shoreline (‘stratified’ sampling). We then surveyed natural ponds (N = 6) where focal species were present using stratified water sampling, camera traps, and field signs. eDNA samples were metabarcoded using vertebrate-specific primers.<jatslist-item><jatslist-item>All focal species were detected in captivity. eDNA signal strength did not differ between directed and stratified samples across or within species, between species lifestyles (i.e. semi-aquatic, ground-dwelling, arboreal), or according to behaviours. Therefore, eDNA was evenly distributed within artificial waterbodies. Conversely, eDNA was unevenly distributed in natural ponds. eDNA metabarcoding, camera trapping, and field signs detected beaver, red deer, and roe deer. Badger and red fox were recorded with cameras and field signs, but not eDNA metabarcoding. However, eDNA metabarcoding detected small mammals missed by cameras and field signs, e.g. water vole. Terrestrial mammal eDNA signals were weaker and detected in fewer samples than semi-aquatic mammal eDNA signals.<jatslist-item><jatslist-item>eDNA metabarcoding has potential for inclusion in mammal monitoring schemes by enabling large-scale, multi-species distribution assessment for priority and difficult to survey species, and could provide early indication of range expansions or contractions. However, eDNA surveys need high spatiotemporal resolution and metabarcoding biases require further investigation before this tool is routinely implemented.<jatslist-item>
biorxiv molecular-biology 0-100-users 2019