Long Read Annotation (LoReAn) automated eukaryotic genome annotation based on long-read cDNA sequencing, bioRxiv, 2017-12-09
AbstractSingle-molecule full-length cDNA sequencing can aid genome annotation by revealing transcript structure and alternative splice-forms, yet current annotation pipelines do not incorporate such information. Here we present LoReAn (Long Read Annotation) software, an automated annotation pipeline utilizing short- and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal and two plant genomes, we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA sequencing data generated from either the PacBio or MinION sequencing platforms, and correctly predicting gene structure and capturing genes missed by other annotation pipelines.
biorxiv bioinformatics 0-100-users 2017LRX- and FER-dependent extracellular sensing coordinates vacuolar size for cytosol homeostasis, bioRxiv, 2017-12-09
Cellular elongation requires the defined coordination of intra- and extracellular processes. The vacuole is the biggest plant organelle and its dimension has a role in limiting cell expansion (Löfke et al., 2015; Scheuring et al., 2016). We reveal that the increase in vacuolar occupancy enables cellular elongation with relatively little enlargement of the cytosole. It remains, however, completely unknown how the vacuolar size is coordinated with other growth-relevant processes. Intriguingly, we show that extracellular constraints impact on the intracellular expansion of the vacuole. The underlying cell wall sensing mechanism requires the interaction of the extracellular leucine-rich repeat extensin (LRX) with the receptor-like kinase Feronia (FER). Our data suggests that LRX links the plasma membrane localised FER with the cell wall, allowing this module to jointly sense and convey extracellular signals to the underlying cell. This mechanism coordinates cell wall acidificationloosening with the increase in vacuolar size, contributing cytosol homeostasis during plant cell expansion.
biorxiv plant-biology 0-100-users 2017Resolving the Full Spectrum of Human Genome Variation using Linked-Reads, bioRxiv, 2017-12-09
AbstractLarge-scale population based analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short read whole genome sequencing. However, standard short-read approaches, used primarily due to accuracy, throughput and costs, fail to give a complete picture of a genome. They struggle to identify large, balanced structural events, cannot access repetitive regions of the genome and fail to resolve the human genome into its two haplotypes. Here we describe an approach that retains long range information while harnessing the advantages of short reads. Starting from only ∼1ng of DNA, we produce barcoded short read libraries. The use of novel informatic approaches allows for the barcoded short reads to be associated with the long molecules of origin producing a novel datatype known as ‘Linked-Reads’. This approach allows for simultaneous detection of small and large variants from a single Linked-Read library. We have previously demonstrated the utility of whole genome Linked-Reads (lrWGS) for performing diploid, de novo assembly of individual genomes (Weisenfeld et al. 2017). In this manuscript, we show the advantages of Linked-Reads over standard short read approaches for reference based analysis. We demonstrate the ability of Linked-Reads to reconstruct megabase scale haplotypes and to recover parts of the genome that are typically inaccessible to short reads, including phenotypically important genes such as STRC, SMN1 and SMN2. We demonstrate the ability of both lrWGS and Linked-Read Whole Exome Sequencing (lrWES) to identify complex structural variations, including balanced events, single exon deletions, and single exon duplications. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.
biorxiv genomics 0-100-users 2017Continuous addition of progenitors forms the cardiac ventricle in zebrafish, bioRxiv, 2017-12-08
AbstractThe vertebrate heart develops from several progenitor lineages. After early-differentiating first heart field (FHF) progenitors form the linear heart tube, late-differentiating second heart field (SHF) progenitors extend atrium, ventricle, and form the inflow and outflow tracts (IFTOFT). However, the position and migration of late-differentiating progenitors during heart formation remains unclear. Here, we tracked zebrafish heart development using transgenics based on the cardiopharyngeal transcription factor gene tbx1. Live-imaging uncovered a tbx1 reporter-expressing cell sheath that from anterior lateral plate mesoderm continuously disseminates towards the forming heart tube. High-speed imaging and optogenetic lineage tracing corroborated that the zebrafish ventricle forms through continuous addition from the undifferentiated progenitor sheath followed by late-phase accrual of the bulbus arteriosus (BA). FGF inhibition during sheath migration reduced ventricle size and abolished BA formation, refining the window of FGF action during OFT formation. Our findings consolidate previous end-point analyses and establish zebrafish ventricle formation as a continuous process.
biorxiv developmental-biology 0-100-users 2017The rust fungus Melampsora larici-populina expresses a conserved genetic program and distinct sets of secreted protein genes during infection of its two host plants, larch and poplar, bioRxiv, 2017-12-07
SummaryMechanims required for broad spectrum or specific host colonization of plant parasites are poorly understood. As a perfect illustration, heteroecious rust fungi require two alternate host plants to complete their life cycle. Melampsora larici-populina infects two taxonomically unrelated plants, larch on which sexual reproduction is achieved and poplar on which clonal multiplication occurs leading to severe epidemics in plantations. High-depth RNA sequencing was applied to three key developmental stages of M. larici-populina infection on larch basidia, pycnia and aecia. Comparative transcriptomics of infection on poplar and larch hosts was performed using available expression data. Secreted protein was the only significantly over-represented category among differentially expressed M. larici-populina genes in basidia, pycnia and aecia compared together, highlighting their probable involvement in the infection process. Comparison of fungal transcriptomes in larch and poplar revealed a majority of rust genes commonly expressed on the two hosts and a fraction exhibiting a host-specific expression. More particularly, gene families encoding small secreted proteins presented striking expression profiles that highlight probable candidate effectors specialized on each host. Our results bring valuable new information about the biological cycle of rust fungi and identify genes that may contribute to host specificity.
biorxiv microbiology 0-100-users 2017k-mer grammar uncovers maize regulatory architecture, bioRxiv, 2017-12-06
ABSTRACTOnly a small percentage of the genome sequence is involved in regulation of gene expression, but to biochemically identify this portion is expensive and laborious. In species like maize, with diverse intergenic regions and lots of repetitive elements, this is an especially challenging problem. While regulatory regions are rare, they do have characteristic chromatin contexts and sequence organization (the grammar) with which they can be identified. We developed a computational framework to exploit this sequence arrangement. The models learn to classify regulatory regions based on sequence features - k-mers. To do this, we borrowed two approaches from the field of natural language processing (1) “bag-of-words” which is commonly used for differentially weighting key words in tasks like sentiment analyses, and (2) a vector-space model using word2vec (vector-k-mers), that captures semantic and linguistic relationships between words. We built “bag-of-k-mers” and “vector-k-mers” models that distinguish between regulatory and non-regulatory regions with an accuracy above 90%. Our “bag-of-k-mers” achieved higher overall accuracy, while the “vector-k-mers” models were more useful in highlighting key groups of sequences within the regulatory regions. These models now provide powerful tools to annotate regulatory regions in other maize lines beyond the reference, at low cost and with high accuracy.
biorxiv plant-biology 0-100-users 2017