OrthoFinder phylogenetic orthology inference for comparative genomics, bioRxiv, 2018-11-08
AbstractHere, we present a major advance of the OrthoFinder method. This extends OrthoFinder’s high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted genes trees, gene duplication events, the rooted species tree, and comparative genomic statistics. Each output is benchmarked on appropriate real or simulated datasets and, where comparable methods exist, OrthoFinder is equivalent to or outperforms these methods. Furthermore, OrthoFinder is the most accurate ortholog inference method on the Quest for Orthologs benchmark test. Finally, OrthoFinder’s comprehensive phylogenetic analysis is achieved with equivalent speed and scalability to the fastest, score-based heuristic methods. OrthoFinder is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comdavidemmsOrthoFinder>httpsgithub.comdavidemmsOrthoFinder<jatsext-link>.
biorxiv bioinformatics 200-500-users 2018Rate variation in the evolution of non-coding DNA associated with social evolution in bees, bioRxiv, 2018-11-08
The evolutionary origins of eusociality represent increases in complexity from individual to caste-based, group reproduction. These behavioral transitions have been hypothesized to go hand-in-hand with an increased ability to regulate when and where genes are expressed. Bees have convergently evolved eusociality up to five times, providing a framework to test this hypothesis. To examine potential links between putative gene regulatory elements and social evolution, we compare alignable, non-coding sequences in eleven diverse bee species, encompassing three independent origins of reproductive division of labor and two elaborations of eusocial complexity. We find that rates of evolution in a number of non-coding sequences correlate with key social transitions in bees. Interestingly, while we find little evidence for convergent rate changes associated with independent origins of social behavior, a number of molecular pathways exhibit convergent rate changes in conjunction with subsequent elaborations of social organization. We also present evidence that many novel non-coding regions may have been recruited alongside the origin of sociality in corbiculate bees; these loci could represent gene regulatory elements associated with division of labor within this group. Thus, our findings are consistent with the hypothesis that gene regulatory innovations are associated with the evolution of eusociality and illustrate how a thorough examination of both coding and non-coding sequence can provide a more complete understanding of the molecular mechanisms underlying behavioral evolution.
biorxiv evolutionary-biology 0-100-users 2018A genome-wide algal mutant library reveals a global view of genes required for eukaryotic photosynthesis, bioRxiv, 2018-11-07
Photosynthetic organisms provide food and energy for nearly all life on Earth, yet half of their protein-coding genes remain uncharacterized1,2. Characterization of these genes could be greatly accelerated by new genetic resources for unicellular organisms that complement the use of multicellular plants by enabling higher-throughput studies. Here, we generated a genome-wide, indexed library of mapped insertion mutants for the flagship unicellular alga Chlamydomonas reinhardtii (Chlamydomonas hereafter). The 62,389 mutants in the library, covering 83% of nuclear, protein-coding genes, are available to the community. Each mutant contains unique DNA barcodes, allowing the collection to be screened as a pool. We leveraged this feature to perform a genome-wide survey of genes required for photosynthesis, which identified 303 candidate genes. Characterization of one of these genes, the conserved predicted phosphatase CPL3, showed it is important for accumulation of multiple photosynthetic protein complexes. Strikingly, 21 of the 43 highest-confidence genes are novel, opening new opportunities for advances in our understanding of this biogeochemically fundamental process. This library is the first genome-wide mapped mutant resource in any unicellular photosynthetic organism, and will accelerate the characterization of thousands of genes in algae, plants and animals.
biorxiv genomics 0-100-users 2018AnnoTree visualization and exploration of a functionally annotated microbial tree of life, bioRxiv, 2018-11-06
AbstractBacterial genomics has revolutionized our understanding of the microbial tree of life; however, mapping and visualizing the distribution of functional traits across bacteria remains a challenge. Here, we introduce AnnoTree - an interactive, functionally annotated bacterial tree of life that integrates taxonomic, phylogenetic, and functional annotation data from nearly 24,000 bacterial genomes. AnnoTree enables visualization of millions of precomputed genome annotations across the bacterial phylogeny, thereby allowing users to explore gene distributions as well as patterns of gene gain and loss across bacteria. Using AnnoTree, we examined the phylogenomic distributions of 28,311 geneprotein families, and measured their phylogenetic conservation, patchiness, and lineage-specificity. Our analyses revealed widespread phylogenetic patchiness among bacterial gene families, reflecting the dynamic evolution of prokaryotic genomes. Genes involved in phage infectiondefense, mobile elements, and antibiotic resistance dominated the list of most patchy traits, as well as numerous intriguing metabolic enzymes that appear to have undergone frequent horizontal transfer. We anticipate that AnnoTree will be a valuable resource for exploring gene histories across bacteria, and will act as a catalyst for biological and evolutionary hypothesis generation.
biorxiv bioinformatics 100-200-users 2018Comparative analysis of sequencing technologies platforms for single-cell transcriptomics, bioRxiv, 2018-11-06
AbstractAll single-cell RNA-seq protocols and technologies require library preparation prior to sequencing on a platform such as Illumina. Here, we present the first report to utilize the BGISEQ-500 platform for scRNA-seq, and compare the sensitivity and accuracy to Illumina sequencing. We generate a scRNA-seq resource of 468 unique single-cells and 1,297 matched single cDNA samples, performing SMARTer and Smart-seq2 protocols on mESCs and K562 cells with RNA spike-ins. We sequence these libraries on both BGISEQ-500 and Illumina HiSeq platforms using single- and paired-end reads. The two platforms have comparable sensitivity and accuracy in terms of quantification of gene expression, and low technical variability. Our study provides a standardised scRNA-seq resource to benchmark new scRNA-seq library preparation protocols and sequencing platforms.
biorxiv genomics 0-100-users 2018Exploring neighborhoods in large metagenome assembly graphs reveals hidden sequence diversity, bioRxiv, 2018-11-06
Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comspacegraphcats>httpsgithub.comspacegraphcats<jatsext-link> spacegraphcats under the 3-Clause BSD License.
biorxiv bioinformatics 100-200-users 2018