How to make a rodent giant Genomic basis and tradeoffs of gigantism in the capybara, the world’s largest rodent, bioRxiv, 2018-09-23
AbstractGigantism is the result of one lineage within a clade evolving extremely large body size relative to its small-bodied ancestors, a phenomenon observed numerous times in animals. Theory predicts that the evolution of giants should be constrained by two tradeoffs. First, because body size is negatively correlated with population size, purifying selection is expected to be less efficient in species of large body size, leading to a genome-wide elevation of the ratio of non-synonymous to synonymous substitution rates (dNdS) or mutation load. Second, gigantism is achieved through higher number of cells and higher rates of cell proliferation, thus increasing the likelihood of cancer. However, the incidence of cancer in gigantic animals is lower than the theoretical expectation, a phenomenon referred to as Peto’s Paradox. To explore the genetic basis of gigantism in rodents and uncover genomic signatures of gigantism-related tradeoffs, we sequenced the genome of the capybara, the world’s largest living rodent. We found that dNdS is elevated genome wide in the capybara, relative to other rodents, implying a higher mutation load. Conversely, a genome-wide scan for adaptive protein evolution in the capybara highlighted several genes involved in growth regulation by the insulininsulin-like growth factor signaling (IIS) pathway. Capybara-specific gene-family expansions included a putative novel anticancer adaptation that involves T cell-mediated tumor suppression, offering a potential resolution to Peto’s Paradox in this lineage. Gene interaction network analyses also revealed that size regulators function simultaneously as growth factors and oncogenes, creating an evolutionary conflict. Based on our findings, we hypothesize that gigantism in the capybara likely involved three evolutionary steps 1) Increase in body size by cell proliferation through the ISS pathway, 2) coupled evolution of growth-regulatory and cancer-suppression mechanisms, possibly driven by intragenomic conflict, and 3) establishment of the T cell-mediated tumor suppression pathway as an anticancer adaptation. Interestingly, increased mutation load appears to be an inevitable outcome of an increase in body size.Author SummaryThe existence of gigantic animals presents an evolutionary puzzle. Larger animals have more cells and undergo exponentially more cell divisions, thus, they should have enormous rates of cancer. Moreover, large animals also have smaller populations making them vulnerable to extinction. So, how do gigantic animals such as elephants and blue whales protect themselves from cancer, and what are the consequences of evolving a large size on the ‘genetic health’ of a species? To address these questions we sequenced the genome of the capybara, the world’s largest rodent, and performed comparative genomic analyses to identify the genes and pathways involved in growth regulation and cancer suppression. We found that the insulin-signaling pathway was involved in the evolution of gigantism in the capybara. We also found a putative novel anticancer mechanism mediated by the detection of tumors by T-cells, offering a potential solution to how capybaras mitigated the tradeoff imposed by cancer. Furthermore, we show that capybara genome harbors a higher proportion of slightly deleterious mutations relative to all other rodent genomes. Overall, this study provides insights at the genomic level into the evolution of a complex and extreme phenotype, and offers a detailed picture of how the evolution of a giant body size in the capybara has shaped its genome.
biorxiv evolutionary-biology 100-200-users 2018Parliament2 Fast Structural Variant Calling Using Optimized Combinations of Callers, bioRxiv, 2018-09-23
AbstractHere we present Parliament2 – a structural variant caller which combines multiple best-in-class structural variant callers to create a highly accurate callset. This captures more events than the individual callers achieve independently. Parliament2 uses a call-overlap-genotype approach that is highly extensible to new methods and presents users the choice to run some or all of Breakdancer, Breakseq, CNVnator, Delly, Lumpy, and Manta to run. Parliament2 applies an additional parallelization framework to speed certain callers and executes these in parallel, taking advantage of the different resource requirements to complete structural variant calling much faster than running the programs individually. Parliament2 is available as a Docker container, which pre-installs all required dependencies. This allows users to run any caller with easy installation and execution. This Docker container can easily be deployed in cloud or local environments and is available as an app on DNAnexus.
biorxiv bioinformatics 0-100-users 2018Revealing multi-scale population structure in large cohorts, bioRxiv, 2018-09-23
Genetic structure in large cohorts results from technical, sampling and demographic variation. Visualisation is therefore a first step in most genomic analyses. However, existing data exploration methods struggle with unbalanced sampling and the many scales of population structure. We investigate an approach to dimension reduction of genomic data that combines principal components analysis (PCA) with uniform manifold approximation and projection (UMAP) to succinctly illustrate population structure in large cohorts and capture their relationships on local and global scales. Using data from large-scale genomic datasets, we demonstrate that PCA-UMAP effectively clusters closely related individuals while placing them in a global continuum of genetic variation. This approach reveals previously overlooked subpopulations within the American Hispanic population and fine-scale relationships between geography, genotypes, and phenotypes in the UK population. This opens new lines of investigation for demographic research and statistical genetics. Given its small computational cost, PCA-UMAP also provides a general-purpose approach to exploratory analysis in population-scale datasets.
biorxiv genomics 100-200-users 2018Vision using multiple distinct rod opsins in deep-sea fishes, bioRxiv, 2018-09-23
AbstractVertebrate vision is accomplished through a set of light-sensitive photopigments, which are located in the photoreceptors of the retina and consist of a visual opsin protein bound to a chromophore. In dim-light, vertebrates generally rely upon a single rod opsin (RH1) for obtaining visual information. By inspecting 101 fish genomes, we found that three deep-sea teleost lineages have independently expanded their RH1 gene repertoires. Amongst these, the silver spinyfin (Diretmus argenteus Johnson 1863) stands out as having the highest number of visual opsins known for animals to date (2 cone and 38 rod opsins). Spinyfins simultaneously express up to 14 RH1s encoding for photopigments with different peak spectral sensitivities (λmax=448-513 nm) that cover the range of the residual daylight, as well as the bioluminescence spectrum present in the deep-sea. Our findings present novel molecular and functional evidence for the recurrent evolution of multiple rod opsin-based vision in vertebrates.SHORT ABSTRACTContrary to the single rod opsin used by most vertebrates, some fishes use multiple rod opsins for vision in the dimly lit deep-sea.
biorxiv evolutionary-biology 0-100-users 2018A Multi-Domain Task Battery Reveals Functional Boundaries in the Human Cerebellum, bioRxiv, 2018-09-21
AbstractThere is compelling evidence that the human cerebellum is engaged in a wide array of motor and cognitive tasks. A fundamental question centers on whether the cerebellum is organized into distinct functional sub-regions. To address this question, we employed a rich task battery, designed to tap into a broad range of cognitive processes. During four functional magnetic resonance imaging (fMRI) sessions, participants performed a battery of 26 diverse tasks comprising 47 unique conditions. Using the data from this multi-domain task battery (MDTB), we derived a comprehensive functional parcellation of the cerebellar cortex and evaluated it by predicting functional boundaries in a novel set of tasks. The new parcellation successfully identified distinct functional sub-regions, providing significant improvements over existing parcellations derived from task-free data. Lobular boundaries, commonly used to summarize functional data, did not coincide with functional subdivisions. This multi-domain task approach offers novel insights into the functional heterogeneity of the cerebellar cortex.
biorxiv neuroscience 100-200-users 2018Comparison of Efficiency and Specificity of CRISPR-Associated (Cas) Nucleases in Plants An Expanded Toolkit for Precision Genome Engineering, bioRxiv, 2018-09-21
Molecular tools adapted from bacterial CRISPR (Clustered Regulatory Interspaced Short Palindromic Repeats) systems for adaptive immunity have become widely used for plant genome engineering, both to investigate gene functions and to engineer desirable traits. A number of different Cas (CRISPR-associated) nucleases are now used but, as most studies performed to date have engineered different targets using a variety of plant species and molecular tools, it has been difficult to draw conclusions about the comparative performance of different nucleases. Due to the time and effort required to regenerate engineered plants, efficiency is critical. In addition, there have been several reports of mutations at sequences with less than perfect identity to the target. While in some plant species it is possible to remove these so-called ‘off-targets’ by backcrossing to a parental line, the specificity of genome engineering tools is important when targeting specific members of closely-related gene families, especially when recent paralogues are co-located in the genome and unlikely to segregate. Specificity is also important for species that take years to reach sexual maturity or that are clonally propagated. Here, we directly compare the efficiency and specificity of Cas nucleases from different bacterial species together with engineered variants of Cas9. We find that the nucleotide content correlates with efficiency and that Cas9 from Staphylococcus aureus is comparatively most efficient at inducing mutations. We also demonstrate that ‘high-fidelity’ variants of Cas9 can reduce off-target mutations in plants. We present these molecular tools as standardised DNA parts to facilitate their re-use.
biorxiv plant-biology 0-100-users 2018