genomics | audiences

Genomic analysis reveals a functional role for myocardial trabeculae in adults, bioRxiv, 2019-02-23

Since being first described by Leonardo da Vinci in 1513 it has remained an enigma why the endocardial surfaces of the adult heart retain a complex network of muscular trabeculae - with their persistence thought to be a vestige of embryonic development. For causative physiological inference we harness population genomics, image-based intermediate phenotyping and in silico modelling to determine the effect of this complex cardiovascular trait on function. Using deep learning-based image analysis we identified genetic associations with trabecular complexity in 18,097 UK Biobank participants which were replicated in an independently measured cohort of 1,129 healthy adults. Genes in these associated regions are enriched for expression in the fetal heart or vasculature and implicate loci associated with haemodynamic phenotypes and developmental pathways. A causal relationship between increasing trabecular complexity and both ventricular performance and electrical activity are supported by complementary biomechanical simulations and Mendelian randomisation studies. These findings show that myocardial trabeculae are a previously-unrecognised determinant of cardiovascular physiology in adult humans.

biorxiv genomics 0-100-users 2019

Pancreas patch-seq links physiologic dysfunction in diabetes to single-cell transcriptomic phenotypes, bioRxiv, 2019-02-21

Pancreatic islet cells regulate glucose homeostasis through insulin and glucagon secretion; dysfunction of these cells leads to severe diseases like diabetes. Prior single-cell transcriptome studies have shown heterogeneous gene expression in major islet cell-types; however it remains challenging to reconcile this transcriptomic heterogeneity with observed islet cell functional variation. Here we achieved electrophysiological profiling and single-cell RNA sequencing in the same islet cell (pancreas patch-seq) thereby linking transcriptomic phenotypes to physiologic properties. We collected 1,369 cells from the pancreas of donors with or without diabetes and assessed function-gene expression networks. We identified a set of genes and pathways that drive functional heterogeneity in β-cells and used these to predict β-cell electrophysiology. We also report specific transcriptional programs that correlate with dysfunction in type 2 diabetes (T2D) and extend this approach to cryopreserved cells from donors with type 1 diabetes (T1D), generating a valuable resource for understanding islet cell heterogeneity in health and disease.

biorxiv genomics 0-100-users 2019

Transcript expression-aware annotation improves rare variant discovery and interpretation, bioRxiv, 2019-02-19

The acceleration of DNA sequencing in patients and population samples has resulted in unprecedented catalogues of human genetic variation, but the interpretation of rare genetic variants discovered using such technologies remains extremely challenging. A striking example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Through manual curation of putative loss of function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD), we show that one explanation for this paradox involves alternative mRNA splicing, which allows exons of a gene to be expressed at varying levels across cell types. Currently, no existing annotation tool systematically incorporates this exon expression information into variant interpretation. Here, we develop a transcript-level annotation metric, the proportion expressed across transcripts (pext), which summarizes isoform quantifications for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression project (GTEx) and show that it clearly differentiates between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.4% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder (ASD) and developmental disorders and intellectual disability (DDID) to show that pLoF variants in weakly expressed regions have effect sizes similar to those of synonymous variants, while pLoF variants in highly expressed exons are most strongly enriched among cases versus controls. Our annotation is fast, flexible, and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for rare disease diagnosis, rare variant burden analyses in complex disorders, and curation and prioritization of variants in recall-by-genotype studies.

biorxiv genomics 200-500-users 2019

Comparative performance of the BGI and Illumina sequencing technology for single-cell RNA-sequencing, bioRxiv, 2019-02-17

The libraries generated by high-throughput single cell RNA-sequencing platforms such as the Chromium from 10x Genomics require considerable amounts of sequencing, typically due to the large number of cells. The ability to use this data to address biological questions is directly impacted by the quality of the sequence data. Here we have compared the performance of the Illumina NextSeq 500 and NovaSeq 6000 against the BGI MGISEQ-2000 platform using identical Single Cell libraries consisting of over 70,000 cells. Our results demonstrate a highly comparable performance between the NovaSeq 6000 and MGISEQ-2000 in sequencing quality, and cell, UMI, and gene detection. However, compared with the NextSeq 500, the MGISEQ- 2000 platform performs consistently better, identifying more cells, genes, and UMIs at equalised read depth. We were able to call an additional 1,065,659 SNPs from sequence data generated by the BGI platform, enabling an additional 14% of cells to be assigned to the correct donor from a multiplexed library. However, both the NextSeq 500 and MGISEQ-2000 detected similar frequencies of gRNAs from a pooled CRISPR single cell screen. Our study provides a benchmark for high capacity sequencing platforms applied to high-throughput single cell RNA-seq libraries.

biorxiv genomics 100-200-users 2019

A high-resolution, chromosome-assigned Komodo dragon genome reveals adaptations in the cardiovascular, muscular, and chemosensory systems of monitor lizards, bioRxiv, 2019-02-16

Monitor lizards are unique among ectothermic reptiles in that they have a high aerobic capacity and distinctive cardiovascular physiology which resembles that of endothermic mammals. We have sequenced the genome of the Komodo dragon (Varanus komodoensis), the largest extant monitor lizard, and present a high resolution de novo chromosome-assigned genome assembly for V. komodoensis, generated with a hybrid approach of long-range sequencing and single molecule physical mapping. Comparing the genome of V. komodoensis with those of related species showed evidence of positive selection in pathways related to muscle energy metabolism, cardiovascular homeostasis, and thrombosis. We also found species-specific expansions of a chemoreceptor gene family related to pheromone and kairomone sensing in V. komodoensis and several other lizard lineages. Together, these evolutionary signatures of adaptation reveal genetic underpinnings of the unique Komodo sensory, cardiovascular, and muscular systems, and suggest that selective pressure altered thrombosis genes to help Komodo dragons evade the anticoagulant effects of their own saliva. As the only sequenced monitor lizard genome, the Komodo dragon genome is an important resource for understanding the biology of this lineage and of reptiles worldwide.

biorxiv genomics 100-200-users 2019

Transposable elements contribute to dynamic genome content in maize, bioRxiv, 2019-02-12

Transposable elements (TEs) are ubiquitous components of eukaryotic genomes and can create variation in genomic organization. The majority of maize genomes are composed of TEs. We developed an approach to define shared and variable TE insertions across genome assemblies and applied this method to four maize genomes (B73, W22, Mo17, and PH207). Among these genomes we identified 1.6 Gb of variable TE sequence representing a combination of recent TE movement and deletion of previously existing TEs. Although recent TE movement only accounted for a portion of the TE variability, we identified 4,737 TEs unique to one genome with defined insertion sites in all other genomes. Variable TEs are found for all superfamilies and are distributed across the genome, including in regions of recent shared ancestry among individuals. There are 2,380 genes annotated in the B73 genome located within variable TEs, providing evidence for the role of TEs in contributing to the substantial differences in gene content among these genotypes. The large scope of TE variation present in this limited sample of temperate maize genomes highlights the major contribution of TEs in driving variation in genome organization and gene content.

biorxiv genomics 100-200-users 2019