Gene networks with transcriptional bursting recapitulate rare transient coordinated expression states in cancer, bioRxiv, 2019-07-17
SUMMARYNon-genetic transcriptional variability at the single-cell level is a potential mechanism for therapy resistance in melanoma. Specifically, rare subpopulations of melanoma cells occupy a transient pre-resistant state characterized by coordinated high expression of several genes. Importantly, these rare cells are able to survive drug treatment and develop resistance. How might these extremely rare states arise and disappear within the population? It is unclear whether the canonical stochastic models of probabilistic transcriptional pulsing can explain this behavior, or if it requires special, hitherto unidentified molecular mechanisms. Here we use mathematical modeling to show that a minimal network comprising of transcriptional bursting and interactions between genes can give rise to rare coordinated high states. We next show that although these states occur across networks of different sizes, they depend strongly on three (out of seven) model parameters and require network connectivity to be ≤ 6. Interestingly, we find that while entry into the rare coordinated high state is initiated by a long transcriptional burst that also triggers entry of other genes, the exit from it occurs through the independent inactivation of individual genes. Finally, our model predicts that increased network connectivity can lead to transcriptionally stable states, which we verify using network inference analysis of experimental data. In sum, we demonstrate that established principles of gene regulation are sufficient to describe this new class of rare cell variability and argue for its general existence in other biological contexts.
biorxiv systems-biology 0-100-users 2019Human Genome Assembly in 100 Minutes, bioRxiv, 2019-07-17
AbstractDe novo genome assembly provides comprehensive, unbiased genomic information and makes it possible to gain insight into new DNA sequences not present in reference genomes. Many de novo human genomes have been published in the last few years, leveraging a combination of inexpensive short-read and single-molecule long-read technologies. As long-read DNA sequencers become more prevalent, the computational burden of generating assemblies persists as a critical factor. The most common approach to long-read assembly, using an overlap-layout-consensus (OLC) paradigm, requires all-to-all read comparisons, which quadratically scales in computational complexity with the number of reads. We assert that recently achievements in sequencing technology (i.e. with accuracy ~99% and read length ~10-15k) enables a fundamentally better strategy for OLC that is effectively linear rather than quadratic. Our genome assembly implementation, Peregrine uses sparse hierarchical minimizers (SHIMMER) to index reads thereby avoiding the need for an all-to-all read comparison step. Peregrine can assemble 30x human PacBio CCS read datasets in less than 30 CPU hours and around 100 wall-clock minutes to a high contiguity assembly (N50 > 20Mb). The continued advance of sequencing technologies coupled with the Peregrine assembler enables routine generation of human de novo assemblies. This will allow for population scale measurements of more comprehensive genomic variations -- beyond SNPs and small indels -- as well as novel applications requiring rapid access to de novo assemblies.
biorxiv bioinformatics 100-200-users 2019Longitudinal single cell transcriptomics reveals Krt8+ alveolar epithelial progenitors in lung regeneration, bioRxiv, 2019-07-17
Lung injury activates quiescent stem and progenitor cells to regenerate alveolar structures. The sequence and coordination of transcriptional programs during this process has largely remained elusive. Using single cell RNA-seq, we first generated a whole-organ bird’s-eye view on cellular dynamics and cell-cell communication networks during mouse lung regeneration from ∼30,000 cells at six timepoints. We discovered an injury-specific progenitor cell state characterized by Krt8 in flat epithelial cells covering alveolar surfaces. The number of these cells peaked during fibrogenesis in independent mouse models, as well as in human acute lung injury and fibrosis. Krt8+ progenitors featured a highly distinct connectome of receptor-ligand pairs with endothelial cells, fibroblasts, and macrophages. To ‘sky dive’ into epithelial differentiation dynamics, we sequenced >30,000 sorted epithelial cells at 18 timepoints and computationally derived cell state trajectories that were validated by lineage tracing genetic reporter mice. Airway stem cells within the club cell lineage and alveolar type-2 cells underwent transcriptional convergence onto the same Krt8+ progenitor cell state, which later resolved by terminal differentiation into alveolar type-1 cells. We derived distinct transcriptional regulators as key switch points in this process and show that induction of TNF-alphaNFkappaB, p53, and hypoxia driven gene expression programs precede a Sox4, Ctnnb1, and Wwtr1 driven switch towards alveolar type-1 cell fate. We show that epithelial cell plasticity can induce non-gradual transdifferentiation, involving intermediate progenitor cell states that may persist and promote disease if checkpoint signals for terminal differentiation are perturbed.
biorxiv systems-biology 0-100-users 2019Nanopore direct RNA sequencing maps an Arabidopsis N6 methyladenosine epitranscriptome, bioRxiv, 2019-07-17
AbstractUnderstanding genome organization and gene regulation requires insight into RNA transcription, processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a wild-type accession of the model plant Arabidopsis thaliana and a mutant defective in mRNA methylation (m6A). Here we show that m6A can be mapped in full-length mRNAs transcriptome-wide and reveal the combinatorial diversity of cap-associated transcription start sites, splicing events, poly(A) site choice and poly(A) tail length. Loss of m6A from 3’ untranslated regions is associated with decreased relative transcript abundance and defective RNA 3′ end formation. A functional consequence of disrupted m6A is a lengthening of the circadian period. We conclude that nanopore direct RNA sequencing can reveal the complexity of mRNA processing and modification in full-length single molecule reads. These findings can refine Arabidopsis genome annotation. Further, applying this approach to less well-studied species could transform our understanding of what their genomes encode.
biorxiv molecular-biology 0-100-users 2019Population genomics of the Viking world, bioRxiv, 2019-07-17
AbstractThe Viking maritime expansion from Scandinavia (Denmark, Norway, and Sweden) marks one of the swiftest and most far-flung cultural transformations in global history. During this time (c. 750 to 1050 CE), the Vikings reached most of western Eurasia, Greenland, and North America, and left a cultural legacy that persists till today. To understand the genetic structure and influence of the Viking expansion, we sequenced the genomes of 442 ancient humans from across Europe and Greenland ranging from the Bronze Age (c. 2400 BC) to the early Modern period (c. 1600 CE), with particular emphasis on the Viking Age. We find that the period preceding the Viking Age was accompanied by foreign gene flow into Scandinavia from the south and east spreading from Denmark and eastern Sweden to the rest of Scandinavia. Despite the close linguistic similarities of modern Scandinavian languages, we observe genetic structure within Scandinavia, suggesting that regional population differences were already present 1,000 years ago. We find evidence for a majority of Danish Viking presence in England, Swedish Viking presence in the Baltic, and Norwegian Viking presence in Ireland, Iceland, and Greenland. Additionally, we see substantial foreign European ancestry entering Scandinavia during the Viking Age. We also find that several of the members of the only archaeologically well-attested Viking expedition were close family members. By comparing Viking Scandinavian genomes with present-day Scandinavian genomes, we find that pigmentation-associated loci have undergone strong population differentiation during the last millennia. Finally, we are able to trace the allele frequency dynamics of positively selected loci with unprecedented detail, including the lactase persistence allele and various alleles associated with the immune response. We conclude that the Viking diaspora was characterized by substantial foreign engagement distinct Viking populations influenced the genomic makeup of different regions of Europe, while Scandinavia also experienced increased contact with the rest of the continent.
biorxiv genetics 200-500-users 2019The Evolutionary History of Common Genetic Variants Influencing Human Cortical Surface Area, bioRxiv, 2019-07-17
AbstractStructural brain changes along the lineage that led to modern Homo sapiens have contributed to our unique cognitive and social abilities. However, the evolutionarily relevant molecular variants impacting key aspects of neuroanatomy are largely unknown. Here, we integrate evolutionary annotations of the genome at diverse timescales with common variant associations from large-scale neuroimaging genetic screens in living humans, to reveal how selective pressures have shaped neocortical surface area. We show that variation within human gained enhancers active in the developing brain is associated with global surface area as well as that of specific regions. Moreover, we find evidence of recent polygenic selection over the past 2,000 years influencing surface area of multiple cortical regions, including those involved in spoken language and visual processing.
biorxiv neuroscience 100-200-users 2019