Deep learning detects virus presence in cancer histology, bioRxiv, 2019-07-06
AbstractOncogenic viruses like human papilloma virus (HPV) or Epstein Barr virus (EBV) are a major cause of human cancer. Viral oncogenesis has a direct impact on treatment decisions because virus-associated tumors can demand a lower intensity of chemotherapy and radiation or can be more susceptible to immune check-point inhibition. However, molecular tests for HPV and EBV are not ubiquitously available.We hypothesized that the histopathological features of virus-driven and non-virus driven cancers are sufficiently different to be detectable by artificial intelligence (AI) through deep learning-based analysis of images from routine hematoxylin and eosin (HE) stained slides. We show that deep transfer learning can predict presence of HPV in head and neck cancer with a patient-level 3-fold cross validated area-under-the-curve (AUC) of 0.89 [0.82; 0.94]. The same workflow was used for Epstein-Barr virus (EBV) driven gastric cancer achieving a cross-validated AUC of 0.80 [0.70; 0.92] and a similar performance in external validation sets. Reverse-engineering our deep neural networks, we show that the key morphological features can be made understandable to humans.This workflow could enable a fast and low-cost method to identify virus-induced cancer in clinical trials or clinical routine. At the same time, our approach for feature visualization allows pathologists to look into the black box of deep learning, enabling them to check the plausibility of computer-based image classification.
biorxiv cancer-biology 0-100-users 2019Aging is associated with a systemic length-driven transcriptome imbalance, bioRxiv, 2019-07-04
AbstractAging manifests itself through a decline in organismal homeostasis and a multitude of cellular and physiological functions1. Efforts to identify a common basis for vertebrate aging face many challenges; for example, while there have been documented changes in the expression of many hundreds of mRNAs, the results across tissues and species have been inconsistent2. We therefore analyzed age-resolved transcriptomic data from 17 mouse organs and 51 human organs using unsupervised machine learning3–5 to identify the architectural and regulatory characteristics most informative on the differential expression of genes with age. We report a hitherto unknown phenomenon, a systemic age-dependent length-driven transcriptome imbalance that for older organisms disrupts the homeostatic balance between short and long transcript molecules for mice, rats, killifishes, and humans. We also demonstrate that in a mouse model of healthy aging, length-driven transcriptome imbalance correlates with changes in expression of splicing factor proline and glutamine rich (Sfpq), which regulates transcriptional elongation according to gene length6. Furthermore, we demonstrate that length-driven transcriptome imbalance can be triggered by environmental hazards and pathogens. Our findings reinforce the picture of aging as a systemic homeostasis breakdown and suggest a promising explanation for why diverse insults affect multiple age-dependent phenotypes in a similar manner.
biorxiv systems-biology 100-200-users 2019Emergence of the Ug99 lineage of the wheat stem rust pathogen through somatic hybridisation, bioRxiv, 2019-07-04
AbstractParasexuality contributes to diversity and adaptive evolution of haploid (monokaryotic) fungi. However non-sexual genetic exchange mechanisms are not defined in dikaryotic fungi (containing two distinct haploid nuclei). Newly emerged strains of the wheat stem rust pathogen, Puccinia graminis f. sp. tritici (Pgt), such as Ug99, are a major threat to global food security. Here we show that Ug99 arose by somatic hybridisation and nuclear exchange between dikaryons. Fully haplotype-resolved genome assembly and DNA proximity analysis revealed that Ug99 shares one haploid nucleus genotype with a much older African lineage of Pgt, with no recombination or reassortment. Generation of genetic variation by nuclear exchange may favour the evolution of dikaryotism by providing an advantage over diploidy.
biorxiv genetics 100-200-users 2019Linking transcriptome and chromatin accessibility in nanoliter droplets for single-cell sequencing, bioRxiv, 2019-07-04
Linked profiling of transcriptome and chromatin accessibility from single cells can provide unprecedented insights into cellular status. Here we developed a droplet-based Single-Nucleus chromatin Accessibility and mRNA Expression sequencing (SNARE-seq) assay, that we used to profile neonatal and adult mouse cerebral cortices. To demonstrate the strength of single-cell dual-omics profiling, we reconstructed transcriptome and epigenetic landscapes of cell types, uncovered lineage-specific accessible sites, and connected dynamics of promoter accessibility with transcription during neurogenesis.
biorxiv genomics 100-200-users 2019Reconciling Dimensional and Categorical Models of Autism Heterogeneity a Brain Connectomics & Behavioral Study, bioRxiv, 2019-07-04
AbstractBackgroundHeterogeneity in autism spectrum disorder (ASD) has hindered the development of biomarkers, thus motivating subtyping efforts. Most subtyping studies divide ASD individuals into non-overlapping (categorical) subgroups. However, continuous inter-individual variation in ASD suggests the need for a dimensional approach.MethodsA Bayesian model was employed to decompose resting-state functional connectivity (RSFC) of ASD individuals into multiple abnormal RSFC patterns, i.e., categorical subtypes henceforth referred to as “factors”. Importantly, the model allowed each individual to express one or more factors to varying degrees (dimensional subtyping). The model was applied to 306 ASD individuals (age 5.2-57 years) from two multisite repositories. Posthoc analyses associated factors with symptoms and demographics.ResultsAnalyses yielded three factors with dissociable whole-brain hypohyper RSFC patterns. Most participants expressed multiple (categorical) factors, suggestive of a mosaic of subtypes within individuals. All factors shared abnormal RSFC involving the default network, but the directionality (hypohyper RSFC) differed across factors. Factor 1 was associated with core ASD symptoms, while factor 2 was associated with comorbid symptoms. Older males preferentially expressed factor 3. Factors were robust across multiple control analyses and not associated with IQ, nor head motion.ConclusionsThere exist at least three ASD factors with dissociable patterns of whole-brain RSFC, behaviors and demographics. Heterogeneous default network hypohyper RSFC across the factors might explain previously reported inconsistencies. The factors differentiated between core ASD and comorbid symptoms - a less appreciated domain of heterogeneity in ASD. These factors are co-expressed in ASD individuals with different degrees, thus reconciling categorical and dimensional perspectives of ASD heterogeneity.
biorxiv neuroscience 100-200-users 2019A near-full-length HIV-1 genome from 1966 recovered from formalin-fixed paraffin-embedded tissue, bioRxiv, 2019-07-01
AbstractAlthough estimated to have emerged in humans in Central Africa in the early 1900s, HIV-1, the main causative agent of AIDS, was only discovered in 1983. With very little direct biological data of HIV-1 from before the 1980s, far-reaching evolutionary and epidemiological inferences regarding the long pre-discovery phase of this pandemic are based on extrapolations by phylodynamic models of HIV-1 genomic sequences gathered mostly over recent decades. Here, using a very sensitive multiplex RT-PCR assay, we screened 1,652 formalin-fixed paraffin-embedded tissue specimens collected for pathology diagnostics in Kinshasa, Democratic Republic of Congo (DRC), between 1959 and 1967. We report the near-complete genome of one positive from 1966 (“DRC66”)—a non-recombinant sister lineage to subtype C that constitutes the oldest HIV-1 near-full-length genome recovered to date. Root-to-tip plots showed the DRC66 sequence is not an outlier as would be expected if dating estimates from more recent genomes were systematically biased; and inclusion of DRC66 sequence in tip-dated BEAST analyses did not significantly alter root and internal node age estimates based on post-1978 HIV-1 sequences. There was larger variation in divergence time estimates among datasets that were subsamples of the available HIV-1 genomes from 1978-2015, showing the inherent phylogenetic stochasticity across subsets of the real HIV-1 diversity. In conclusion, this unique archival HIV-1 sequence provides direct genomic insight into HIV-1 in 1960s DRC, and, as an ancient-DNA calibrator, it validates our understanding of HIV-1 evolutionary history.SignificanceInferring the precise timing of the origin of the HIVAIDS pandemic is of great importance because it offers insights into which factors did—or did not—facilitate the emergence of the causal virus. Previous estimates have implicated rapid development during the early 20th century in Central Africa, which wove once-isolated populations into a more continuous fabric. We recovered the first HIV-1 genome from the 1960s, and it provides direct evidence that HIV-1 molecular clock estimates spanning the last half-century are remarkably reliable. And, because this genome itself was sampled only about a half-century after the estimated origin of the pandemic, it empirically anchors this crucial inference with high confidence.
biorxiv evolutionary-biology 200-500-users 2019