Long live the king chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long read data, bioRxiv, 2019-07-18
AbstractThe lion (Panthera leo) is one of the most popular and iconic feline species on the planet, yet in spite of its popularity, the last century has seen massive declines for lion populations worldwide. Genomic resources for endangered species represent an important way forward for the field of conservation, enabling high-resolution studies of demography, disease, and population dynamics. Here, we present a chromosome-level assembly for the captive African lion from the Exotic Feline Rescue Center as a resource for current and subsequent genetic work of the sole social species of the Panthera clade. Our assembly is composed of 10x Genomics Chromium data, Dovetail Hi-C, and Oxford Nanopore long-read data. Synteny is highly conserved between the lion, other Panthera genomes, and the domestic cat. We find variability in the length and levels of homozygosity across the genomes of the lion sequenced here and other previous published resequence data, indicating contrasting histories of recent and ancient small population sizes andor inbreeding. Demographic analyses reveal similar histories across all individuals except the Asiatic lion, which shows a more rapid decline in population size. This high-quality genome will greatly aid in the continuing research and conservation efforts for the lion.
biorxiv genomics 100-200-users 2019Weak and uneven associations of home, neighborhood and school environments with stress hormone output across multiple time scales, bioRxiv, 2019-07-18
ABSTRACTThe progression of lifelong trajectories of socioeconomic inequalities in health and mortality begins in childhood. Dysregulation in cortisol, a stress hormone that is the primary output of the hypothalamus-pituitary-adrenal (HPA) axis, has been hypothesized to be a mechanism for how early environmental adversity compromises health. However, despite the popularity of cortisol as a biomarker for stress and adversity, little is known about whether cortisol output differs in children being raised in socioeconomically disadvantaged environments. Here, we show that there are few differences between advantaged and disadvantaged children in their cortisol output. In 8- to 14-year-old children from the population-based Texas Twin Project, we measured cortisol output at three different time-scales (1) diurnal fluctuation in salivary cortisol (n = 400), (2) salivary cortisol reactivity and recovery after exposure to the Trier Social Stress Test (n = 444), and (3) and cortisol concentration in hair (n = 1,210). These measures converged on two moderately correlated, yet distinguishable, dimensions of HPA function. We then tested differences in cortisol output across nine aspects of social disadvantage at the home (e.g., family socioeconomic status), school (e.g., average levels of academic achievement), and neighborhood (e.g., concentrated poverty). Children living in neighborhoods with higher concentrated poverty had higher diurnal cortisol output, as measured in saliva; otherwise, child cortisol output was unrelated to any other aspect of social disadvantage. Overall, we find limited support for alteration in HPA axis functioning as a general mechanism for the health consequences of socioeconomic inequality in childhood.
biorxiv physiology 100-200-users 2019Compartment-dependent chromatin interaction dynamics revealed by liquid chromatin Hi-C, bioRxiv, 2019-07-17
SUMMARYChromosomes are folded so that active and inactive chromatin domains are spatially segregated. Compartmentalization is thought to occur through polymer phasemicrophase separation mediated by interactions between loci of similar type. The nature and dynamics of these interactions are not known. We developed liquid chromatin Hi-C to map the stability of associations between loci. Before fixation and Hi-C, chromosomes are fragmented removing the strong polymeric constraint to enable detection of intrinsic locus-locus interaction stabilities. Compartmentalization is stable when fragments are over 10-25 kb. Fragmenting chromatin into pieces smaller than 6 kb leads to gradual loss of genome organization. Dissolution kinetics of chromatin interactions vary for different chromatin domains. Lamin-associated domains are most stable, while interactions among speckle and polycomb-associated loci are more dynamic. Cohesin-mediated loops dissolve after fragmentation, possibly because cohesin rings slide off nearby DNA ends. Liquid chromatin Hi-C provides a genome-wide view of chromosome interaction dynamics.Highlights<jatslist list-type=bullet><jatslist-item>Liquid chromatin Hi-C detects chromatin interaction dissociation rates genome-wide<jatslist-item><jatslist-item>Chromatin conformations in distinct nuclear compartments differ in stability<jatslist-item><jatslist-item>Stable heterochromatic associations are major drivers of chromatin phase separation<jatslist-item><jatslist-item>CTCF-CTCF loops are stabilized by encirclement of loop bases by cohesin rings<jatslist-item>
biorxiv genomics 100-200-users 2019Human Genome Assembly in 100 Minutes, bioRxiv, 2019-07-17
AbstractDe novo genome assembly provides comprehensive, unbiased genomic information and makes it possible to gain insight into new DNA sequences not present in reference genomes. Many de novo human genomes have been published in the last few years, leveraging a combination of inexpensive short-read and single-molecule long-read technologies. As long-read DNA sequencers become more prevalent, the computational burden of generating assemblies persists as a critical factor. The most common approach to long-read assembly, using an overlap-layout-consensus (OLC) paradigm, requires all-to-all read comparisons, which quadratically scales in computational complexity with the number of reads. We assert that recently achievements in sequencing technology (i.e. with accuracy ~99% and read length ~10-15k) enables a fundamentally better strategy for OLC that is effectively linear rather than quadratic. Our genome assembly implementation, Peregrine uses sparse hierarchical minimizers (SHIMMER) to index reads thereby avoiding the need for an all-to-all read comparison step. Peregrine can assemble 30x human PacBio CCS read datasets in less than 30 CPU hours and around 100 wall-clock minutes to a high contiguity assembly (N50 > 20Mb). The continued advance of sequencing technologies coupled with the Peregrine assembler enables routine generation of human de novo assemblies. This will allow for population scale measurements of more comprehensive genomic variations -- beyond SNPs and small indels -- as well as novel applications requiring rapid access to de novo assemblies.
biorxiv bioinformatics 100-200-users 2019The Evolutionary History of Common Genetic Variants Influencing Human Cortical Surface Area, bioRxiv, 2019-07-17
AbstractStructural brain changes along the lineage that led to modern Homo sapiens have contributed to our unique cognitive and social abilities. However, the evolutionarily relevant molecular variants impacting key aspects of neuroanatomy are largely unknown. Here, we integrate evolutionary annotations of the genome at diverse timescales with common variant associations from large-scale neuroimaging genetic screens in living humans, to reveal how selective pressures have shaped neocortical surface area. We show that variation within human gained enhancers active in the developing brain is associated with global surface area as well as that of specific regions. Moreover, we find evidence of recent polygenic selection over the past 2,000 years influencing surface area of multiple cortical regions, including those involved in spoken language and visual processing.
biorxiv neuroscience 100-200-users 2019Identification of hidden population structure in time-scaled phylogenies, bioRxiv, 2019-07-16
AbstractPopulation structure influences genealogical patterns, however data pertaining to how populations are structured are often unavailable or not directly observable. Inference of population structure is highly important in molecular epidemiology where pathogen phylogenetics is increasingly used to infer transmission patterns and detect outbreaks. Discrepancies between observed and idealised genealogies, such as those generated by the coalescent process, can be quantified, and where significant differences occur, may reveal the action of natural selection, host population structure, or other demographic and epidemiological heterogeneities. We have developed a fast non-parametric statistical test for detection of cryptic population structure in time-scaled phylogenetic trees. The test is based on contrasting estimated phylogenies with the theoretically expected phylodynamic ordering of common ancestors in two clades within a coalescent framework. These statistical tests have also motivated the development of algorithms which can be used to quickly screen a phylogenetic tree for clades which are likely to share a distinct demographic or epidemiological history. Epidemiological applications include identification of outbreaks in vulnerable host populations or rapid expansion of genotypes with a fitness advantage. To demonstrate the utility of these methods for outbreak detection, we applied the new methods to large phylogenies reconstructed from thousands of HIV-1 partial pol sequences. This revealed the presence of clades which had grown rapidly in the recent past, and was significantly concentrated in young men, suggesting recent and rapid transmission in that group. Furthermore, to demonstrate the utility of these methods for the study of antimicrobial resistance, we applied the new methods to a large phylogeny reconstructed from whole genome Neisseria gonorrhoeae sequences. We find that population structure detected using these methods closely overlaps with the appearance and expansion of mutations conferring antimicrobial resistance.
biorxiv evolutionary-biology 100-200-users 2019