Genome-wide polygenic score to identify a monogenic risk-equivalent for coronary disease, bioRxiv, 2017-11-21
AbstractIdentification of individuals at increased genetic risk for a complex disorder such as coronary disease can facilitate treatments or enhanced screening strategies. A rare monogenic mutation associated with increased cholesterol is present in ~1250 carriers and confers an up to 4-fold increase in coronary risk when compared with non-carriers. Although individual common polymorphisms have modest predictive capacity, their cumulative impact can be aggregated into a polygenic score. Here, we develop a new, genome-wide polygenic score that aggregates information from 6.6 million common polymorphisms and show that this score can similarly identify individuals with a 4-fold increased risk for coronary disease. In >400,000 participants from UK Biobank, the score conforms to a normal distribution and those in the top 2.5% of the distribution are at 4-fold increased risk compared to the remaining 97.5%. Similar patterns are observed with genome-wide polygenic scores for two additional diseases – breast cancer and severe obesity.One Sentence SummaryA genome-wide polygenic score identifies 2.5% of the population born with a 4-fold increased risk for coronary artery disease.
biorxiv genomics 100-200-users 2017Higher-order inter-chromosomal hubs shape 3-dimensional genome organization in the nucleus, bioRxiv, 2017-11-19
ABSTRACTEukaryotic genomes are packaged into a 3-dimensional structure in the nucleus of each cell. There are currently two distinct views of genome organization that are derived from different technologies. The first view, derived from genome-wide proximity ligation methods (e.g. Hi-C), suggests that genome organization is largely organized around chromosomes. The second view, derived from in situ imaging, suggests a central role for nuclear bodies. Yet, because microscopy and proximity-ligation methods measure different aspects of genome organization, these two views remain poorly reconciled and our overall understanding of how genomic DNA is organized within the nucleus remains incomplete. Here, we develop Split-Pool Recognition of Interactions by Tag Extension (SPRITE), which moves away from proximity-ligation and enables genome-wide detection of higher-order DNA interactions within the nucleus. Using SPRITE, we recapitulate known genome structures identified by Hi-C and show that the contact frequencies measured by SPRITE strongly correlate with the 3-dimensional distances measured by microscopy. In addition to known structures, SPRITE identifies two major hubs of inter-chromosomal interactions that are spatially arranged around the nucleolus and nuclear speckles, respectively. We find that the majority of genomic regions exhibit preferential spatial association relative to one of these nuclear bodies, with regions that are highly transcribed by RNA Polymerase II organizing around nuclear speckles and transcriptionally inactive and centromere-proximal regions organizing around the nucleolus. Together, our results reconcile the two distinct pictures of nuclear structure and demonstrate that nuclear bodies act as inter-chromosomal hubs that shape the overall 3-dimensional packaging of genomic DNA in the nucleus.
biorxiv genomics 100-200-users 2017The Generation and Propagation of the Human Alpha Rhythm, bioRxiv, 2017-11-19
AbstractThe alpha rhythm is the longest studied brain oscillation and has been theorized to play a key role in cognition. Still, its physiology is poorly understood. In this study, we used micro and macro electrodes in surgical epilepsy patients to measure the intracortical and thalamic generators of the alpha rhythm during quiet wakefulness. We first found that alpha in posterior cortex propagates from higher-order anterosuperior areas towards the occipital pole, consistent with alpha effecting top-down processing. This cortical alpha leads pulvinar alpha, complicating prevailing theories of a thalamic pacemaker. Finally, alpha is dominated by currents and firing in supragranular cortical layers. Together, these results suggest that the alpharhythm likely reflects short-range supragranular feedback which propagates from higher to lower-order cortex and cortex to thalamus. These physiological insights suggest how alpha could mediate feedback throughout the thalamocortical system.
biorxiv neuroscience 0-100-users 2017Transcription start site analysis reveals widespread divergent transcription in D. melanogaster and core promoter-encoded enhancer activities, bioRxiv, 2017-11-19
ABSTRACTMammalian gene promoters and enhancers share many properties. They are composed of a unified promoter architecture of divergent transcripton initiation and gene promoters may exhibit enhancer function. However, it is currently unclear how expression strength of a regulatory element relates to its enhancer strength and if the unifying architecture is conserved across Metazoa. Here we investigate the transcription initiation landscape and its associated RNA decay in D. melanogaster. Surprisingly, we find that the majority of active gene-distal enhancers and a considerable fraction of gene promoters are divergently transcribed. We observe quantitative relationships between enhancer potential, expression level and core promoter strength, providing an explanation for indirectly related histone modifications that are reflecting expression levels. Lowly abundant unstable RNAs initiated from weak core promoters are key characteristics of gene-distal developmental enhancers, while the housekeeping enhancer strengths of gene promoters reflect their expression strengths. The different layers of regulation mediated by gene-distal enhancers and gene promoters are also reflected in chromatin interaction data. Our results suggest a unified promoter architecture of many D. melanogaster regulatory elements, that is universal across Metazoa, whose regulatory functions seem to be related to their core promoter elements.
biorxiv genomics 0-100-users 2017Real-time analysis of nanopore-based metagenomic sequencing from orthopaedic device infection, bioRxiv, 2017-11-18
AbstractProsthetic joint infections are clinically difficult to diagnose and treat. Previously, we demonstrated metagenomic sequencing on an Illumina MiSeq replicates the findings of current gold standard microbiological diagnostic techniques. Nanopore sequencing offers advantages in speed of detection over MiSeq. Here, we compare direct-from-clinical-sample metagenomic Illumina sequencing with Nanopore sequencing, and report a real-time analytical pathway for Nanopore sequence data, designed for detecting bacterial composition of prosthetic joint infections.DNA was extracted from the sonication fluids of seven explanted orthopaedic devices, and additionally from two culture negative controls, and was sequenced on the Oxford Nanopore Technologies MinION platform. A specific analysis pipeline was assembled to overcome the challenges of identifying the true infecting pathogen, given high levels of host contamination and unavoidable background lab and kit contamination.The majority of DNA classified (>90%) was host contamination and discarded. Using negative control filtering thresholds, the species identified corresponded with both routine microbiological diagnosis and MiSeq results. By analysing sequences in real time, causes of infection were robustly detected within minutes from initiation of sequencing.We demonstrate initial proof of concept that metagenomic MinION sequencing can provide rapid, accurate diagnosis for prosthetic joint infections. We demonstrate a novel, scalable pipeline for real-time analysis of MinION sequence data. The high proportion of human DNA in extracts prevents full genome analysis from complete coverage, and methods to reduce this could increase genome depth and allow antimicrobial resistance profiling.
biorxiv microbiology 100-200-users 2017Scaling accurate genetic variant discovery to tens of thousands of samples, bioRxiv, 2017-11-15
AbstractComprehensive disease gene discovery in both common and rare diseases will require the efficient and accurate detection of all classes of genetic variation across tens to hundreds of thousands of human samples. We describe here a novel assembly-based approach to variant calling, the GATK HaplotypeCaller (HC) and Reference Confidence Model (RCM), that determines genotype likelihoods independently per-sample but performs joint calling across all samples within a project simultaneously. We show by calling over 90,000 samples from the Exome Aggregation Consortium (ExAC) that, in contrast to other algorithms, the HC-RCM scales efficiently to very large sample sizes without loss in accuracy; and that the accuracy of indel variant calling is superior in comparison to other algorithms. More importantly, the HC-RCM produces a fully squared-off matrix of genotypes across all samples at every genomic position being investigated. The HC-RCM is a novel, scalable, assembly-based algorithm with abundant applications for population genetics and clinical studies.
biorxiv genomics 0-100-users 2017