Genetic determinants of chromatin accessibility in T cell activation across humans, bioRxiv, 2016-12-03
AbstractOver 90% of genetic variants associated with complex human traits map to non-coding regions, but little is understood about how they modulate gene regulation in health and disease. One possible mechanism is that genetic variants affect the activity of one or more cis-regulatory elements leading to gene expression variation in specific cell types. To identify such cases, we analyzed Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq) and RNA-seq profiles from activated CD4+ T cells of up to 105 healthy donors. We found that regions of accessible chromatin (ATAC-peaks) are co-accessible at kilobase and megabase resolution, in patterns consistent with the 3D organization of chromosomes measured by in situ Hi-C in T cells. 15% of genetic variants located within ATAC-peaks affected the accessibility of the corresponding peak through disrupting binding sites for transcription factors important for T cell differentiation and activation. These ATAC quantitative trait nucleotides (ATAC-QTNs) have the largest effects on co-accessible peaks, are associated with gene expression from the same aliquot of cells, are rarely affecting core binding motifs, and are enriched for autoimmune disease variants. Our results provide insights into how natural genetic variants modulate cis- regulatory elements, in isolation or in concert, to influence gene expression in primary immune cells that play a key role in many human diseases.
biorxiv genomics 100-200-users 2016SPRING a kinetic interface for visualizing high dimensional single-cell expression data, bioRxiv, 2016-11-30
MotivationSingle-cell gene expression profiling technologies can map the cell states in a tissue or organism. As these technologies become more common, there is a need for computational tools to explore the data they produce. In particular, existing data visualization approaches are imperfect for studying continuous gene expression topologies.ResultsForce-directed layouts of k-nearest-neighbor graphs can visualize continuous gene expression topologies in a manner that preserves high-dimensional relationships and allows manually exploration of different stable two-dimensional representations of the same data. We implemented an interactive web-tool to visualize single-cell data using force-directed graph layouts, called SPRING. SPRING reveals more detailed biological relationships than existing approaches when applied to branching gene expression trajectories from hematopoietic progenitor cells. Visualizations from SPRING are also more reproducible than those of stochastic visualization methods such as tSNE, a state-of-the-art tool.Availability<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpskleintools.hms.harvard.edutoolsspring.html>httpskleintools.hms.harvard.edutoolsspring.html<jatsext-link>,<jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comAllonKleinLabSPRING>httpsgithub.comAllonKleinLabSPRING<jatsext-link>Contactcalebsw@gmail.com, allon_klein@hms.harvard.edu
biorxiv bioinformatics 0-100-users 2016Cryo-EM structure of haemoglobin at 3.2 Å determined with the Volta phase plate, bioRxiv, 2016-11-18
With the advent of direct electron detectors, the perspectives of cryo-electron microscopy (cryo-EM) have changed in a profound way1. These cameras are superior to previous detectors in coping with the intrinsically low contrast of radiation-sensitive organic materials embedded in amorphous ice, and so they have enabled the structure determination of several macromolecular assemblies to atomic or near-atomic resolution. According to one theoretical estimation, a few thousand images should suffice for calculating the structure of proteins as small as 17 kDa at 3 Å resolution2. In practice, however, we are still far away from this theoretical ideal. Thus far, protein complexes that have been successfully reconstructed to high-resolution by single particle analysis (SPA) have molecular weights of ~100 kDa or larger3. Here, we report the use of Volta phase plate in determining the structure of human haemoglobin (64 kDa) at 3.2 Å. Our results demonstrate that this method can be applied to complexes that are significantly smaller than those previously studied by conventional defocus-based approaches. Cryo-EM is now close to becoming a fast and cost-effective alternative to crystallography for high-resolution protein structure determination.
biorxiv biophysics 100-200-users 2016Tractography-based connectomes are dominated by false-positive connections, bioRxiv, 2016-11-08
AbstractFiber tractography based on non-invasive diffusion imaging is at the heart of connectivity studies of the human brain. To date, the approach has not been systematically validated in ground truth studies. Based on a simulated human brain dataset with ground truth white matter tracts, we organized an open international tractography challenge, which resulted in 96 distinct submissions from 20 research groups. While most state-of-the-art algorithms reconstructed 90% of ground truth bundles to at least some extent, on average they produced four times more invalid than valid bundles. About half of the invalid bundles occurred systematically in the majority of submissions. Our results demonstrate fundamental ambiguities inherent to tract reconstruction methods based on diffusion orientation information, with critical consequences for the approach of diffusion tractography in particular and human connectivity studies in general.
biorxiv neuroscience 200-500-users 2016Predicting Enhancer-Promoter Interaction from Genomic Sequence with Deep Neural Networks, bioRxiv, 2016-11-03
AbstractIn the human genome, distal enhancers are involved in regulating target genes through proxi-mal promoters by forming enhancer-promoter interactions. Although recently developed high-throughput experimental approaches have allowed us to recognize potential enhancer-promoter interactions genome-wide, it is still largely unclear to what extent the sequence-level information encoded in our genome help guide such interactions. Here we report a new computational method (named “SPEID”) using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given. Our results across six different cell types demonstrate that SPEID is effective in predicting enhancer-promoter interactions as compared to state-of-the-art methods that only use information from a single cell type. As a proof-of-principle, we also applied SPEID to identify somatic non-coding mutations in melanoma samples that may have reduced enhancer-promoter interactions in tumor genomes. This work demonstrates that deep learning models can help reveal that sequence-based features alone are sufficient to reliably predict enhancer-promoter interactions genome-wide.
biorxiv bioinformatics 0-100-users 2016WhatsHap fast and accurate read-based phasing, bioRxiv, 2016-11-03
AbstractRead-based phasing allows to reconstruct the haplotypes of a sample purely from sequencing reads. While phasing is an important step for answering questions about population genetics, compound heterozygosity, and to aid in clinical decision making, there has been a lack of accurate, usable and standards-based software.WhatsHap is a production-ready tool for highly accurate read-based phasing. It was designed from the beginning to leverage third-generation sequencing technologies, whose long reads can span many variants and are therefore ideal for phasing. WhatsHap works also well with second-generation data, is easy to use and will phase not only SNVs, but also indels and other variants. It is unique in its ability to combine read-based with pedigree-based phasing, allowing to further improve accuracy if multiple related samples are provided.
biorxiv bioinformatics 0-100-users 2016