Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals, Nature Genetics, 2018-07-20
Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance in educational attainment and 7–10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.
nature genetics genetics 500+-users 2018Classification of electrophysiological and morphological types in mouse visual cortex, bioRxiv, 2018-07-18
ABSTRACTUnderstanding the diversity of cell types in the brain has been an enduring challenge and requires detailed characterization of individual neurons in multiple dimensions. To profile morpho-electric properties of mammalian neurons systematically, we established a single cell characterization pipeline using standardized patch clamp recordings in brain slices and biocytin-based neuronal reconstructions. We built a publicly-accessible online database, the Allen Cell Types Database, to display these data sets. Intrinsic physiological and morphological properties were measured from over 1,800 neurons from the adult laboratory mouse visual cortex. Quantitative features were used to classify neurons into distinct types using unsupervised methods. We establish a taxonomy of morphologically- and electrophysiologically-defined cell types for this region of cortex with 17 e-types and 35 m-types, as well as an initial correspondence with previously-defined transcriptomic cell types using the same transgenic mouse lines.
biorxiv neuroscience 100-200-users 2018Determining cellular CTCF and cohesin abundances to constrain 3D genome models, bioRxiv, 2018-07-18
Achieving a quantitative and predictive understanding of 3D genome architecture remains a major challenge, as it requires quantitative measurements of the key proteins involved. Here we report the quantification of CTCF and cohesin, two causal regulators of topological associating domains (TADs) in mammalian cells. Extending our previous imaging studies (Hansen 2017), we estimate bounds on the density of putatively DNA loop-extruding cohesin complexes and CTCF binding site occupancy. Furthermore, co-immunoprecipitation studies of an endogenously tagged subunit (Rad21) suggest the presence of cohesin dimers andor oligomers. Finally, based on our cell lines with accurately measured protein abundances, we report a method to conveniently determine the number of molecules of any Halo-tagged protein in the cell. We anticipate that our results and the established tool for measuring cellular protein abundances will advance a more quantitative understanding of 3D genome organization, and facilitate protein quantification, key for understanding diverse biological processes.
biorxiv biophysics 100-200-users 2018Entomophthovirus An insect-derived iflavirus that infects a behavior manipulating fungal pathogen of dipterans, bioRxiv, 2018-07-18
AbstractWe discovered a virus infecting Entomophthora muscae, a behavior-manipulating fungal pathogen of dipterans. The virus, which we name Entomophthovirus, is a capsid-forming, positive-strand RNA virus in the viral family iflaviridae, whose known members almost exclusively infect insects. We show that the virus RNA is expressed at high levels in fungal cells in vitro and during in vivo infections of Drosophila melanogaster, and that virus particles are present in E. muscae. Two close relatives of the virus had been previously described as insect viruses based on the presence of viral genomes in transcriptomes assembled from RNA extracted from wild dipterans. By analyzing sequencing data from these earlier reports, we show that both dipteran samples were co-infected with E. muscae. We also find the virus in RNA sequencing data from samples of two other species of dipterans, Musca domestica and Delia radicum, known to be infected with E. muscae. These data establish that Entomophthovirus is widely, and seemingly obligately, associated with E. muscae. As other members of the iflaviridae cause behavioral changes in insects, we speculate on the possibility that Entomophthovirus plays a role in E. muscae involved host manipulation.
biorxiv microbiology 500+-users 2018Panoramic stitching of heterogeneous single-cell transcriptomic data, bioRxiv, 2018-07-18
AbstractResearchers are generating single-cell RNA sequencing (scRNA-seq) profiles of diverse biological systems1–4 and every cell type in the human body.5 Leveraging this data to gain unprecedented insight into biology and disease will require assembling heterogeneous cell populations across multiple experiments, laboratories, and technologies. Although methods for scRNA-seq data integration exist6,7, they often naively merge data sets together even when the data sets have no cell types in common, leading to results that do not correspond to real biological patterns. Here we present Scanorama, inspired by algorithms for panorama stitching, that overcomes the limitations of existing methods to enable accurate, heterogeneous scRNA-seq data set integration. Our strategy identifies and merges the shared cell types among all pairs of data sets and is orders of magnitude faster than existing techniques. We use Scanorama to combine 105,476 cells from 26 diverse scRNA-seq experiments across 9 different technologies into a single comprehensive reference, demonstrating how Scanorama can be used to obtain a more complete picture of cellular function across a wide range of scRNA-seq experiments.
biorxiv bioinformatics 100-200-users 2018The high abortion cost of human reproduction, bioRxiv, 2018-07-18
Information from many large data bases and published studies was integrated to estimate the age-specific spontaneous abortion rate in an economically-developed human population. Accuracy was tested with published data from a diverse array of studies. Spontaneous abortion was found to be i) the predominant outcome of fertilization and ii) a natural and inevitable part of human reproduction at all ages. The decision to reproduce is inextricably coupled with the production of spontaneous abortions with high probability, and the decision to have a large family leads to many spontaneous abortions with virtual certainty. The lifetime number of spontaneous abortions was estimated for a “canonical” woman (constrained to have average age at marriage, first birth, inter-birth intervals, and family size) in two populations one with and the other without effective birth control (including free access to elective abortions). Birth control was found to reduce lifetime abortions more than 6-fold.
biorxiv physiology 100-200-users 2018