The Genomic History Of Southeastern Europe, bioRxiv, 2017-05-12
AbstractFarming was first introduced to southeastern Europe in the mid-7th millennium BCE – brought by migrants from Anatolia who settled in the region before spreading throughout Europe. To clarify the dynamics of the interaction between the first farmers and indigenous hunter-gatherers where they first met, we analyze genome-wide ancient DNA data from 223 individuals who lived in southeastern Europe and surrounding regions between 12,000 and 500 BCE. We document previously uncharacterized genetic structure, showing a West-East cline of ancestry in hunter-gatherers, and show that some Aegean farmers had ancestry from a different lineage than the northwestern Anatolian lineage that formed the overwhelming ancestry of other European farmers. We show that the first farmers of northern and western Europe passed through southeastern Europe with limited admixture with local hunter-gatherers, but that some groups mixed extensively, with relatively sex-balanced admixture compared to the male-biased hunter-gatherer admixture that prevailed later in the North and West. Southeastern Europe continued to be a nexus between East and West after farming arrived, with intermittent genetic contact from the Steppe up to 2,000 years before the migration that replaced much of northern Europe’s population.
biorxiv genetics 100-200-users 2017The population genomics of archaeological transition in west Iberia Investigation of ancient substructure using imputation and haplotype-based methods, bioRxiv, 2017-05-11
AbstractWe analyse new genomic data (0.05-2.95x) from 14 ancient individuals from Portugal distributed from the Middle Neolithic (4200-3500 BC) to the Middle Bronze Age (1740-1430 BC) and impute genomewide diploid genotypes in these together with published ancient Eurasians. While discontinuity is evident in the transition to agriculture across the region, sensitive haplotype-based analyses suggest a significant degree of local hunter-gatherer contribution to later Iberian Neolithic populations. A more subtle genetic influx is also apparent in the Bronze Age, detectable from analyses including haplotype sharing with both ancient and modern genomes, D-statistics and Y-chromosome lineages. However, the limited nature of this introgression contrasts with the major Steppe migration turnovers within third Millennium northern Europe and echoes the survival of non-Indo-European language in Iberia. Changes in genomic estimates of individual height across Europe are also associated with these major cultural transitions, and ancestral components continue to correlate with modern differences in stature.Author SummaryRecent ancient DNA work has demonstrated the significant genetic impact of mass migrations from the Steppe into Central and Northern Europe during the transition from the Neolithic to the Bronze Age. In Iberia, archaeological change at the level of material culture and funerary rituals has been reported during this period, however, the genetic impact associated with this cultural transformation has not yet been estimated. In order to investigate this, we sequence Neolithic and Bronze Age samples from Portugal, which we compare to other ancient and present-day individuals. Genome-wide imputation of a large dataset of ancient samples enabled sensitive methods for detecting population structure and selection in ancient samples. We revealed subtle genetic differentiation between the Portuguese Neolithic and Bronze Age samples suggesting a markedly reduced influx in Iberia compared to other European regions. Furthermore, we predict individual height in ancients, suggesting that stature was reduced in the Neolithic and affected by subsequent admixtures. Lastly, we examine signatures of strong selection in important traits and the timing of their origins.
biorxiv genomics 100-200-users 2017Consequences of natural perturbations in the human plasma proteome, bioRxiv, 2017-05-06
AbstractProteins are the primary functional units of biology and the direct targets of most drugs, yet there is limited knowledge of the genetic factors determining inter-individual variation in protein levels. Here we reveal the genetic architecture of the human plasma proteome, testing 10.6 million DNA variants against levels of 2,994 proteins in 3,301 individuals. We identify 1,927 genetic associations with 1,478 proteins, a 4-fold increase on existing knowledge, including trans associations for 1,104 proteins. To understand consequences of perturbations in plasma protein levels, we introduce an approach that links naturally occurring genetic variation with biological, disease, and drug databases. We provide insights into pathogenesis by uncovering the molecular effects of disease-associated variants. We identify causal roles for protein biomarkers in disease through Mendelian randomization analysis. Our results reveal new drug targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.
biorxiv genomics 100-200-users 2017Deep Neural Networks in Computational Neuroscience, bioRxiv, 2017-05-05
SummaryThe goal of computational neuroscience is to find mechanistic explanations of how the nervous system processes information to give rise to cognitive function and behaviour. At the heart of the field are its models, i.e. mathematical and computational descriptions of the system being studied, which map sensory stimuli to neural responses andor neural to behavioural responses. These models range from simple to complex. Recently, deep neural networks (DNNs) have come to dominate several domains of artificial intelligence (AI). As the term “neural network” suggests, these models are inspired by biological brains. However, current DNNs neglect many details of biological neural networks. These simplifications contribute to their computational efficiency, enabling them to perform complex feats of intelligence, ranging from perceptual (e.g. visual object and auditory speech recognition) to cognitive tasks (e.g. machine translation), and on to motor control (e.g. playing computer games or controlling a robot arm). In addition to their ability to model complex intelligent behaviours, DNNs excel at predicting neural responses to novel sensory stimuli with accuracies well beyond any other currently available model type. DNNs can have millions of parameters, which are required to capture the domain knowledge needed for successful task performance. Contrary to the intuition that this renders them into impenetrable black boxes, the computational properties of the network units are the result of four directly manipulable elements input statistics, network structure, functional objective, and learning algorithm. With full access to the activity and connectivity of all units, advanced visualization techniques, and analytic tools to map network representations to neural data, DNNs represent a powerful framework for building task-performing models and will drive substantial insights in computational neuroscience.
biorxiv neuroscience 100-200-users 2017Gene annotation bias impedes biomedical research, bioRxiv, 2017-05-03
1AbstractWe found tremendous inequality across gene and protein annotation resources. We observe that this bias leads biomedical researchers to focus on richly annotated genes instead of those with the strongest molecular data. We advocate for researchers to reduce these biases by pursuing data-driven hypotheses.
biorxiv bioinformatics 100-200-users 2017Nanopore Long-Read RNAseq Reveals Widespread Transcriptional Variation Among the Surface Receptors of Individual B cells, bioRxiv, 2017-04-14
ABSTRACTUnderstanding gene regulation and function requires a genome-wide method capable of capturing both gene expression levels and isoform diversity at the single cell level. Short-read RNAseq, while the current standard for gene expression quantification, is limited in its ability to resolve complex isoforms because it fails to sequence full-length cDNA copies of RNA molecules. Here, we investigated whether RNAseq using the long-read single-molecule Oxford Nanopore MinION sequencing technology (ONT RNAseq) would be able to identify and quantify complex isoforms without sacrificing accurate gene expression quantification. After successfully benchmarking our experimental and computational approaches on a mixture of synthetic transcripts, we analyzed individual murine B1a cells using a new cellular indexing strategy. Using the Mandalorion analysis pipeline we developed, we identified thousands of unannotated transcription start and end sites, as well as hundreds of alternative splicing events in these B1a cells. We also identified hundreds of genes expressed across B1a cells that displayed multiple complex isoforms, including several B cell specific surface receptors and the antibody heavy chain (IGH) locus. Our results show that not only can we identify complex isoforms, but also quantify their expression, at the single cell level.
biorxiv genomics 100-200-users 2017