Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences, bioRxiv, 2019-04-30

AbstractIn the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In biology, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Learning the natural distribution of evolutionary protein sequence variation is a logical step toward predictive and generative modeling for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge. The learned representation space organizes sequences at multiple levels of biological granularity from the biochemical to proteomic levels. Unsupervised learning recovers information about protein structure secondary structure and residue-residue contacts can be identified by linear projections from the learned representations. Training language models on full sequence diversity rather than individual protein families increases recoverable information about secondary structure. The unsupervised models can be adapted with supervision from quantitative mutagenesis data to predict variant activity. Predictions from sequences alone are comparable to results from a state-of-the-art model of mutational effects that uses evolutionary and structurally derived features.

biorxiv synthetic-biology 200-500-users 2019

Long-Term Exposure to Elevated Lipoprotein(a) Levels, Parental Lifespan and Risk of Mortality, bioRxiv, 2019-04-30

ABSTRACTBackgroundElevated Lipoprotein(a) (Lp[a]) levels are associated with a broad range of atherosclerotic cardiovascular diseases (CVD). The impact of high Lp(a) levels on human longevity is however controversial. Our objectives were to determine whether genetically-determined Lp(a) levels are associated with parental lifespan and to assess the association between measured and genetically-determined Lp(a) levels and long-term all-cause and cardiovascular mortality.MethodsWe determined the association between a genetic risk score of 26 single nucleotide polymorphisms weighted for their impact on Lp(a) levels (wGRS) and parental lifespan (at least one long-lived parent; father still alive and older than 90 or father’s age of death ≥90 or mother still alive and older than 93 or mother’s age of death ≥93) in 139,362 participants from the UK Biobank. A total of 17,686 participants were considered as having high parental lifespan. We also investigated the association between Lp(a) levels and all-cause and cardiovascular mortality in 18,720 participants from the EPIC-Norfolk study.ResultsIn the UK Biobank, increases in the wGRS (weighted for a 50 mgdL increase in Lp(a) levels) were inversely associated with a high parental lifespan (odds ratio=0.92, 95% confidence interval [CI]=0.89-0.94, p=2.7×10−8). During the 20-year follow-up of the EPIC-Norfolk study, 5686 participants died (2412 from CVD-related causes). Compared to participants with Lp(a) levels <50 mgdL, those with Lp(a) levels ≥50 mgdL had an increased hazard ratio (HR) for all-cause (HR=1.17, 95% CI=1.08-1.27) and cardiovascular (HR=1.54, 95% CI=1.37-1.72) mortality. Compared to individuals with Lp(a) levels below the 50th percentile of the Lp(a) distribution (in whom event rates were 29.8% and 11.3%, respectively for all-cause and cardiovascular mortality), those with Lp(a) levels equal or above the 95th percentile of the population distribution (≥70 mgdL) had HRs of 1.22 (95% CI=1.09-1.37, event rate 37.5%) and 1.71 (95% CI=1.46-2.00, event rate 20.0%), for all-cause mortality and cardiovascular mortality, respectively.ConclusionsResults of this study suggest a potentially causal effect of Lp(a) on human longevity, support the use of parental lifespan as a tool to study the genetic determinants of human longevity, and provide a rationale for a trial of Lp(a)-lowering therapy in individuals with high Lp(a) levels.

biorxiv genetics 0-100-users 2019

Structural and Functional Characterization of G Protein-Coupled Receptors with Deep Mutational Scanning, bioRxiv, 2019-04-30

AbstractIn humans, the 813 G protein-coupled receptors (GPCRs) are responsible for transducing diverse chemical stimuli to alter cell state, and are the largest class of drug targets. Their myriad structural conformations and various modes of signaling make it challenging to understand their structure and function. Here we developed a platform to characterize large libraries of GPCR variants in human cell lines with a barcoded transcriptional reporter of G-protein signal transduction. We tested 7,800 of 7,828 possible single amino acid substitutions to the beta-2 adrenergic receptor (β2AR) at four concentrations of the agonist isoproterenol. We identified residues specifically important for β2AR signaling, mutations in the human population that are potentially loss of function, and residues that modulate basal activity. Using unsupervised learning, we resolve residues critical for signaling, including all major structural motifs and molecular interfaces. We also find a previously uncharacterized structural latch spanning the first two extracellular loops that is highly conserved across Class A GPCRs and is conformationally rigid in both the inactive and active states of the receptor. More broadly, by linking deep mutational scanning with engineered transcriptional reporters, we establish a generalizable method for exploring pharmacogenomics, structure and function across broad classes of drug receptors.

biorxiv molecular-biology 0-100-users 2019

Co-reviewing and ghostwriting by early career researchers in the peer review of manuscripts, bioRxiv, 2019-04-27

AbstractThe goal of this study is to shed light on the involvement of early career researchers (ECRs) during peer review of manuscripts for publication in journals. In particular, we sought to better understand how commonly ECRs contribute ideas andor text to peer review reports when they are not the invited reviewer (“co-review”), and how commonly ECRs do not receive named credit to the journal editorial staff for these scholarly efforts (“ghostwrite”). First, we evaluated 1,952 publications in the peer-reviewed literature generated by exhaustive search terms that combined synonyms of “early career researcher” and “peer review” and found no previous studies about ECRs ghostwriting peer review reports. We then surveyed 498 researchers about their experiences with, and opinions about, co-reviewing and ghostwriting as ECRs. Three quarters of those surveyed have co-reviewed and most find it to be a beneficial (95% agree) and ethical (73% agree) form of training in peer review. Co-reviewing is the second most commonly reported form of training in peer review besides receiving reviews on one’s own papers. Half of survey respondents have ghostwritten a peer review report, despite the 45ths majority opinion that ghostwriting is unethical. Survey respondents report that the three major barriers to including co-reviewer names on peer review reports are a lack of communication between PIs and ECRs; a false belief that co-authorship is for manuscripts but not peer review reports; and prohibitive journal policies that are out of alignment with current practice and opinions about best practice. We therefore propose recommendations for changing this status quo, to discourage unethical ghostwriting of peer review reports and encourage quality co-reviewing experiences as normal training in peer review.

biorxiv scientific-communication-and-education 100-200-users 2019

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo