Deep learning in bioinformatics introduction, application, and perspective in big data era, bioRxiv, 2019-03-01
Deep learning, which is especially formidable in handling big data, has achieved great success in various fields, including bioinformatics. With the advances of the big data era in biology, it is foreseeable that deep learning will become increasingly important in the field and will be incorporated in vast majorities of analysis pipelines. In this review, we provide both the exoteric introduction of deep learning, and concrete examples and implementations of its representative applications in bioinformatics. We start from the recent achievements of deep learning in the bioinformatics field, pointing out the problems which are suitable to use deep learning. After that, we introduce deep learning in an easy-to-understand fashion, from shallow neural networks to legendary convolutional neural networks, legendary recurrent neural networks, graph neural networks, generative adversarial networks, variational autoencoder, and the most recent state-of-the-art architectures. After that, we provide eight examples, covering five bioinformatics research directions and all the four kinds of data type, with the implementation written in Tensorflow and Keras. Finally, we discuss the common issues, such as overfitting and interpretability, that users will encounter when adopting deep learning methods and provide corresponding suggestions. The implementations are freely available at httpsgithub.comlykaust15Deep_learning_examples .
biorxiv bioinformatics 100-200-users 2019The Genomic Ecosystem of Transposable Elements in Maize, bioRxiv, 2019-03-01
Transposable elements (TEs) constitute the majority of flowering plant DNA, reflecting their tremendous success in subverting, avoiding, and surviving the defenses of their host genomes to ensure their selfish replication. More than 85% of the sequence of the maize genome can be ascribed to past transposition, providing a major contribution to the structure of the genome. Evidence from individual loci has informed our understanding of how transposition has shaped the genome, and a number of individual TE insertions have been causally linked to dramatic phenotypic changes. But genome-wide analyses in maize and other taxa have frequently represented TEs as a relatively homogeneous class of fragmentary relics of past transposition, obscuring their evolutionary history and interaction with their host genome. Using an updated annotation of structurally intact TEs in the maize reference genome, we investigate the family-level ecological and evolutionary dynamics of TEs in maize. Integrating a variety of data, from descriptors of individual TEs like coding capacity, expression, and methylation, as well as similar features of the sequence they inserted into, we model the relationship between these attributes of the genomic environment and the survival of TE copies and families. Our analyses reveal a diversity of ecological strategies of TE families, each representing the evolution of a distinct ecological niche allowing survival of the TE family. In contrast to the wholesale relegation of all TEs to a single category of junk DNA, these differences generate a rich ecology of the genome, suggesting families of TEs that coexist in time and space compete and cooperate with each other. We conclude that while the impact of transposition is highly family- and context-dependent, a family-level understanding of the ecology of TEs in the genome can refine our ability to predict the role of TEs in generating genetic and phenotypic diversity.
biorxiv evolutionary-biology 100-200-users 2019Drug screens of NGLY1 Deficiency worm and fly models reveal catecholamine, NRF2 and anti-inflammatory pathway activation as clinical approaches, bioRxiv, 2019-02-28
N-glycanase 1NGLY1 Deficiency is an ultra-rare and complex monogenic glycosylation disorder that affects fewer than 40 patients globally. NGLY1 Deficiency has been studied in model organisms such as yeast, worms, flies and mice. Proteasomal and mitochondrial homeostasisgene networks are controlled by the evolutionarily conserved transcriptional regulator Nrf1, whose activity requires deglycosylation by NGLY1. Hypersensitivity to the proteasome inhibitor bortezomib is a common phenotype observed in whole animal and cellular models of NGLY1Deficiency. Here we describe unbiased phenotypic drug screens to identify FDA approved drugs, generally recognized as safe natural products and novel chemical entities that rescue growth and development of NGLY1-deficient worm and fly larvae treated with a toxic dose of bortezomib. We used image-based larval size and number assays for use in screens of a 2,560-member drug repurposing library and a 20,240-member lead discovery library. A total of 91 validated hit compounds from primary invertebrate screens were tested in a human cell line in a NRF2 activity assay. NRF2 is a transcriptional regulator that regulates cellular redox homeostasis and it can compensate for loss of Nrf1. Plant-based polyphenols comprise the largest class of hit compounds and NRF2 inducers. Catecholamines and catecholamine receptor activators comprise the second largest class of hits. Steroidal and non-steroidal anti-inflammatory drugs comprise the third largest class. Only one compound was active in all assays and species the atypical antipsychotic and dopamine receptor agonist aripiprazole. Worm and fly models of NGLY1 Deficiency validate therapeutic rationales for activation of NRF2 and anti-inflammatory pathways based on results in mice and human cell models and suggest a novel therapeutic rationale for boosting catecholamine levels andor signaling in the brain.
biorxiv pharmacology-and-toxicology 100-200-users 2019Human loss-of-function variants suggest that partial LRRK2 inhibition is a safe therapeutic strategy for Parkinsons disease, bioRxiv, 2019-02-28
Human genetic variants causing loss of function (LoF) of protein-coding genes provide natural in vivo models of gene inactivation, which are powerful indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes. Gain of kinase function variants in LRRK2 are known to significantly increase the risk of Parkinsons disease suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. Whilst preclinical studies in model organisms have raised some on-target toxicity concerns, the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here we systematically analyse LoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD) and over 4 million participants in the 23andMe genotyped dataset, to assess their impact at a molecular and phenotypic level. After thorough variant curation, we identify 1,358 individuals with high-confidence predicted LoF variants in LRRK2, several with experimental validation. We show that heterozygous LoF of LRRK2 reduces LRRK2 protein level by ~50% but is not associated with reduced life expectancy, or with any specific phenotype or disease state. These data suggest that therapeutics that downregulate LRRK2 levels or kinase activity by up to 50% are unlikely to have major on-target safety liabilities. Our results demonstrate the value of large scale genomic databases and phenotyping of human LoF carriers for target validation in drug discovery.
biorxiv genomics 100-200-users 2019Human loss-of-function variants suggest that partial LRRK2 inhibition is a safe therapeutic strategy for Parkinson’s disease, bioRxiv, 2019-02-28
AbstractHuman genetic variants causing loss-of-function (LoF) of protein-coding genes provide natural in vivo models of gene inactivation, which are powerful indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes1,2. Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson’s disease3,4, suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. Whilst preclinical studies in model organisms have raised some on-target toxicity concerns5–8, the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here we systematically analyse LoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9 and over 4 million participants in the 23andMe genotyped dataset, to assess their impact at a molecular and phenotypic level. After thorough variant curation, we identify 1,358 individuals with high-confidence predicted LoF variants in LRRK2, several with experimental validation. We show that heterozygous LoF of LRRK2 reduces LRRK2 protein level by ~50% but is not associated with reduced life expectancy, or with any specific phenotype or disease state. These data suggest that therapeutics that downregulate LRRK2 levels or kinase activity by up to 50% are unlikely to have major on-target safety liabilities. Our results demonstrate the value of large scale genomic databases and phenotyping of human LoF carriers for target validation in drug discovery.
biorxiv genomics 100-200-users 2019A positively selected, common, missense variant in FBN1 confers a 2.2 centimeter reduction of height in the Peruvian population, bioRxiv, 2019-02-26
Peruvians are among the shortest people in the world. To understand the genetic basis of short stature in Peru, we examined an ethnically diverse group of Peruvians and identified a novel, population-specific, missense variant in FBN1 (E1297G) that is significantly associated with lower height in the Peruvian population. Each copy of the minor allele (frequency = 4.7%) reduces height by 2.2 cm (4.4 cm in homozygous individuals). This is the largest effect size known for a common height-associated variant. This variant shows strong evidence of positive selection within the Peruvian population and is significantly more frequent in Native American populations from coastal regions of Peru compared to populations from the Andes or the Amazon, suggesting that short stature in Peruvians is the result of adaptation to the coastal environment.
biorxiv genomics 100-200-users 2019