DeepAD Alzheimer’s Disease Classification via Deep Convolutional Neural Networks using MRI and fMRI, bioRxiv, 2016-08-22
1AbstractTo extract patterns from neuroimaging data, various statistical methods and machine learning algorithms have been explored for the diagnosis of Alzheimer’s disease among older adults in both clinical and research applications; however, distinguishing between Alzheimer’s and healthy brain data has been challenging in older adults (age > 75) due to highly similar patterns of brain atrophy and image intensities. Recently, cutting-edge deep learning technologies have rapidly expanded into numerous fields, including medical image analysis. This paper outlines state-of-the-art deep learning-based pipelines employed to distinguish Alzheimer’s magnetic resonance imaging (MRI) and functional MRI (fMRI) from normal healthy control data for a given age group. Using these pipelines, which were executed on a GPU-based high-performance computing platform, the data were strictly and carefully preprocessed. Next, scale- and shift-invariant low- to high-level features were obtained from a high volume of training images using convolutional neural network (CNN) architecture. In this study, fMRI data were used for the first time in deep learning applications for the purposes of medical image analysis and Alzheimer’s disease prediction. These proposed and implemented pipelines, which demonstrate a significant improvement in classification output over other studies, resulted in high and reproducible accuracy rates of 99.9% and 98.84% for the fMRI and MRI pipelines, respectively. Additionally, for clinical purposes, subject-level classification was performed, resulting in an average accuracy rate of 94.32% and 97.88% for the fMRI and MRI pipelines, respectively. Finally, a decision making algorithm designed for the subject-level classification improved the rate to 97.77% for fMRI and 100% for MRI pipelines.
biorxiv bioinformatics 0-100-users 2016Direct determination of diploid genome sequences, bioRxiv, 2016-08-20
ABSTRACTDetermining the genome sequence of an organism is challenging, yet fundamental to understanding its biology. Over the past decade, thousands of human genomes have been sequenced, contributing deeply to biomedical research. In the vast majority of cases, these have been analyzed by aligning sequence reads to a single reference genome, biasing the resulting analyses and, in general, failing to capture sequences novel to a given genome.Some de novo assemblies have been constructed, free of reference bias, but nearly all were constructed by merging homologous loci into single ‘consensus’ sequences, generally absent from nature. These assemblies do not correctly represent the diploid biology of an individual. In exactly two cases, true diploid de novo assemblies have been made, at great expense. One was generated using Sanger sequencing and one using thousands of clone pools.Here we demonstrate a straightforward and low-cost method for creating true diploid de novo assemblies. We make a single library from ~1 ng of high molecular weight DNA, using the 10x Genomics microfluidic platform to partition the genome. We applied this technique to seven human samples, generating low-cost HiSeq X data, then assembled these using a new ‘pushbutton’ algorithm, Supernova. Each computation took two days on a single server. Each yielded contigs longer than 100 kb, phase blocks longer than 2.5 Mb, and scaffolds longer than 15 Mb. Our method provides a scalable capability for determining the actual diploid genome sequence in a sample, opening the door to new approaches in genomic biology and medicine.
biorxiv genomics 0-100-users 2016DNA damage is a major cause of sequencing errors, directly confounding variant identification, bioRxiv, 2016-08-20
AbstractPervasive mutations in somatic cells generate a heterogeneous genomic population within an organism and may result in serious medical conditions. While cancer is the most studied disease associated with somatic variations, recent advances in single cell and ultra deep sequencing indicate that a number of phenotypes and pathologies are impacted by cell specific variants. Currently, the accurate identification of low allelic frequency somatic variants relies on a combination of deep sequencing coverage and multiple evidences of the presence of variants. However, in this study we show that false positive variants can account for more than 70% of identified somatic variations, rendering conventional detection methods inadequate for accurate determination of low allelic variants. Interestingly, these false positive variants primarily originate from mutagenic DNA damage which directly confounds determination of genuine somatic mutations. Furthermore, we developed and validated a simple metric to measure mutagenic DNA damage, and demonstrated that mutagenic DNA damage is the leading cause of sequencing errors in widely used resources including the 1000 Genomes Project and The Cancer Genome Atlas.
biorxiv genomics 0-100-users 2016Projected spread of Zika virus in the Americas, bioRxiv, 2016-07-29
AbstractWe use a data-driven global stochastic epidemic model to project past and future spread of the Zika virus (ZIKV) in the Americas. The model has high spatial and temporal resolution, and integrates real-world demographic, human mobility, socioeconomic, temperature, and vector density data. We estimate that the first introduction of ZIKV to Brazil likely occurred between August 2013 and April 2014 (90% credible interval). We provide simulated epidemic profiles of incident ZIKV infections for several countries in the Americas through February 2017. The ZIKV epidemic is characterized by slow growth and high spatial and seasonal heterogeneity, attributable to the dynamics of the mosquito vector and to the characteristics and mobility of the human populations. We project the expected timing and number of pregnancies infected with ZIKV during the first trimester, and provide estimates of microcephaly cases assuming different levels of risk as reported in empirical retrospective studies. Our approach represents an early modeling effort aimed at projecting the potential magnitude and timing of the ZIKV epidemic that might be refined as new and more accurate data from the region become available.
biorxiv epidemiology 0-100-users 2016AFNI and Clustering False Positive Rates Redux, bioRxiv, 2016-07-27
AbstractIn response to reports of inflated false positive rate (FPR) in FMRI group analysis tools, a series of replications, investigations, and software modifications were made to address this issue. While these investigations continue, significant progress has been made to adapt AFNI to fix such problems. Two separate lines of changes have been made. First, a long-tailed model for the spatial correlation of the FMRI noise characterized by autocorrelation function (ACF) was developed and implemented into the 3dClustSim tool for determining the cluster-size threshold to use for a given voxel-wise threshold. Second, the 3dttest++ program was modified to do randomization of the voxel-wise t-tests and then to feed those randomized t-statistic maps into 3dClustSim directly for cluster-size threshold determination-without any spatial model for the ACF. These approaches were tested with the Beijing subset of the FCON-1000 data collection. The first approach shows markedly improved (reduced) FPR, but in many cases is still above the nominal 5%. The second approach shows FPRs clustered tightly about 5% across all per-voxel p-value thresholds ≤ 0.01. If t-tests from a univariate GLM are adequate for the group analysis in question, the second approach is what the AFNI group currently recommends for thresholding. If more complex per-voxel statistical analyses are required (where permutationrandomization is impracticable), then our current recommendation is to use the new ACF modeling approach coupled with a per-voxel p-threshold of 0.001 or below. Simulations were also repeated with the now infamously “buggy” version of 3dClustSim the effect of the bug on FPRs was minimal (of order a few percent).
biorxiv neuroscience 0-100-users 2016The druggable genome and support for target identification and validation in drug development, bioRxiv, 2016-07-27
Target identification (identifying the correct drug targets for each disease) and target validation (demonstrating the effect of target perturbation on disease biomarkers and disease end-points) are essential steps in drug development. We showed previously that biomarker and disease endpoint associations of single nucleotide polymorphisms (SNPs) in a gene encoding a drug target accurately depict the effect of modifying the same target with a pharmacological agent; others have shown that genomic support for a target is associated with a higher rate of drug development success. To delineate drug development (including repurposing) opportunities arising from this paradigm, we connected complex disease- and biomarker-associated loci from genome wide association studies (GWAS) to an updated set of genes encoding druggable human proteins, to compounds with bioactivity against these targets and, where these were licensed drugs, to clinical indications. We used this set of genes to inform the design of a new genotyping array, to enable druggable genome-wide association studies for drug target selection and validation in human disease.
biorxiv genetics 0-100-users 2016