Cytoarchitectonic similarity is a wiring principle of the human connectome, bioRxiv, 2016-08-07
AbstractUnderstanding the wiring diagram of the human cerebral cortex is a fundamental challenge in neuroscience. Elemental aspects of its organization remain elusive. Here we examine which structural traits of cortical regions, particularly their cytoarchitecture and thickness, relate to the existence and strength of inter-regional connections. We use the architecture data from the classic work of von Economo and Koskinas and state-of-the-art diffusion-based connectivity data from the Human Connectome Project. Our results reveal a prominent role of the cytoarchitectonic similarity of supragranular layers for predicting the existence and strength of connections. In contrast, cortical thickness similarity was not related to the existence or strength of connections. These results are in line with findings for non-human mammalian cerebral cortices, suggesting overarching wiring principles of the mammalian cerebral cortex. The results invite hypotheses about evolutionary conserved neurobiological mechanisms that give rise to the relation of cytoarchitecture and connectivity in the human cerebral cortex.
biorxiv neuroscience 100-200-users 2016A tutorial on how (not) to over-interpret STRUCTUREADMIXTURE bar plots, bioRxiv, 2016-07-29
AbstractGenetic clustering algorithms, implemented in popular programs such as STRUCTURE and ADMIXTURE, have been used extensively in the characterisation of individuals and populations based on genetic data. A successful example is the reconstruction of the genetic history of African Americans who are a product of recent admixture between highly differentiated populations. Histories can also be reconstructed using the same procedure for groups which do not have admixture in their recent history, where recent genetic drift is strong or that deviate in other ways from the underlying inference model. Unfortunately, such histories can be misleading. We have implemented an approach (badMIXTURE, available at github.comdanjlawsonbadMIXTURE) to assess the goodness of fit of the model using the ancestry “palettes” estimated by CHROMOPAINTER and apply it to both simulated data and real case studies. Combining these complementary analyses with additional methods that are designed to test specific hypotheses allows a richer and more robust analysis of recent demographic history based on genetic data.
biorxiv genetics 200-500-users 2016Projected spread of Zika virus in the Americas, bioRxiv, 2016-07-29
AbstractWe use a data-driven global stochastic epidemic model to project past and future spread of the Zika virus (ZIKV) in the Americas. The model has high spatial and temporal resolution, and integrates real-world demographic, human mobility, socioeconomic, temperature, and vector density data. We estimate that the first introduction of ZIKV to Brazil likely occurred between August 2013 and April 2014 (90% credible interval). We provide simulated epidemic profiles of incident ZIKV infections for several countries in the Americas through February 2017. The ZIKV epidemic is characterized by slow growth and high spatial and seasonal heterogeneity, attributable to the dynamics of the mosquito vector and to the characteristics and mobility of the human populations. We project the expected timing and number of pregnancies infected with ZIKV during the first trimester, and provide estimates of microcephaly cases assuming different levels of risk as reported in empirical retrospective studies. Our approach represents an early modeling effort aimed at projecting the potential magnitude and timing of the ZIKV epidemic that might be refined as new and more accurate data from the region become available.
biorxiv epidemiology 0-100-users 2016AFNI and Clustering False Positive Rates Redux, bioRxiv, 2016-07-27
AbstractIn response to reports of inflated false positive rate (FPR) in FMRI group analysis tools, a series of replications, investigations, and software modifications were made to address this issue. While these investigations continue, significant progress has been made to adapt AFNI to fix such problems. Two separate lines of changes have been made. First, a long-tailed model for the spatial correlation of the FMRI noise characterized by autocorrelation function (ACF) was developed and implemented into the 3dClustSim tool for determining the cluster-size threshold to use for a given voxel-wise threshold. Second, the 3dttest++ program was modified to do randomization of the voxel-wise t-tests and then to feed those randomized t-statistic maps into 3dClustSim directly for cluster-size threshold determination-without any spatial model for the ACF. These approaches were tested with the Beijing subset of the FCON-1000 data collection. The first approach shows markedly improved (reduced) FPR, but in many cases is still above the nominal 5%. The second approach shows FPRs clustered tightly about 5% across all per-voxel p-value thresholds ≤ 0.01. If t-tests from a univariate GLM are adequate for the group analysis in question, the second approach is what the AFNI group currently recommends for thresholding. If more complex per-voxel statistical analyses are required (where permutationrandomization is impracticable), then our current recommendation is to use the new ACF modeling approach coupled with a per-voxel p-threshold of 0.001 or below. Simulations were also repeated with the now infamously “buggy” version of 3dClustSim the effect of the bug on FPRs was minimal (of order a few percent).
biorxiv neuroscience 0-100-users 2016Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega-reads algorithm, bioRxiv, 2016-07-27
AbstractLong sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and highly repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy.
biorxiv bioinformatics 100-200-users 2016Massively parallel digital transcriptional profiling of single cells, bioRxiv, 2016-07-27
ABSTRACTCharacterizing the transcriptome of individual cells is fundamental to understanding complex biological systems. We describe a droplet-based system that enables 3′ mRNA counting of up to tens of thousands of single cells per sample. Cell encapsulation in droplets takes place in ∼6 minutes, with ∼50% cell capture efficiency, up to 8 samples at a time. The speed and efficiency allow the processing of precious samples while minimizing stress to cells. To demonstrate the system′s technical performance and its applications, we collected transcriptome data from ∼¼ million single cells across 29 samples. First, we validate the sensitivity of the system and its ability to detect rare populations using cell lines and synthetic RNAs. Then, we profile 68k peripheral blood mononuclear cells (PBMCs) to demonstrate the system′s ability to characterize large immune populations. Finally, we use sequence variation in the transcriptome data to determine host and donor chimerism at single cell resolution in bone marrow mononuclear cells (BMMCs) of transplant patients. This analysis enables characterization of the complex interplay between donor and host cells and monitoring of treatment response. This high-throughput system is robust and enables characterization of diverse biological systems with single cell mRNA analysis.
biorxiv genomics 100-200-users 2016