New synthetic-diploid benchmark for accurate variant calling evaluation, bioRxiv, 2017-11-23
Constructed from the consensus of multiple variant callers based on short-read data, existing benchmark datasets for evaluating variant calling accuracy are biased toward easy regions accessible by known algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two human cell lines that are homozygous across the whole genome. This benchmark provides a more accurate and less biased estimate of the error rate of small variant calls in a realistic context.
biorxiv bioinformatics 100-200-users 2017Recovery of gene haplotypes from a metagenome, bioRxiv, 2017-11-23
AbstractElucidation of population-level diversity of microbiomes is a significant step towards a complete understanding of the evolutionary, ecological and functional importance of microbial communities. Characterizing this diversity requires the recovery of the exact DNA sequence (haplotype) of each gene isoform from every individual present in the community. To address this, we present Hansel and Gretel a freely-available data structure and algorithm, providing a software package that reconstructs the most likely haplotypes from metagenomes. We demonstrate recovery of haplotypes from short-read Illumina data for a bovine rumen microbiome, and verify our predictions are 100% accurate with long-read PacBio CCS sequencing. We show that Gretel’s haplotypes can be analyzed to determine a significant difference in mutation rates between core and accessory gene families in an ovine rumen microbiome. All tools, documentation and data for evaluation are open source and available via our repository <jatsext-link xmlnsxlink=httpwww.w3.org1999xlink ext-link-type=uri xlinkhref=httpsgithub.comsamstudio8gretel>httpsgithub.comsamstudio8gretel<jatsext-link>
biorxiv bioinformatics 100-200-users 2017Spliceosome profiling visualizes the operations of a dynamic RNP in vivo at nucleotide resolution, bioRxiv, 2017-11-23
SummaryTools to understand how the spliceosome functions in vivo have lagged behind advances in its structural biology. We describe methods to globally profile spliceosome-bound precursor, intermediates and products at nucleotide resolution. We apply these tools to three divergent yeast species that span 600 million years of evolution. The sensitivity of the approach enables detection of novel cases of non-canonical catalysis including interrupted, recursive and nested splicing. Employing statistical modeling to understand the quantitative relationships between RNA features and the data, we uncover independent roles for intron size, position and number in substrate progression through the two catalytic stages. These include species-specific inputs suggestive of spliceosome-transcriptome coevolution. Further investigations reveal ATP-dependent discard of numerous endogenous substrates at both the precursor and lariat-intermediate stages and connect discard to intron retention, a form of splicing regulation. Spliceosome profiling is a quantitative, generalizable global technology to investigate an RNP central to eukaryotic gene expression.Highlights<jatslist list-type=bullet><jatslist-item>Measurement of spliceosome-bound precursor and intermediate in three species<jatslist-item><jatslist-item>Non-canonical splicing events revealed<jatslist-item><jatslist-item>Statistical modeling uncovers substrate features that predict catalytic efficiency<jatslist-item><jatslist-item>Discard of suboptimal substrates occurs in vivo and predicts intron-retained mRNAs<jatslist-item>
biorxiv molecular-biology 0-100-users 2017Current CRISPR gene drive systems are likely to be highly invasive in wild populations, bioRxiv, 2017-11-21
AbstractRecent reports have suggested that CRISPR-based gene drives are unlikely to invade wild populations due to drive-resistant alleles that prevent cutting. Here we develop mathematical models based on existing empirical data to explicitly test this assumption. We show that although resistance prevents drive systems from spreading to fixation in large populations, even the least effective systems reported to date are highly invasive. Releasing a small number of organisms often causes invasion of the local population, followed by invasion of additional populations connected by very low gene flow rates. Examining the effects of mitigating factors including standing variation, inbreeding, and family size revealed that none of these prevent invasion in realistic scenarios. Highly effective drive systems are predicted to be even more invasive. Contrary to the National Academies report on gene drive, our results suggest that standard drive systems should not be developed nor field-tested in regions harboring the host organism.
biorxiv synthetic-biology 100-200-users 2017Dynamics of the upper airway microbiome in the pathogenesis of asthma-associated persistent wheeze in preschool children, bioRxiv, 2017-11-21
ABSTRACTRepeated cycles of infection-associated lower airway inflammation drives the pathogenesis of persistent wheezing disease in children. Tracking these events across a birth cohort during their first five years, we demonstrate that >80% of infectious events indeed involve viral pathogens, but are accompanied by a shift in the nasopharyngeal microbiome (NPM) towards dominance by a small range of pathogenic bacterial genera. Unexpectedly, this change in NPM frequently precedes the appearance of viral pathogens and acute symptoms. In non-sensitized children these events are associated only with “transient wheeze” that resolves after age three. In contrast, in children developing early allergic sensitization, they are associated with ensuing development of persistent wheeze, which is the hallmark of the asthma phenotype. This suggests underlying pathogenic interactions between allergic sensitization and antibacterial mechanisms.
biorxiv genetics 0-100-users 2017Eye movement-related confounds in neural decoding of visual working memory representations, bioRxiv, 2017-11-21
AbstractThe study of visual working memory (VWM) has recently seen revitalization with the emergence of new insights and theories regarding its neural underpinnings. One crucial ingredient responsible for this progress is the rise of neural decoding techniques. These techniques promise to uncover the representational contents of neural signals, as well as the underlying code and the dynamic profile thereof. Here, we aimed to contribute to the field by subjecting human volunteers to a combined VWMimagery task, while recording and decoding their neural signals as measured by MEG. At first sight, the results seem to provide evidence for a persistent, stable representation of the memorandum throughout the delay period. However, control analyses revealed that these findings can be explained by subtle, VWM-specific eye movements. As a potential remedy, we demonstrate the use of a functional localizer, which was specifically designed to target bottom-up sensory signals and as such avoids eye movements, to train the neural decoders. This analysis revealed a sustained representation for approximately 1 second, but no longer throughout the entire delay period. We conclude by arguing for more awareness of the potentially pervasive and ubiquitous effects of eye movement-related confounds.Significance statementVisual working memory is an important aspect of higher cognition and has been subject of much investigation within the field of cognitive neuroscience. Over recent years, these studies have increasingly relied on the use of neural decoding techniques. Here, we show that neural decoding may be susceptible to confounds induced by stimulus-specific eye movements. Such eye movements during working memory have been reported before, and may in fact be a common phenomenon. Given the widespread use of neural decoding and the potentially contaminating effects of eye movements, we therefore believe that our results are of significant relevance for the field.
biorxiv neuroscience 0-100-users 2017