Consistent and correctable bias in metagenomic sequencing experiments, bioRxiv, 2019-02-25
AbstractMarker-gene and metagenomic sequencing have profoundly expanded our ability to measure biological communities. But the measurements they provide differ from the truth, often dramatically, because these experiments are biased towards detecting some taxa over others. This experimental bias makes the taxon or gene abundances measured by different protocols quantitatively incomparable and can lead to spurious biological conclusions. We propose a mathematical model for how bias distorts community measurements based on the properties of real experiments. We validate this model with 16S rRNA gene and shotgun metagenomics data from defined bacterial communities. Our model better fits the experimental data despite being simpler than previous models. We illustrate how our model can be used to evaluate protocols, to understand the effect of bias on downstream statistical analyses, and to measure and correct bias given suitable calibration controls. These results illuminate new avenues towards truly quantitative and reproducible metagenomics measurements.
biorxiv bioinformatics 100-200-users 2019The Linked Selection Signature of Rapid Adaptation in Temporal Genomic Data, bioRxiv, 2019-02-25
Populations can adapt over short, ecological timescales via standing genetic variation. Genomic data collected over tens of generations in both natural and lab populations is increasingly used to find selected loci underpinning such rapid adaptation. Although selection on large effect loci may be detectable in such data, often the fitness differences between individuals have a polygenic architecture, such that selection at any one locus leads to allele frequency changes that are too subtle to distinguish from genetic drift. However, one promising signal comes from the fact that selection on polygenic traits leads to heritable fitness backgrounds that neutral alleles can become stochastically associated with. These associations perturb neutral allele frequency trajectories, creating autocovariance across generations that can be directly measured from temporal genomic data. We develop theory that predicts the magnitude of these temporal autocovariances, showing that it is determined by the level of additive genetic variation, recombination, and linkage disequilibria in a region. Furthermore, by using analytic expressions for the temporal variances and autocovariances in allele frequency, we demonstrate one can estimate the additive genetic variation for fitness and the drift-effective population size from temporal genomic data. Finally, we also show how the proportion of total variation in allele frequency change due to linked selection can be estimated from temporal data. Temporal genomic data offers strong opportunities to identify the role linked selection has on genome-wide diversity over short timescales, and can help bridge population genetic and quantitative genetic studies of adaptation.
biorxiv evolutionary-biology 100-200-users 2019A comparison of three programming languages for a full-fledged next-generation sequencing tool, bioRxiv, 2019-02-23
Background elPrep is an established multi-threaded framework for preparing SAM and BAM files in sequencing pipelines. To achieve good performance, its software architecture makes only a single pass through a SAMBAM file for multiple preparation steps, and keeps sequencing data as much as possible in main memory. Similar to other SAMBAM tools, management of heap memory is a complex task in elPrep, and it became a serious productivity bottleneck in its original implementation language during recent further development of elPrep. We therefore investigated three alternative programming languages Go and Java using a concurrent, parallel garbage collector on the one hand, and C++17 using reference counting on the other hand for handling large amounts of heap objects. We reimplemented elPrep in all three languages and benchmarked their runtime performance and memory use.Results The Go implementation performs best, yielding the best balance between runtime performance and memory use. While the Java benchmarks report a somewhat faster runtime than the Go benchmarks, the memory use of the Java runs is significantly higher. The C++17 benchmarks run significantly slower than both Go and Java, while using somewhat more memory than the Go runs. Our analysis shows that concurrent, parallel garbage collection is better at managing a large heap of objects than reference counting in our case.Conclusions Based on our benchmark results, we selected Go as our new implementation language for elPrep, and recommend considering Go as a good candidate for developing other bioinformatics tools for processing SAMBAM data as well.
biorxiv bioinformatics 100-200-users 2019Ion beam subcellular tomography, bioRxiv, 2019-02-23
Multiplexed ion beam imaging (MIBI) has been previously used to profile multiple parameters in two dimensions in single cells within tissue slices. Here, a mathematical and technical framework for three-dimensional subcellular MIBI is presented. We term the approach ion beam tomography (IBT) wherein ion beam images are acquired iteratively across successive, multiple scans and later compiled into a 3D format. For IBT, cells were imaged at 0.2-4 pA ion current across 1,000 axial scans. Consecutive subsets of ion beam images were binned over 3 to 20 slices (above and below) to create a resolved image, wherein binning was incremented one slice at a time to yield an enhanced multi-depth data without loss of depth resolution. Algorithmic deconvolution, tailored for ion beams, was then applied to the transformed ion image series using a hybrid deblurring algorithm and an ion beam current-dependent point-spread function. Three-dimensional processing was implemented by segmentation, mesh, molecular neighborhoods, and association maps. In cultured cancer cells and tissues, IBT enabled accessible visualization of three-dimensional volumetric distributions of genomic regions, RNA transcripts, and protein factors with 65-nm lateral and 5-nm axial resolution. IBT also enabled label-free elemental mapping of cells, allowing point of source cellular component measurements not possible for most optical microscopy targets. Detailed multiparameter imaging of subcellular features at near macromolecular resolution should now be made possible by the IBT tools and reagents provided here to open novel venues for interrogating subcellular biology.
biorxiv systems-biology 100-200-users 2019The effect of environmental enrichment on behavioral variability depends on genotype, behavior, and type of enrichment, bioRxiv, 2019-02-22
Non-genetic individuality in behavior, also termed intragenotypic variability, has been observed across many different organisms. A potential cause of intragenotypic variability is sensitivity to minute environmental differences during development, even as major environmental parameters are kept constant. Animal enrichment paradigms often include the addition of environmental diversity, whether in the form of social interaction, novel objects, or exploratory opportunities. Enrichment could plausibly affect intragenotypic variability in opposing ways it could cause an increase in variability due to the increase in microenvironmental variation, or a decrease in variability due to elimination of aberrant behavior as animals are taken out of impoverished laboratory conditions. In order to test our hypothesis, we assayed five isogenic Drosophila melanogaster lines raised in control and mild enrichment conditions, and one isogenic line under both mild and intense enrichment conditions. We compared the mean and variability of six behavioral metrics between our enriched fly populations and the laboratory housing control. We found that enrichment often caused a small increase in variability across most of our behaviors, but that the ultimate effect of enrichment on both behavioral means and variabilities was highly dependent on genotype and its interaction with the particular enrichment treatment. Our results support previous work on enrichment that presents a highly variable picture of its effects on both behavior and physiology.
biorxiv genetics 100-200-users 2019Microfluidic protein isolation and sample preparation for high resolution cryo-EM, bioRxiv, 2019-02-21
High-resolution structural information is essential to understand protein function. Protein-structure determination needs a considerable amount of protein, which can be challenging to produce, often involving harsh and lengthy procedures. In contrast, the several thousands to a few million protein particles required for structure-determination by cryogenic electron microscopy (cryo-EM) can be provided by miniaturized systems. Here, we present a microfluidic method for the rapid isolation of a target protein and its direct preparation for cryo-EM. Less than one microliter of cell lysate is required as starting material to solve the atomic structure of the untagged, endogenous human 20S proteasome. Our work paves the way for high-throughput structure determination of proteins from minimal amounts of cell lysate and opens new opportunities for the isolation of sensitive, endogenous protein complexes.
biorxiv biophysics 100-200-users 2019