Phylofactorization a graph-partitioning algorithm to identify phylogenetic scales of ecological data, bioRxiv, 2017-12-17

AbstractThe problem of pattern and scale is a central challenge in ecology. The problem of scale is central to community ecology, where functional ecological groups are aggregated and treated as a unit underlying an ecological pattern, such as aggregation of “nitrogen fixing trees” into a total abundance of a trait underlying ecosystem physiology. With the emergence of massive community ecological datasets, from microbiomes to breeding bird surveys, there is a need to objectively identify the scales of organization pertaining to well-defined patterns in community ecological data.The phylogeny is a scaffold for identifying key phylogenetic scales associated with macroscopic patterns. Phylofactorization was developed to objectively identify phylogenetic scales underlying patterns in relative abundance data. However, many ecological data, such as presence-absences and counts, are not relative abundances, yet it is still desireable and informative to identify phylogenetic scales underlying a pattern of interest. Here, we generalize phylofactorization beyond relative abundances to a graph-partitioning algorithm for any community ecological data.Generalizing phylofactorization connects many tools from data analysis to phylogenetically-informe analysis of community ecological data. Two-sample tests identify three phylogenetic factors of mammalian body mass which arose during the K-Pg extinction event, consistent with other analyses of mammalian body mass evolution. Projection of data onto coordinates defined by the phylogeny yield a phylogenetic principal components analysis which refines our understanding of the major sources of variation in the human gut microbiome. These same coordinates allow generalized additive modeling of microbes in Central Park soils and confirm that a large clade of Acidobacteria thrive in neutral soils. Generalized linear and additive modeling of exponential family random variables can be performed by phylogenetically-constrained reduced-rank regression or stepwise factor contrasts. We finish with a discussion of how phylofac-torization produces an ecological species concept with a phylogenetic constraint. All of these tools can be implemented with a new R package available online.

biorxiv ecology 0-100-users 2017

Amplicon sequencing of the 16S-ITS-23S rRNA operon with long-read technology for improved phylogenetic classification of uncultured prokaryotes, bioRxiv, 2017-12-16

AbstractAmplicon sequencing of the 16S rRNA gene is the predominant method to quantify microbial compositions of environmental samples and to discover previously unknown lineages. Its unique structure of interspersed conserved and variable regions is an excellent target for PCR and allows for classification of reads at all taxonomic levels. However, the relatively few phylogenetically informative sites prevent confident phylogenetic placements of novel lineages that are deep branching relative to reference taxa. This problem is exacerbated when only short 16S rRNA gene fragments are sequenced. To resolve their placement, it is common practice to gather more informative sites by combining multiple conserved genes into concatenated datasets. This however requires genomic data which may be obtained through relatively expensive metagenome sequencing and computationally demanding analyses. Here we develop a protocol that amplifies a large part of 16S and 23S rRNA genes within the rRNA operon, including the ITS region, and sequences the amplicons with PacBio long-read technology. We tested our method with a synthetic mock community and developed a read curation pipeline that reduces the overall error rate to 0.18%. Applying our method on four diverse environmental samples, we were able to capture near full-length rRNA operon amplicons from a large diversity of prokaryotes. Phylogenetic trees constructed with these sequences showed an increase in statistical support compared to trees inferred with shorter, Illumina-like sequences using only the 16S rRNA gene (250 bp). Our method is a cost-effective solution to generate high quality, near full-length 16S and 23S rRNA gene sequences from environmental prokaryotes.

biorxiv microbiology 0-100-users 2017

Specificity of RNAi, LNA and CRISPRi as loss-of-function methods in transcriptional analysis, bioRxiv, 2017-12-16

ABSTRACTLoss-of-function (LOF) methods, such as RNA interference (RNAi), antisense oligonucleotides or CRISPR-based genome editing, provide unparalleled power for studying the biological function of genes of interest. When coupled with transcriptomic analyses, LOF methods allow researchers to dissect networks of transcriptional regulation. However, a major concern is nonspecific targeting, which involves depletion of transcripts other than those intended. The off-target effects of each of these common LOF methods have yet to be compared at the whole-transcriptome level. Here, we systematically and experimentally compared non-specific activity of RNAi, antisense oligonucleotides and CRISPR interference (CRISPRi). All three methods yielded non-negligible offtarget effects in gene expression, with CRISPRi exhibiting clonal variation in the transcriptional profile. As an illustrative example, we evaluated the performance of each method for deciphering the role of a long noncoding RNA (lncRNA) with unknown function. Although all LOF methods reduced expression of the candidate lncRNA, each method yielded different sets of differentially expressed genes upon knockdown as well as a different cellular phenotype. Therefore, to definitively confirm the functional role of a transcriptional regulator, we recommend the simultaneous use of at least two different LOF methods and the inclusion of multiple, specifically designed negative controls.

biorxiv genomics 0-100-users 2017

Examining the genetic influences of educational attainment and the validity of value-added measures of progress, bioRxiv, 2017-12-15

AbstractIn this study, we estimate (i) the SNP heritability of educational attainment at three time points throughout the compulsory educational lifecourse; (ii) the SNP heritability of value-added measures of educational progress built from test data; and (iii) the extent to which value-added measures built from teacher rated ability may be biased due to measurement error. We utilise a genome wide approach using generalized restricted maximum likelihood (GCTA-GREML) to determine the total phenotypic variance in educational attainment and value-added measures that is attributable to common genetic variation across the genome within a sample of unrelated individuals from a UK birth cohort, the Avon Longitudinal Study of Parents and Children. Our findings suggest that the heritability of educational attainment measured using point score test data increases with age from 47% at age 11 to 61% at age 16. We also find that genetic variation does not contribute towards value-added measures created only from educational attainment point score data, but it does contribute a small amount to measures that additionally control for background characteristics (up to 20.09% [95%CI 6.06 to 35.71] from age 11 to 14). Finally, our results show that value-added measures built from teacher rated ability have higher heritability than those built from exam scores. Our findings suggest that the heritability of educational attainment increases through childhood and adolescence. Value-added measures based upon fine grain point scores may be less prone to between-individual genomic differences than measures that control for students’ backgrounds, or those built from more subjective measures such as teacher rated ability.

biorxiv genetics 0-100-users 2017

 

Created with the audiences framework by Jedidiah Carlson

Powered by Hugo