Oxford Nanopore sequencing in a research-based undergraduate course, bioRxiv, 2017-12-01
AbstractBackgroundNanopore sequencing is a third generation genomic sequencing method that offers real time sequencing of DNA samples. Nanopore sequencing is an excellent tool for teaching because it involves cutting-edge sequencing methods and also helps students to develop a research mindset, where students can learn to identify and resolve problems that arise during an experiment.ResultsWe, as a group of undergraduate biology students, were able to use nanopore sequencing to analyze a sample of pupfish DNA. We were able to accomplish this without computer science backgrounds and only some basic DNA extraction training. Although there were issues, such as inconsistent results across runs, we found it useful as a research learning experience and an application of the skills we learned.ConclusionsAs students, it was exciting to be able to experience this technology first hand and apply what we learned in the classroom. Nanopore sequencing holds potential for DNA sequencing of large fragments in real time. It allows students to be acquainted with novel technologies and the theories behind them. However, as with all new techniques, it does not have the same established support, and when students run into difficulties while using nanopore sequencing, it is often difficult to identify what went wrong.
biorxiv genomics 100-200-users 2017Deciphering eukaryotic cis-regulatory logic with 100 million random promoters, bioRxiv, 2017-11-26
AbstractDeciphering cis-regulation, the code by which transcription factors (TFs) interpret regulatory DNA sequence to control gene expression levels, is a long-standing challenge. Previous studies of native or engineered sequences have remained limited in scale. Here, we use random sequences as an alternative, allowing us to measure the expression output of over 100 million synthetic yeast promoters. Random sequences yield a broad range of reproducible expression levels, indicating that the fortuitous binding sites in random DNA are functional. From these data we learn models of transcriptional regulation that predict over 94% of the expression driven from independent test data and nearly 89% from sequences from yeast promoters. These models allow us to characterize the activity of TFs and their interactions with chromatin, and help refine cis-regulatory motifs. We find that strand, position, and helical face preferences of TFs are widespread and depend on interactions with neighboring chromatin. Such massive-throughput regulatory assays of random DNA provide the diverse examples necessary to learn complex models of cis-regulatory logic.
biorxiv genomics 200-500-users 20175-Formylcytosine controls nucleosome positioning through covalent histone-DNA interaction, bioRxiv, 2017-11-25
Nucleosomes are the basic unit of chromatin that ensure genome integrity and control access to the genetic information. The organization of nucleosomes is influenced by the underlying DNA sequence itself, transcription factors or other transcriptional machinery associated proteins and chromatin remodeling complexes (1–4). Herein, we show that the naturally occurring DNA modification, 5-formylcytosine (5fC) contributes to the positioning of nucleosomes. We show that the ability of 5fC to position nucleosomes in vitro is associated with the formation of covalent interactions between histone residues and 5fC in the form of Schiff bases. We demonstrate that similar interactions can occur in a cellular environment and define their specific genomic loci in mouse embryonic stem cells. Collectively, our findings identify 5fC as a determinant of nucleosomal organization in which 5fC plays a role in establishing distinct regulatory regions that are linked to gene expression Our study provides a previously unknown molecular mechanism, involving the formation of reversible-covalent bonds between chromatin and DNA that supports a molecular linkage between DNA sequence, DNA base modification and chromatin structure.
biorxiv genomics 0-100-users 2017Expressed Exome Capture Sequencing (EecSeq) a method for cost-effective exome sequencing for all organisms with or without genomic resources, bioRxiv, 2017-11-24
AbstractExome capture is an effective tool for surveying the genome for loci under selection. However, traditional methods require annotated genomic resources. Here, we present a method for creating cDNA probes from expressed mRNA, which are then used to enrich and capture genomic DNA for exon regions. This approach, called “EecSeq”, eliminates the need for costly probe design and synthesis. We tested EecSeq in the eastern oyster, Crassostrea virginica, using a controlled exposure experiment. Four adult oysters were heat shocked at 36° C for 1 hour along with four control oysters kept at 14° C. Stranded mRNA libraries were prepared for two individuals from each treatment and pooled. Half of the combined library was used for probe synthesis and half was sequenced to evaluate capture efficiency. Genomic DNA was extracted from all individuals, enriched via captured probes, and sequenced directly. We found that EecSeq had an average capture sensitivity of 86.8% across all known exons and had over 99.4% sensitivity for exons with detectable levels of expression in the mRNA library. For all mapped reads, over 47.9% mapped to exons and 37.0% mapped to expressed targets, which is similar to previously published exon capture studies. EecSeq displayed relatively even coverage within exons (i.e. minor “edge effects”) and even coverage across exon GC content. We discovered 5,951 SNPs with a minimum average coverage of 80X, with 3,508 SNPs appearing in exonic regions. We show that EecSeq provides comparable, if not superior, specificity and capture efficiency compared to costly, traditional methods.
biorxiv genomics 0-100-users 2017Genome-wide polygenic score to identify a monogenic risk-equivalent for coronary disease, bioRxiv, 2017-11-21
AbstractIdentification of individuals at increased genetic risk for a complex disorder such as coronary disease can facilitate treatments or enhanced screening strategies. A rare monogenic mutation associated with increased cholesterol is present in ~1250 carriers and confers an up to 4-fold increase in coronary risk when compared with non-carriers. Although individual common polymorphisms have modest predictive capacity, their cumulative impact can be aggregated into a polygenic score. Here, we develop a new, genome-wide polygenic score that aggregates information from 6.6 million common polymorphisms and show that this score can similarly identify individuals with a 4-fold increased risk for coronary disease. In >400,000 participants from UK Biobank, the score conforms to a normal distribution and those in the top 2.5% of the distribution are at 4-fold increased risk compared to the remaining 97.5%. Similar patterns are observed with genome-wide polygenic scores for two additional diseases – breast cancer and severe obesity.One Sentence SummaryA genome-wide polygenic score identifies 2.5% of the population born with a 4-fold increased risk for coronary artery disease.
biorxiv genomics 100-200-users 2017Higher-order inter-chromosomal hubs shape 3-dimensional genome organization in the nucleus, bioRxiv, 2017-11-19
ABSTRACTEukaryotic genomes are packaged into a 3-dimensional structure in the nucleus of each cell. There are currently two distinct views of genome organization that are derived from different technologies. The first view, derived from genome-wide proximity ligation methods (e.g. Hi-C), suggests that genome organization is largely organized around chromosomes. The second view, derived from in situ imaging, suggests a central role for nuclear bodies. Yet, because microscopy and proximity-ligation methods measure different aspects of genome organization, these two views remain poorly reconciled and our overall understanding of how genomic DNA is organized within the nucleus remains incomplete. Here, we develop Split-Pool Recognition of Interactions by Tag Extension (SPRITE), which moves away from proximity-ligation and enables genome-wide detection of higher-order DNA interactions within the nucleus. Using SPRITE, we recapitulate known genome structures identified by Hi-C and show that the contact frequencies measured by SPRITE strongly correlate with the 3-dimensional distances measured by microscopy. In addition to known structures, SPRITE identifies two major hubs of inter-chromosomal interactions that are spatially arranged around the nucleolus and nuclear speckles, respectively. We find that the majority of genomic regions exhibit preferential spatial association relative to one of these nuclear bodies, with regions that are highly transcribed by RNA Polymerase II organizing around nuclear speckles and transcriptionally inactive and centromere-proximal regions organizing around the nucleolus. Together, our results reconcile the two distinct pictures of nuclear structure and demonstrate that nuclear bodies act as inter-chromosomal hubs that shape the overall 3-dimensional packaging of genomic DNA in the nucleus.
biorxiv genomics 100-200-users 2017