I am reanalysing a ChIP-seq dataset I downloaded from GEO. Using the nf-core ChIP-seq pipeline, I have performed QC, alligned reads and called peaks using MACS. I am now at the point where I have annotated peak files and bigwig files (which can be visualised in IGV) for each of my samples. Now, I'm a bit unsure how to proceed from this point. The specific question I would like to answer is; do my targets bind within the region of tRNA genes and how does this change with experimental treatment? I'm not interested in genome-wide binding of my targets; only within tRNA genes.
How would I go about answering this question? I have a tRNA genes bed file which I can visualise alongside my sample peaks and I can see that there are peaks within the tRNA genes but I'm confused as to how I quantify this binding/display this in terms of figures/plot.
Any help would be much appreciated!
Can you elaborate on where you get stuck? Remind me but tRNA, is it annotated on the regular chromosomes or on some extra contig? Anyway, I would do a regular differential analysis, based on a count matrix from this experiment, and then subset to the regions you are interested in after the analysis. Plots like Volcanos or MA-plots come to mind. Does this make sense?
Hi! Thank you for your response! You are absolutely right, tRNA genes are annotated throughout the genome the same as normal genes (albeit they are very repetitive stretches of DNA). I should probably have explained just how new I am to this; I'm in my first year of my PhD and this is my first foray into bioinformatics so its not specifically the fact that I'm looking at tRNA that is confusing me, but rather I would struggle looking at any area of the genome! Just to make sure I understand, you would perform a differential binding analysis (maybe with a tool like diffbind) and then using the tRNA bed file I have from USCS, I would be able to subsest to the tRNA genome at some point? And thank you for suggesting some possible plots! Again, thank you for response; this has been very helpful already!
Yes indeed, I probably would use such a package (especially when you're new to the party because there are some pitfalls if you code-up everything from scratch), and then after the whole analysis, once you get your file with the regions and its stats, subset to the tRNA. tRNA is not really my field but can you confidently map reads to them, so when you load your BAM files or BigWigs into the genome browser and inspect these loci, do you see peaks? I am asking because repetitive regions are hard to map with short Illumina reads, so it might be that they multimap and by this get discarded.
Given that tRNA genes are quite repetitive, a conventional analysis may not be suitable here because multi-mapping reads will be removed within nf-core/chipseq by default.
I done something similar for rRNA/transposon genes for the paper below but it required more of a custom analysis: https://www.life-science-alliance.org/content/4/7/e202000961/tab-article-info
You could use csaw to count windows across the genome and then test for differential binding only on windows locating to tRNA genes (see the manual, section 7.2.2 and 9.8.3).
(This would be a bit similar to e.g. using featureCounts to count your reads only at tRNA genes and then doing differential analyses, but allowing for spatial resolution)
Don't you think it makes sense to run testing on all windows, and then later subset? I could imagine that there are not many windows left if filtering for the tRNA loci before the testing. Might cause issues with normalization to use very few windows.
Yes, you're right, I expressed myself imprecisely, the csaw workflow tests first all the windows (so you normalize prior to this step so you can use all data for normalizing checking biases, etc.) and then you can group those tested windows of interest at specific regions, so that it recomputes the statistics to return a region-level FDR.
I've sometimes found this approach useful to screen e.g. sets of regions (promoters of genes, enhancer locations) with some statistical confidence when there are very small changes at the whole-genome level using all windows.