A simple question. Would you use different q-value thresholds for different peak detections in a dataset?
The chip-seq experiments all have the same input, I am performing the peak detection with macs2. I would rather use the same q-value for all experiments but since some are more noisy than others, would it be wrong to use a less stringeant q-value for those? Would you say that would be "artificially" improving the detection?
I don't know if it's the most ideal way to do it but yes I've done it. I would use the same cutoff if it's for 2 samples I want to directly compare to each other (like protein X in treated vs untreated) because hopefully the samples you want to compare worked equally well otherwise it's not a fair comparison. But if it's 2 separate experiments I will sometimes use different p or q values with MACS2. What I usually do is run MACS2 on the dataset using several p and q values (eg. q0.05, q0.01, p0.0001, p0.0005, etc.), convert all the output lists of peaks to .bed files, make .wig files from the ChIP-seq alignments, and load them all up on the UCSC Genome Browser so I can see the ChIP enrichment and see all the peaks that MACS2 called, then try decide which MACS2 settings most accurately detected the peaks.
I happened to have just made a relevant image for a presentation, here's MACS2 peak calling on a ChIP-seq dataset (IP and input) using 8 different p/q values. Which is the correct value to use? Some are obviously not stringent enough and others might be too stringent, I think it's pretty subjective.
A very good example for p/q threshold. My question is maybe a bit strange: can I do no filtering at all (so it will return me all peaks red in your plot). Then I do filtering myself manually? By checking your plot, I think filtering cutoff is just selections of peaks based on their intensity right?
I want to smooth across peaks in one region (for example a gene body). I think it's improper to just select peaks with q value 0.05 then smooth between them, instead, I think I should consider all peaks existing on it (no matter significant or not). My question post is this.
I indeed adopt the same strategies of using several q-value to make my decision. I am comparing two different experiments (two antibodies) and want to assess the overlap of it.
If I am doing a venndiagram based on the peaks detected, I obtain around 20% overlap of one mark, and 90% of the other. The 90% one is very noisy. If I plot a heatmap of the two marks based on the union of peaks, "visually", I should expect a 90% overlap for both. That is why I thought to lower the threshold for the 20% one.
Having said that, the justification of the thresholds is visual only, which I do not like that much. However, some could argue that we are investigating signal that can be assessed visually so this approach is valid, which is a good point. Others could argue that if I lower the threshold for both, I should stick to the same proportions and that not doing that, is considered "artificial fitting".
I feel that both arguments are valid and I feel that there is no answer to this problem. I was hoping that somebody could bring up a point to choose.
Hi Mike:
A very good example for p/q threshold. My question is maybe a bit strange: can I do no filtering at all (so it will return me all peaks red in your plot). Then I do filtering myself manually? By checking your plot, I think filtering cutoff is just selections of peaks based on their intensity right?
I want to smooth across peaks in one region (for example a gene body). I think it's improper to just select peaks with q value 0.05 then smooth between them, instead, I think I should consider all peaks existing on it (no matter significant or not). My question post is this.
Best Tian