Hi everyone,
I'm working with datasets in which I know one of them should have more signal globally specially in genes and I'm struggling with their comparison as most tools assume number of mapped reads as a factor to normalize.
For example, to illustrate what I mean, let's suppose you are blocking transcription machinery on one condition, so you should have less mapped reads per transcript in that condition with respect to a control. (To be clear, my dataset is not RNA-Seq, but another type of sequencing, and I also want to address enrichment on intergenic zones apart of genic ones ). In my true dataset, I expect (and actually see) that I have more mapped reads on the untreated sample than in the treated one (this is actually my negative control sample), and want to see enrichment over my treated sample, but peak callers like MACS2 or count-based approaches such as DESeq2 would decrease the signal (as far as I'm concerned).
I would like to do peak calling on it and some profile signal, but I think most peak callers try to scale down the dataset with highest number of mapped reads agaisnt the other one, and in the case of profile signals, I would use bamcompare or bamcoverage, but the normalization would lead again to a decrease signal in the sample with more mapped reads.
In short, would it be more properly to normalize datasets with respect to the total number of sequenced reads per condition in this case?
Hi! This is not RNA-Seq. That example was only to illustrate the situation.
You used RNAseq tag, so we assumed it was RNAseq. Be precise in your question, then you get precise answers.