TMM normalization and sex effect
1
0
Entering edit mode
7 weeks ago
francesca3 ▴ 50

Hello to everyone. I have a doubt about TMM normalization. I'm comparing male versus female samples. Can TMM normalization be affected by the presence of more reads on chromosome x in the female group?? Thanks Francesca

edgeR diffbind rnaseq tmm • 273 views
0
Entering edit mode

I mean probably unless the difference is extreme and it is excluded by TMM... the real question is: Is it significant? My guess would be not really but I'm just guessing. An easy way to look into this would be tabulate the counts across the X-chromosome/Y-chromosome.

0
Entering edit mode

We exclude the chromosome y in the analysis. Observing the reads (not normalized) of the chromosome x, we are able to distinguish males and females. My question is if the normalization factor calculated during the TMM normalization can be influenced by the different number of reads present on chromosome x in males and females?

0
Entering edit mode
7 weeks ago
Rory Stark ★ 1.2k

It sounds like contrast is based on the sex, such that all the sample in one group are one sex and all the samples in the other group are the other sex. In that case, you would expect all of the X-chromosome signal to be different, so comparing X-chromosomal signals does not seem very interesting. As some of the sequencing reads will be dedicated to the X-chromosome, this will impact the number of reads available for the non-X signals, and this needs to be accounted for. I think the easiest thing would be to exclude the X-chromosome reads and let the library size adjustment take care of that, but TMM should actually be able to deal with the entire set of reads. It might be interesting to try it both ways and see how different the differential analysis is, both on the X-chromosome and elsewhere.

I also notice that you tagged both DiffBind and RNA-seq in this question, and the answer may be different depending on if you are trying to normalize mRNA or ChIP/ATAC data. The recommended method for normalizing data in DiffBind is to use counts in background bins, while for RNA-seq you would normalize only using the reads that are counted as mapping uniquely to transcripts.