when do we need to normalize for GC-content and/or length bias in RNA-Seq reads?
1
0
Entering edit mode
9 weeks ago
pilargmarch ▴ 10

Hi! This has been a conundrum for me these past months. There are some packages like cqn (conditional quantile normalization) and EDASeq that can be used to normalize for sample-specific gene GC content and/or length biases, which can alter functional enrichment analysis results.

My question is, when is it appropiate to use these normalization techniques? I have some GSEA results that change drastically after normalizing with cqn, going from 17 to 109 significant GO terms, but I'm not really sure if it's correct to do this.

gc-content normalization RNA-seq bias edaseq • 216 views
1
Entering edit mode
9 weeks ago

To be honest, I don't think anyone knows what is right and what is wrong - it is a bit of a wild west out there, everyone swinging.

I would plot the distribution of the p-values, and generate heatmaps, and PCA plots to try to understand whether the process improved the data or introduced unwanted artifacts.

Try to explain the changes from the point of view of the changes you get in genes and error distribution you get, and not in terms of the GO terms' enrichment. ( will admit that I am not sure if these corrections are applied before the DE detection runs or after).