Weird edgeR normalization results for one tissue
0
1
Entering edit mode
11 days ago
imetrev ▴ 10

Hello,

I have a question regarding edgeR normalization,

I have a three tissue RNAseq dataset, with males and females sequenced for both sexes,

After normalization (TMM for read-depth and "composition"+FPKM for gene length) I got similar median of expression for both sexes for two tissues but not the third one (see below),

Gene expression distributions, one row one tissue

Because the third tissue possess a lot of genes that are uniquely expressed in one sex, I tried to perform the normalization utilizing only the genes that are expressed in both sexes of all tissues as control genes (sort of 'housekeeping genes' - see vertical line above for the threshold between expressed and non expressed genes),

However, even this way, the difference persists (see below),

enter image description here

Can someone help me to understand what is going on ?

Thank you, Vincent

edgeR Normalization RNAseq • 1.2k views
ADD COMMENT
0
Entering edit mode

New Nature paper: Makes and females are physiologically not the same. Joke aside, which tissue is number 3? A better diagnostic plot would be an MA-plot by the way and not using FPKM since this is not compatible with edgeR (or any) serious testing framework.

ADD REPLY
0
Entering edit mode

How do raw counts look? How many genes are 'detected' in each sample, perhaps using something like 10 counts threshold? Since tissue 3 has more sex-specific expression patterns, perhaps the overall expression is too different to be normalized like you expect. The assumption is that the majority of genes do not change expression. Can you confirm that assumption is true for this tissue?

ADD REPLY
0
Entering edit mode

Thanks for the replies,

Here are MAplots for the three tissues (TMM normalization, no FPKM, only autosomal genes included),

enter image description here

Third tissue is gonads, so yes I do expect (large) differences ... Yet, because of the litterature, I must admit that I was surprised to find this difference of median log2FC between sexes for autosomal genes,

There is between 8000 and 10000 of genes with more than 10 counts for each tissue,

"The assumption is that the majority of genes do not change expression. Can you confirm that assumption is true for this tissue?" From the MAplot I would say maybe not,

ADD REPLY
0
Entering edit mode

Third tissue is gonads,

Arguably the most sex-specific tissue that exists, of course this looks different.

There is between 8000 and 10000 of genes with more than 10 counts for each tissue,

This sounds normal. You can prefilter using filterByExpr(). Usually one finds about (crude ballpark estimate) 15k genes using this function in most cases I've seen.

To me, everything looks perfectly fine, so normalization looks decent and the plots at the bottom, indicating, large changes, which is expected given the biology.

ADD REPLY

Login before adding your answer.

Traffic: 7352 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6