Question

Weird edgeR normalization results for one tissue

1

Entering edit mode

11 days ago

imetrev ▴ 10

Hello,

I have a question regarding edgeR normalization,

I have a three tissue RNAseq dataset, with males and females sequenced for both sexes,

After normalization (TMM for read-depth and "composition"+FPKM for gene length) I got similar median of expression for both sexes for two tissues but not the third one (see below),

Gene expression distributions, one row one tissue

Because the third tissue possess a lot of genes that are uniquely expressed in one sex, I tried to perform the normalization utilizing only the genes that are expressed in both sexes of all tissues as control genes (sort of 'housekeeping genes' - see vertical line above for the threshold between expressed and non expressed genes),

However, even this way, the difference persists (see below),

enter image description here

Can someone help me to understand what is going on ?

Thank you, Vincent

edgeR Normalization RNAseq • 1.2k views

ADD COMMENT • link updated 5 days ago by ATpoint 89k • written 11 days ago by imetrev ▴ 10

0

Entering edit mode

New Nature paper: Makes and females are physiologically not the same. Joke aside, which tissue is number 3? A better diagnostic plot would be an MA-plot by the way and not using FPKM since this is not compatible with edgeR (or any) serious testing framework.

ADD REPLY • link 11 days ago by ATpoint 89k

0

Entering edit mode

How do raw counts look? How many genes are 'detected' in each sample, perhaps using something like 10 counts threshold? Since tissue 3 has more sex-specific expression patterns, perhaps the overall expression is too different to be normalized like you expect. The assumption is that the majority of genes do not change expression. Can you confirm that assumption is true for this tissue?

ADD REPLY • link 9 days ago by rfran010 ★ 1.7k

0

Entering edit mode

Thanks for the replies,

Here are MAplots for the three tissues (TMM normalization, no FPKM, only autosomal genes included),

enter image description here

Third tissue is gonads, so yes I do expect (large) differences ... Yet, because of the litterature, I must admit that I was surprised to find this difference of median log2FC between sexes for autosomal genes,

There is between 8000 and 10000 of genes with more than 10 counts for each tissue,

"The assumption is that the majority of genes do not change expression. Can you confirm that assumption is true for this tissue?" From the MAplot I would say maybe not,

ADD REPLY • link 5 days ago by imetrev ▴ 10

0

Entering edit mode

Third tissue is gonads,

Arguably the most sex-specific tissue that exists, of course this looks different.

There is between 8000 and 10000 of genes with more than 10 counts for each tissue,

This sounds normal. You can prefilter using filterByExpr(). Usually one finds about (crude ballpark estimate) 15k genes using this function in most cases I've seen.

To me, everything looks perfectly fine, so normalization looks decent and the plots at the bottom, indicating, large changes, which is expected given the biology.

ADD REPLY • link 5 days ago by ATpoint 89k