Does is make sense if I select a subset from the count matrix to do DE analysis?
2
0
Entering edit mode
4.8 years ago
vw ▴ 40

Hi all,

I have a question: I am just interested in a subset (a part of genes) of my raw count matrix. I tried to run DESeq2 with the subset and the complete matrix respectively. However, the DE result based on the subset is very different compared with the corresponding genes in the output based on the whole count matrix. In the subset only result, 90% of genes are down-regulated. But in the whole matrix based result, only 50% of genes which are from the subset are down-regulated.

I want to know is that correct: do DE analysis based on the subset rather than a complete count matrix?

edit: 1. I also tried limma + voom. The result is similar to the DESeq2. The subset appears 90% down-regulated. But the corresponding genes in the complete count matrix only appears 50% down-regulated.

DESeq2 Limma RNA-Seq • 2.9k views
ADD COMMENT
4
Entering edit mode
4.8 years ago
ATpoint 82k

DESeq2 assumes many genes being not-DE during normalization and dispersion estimation. Take the complete matrix and perform standard DE analysis. What you probably can do it to filter out lowly-expressed genes and things such as smallRNAs which have little relevance in standard RNA-seq prior to FDR correction but after running DESeq() to reduce the multiple-testing burden. Still, if you are no expert (I am not) better leave everything at default and simply see which of your genes come out as significant.

ADD COMMENT
0
Entering edit mode

Thank you so much! I forgot this point. One more question: I ran the limma + voom on my data also. It generated two similar results respectively. Is that also because of the assumption?

ADD REPLY
0
Entering edit mode

I guess so.

ADD REPLY
2
Entering edit mode
4.8 years ago

You can not do a DE analysis removing parts of reads (aligned on genes) because RNAseq is a relative quantification tool. Meaning that number of reads mapped to gene A is relative to the number of reads on Gene B, C etc...

DESeq2 normalization use all your read set minus reads falling in outlier genes to normalize your data. If you remove 99% of your genes before the DE your normalization is totally biaised.

The correct way is to do your DE analysis on DEseq2, then you filter your genes of interest based on pvalue, logFC, names...

ADD COMMENT
0
Entering edit mode

Thanks so much. I also ran the limma+voom on the subset. The result is similar. Is it the same kind of issue?

ADD REPLY
0
Entering edit mode

DESeq2 (RLE and median normalization using dispersion estimation) and voom (Quantile and mediand normalization using linear regression) normalization methods are slightly differents but DE genes from both methods are very close. It is not what you can call an issue, both methods have to take the whole set in account to deliver a correct normalization.

Moreover as your 'issue' is reproductible it cannot come from the normalization method

ADD REPLY
0
Entering edit mode

After I approved all genes rather than a subset, the results from DESeq2 and Limma are very similar. Thanks for your help!

ADD REPLY

Login before adding your answer.

Traffic: 2870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6