Question: Does is make sense if I select a subset from the count matrix to do DE analysis?
0
gravatar for vw
4 months ago by
vw20
vw20 wrote:

Hi all,

I have a question: I am just interested in a subset (a part of genes) of my raw count matrix. I tried to run DESeq2 with the subset and the complete matrix respectively. However, the DE result based on the subset is very different compared with the corresponding genes in the output based on the whole count matrix. In the subset only result, 90% of genes are down-regulated. But in the whole matrix based result, only 50% of genes which are from the subset are down-regulated.

I want to know is that correct: do DE analysis based on the subset rather than a complete count matrix?

edit: 1. I also tried limma + voom. The result is similar to the DESeq2. The subset appears 90% down-regulated. But the corresponding genes in the complete count matrix only appears 50% down-regulated.

rna-seq limma deseq2 • 202 views
ADD COMMENTlink modified 4 months ago by Bastien Hervé4.5k • written 4 months ago by vw20
4
gravatar for ATpoint
4 months ago by
ATpoint25k
Germany
ATpoint25k wrote:

DESeq2 assumes many genes being not-DE during normalization and dispersion estimation. Take the complete matrix and perform standard DE analysis. What you probably can do it to filter out lowly-expressed genes and things such as smallRNAs which have little relevance in standard RNA-seq prior to FDR correction but after running DESeq() to reduce the multiple-testing burden. Still, if you are no expert (I am not) better leave everything at default and simply see which of your genes come out as significant.

ADD COMMENTlink modified 4 months ago • written 4 months ago by ATpoint25k

Thank you so much! I forgot this point. One more question: I ran the limma + voom on my data also. It generated two similar results respectively. Is that also because of the assumption?

ADD REPLYlink written 4 months ago by vw20

I guess so.

ADD REPLYlink written 4 months ago by ATpoint25k
2
gravatar for Bastien Hervé
4 months ago by
Bastien Hervé4.5k
Limoges, CBRS, France
Bastien Hervé4.5k wrote:

You can not do a DE analysis removing parts of reads (aligned on genes) because RNAseq is a relative quantification tool. Meaning that number of reads mapped to gene A is relative to the number of reads on Gene B, C etc...

DESeq2 normalization use all your read set minus reads falling in outlier genes to normalize your data. If you remove 99% of your genes before the DE your normalization is totally biaised.

The correct way is to do your DE analysis on DEseq2, then you filter your genes of interest based on pvalue, logFC, names...

ADD COMMENTlink modified 4 months ago • written 4 months ago by Bastien Hervé4.5k

Thanks so much. I also ran the limma+voom on the subset. The result is similar. Is it the same kind of issue?

ADD REPLYlink written 4 months ago by vw20

DESeq2 (RLE and median normalization using dispersion estimation) and voom (Quantile and mediand normalization using linear regression) normalization methods are slightly differents but DE genes from both methods are very close. It is not what you can call an issue, both methods have to take the whole set in account to deliver a correct normalization.

Moreover as your 'issue' is reproductible it cannot come from the normalization method

ADD REPLYlink written 4 months ago by Bastien Hervé4.5k

After I approved all genes rather than a subset, the results from DESeq2 and Limma are very similar. Thanks for your help!

ADD REPLYlink written 4 months ago by vw20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1701 users visited in the last hour