Question: is it correct to pre-select a set of genes to perform differential expression analysis using deseq2
4
gravatar for silviajserrano
2.0 years ago by
silviajserrano40 wrote:

is it correct to pre-select a set of genes to perform differential expression analysis using deseq2 For example, my first comparison would be tumoral vs non-tumoral tissue, and the set of genes I get (over 10.000 DE genes) I would use to compare for example, patients that recur vs patients that did not recur, using just that set of genes differentially express in the last comparison (tumoral vs non-tumoral)

rna-seq • 678 views
ADD COMMENTlink modified 2.0 years ago by ATpoint26k • written 2.0 years ago by silviajserrano40
2
gravatar for ATpoint
2.0 years ago by
ATpoint26k
Germany
ATpoint26k wrote:

There was a similar quuestion recently, asking if removal of a large set of genes in order to save computional time is valid. Without being a statistician, purely based on my (naive) understanding of DESeq2, I assumed that any removal of (a large number of ) genes, or in your case subsetting to certain genes might violate the assumptions of DESeq2. In your case, the question is if the median ratio of the chosen genes will still capture the true size relationships between the datasets (e.g. sequencing depth), as this is the basis for the normalization process. In other words, do the chosen genes allow to scale the different samples appropriately to each other. Why don't you choose the patients of interest based on the first analysis, assign factor levels to them, "recurr" / "non-recurr", rerun DESeq2 on the full set of genes and then check if your target genes come out as DE?

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by ATpoint26k
1
gravatar for Sean Davis
2.0 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Pre-selecting genes for differential expression based on differential expression is generally going to be challenging to justify if there is a nested design (samples overlap between test set #1 and test set #2). If these are two different datasets, then perhaps this can be more easily justified.

From a biological point-of-view, it is quite possible and believable that genes that are associated with recurrence are not differentially expressed between tumor and normal, so it is also quite possible that including only those "first" differentially expressed genes in a second comparison will lead to false negatives.

ADD COMMENTlink written 2.0 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1077 users visited in the last hour