Entering edit mode
5.2 years ago
bioinfo456 ▴ 150
My purpose of using DESeq2 is to obtain a set of differentially expressed genes for 3 different subtypes of a certain cancer and to build a classifier using the resulting genes from DESeq2 to classify between normal and the 3 cancer subtype samples.
Is there any alternate way to confirm that the resultant of DESeq2 is appropriate for this purpose?
Differential expression and sample-classification are two quite different problems. If you want to detect differentially expressed genes then DESeq2 is OK (dude you've been asking the same question for a fortnight, just do the experiment)
To go from differential expression all the way to a classifier, please take a look at my answer here: What is the best way to combine machine learning algorithms for feature selection such as Variable importance in Random Forest with differential expression analysis?
Although, I fear that it may be overly complex for you, i.e., if you are already struggling with the differential expression part.
Uday, in addition, please follow up with the other comments by Wouter and Sean. If you engage more, you will learn more and people will likely help you more.
I don’t feel the need to perform yet another statistical approach like stepwise regression since deseq2 already involves the concept of p value in it. Correct me if I’m wrong. Testing one gene at a time manually is out of question. You reckon there is any other ways of doing it?
With all due respect sir, I’m actually done doing what I felt is right. At the moment, I’m clarifying things. I’ll be glad if you could help. Otherwise, kindly ignore my posts. Thanks.
Okay, that is great to hear, Uday! Please stay in close touch
Renal cell cancer is majorly divided into 3 classes (ie; KICH KIRP KIRC). I used deseq2 to identify DEG for each class. Suppose x was the resulting subset of genes for KICH similarly y and z for KIRP and KIRC, I eliminated all intersections for each class (ie; s = (x u y u z) - ((x n y n z) u (x n y) u (y n z) u (z n x))). As a result of which I got approx 5k genes. Now, I have a 1000 samples out of which I’ve divided 700 for training and 300 for testing. I’ve extracted s genes out of the training set and trained a classifier model. I’ve obtained a certain significant results.
I’m trying to put together a paper. There exists a paper which makes use of deseq2 and also the same data set. Now my question is how do I go about writing this paper? Their methodology is completely different. They have used DEGs as low as 250 genes and obtained a certain result which is comparatively less than what I got. Please share your thoughts. Thanks.
Thanks for sharing your process. I think publication level queries are best kept off a Q&A site, seems a bit presumptuous to me.
Could you perhaps select a more descriptive title for your threads?
What is the classifier you want to build? And what samples will you be classifying, relative to the DESeq2 analysis--different samples or the same samples?
I have samples from 3 different cancer subtypes and their corresponding normals. Each sample consists of 20530 genes. Inputting such huge samples to a classifier is pointless coz it doesn’t achieve a reasonable accuracy for classifying. Which is why I’m extracting only those genes which are affected between normal and cancer samples using DESeq2 and then building the classifier using the same. For the last part of your question, I’m not really sure if I should be using a different set of samples for testing or the same. I’ll be glad if you could help.
Well if you are going to test your classifier using genes which you selected to be differentially expressed between groups A and B you classifier is probably going to be good at differentiating between groups A and B because you biased it severely. That's cheating :) You need an independent set.
And again: could you perhaps select a more descriptive title for your threads?
Haha alright sir. Will test using an independent set and get back to you. Suggest me a descriptive title yourself xD.
There is no need to be gender specific. Not everyone in science is male.
We have thousands of questions concerning differential expression analysis. What about something like "Using DESeq2 results for building a classifier", which is A LOT more specific about what you want answers to.
I’m so sorry if I have offended you. Title changed :).
For the record, I'm male. I'm not offended, but please avoid such biased assumptions in the future.
Yes, Wouter is male!