Hello, I am relatively new to RNA-Seq data analysis so I apologize in advance if this is a novice question. I have been reading several forums on the subject and I think I have the general idea, but I would be happy to get advice from the experts.
I have RNA-Seq data for multiple samples of a specific cancer. Previously from Gene Expression analysis (on Illumina platforms) the lab ran before I arrived, we have learned in that this cancer can be divided into 3 subgroups (1,2,3). I would like to find the list of differentially expressed genes between RNA-Seq samples pertaining to subgroups 2 and 3, using DESeq.
I am somewhat confused here as to what I should consider as my 'biological replicates', or whether I should not consider replicates at all for my analysis.
For each of the subgroups, I only have one sample per patient. So the scenario looks like this:
Subgroup 2: Sample A, Sample B, Sample C, Sample D
Subgroup 3: Sample X, Sample Y, Sample Z, Sample F, Sample W
In this case, should I consider that Samples A,B,C,D are all biological replicates of Subgroup 2, and Samples X,Y,Z,F,W as biological replicates of Subgroup3? Each of the samples in the subgroup pertains to a different patient, and there is no control sample for each patient (and in this case, no pairing between my samples).
If I am to consider this scenario, any advice on the DESeq parameters? Right now I am just running the defaults as appears in the vignette.
The alternative is to consider that I don't have any replicates and run the two groups. So the DESeq for calculating dispersions would be like this:
cds = estimateDispersions( cds, method="blind", sharingMode="fit-only", fitType="local" )
I have tried both scenarios. In the case where I don't consider any replicates at all, I have ended up with a much larger number of differentially expressed genes at p<0.1 (1380 as opposed to 92).
Any advice would be appreciated! Thank you in advance! Deena