Hi,
I have received some RNA-seq data and asked to perform differential expression analysis, but I am not sure how to proceed due to the design of the experiment.
The authors have 18 tissue samples from 18 patients, 6 are healthy, 6 have a (non-cancer) disease, and another 6 are healthy but under a condition that is supposed to cause the disease. They ask for DE genes between the three conditions.
I can understand how to use DE software if I had 1 patient with 6 replicates per each condition. I also understand how to do it if I had 6 patients with paired normal and disease information per patient (even without replicates). But I am not sure how to proceed when I have 18 different patients, no paired, no replicates, just three groups of patients. Can I just assume that each group of 6 patients are analogous to 6 replicates for each condition and just use DESeq2 or edgeR? If not, what are the alternatives?
I would thank you for any guidance.
PS: Short story about this dataset: Both the sequencing and original DE analysis were originally performed by a hired professional company. The company reported three comparisons: conditions 1vs2, 1vs3, 2vs3. For each case, they reported hundreds of DE genes that they found by pooling the results from DESeq2, edgeR and DEGSeq. The authors tried to validate the top ranked genes sent by the company and nothing could be validated. Then they came to me. I read bad comments about DEGSeq here in this website so I tried to use DESeq2 and edgeR only, assuming three groups of 6 patients, and comparing conditions 1vs2, 1vs3, 2vs3, just as the company did. My result was only 1 DE gene under 1vs2 scenario, and nothing else (which seems to be less attractive to the authors than a long list of candidates). I also built a PCA plot trying to see if the 6 patients for each condition do actually cluster, but the result is a scattered plot that looks like all 18 patients are in the same cloud without any visible structure. Now I wonder: Is this experimental design good for DE analysis at all? If so, what is the right way to proceed?
Under the assumption that the conditions are the same within each group (same disease with same cause) then you do have 6 biological replicates per group. Which is far more important than technical replicates! This is a pretty good setup for differential expression analysis.
As said above, yup, those are replicates.
So glad you added the
PS
without which we would have an endless discussion in this thread. You should specify what conditions were used for generating your results (e.g. 1 DE gene). Rather than the design it appears that either the experiment was not well executed or RNAseq is not the right technique to discriminate between these samples.That sounds unusual and cruel.
Maybe OP needs to rephrase? Maybe a genotype associated with the phenotype being studied was seen but the phenotype itself is not manifested?
I meant a group of patients with a lifestyle that has been associated to the disease, but who are still healthy. I didn't mean a treatment that may potentially harm them. Poor choice of words from my side.
Rereading this, could you specify if disease-relevant tissue was used? Also, if the company used an old version of DESeq2 etc. results can indeed be different, but this is quite extreme. Have you asked for their code?
Did they combine (union) or intersect DESeq2, edgeR and DEGSeq?
I can confirm is disease-relevant tissue. Regarding software, their report just mentions that they used: edgeR, DEGSeq, DESeq (not DESeq2), and AudicS (by "AudicS" they mean this paper: Audic S, Claverie J M. The significance of digital gene expression profiles. Genome research, 1997, 7(10): 986-995). No software versions were specified in their report, but the company delivered their results in 2016, so they must have used older versions of all the software. No specific parameters of each software were reported either, besides common p-value (0.05) and q-value (0.1) thresholds. Due to the amount of DE genes they find, I assume they used the union of all results. I didn't try asking for their code, but I am trying now. Thanks for all your comments.