Entering edit mode
4.5 years ago
Jeje119
•
0
Hi everyone. I'm new to bioinformatics. I have an experiment that find differentially expressed gene with bacteria wild type and mutant 1,2 at two time point(6h, 10h). I should compare log2FC(RPKM, wild type/mutant 1,2) and I have no replicate. The problem is that I found too many degs. Threshold is logFC>1 and logFC<-1.
6h DEGs + 10h DEGs - common genes(6h, 10h) = entire DEGs (I do this between Wild type and mutant1,2 respectively.)
<< Is there something wrong with my way? I do this using R without programs.
I would appreciate your advice. Thank you.
You have no replicates. Therefore you cannot make
meaningfulstatistics for each strain and time point. Second, filtering purely by fold change is not recommended as high fold changes often come from lowly-expressed genes due to the mean-variance relationship (please use the search function and google to know what this is). In a nutshell, the highest fold-changes are most likely false-positives and not reliable. You need replicates to do meaningful analysis. Also, RPKM is a poor choice (again, use search function, has been discussed many times before). You can userlog
fromDESeq2
to normalize your data and then re-calculate fold-changes. This is still not reliable but will probably save you from the high/false enrichments of lowly-expressed genes. Please scan the web for suggestions on analysis with unreplicated data. There are dozens of threads towards this. No replicates in RNA-seq is poor experimental design and there is no computational method to extract reliable results from this.Edit: As commented below you could treat the time points as replicates and might even use the strains as replicates to capture the general effects of WT/mutant or time.
These are all very good points, but I would not be so definitive about the lack of replicate leading to "no meaningful statistics". Actually, there is enough data here to build a statistical model of the form (expression ~ strain + time), without the interaction term (strain:time). In practice though, this kind of interaction is often of interest in such experiments. So yes, more replicates would be better, but it is still possible to do something statistically meaningful with that data.
You are right, I missed the point that two strains were present. What probably can be done is to capture the general mutant effect and the time effect given these exist and are prominent enough to be captured at the given number of samples.
Thank you for your comment. I don't understand well yet, but I will keep in mind.
Thank you for the detailed comment. But There is a reason I have to use that method. I would refer to your advice. BTW, Can I do that when comparing log2 RPKM fold change manually at 2 time points?