Question

Log2FC at 2 time point

0

Entering edit mode

6.1 years ago

Jeje119 • 0

Hi everyone. I'm new to bioinformatics. I have an experiment that find differentially expressed gene with bacteria wild type and mutant 1,2 at two time point(6h, 10h). I should compare log2FC(RPKM, wild type/mutant 1,2) and I have no replicate. The problem is that I found too many degs. Threshold is logFC>1 and logFC<-1.

6h DEGs + 10h DEGs - common genes(6h, 10h) = entire DEGs (I do this between Wild type and mutant1,2 respectively.)

<< Is there something wrong with my way? I do this using R without programs.

I would appreciate your advice. Thank you.

RNA-Seq R • 1.3k views

ADD COMMENT • link 6.1 years ago by Jeje119 • 0

1

Entering edit mode

Is there something wrong with my way

You have no replicates. Therefore you cannot make ~~meaningful~~ statistics for each strain and time point. Second, filtering purely by fold change is not recommended as high fold changes often come from lowly-expressed genes due to the mean-variance relationship (please use the search function and google to know what this is). In a nutshell, the highest fold-changes are most likely false-positives and not reliable. You need replicates to do meaningful analysis. Also, RPKM is a poor choice (again, use search function, has been discussed many times before). You can use rlog from DESeq2 to normalize your data and then re-calculate fold-changes. This is still not reliable but will probably save you from the high/false enrichments of lowly-expressed genes. Please scan the web for suggestions on analysis with unreplicated data. There are dozens of threads towards this. No replicates in RNA-seq is poor experimental design and there is no computational method to extract reliable results from this.

Edit: As commented below you could treat the time points as replicates and might even use the strains as replicates to capture the general effects of WT/mutant or time.

ADD REPLY • link 6.1 years ago by ATpoint 90k

2

Entering edit mode

These are all very good points, but I would not be so definitive about the lack of replicate leading to "no meaningful statistics". Actually, there is enough data here to build a statistical model of the form (expression ~ strain + time), without the interaction term (strain:time). In practice though, this kind of interaction is often of interest in such experiments. So yes, more replicates would be better, but it is still possible to do something statistically meaningful with that data.

ADD REPLY • link 6.1 years ago by Carlo Yague 9.0k

0

Entering edit mode

You are right, I missed the point that two strains were present. What probably can be done is to capture the general mutant effect and the time effect given these exist and are prominent enough to be captured at the given number of samples.

ADD REPLY • link 6.1 years ago by ATpoint 90k

0

Entering edit mode

Thank you for your comment. I don't understand well yet, but I will keep in mind.

ADD REPLY • link 6.1 years ago by Jeje119 • 0

0

Entering edit mode

Thank you for the detailed comment. But There is a reason I have to use that method. I would refer to your advice. BTW, Can I do that when comparing log2 RPKM fold change manually at 2 time points?

6h DEGs + 10h DEGs - common genes(6h, 10h) = entire DEGs (I do this between Wild type and mutant1,2 respectively.)

ADD REPLY • link 6.1 years ago by Jeje119 • 0