Hello, I am doing RNAseq analysis for the first time. I have two samples, control and treatment of a plant variety collected at 14 days interval. I obtained few data sets of differentially expressed genes which had similar gene ID and were also the same transcripts. They were same except that, they differed in FPKM values and had different regulation, like one is up-regulated (12-fold) and other is down-regulated (13-fold). I assume that minor variation errors could be possible but such fold variation along with up-regulation and down-regulation cannot be overlooked. I also don’t think that they could be different fragments of the same transcript as they show different regulation. Can anyone suggest the reason for such data? Or is this mere an error.
Large log fold-changes are often observed in RNA-seq data that has undergone normalisation to FPKM expression levels, even as high as +90, but this is more due to the inadequacies of this normalisation strategy than anything else. For one, this normalisation is not performed across samples and is therefore not adequately adjusting for different library sizes.
If you can obtain raw counts, my advice is to get those, and then work from those using a 'better' normalisation strategy.
Thanks Kevin, I tried another strategy, but results are not much varying than previous. I suppose removing such ambiguous data would be better.
Which was the other strategy? Have you checked for sample outliers via something like a PCA bi-plot?
I don't think these could be outliers because several other genes have similar up and down regulation values. I have
GENE_MODEL_ID RefSeq_ID control read count treated read count
TCONS_00047959 XLOC_028640 XM_003535153.3 135.743 0.00891439
TCONS_00047960 XLOC_028640 XM_003535153.3 0.00996383 70.1898
I expect TCONS ID differed because it is generated for each different transcript in each experiment
Did you find a way out of this?