I've been using the Tuxedo suite to calculate differential gene expression for Drosophila RNAseq samples I have been analyzing. I've noticed that Cufflinks/Cuffdiff seems to give incorrect statistics between ON and OFF genes.
My pipeline for analyzing differential gene expression was: 1) Align reads with TopHat, 2) Cufflinks to estimate gene/isoform abundance, 3) Merge cufflinks files with Cuffmerge, and 4) Perform differential gene expression test with Cuffdiff (earlier version of cufflinks package).
For both cufflinks and cuffdiff I used the options -frag-bias-correct and -multi-read-correct to improve the accuracy of the abundance of the expressed genes.
Here is an example of how this pipeline seems to miscalculating the expression between ON and OFF genes:
Why am I seeing this discrepancy in the differential gene expression between ON and OFF genes. Especially for the genes that get a status called "HIDATA".
How can I resolve this discrepancy and improve the calculating of differential gene expression for these ON and OFF genes.