I followed similar steps mentioned in the procedure part of Trapnell et al, 2012 for an RNASeq analysis of oryza sativa datasets. The problem I face is in the Cuffdiff output, where more that one FPKM value is reported for many genes as below,
test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant XLOC_005901 XLOC_005901 LOC_Os01g46440 Chr1:26422826-26425093 wild mutant OK 2.48243 8.09808 1.70582 0.193218 0.35875 0.999974 no XLOC_002129 XLOC_002129 LOC_Os01g46440 Chr1:26422826-26425093 wild mutant OK 26.4118 20.9721 -0.332716 -0.280221 0.63665 0.999974 no XLOC_003921 XLOC_003921 LOC_Os01g03040 Chr1:1159160-1164635 wild mutant OK 72.5969 77.3011 0.09058 0.0824954 0.8934 0.999974 no XLOC_003922 XLOC_003922 LOC_Os01g03040 Chr1:1159160-1164635 wild mutant NOTEST 0.40255 0.306853 -0.391618 0 1 1 no
I noticed that some threads like this discussed similar issues. But this case is different since 1) Not all genes with alternative spliced forms are reporting multiple FPKM 2) There is a big difference between the multiple time reported FPKM.
I have downloaded gff3 file and genome sequence info. fromRGAP MSU database ftp . Tophat is of version 2.
Can anyone suggest me exactly what is going wrong here?
will be required to run again with annotation and genome index file from sources such asiGenomes? iGenomes?