Question: For only a few genes, Cuffdiff report more than one FPKM values per locus. Why?
I followed similar steps mentioned in the procedure part of Trapnell et al, 2012 for an RNASeq analysis of oryza sativa datasets. The problem I face is in the Cuffdiff output, where more that one FPKM value is reported for many genes as below,

   test_id   gene_id gene    locus   sample_1    sample_2    status  value_1 value_2 log2(fold_change)   test_stat   p_value q_value significant
XLOC_005901    XLOC_005901 LOC_Os01g46440  Chr1:26422826-26425093  wild    mutant  OK  2.48243 8.09808 1.70582 0.193218    0.35875 0.999974    no
 XLOC_002129    XLOC_002129 LOC_Os01g46440  Chr1:26422826-26425093  wild    mutant  OK  26.4118 20.9721 -0.332716   -0.280221   0.63665 0.999974    no

 XLOC_003921    XLOC_003921 LOC_Os01g03040  Chr1:1159160-1164635    wild    mutant  OK  72.5969 77.3011 0.09058 0.0824954   0.8934  0.999974    no
XLOC_003922 XLOC_003922 LOC_Os01g03040  Chr1:1159160-1164635    wild    mutant  NOTEST  0.40255 0.306853    -0.391618   0   1   1   no

I noticed that some threads like this discussed similar issues. But this case is different since 1) Not all genes with alternative spliced forms are reporting multiple FPKM 2) There is a big difference between the multiple time reported FPKM.

I have downloaded gff3 file and genome sequence info. fromRGAP MSU database ftp . Tophat is of version 2.

Can anyone suggest me exactly what is going wrong here?


will be required to run again with annotation and genome index file from sources such asiGenomes? iGenomes?

It could be your processor failing so those two vales might be the expression values in the dataset; OR make sure you have unique IDs, so having the annotation could separate the two if they are identical, or duplicated, genes.

Oh, you have different ids mapping to the same locus, chromosome and nucleotide positions (start and end).

Processor failing could result in such fragmentation?Any previous experience? BTW I noticed that using similar commands with this annotation file gave me unfragmented FPKM previously, means its not a problem with annotation file @theobromma22

It is well-known that a single gene locus can transcribe more than one gene, or mRNA via alternative splicing. So, it seems this is what is happening in your case. From the literature you have several options. You can keep one of those genes, take the average of those genes to represent a single expression for that locus or separate them. Which option you choose is dependent on your overall research goal.

I forgot to mention that this can be done programmatically or manually, just remember to write how you did this bit in your M&M section. Also, using the first option it's obvious that you should choose the one that has the highest expression level, or fold-change values.

Thanks for the direction@theobroma22

