Question: For only a few genes, Cuffdiff report more than one FPKM values per locus. Why?
gravatar for rice.researcher
5 weeks ago by
Korea, Republic Of
rice.researcher20 wrote:

I followed similar steps mentioned in the procedure part of Trapnell et al, 2012 for an RNASeq analysis of oryza sativa datasets. The problem I face is in the Cuffdiff output, where more that one FPKM value is reported for many genes as below,

   test_id   gene_id gene    locus   sample_1    sample_2    status  value_1 value_2 log2(fold_change)   test_stat   p_value q_value significant
XLOC_005901    XLOC_005901 LOC_Os01g46440  Chr1:26422826-26425093  wild    mutant  OK  2.48243 8.09808 1.70582 0.193218    0.35875 0.999974    no
 XLOC_002129    XLOC_002129 LOC_Os01g46440  Chr1:26422826-26425093  wild    mutant  OK  26.4118 20.9721 -0.332716   -0.280221   0.63665 0.999974    no

 XLOC_003921    XLOC_003921 LOC_Os01g03040  Chr1:1159160-1164635    wild    mutant  OK  72.5969 77.3011 0.09058 0.0824954   0.8934  0.999974    no
XLOC_003922 XLOC_003922 LOC_Os01g03040  Chr1:1159160-1164635    wild    mutant  NOTEST  0.40255 0.306853    -0.391618   0   1   1   no

I noticed that some threads like this discussed similar issues. But this case is different since 1) Not all genes with alternative spliced forms are reporting multiple FPKM 2) There is a big difference between the multiple time reported FPKM.

I have downloaded gff3 file and genome sequence info. fromRGAP MSU database ftp . Tophat is of version 2.

Can anyone suggest me exactly what is going wrong here?


will be required to run again with annotation and genome index file from sources such asiGenomes? iGenomes?

ADD COMMENTlink modified 5 weeks ago by theobroma22460 • written 5 weeks ago by rice.researcher20
gravatar for theobroma22
5 weeks ago by
theobroma22460 wrote:

It could be your processor failing so those two vales might be the expression values in the dataset; OR make sure you have unique IDs, so having the annotation could separate the two if they are identical, or duplicated, genes.

ADD COMMENTlink written 5 weeks ago by theobroma22460

Oh, you have different ids mapping to the same locus, chromosome and nucleotide positions (start and end).

ADD REPLYlink written 5 weeks ago by theobroma22460

Processor failing could result in such fragmentation?Any previous experience? BTW I noticed that using similar commands with this annotation file gave me unfragmented FPKM previously, means its not a problem with annotation file @theobromma22

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by rice.researcher20

It is well-known that a single gene locus can transcribe more than one gene, or mRNA via alternative splicing. So, it seems this is what is happening in your case. From the literature you have several options. You can keep one of those genes, take the average of those genes to represent a single expression for that locus or separate them. Which option you choose is dependent on your overall research goal.

ADD REPLYlink written 5 weeks ago by theobroma22460

I forgot to mention that this can be done programmatically or manually, just remember to write how you did this bit in your M&M section. Also, using the first option it's obvious that you should choose the one that has the highest expression level, or fold-change values.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by theobroma22460

Thanks for the direction@theobroma22

ADD REPLYlink written 4 weeks ago by rice.researcher20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1655 users visited in the last hour