I have 5 samples. I first run cufflinks on them, then I used cuffcompare to track the expression of the isoforms across my samples. However, there are some inconsistencies between the FPKM values reported by cufflinks and cuffcompare. Here is one example: The line below is from the tracking file from the cuffcompare output. According to this line, the 4th sample does not contain this transcript (as indicated by the '-' in the 8th column).
TCONS_00141429 XLOC_036901 FAT1|ENST00000500085 =
q1:CUFF.4123|ENST00000500085|100|2158083.464660|2108371.687017|2207795.242302|782.095564|2224 q2:CUFF.4066|ENST00000500085|100|2139612.460549|2083085.595472|2196139.325626|653.405609|2224 q3:CUFF.4238|ENST00000500085|100|1204540.730534|1174769.622819|1234311.838249|448.930688|2224 - q5:CUFF.4639|ENST00000500085|100|146171.993631|132793.120191|159550.867071|56.739054|2224
However, when I look at the transcripts.gtf file of the 4th sample, I see that this transcript is present in this sample:
chr4 Cufflinks transcript 187508981 187518385
1000 - . gene_id "CUFF.4400"; transcript_id "ENST00000500085"; FPKM "2507950.4894674225"; frac "0.403112"; conf_lo "2449139.510190"; conf_hi "2566761.468745"; cov "1014.433533"; full_read_support "yes";
I assembled my RNA-Seq reads against a reference transcriptome, and used the same reference for the cuffcompare, so there shouldn't be a issue of structure mismatch (eg. non-matching coordinates or strand for this transcript across samples).
Does anyone have an idea on what's causing this?
My cuffcompare command :
cuffcompare -o 5samples -r Homo_sapiens.GRCh37.68-chr-added.gtf -R sample1/transcripts.gtf sample2/transcripts.gtf sample3/transcripts.gtf sample4/transcripts.gtf sample5/transcripts.gtf
I am also interested in this question. It still exist now.