Question: cuffcompare: tracking file lists multiple cufflinks transcripts per transfrag
I am trying ab initio transcriptome assembly on tophat aligned RNAseq data with cufflinks. I ran cufflinks and cuffcompare without a reference genome to obtain the union of all transcripts.

In the tracking file, I often see many different "CUFF" gene ids listed for a single transfrag ("TCONS"). In addition, if the "CUFF" transcript IDs are the same, they will have different lengths in different samples. This is a single line of the output:

TCONS_00000015  XLOC_000034 -   .   q1:CUFF.33|CUFF.33.1|100|4.126633|3.879204|4.374062|14.800053|7595  q2:CUFF.39|CUFF.39.1|100|5.006652|4.725912|5.287393|18.373256|7021  q3:CUFF.38|CUFF.38.1|100|17.555915|16.559479|18.552350|58.809951|2233,CUFF.37|CUFF.37.2|100|0.695648|0.507680|0.883617|2.299228|3360    q4:CUFF.48|CUFF.48.1|100|6.568168|5.949411|7.186924|21.635958|2236,CUFF.46|CUFF.46.1|100|1.057953|0.745443|1.370464|3.535247|1455,CUFF.47|CUFF.47.1|100|0.875080|0.701251|1.048909|2.838155|3642    q5:CUFF.46|CUFF.46.1|100|19.644389|18.552479|20.736298|61.979307|2244,CUFF.45|CUFF.45.1|100|1.450014|1.201338|1.698689|4.501860|3101,CUFF.44|CUFF.44.1|100|0.725197|0.474165|0.976229|2.209039|1629 q6:CUFF.45|CUFF.45.1|100|6.836163|6.522581|7.149745|25.820700|7446

My questions are:

Why are CUFF33, CUFF39, CUFF38, CUFF48, etc all listed under the same transfrag (TCONS_00000015)?

Why does CUFF.46.1 have length 1455 in q4 but 2244 in q5? Are they different isoforms? Or is this the total bp mapping to that transcript in each sample?

If a transcript length is indicated by "-", what does this mean?

I appreciate any help!

Did you run cufflink for multiple samples and then merge assembly using cuffmerge ? If yes then this could happen. Cuffmerge will merge the assembly and give you comprehensive transcriptome. The output you have showed here says that which are the genes got merged and then one final representative transcript created which is TCONS_00000015. Your sample details could be helpful to understand whole scenario.


