Question: How To Extract Cds And Protein Sequences From Cufflinks Transcripts.Gtf File?
6.9 years ago by
I am using Tophat2 and Cufflinks for gene/transcript identification. I used reference genome for mapping RNA-Seq reads and later I used Cufflinks to generate the transcripts.gtf file. I generated the transcript sequences using following command:

gffread -w transcripts.fa -g Masked_for_Tophat.fa transcripts.gtf

Since in the Cufflinks transcripts.gtf file, we do not have CDS information so it is not possible to extract the CDS sequences using it. I got one tool TransDecoder which can generate CDS from the input transcript. Does anyone know how to generate CDS/Protein sequences from Cufflinks transcripts.gtf file?

In another analysis I want to train Augustus using this mapping information. For training augustus, I need to have CDS/Protein sequences. Although I used Augustus for gene prediction using intron/exon hints as mentioned here. I would appreciate your suggestions on this.


Hi @R@hul, on the TransDecoder page, there is a separate section that deals with your exact situation, i.e. converting a cufflinks.gtf file into GFF3, extracting the transcripts, finding the longest ORFs (reported both as CDS and PEP sequences) and then generating a new GFF3 which reports these coding regions in the context of the genome.

Here is a link to the relevant section: Starting from a genome-based transcript structure GTF file

Hi, I would like to know if you have figured out about annotating a transcripts.gtf file generated by cufflinks.

5.5 years ago by
I'm not sure there is a one-step solution to that. The PASA pipeline includes a script to extract transcripts from cufflinks.gtf, called ""

CDS/peptides can be generated from the transcripts as suggested above with TransDecoder.

Thanks, this answer helped me a lot even though my problems was slightly different. Just as a note - the output from this script includes both the transcript_id (TCONS) and the gene_id (XLOC) together in the fasta header from the cufflinks .gtf file.

5.5 years ago by
Singapore, Temasek Life Sciences Laboratory
Can TransDecoder annotate 5" UTR and 3'UTR as well?


