Question: How To Extract Cds And Protein Sequences From Cufflinks Transcripts.Gtf File?
gravatar for Rahul Sharma
6.9 years ago by
Rahul Sharma600
Rahul Sharma600 wrote:


I am using Tophat2 and Cufflinks for gene/transcript identification. I used reference genome for mapping RNA-Seq reads and later I used Cufflinks to generate the transcripts.gtf file. I generated the transcript sequences using following command:

gffread -w transcripts.fa -g Masked_for_Tophat.fa transcripts.gtf

Since in the Cufflinks transcripts.gtf file, we do not have CDS information so it is not possible to extract the CDS sequences using it. I got one tool TransDecoder which can generate CDS from the input transcript. Does anyone know how to generate CDS/Protein sequences from Cufflinks transcripts.gtf file?

In another analysis I want to train Augustus using this mapping information. For training augustus, I need to have CDS/Protein sequences. Although I used Augustus for gene prediction using intron/exon hints as mentioned here. I would appreciate your suggestions on this.


cds cufflinks rna-seq • 11k views
ADD COMMENTlink modified 5.5 years ago by wanziyi8960 • written 6.9 years ago by Rahul Sharma600

Hi @R@hul, on the TransDecoder page, there is a separate section that deals with your exact situation, i.e. converting a cufflinks.gtf file into GFF3, extracting the transcripts, finding the longest ORFs (reported both as CDS and PEP sequences) and then generating a new GFF3 which reports these coding regions in the context of the genome.

Here is a link to the relevant section: Starting from a genome-based transcript structure GTF file

ADD REPLYlink written 6.8 years ago by Vivek Krishnakumar390

Hi, I would like to know if you have figured out about annotating a transcripts.gtf file generated by cufflinks.

ADD REPLYlink written 5.6 years ago by geek_y11k
gravatar for wrf
5.5 years ago by
wrf50 wrote:

I'm not sure there is a one-step solution to that. The PASA pipeline includes a script to extract transcripts from cufflinks.gtf, called ""

CDS/peptides can be generated from the transcripts as suggested above with TransDecoder.

ADD COMMENTlink written 5.5 years ago by wrf50

Thanks, this answer helped me a lot even though my problems was slightly different. Just as a note - the output from this script includes both the transcript_id (TCONS) and the gene_id (XLOC) together in the fasta header from the cufflinks .gtf file.

ADD REPLYlink written 3.9 years ago by Dan Powell0
gravatar for wanziyi89
5.5 years ago by
Singapore, Temasek Life Sciences Laboratory
wanziyi8960 wrote:


Can TransDecoder annotate 5" UTR and 3'UTR as well?


ADD COMMENTlink written 5.5 years ago by wanziyi8960
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1414 users visited in the last hour