I am using Tophat2 and Cufflinks for gene/transcript identification. I used reference genome for mapping RNA-Seq reads and later I used Cufflinks to generate the transcripts.gtf file. I generated the transcript sequences using following command:
gffread -w transcripts.fa -g Masked_for_Tophat.fa transcripts.gtf
Since in the Cufflinks transcripts.gtf file, we do not have CDS information so it is not possible to extract the CDS sequences using it. I got one tool TransDecoder which can generate CDS from the input transcript. Does anyone know how to generate CDS/Protein sequences from Cufflinks transcripts.gtf file?
In another analysis I want to train Augustus using this mapping information. For training augustus, I need to have CDS/Protein sequences. Although I used Augustus for gene prediction using intron/exon hints as mentioned here. I would appreciate your suggestions on this.