GTF to FASTA including stop codon
Entering edit mode
5.9 years ago
athey.johnc ▴ 40

I am looking for a way to extract coding sequences, including their stop codons where available, using a genome and GTF annotations. None of the tools I've come across, including tophat's gtf_to_fasta and gffread, actually parse out the full sequence with the stop codon. Gffread has a flag -J, "discard any mRNAs that either lack initial START codon or the terminal STOP codon, or have an in-frame stop codon (only print mRNAs with a fulll, valid CDS)", which causes it to write out the sequence with the stop codon, but has the disadvantage of excluding any partial sequences that may not have a stop specified. Feeding these tools a GTF file with only CDS entries produces just the amino-acid encoding part of the sequence (no stop codon), but feeding them a GTF with CDS and stop codon rows (except gffread -J) causes them to write out the coding sequence and the first nucleotide of the stop codon (which I don't understand either). Are there other tools available that could do what I need?

gtf fasta tophat gffread • 2.1k views

