GTF to FASTA including stop codon
0
0
Entering edit mode
5.0 years ago
athey.johnc ▴ 40

I am looking for a way to extract coding sequences, including their stop codons where available, using a genome and GTF annotations. None of the tools I've come across, including tophat's gtf_to_fasta and gffread, actually parse out the full sequence with the stop codon. Gffread has a flag -J, "discard any mRNAs that either lack initial START codon or the terminal STOP codon, or have an in-frame stop codon (only print mRNAs with a fulll, valid CDS)", which causes it to write out the sequence with the stop codon, but has the disadvantage of excluding any partial sequences that may not have a stop specified. Feeding these tools a GTF file with only CDS entries produces just the amino-acid encoding part of the sequence (no stop codon), but feeding them a GTF with CDS and stop codon rows (except gffread -J) causes them to write out the coding sequence and the first nucleotide of the stop codon (which I don't understand either). Are there other tools available that could do what I need?

gtf fasta tophat gffread • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 1841 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6