I am looking for a way to extract coding sequences, including their stop codons where available, using a genome and GTF annotations. None of the tools I've come across, including tophat's gtf_to_fasta and gffread, actually parse out the full sequence with the stop codon. Gffread has a flag -J, "discard any mRNAs that either lack initial START codon or the terminal STOP codon, or have an in-frame stop codon (only print mRNAs with a fulll, valid CDS)", which causes it to write out the sequence with the stop codon, but has the disadvantage of excluding any partial sequences that may not have a stop specified. Feeding these tools a GTF file with only CDS entries produces just the amino-acid encoding part of the sequence (no stop codon), but feeding them a GTF with CDS and stop codon rows (except gffread -J) causes them to write out the coding sequence and the first nucleotide of the stop codon (which I don't understand either). Are there other tools available that could do what I need?
Question: GTF to FASTA including stop codon
2.4 years ago by
athey.johnc • 40
athey.johnc • 40 wrote:
ADD COMMENT • link •
Please log in to add an answer.
Powered by Biostar version 2.3.0
Traffic: 1612 users visited in the last hour