I am new to using bioinformatic tools and I was hoping this community could help clear something up for me. I need to generate a gtf file. My data are a set of complete HA genes for influenza B viruses in fasta format. From reading through this forum and other internet searches, my understanding is that fasta cannot be transformed to gtf or gff because these last two provide location information which fasta does not (unless available in the header). I saw a post which suggested mapping sequences to a reference sequence and generating the gtf from that mapping.
My question is, would it make sense to use the consensus sequence selected for alignment as this reference sequence? The alignment algorithm is a simple majority rule. Because all my data are complete genes, I would think the mapping of locations (start codon and stop codon) would be the same no matter which sequence in my data set I choose to use. Is this not so?
I appreciate any help you can provide me!