I need to predict genes in several thousand files and then analyses predicted proteins.
First, I have used
getAnnoFasta.pl to have a fasta file of proteins. The
getAnnoFasta.pl give me a file with protein names like
>g1.t1 >g2.t1 >g3.t1 ..
But, I need to keep DNA contig names in my protein sequence names like
>dnacontig1.g1 >dnacontig1.g2 >dnacontig2.g1
>g1.dnacontig1 >g2.dnacontig1 >g1.dnacontig2
Don't matter the format, I just need to have the original contig name in the protein sequence name with the quickest method.
I think to used bedtools to extract my sequences in original files then translate sequences. Or, I think to make my homemade python script to extract sequences from Augustus outputs.
What is the best way? Thanks for your help.