How to add name of genus and species automatically in headers of a multi-FASTA file after run BLASTn?
0
0
Entering edit mode
3.6 years ago

I need put name of genus and species of the best match in BLASTn with percentage of identity automatically, in headers of a multi-FASTA file. How can I do this? For example:

Before:

>xxxx|yyyy|zzzz
ATCG...
>xxxx|yxyx|zxzx
ATCG...


After:

>xxxx|yyyy|zzzz|*Genus_species*_99%
ATCG...
>xxxx|yxyx|zxzx|*Genus_species_2*_100%
ATCG...


Thanks!

BLASTN RENAME HEADER MULTI-FASTA SCRIPT • 1.4k views
0
Entering edit mode

Did you know that the singular from of the word species is species?

0
Entering edit mode

Ok. Thanks. I need put genus and species.

0
Entering edit mode

How does your blast output looks like?

0
Entering edit mode

'6 qseqid sseqid stitle pident length evalue sstart send qlen slen'

0
Entering edit mode

What have you tried? If you give real workable examples, there is likely someone here that will do this for you.

0
Entering edit mode

What is the blast command you used ?

Specie names is not that easy to catch with blast. Chose the informations you want in your blast output amongst this list. Like qseqid, pident and scomnames...

The specie name you want could be under sscinames (Subject Scientific Name(s), separated by a ';'), scomnames (Subject Common Name(s), separated by a ';') or sblastnames (Subject Blast Name(s), separated by a ';')

Then, keep the line of the best pident for each qseqid

You can now use a script language as Perl or Python (you can even do it in Unix if you want)

• Create a dictionnary with qseqid as key and scomnames+pident as value
• For each record of your fasta file
• Check if the id exist as key in your directory, if yes, change the id name
• Write the record in a new file
0
Entering edit mode

Thanks for answering! But I need help to write this script.

0
Entering edit mode

I could, but you have to help me, giving me the blast command line you used, and the attribute you want as species (sscinames, sscinames, sblastnames)

If you don't know which attribute could be the best "species" for you, re-run your blast command adding sscinames, sscinames and sblastnames to your command

'6 qseqid sseqid stitle pident length evalue sstart send qlen slen sscinames sscinames sblastnames'

And copy the 10 first line of the blast output in your post

0
Entering edit mode

It all depends on your reference so if you can add that to your question it is easier to help. If you have taxonid's in your database it is "fairly easy" with python. You need to add staxid to the output and use the rankedlineage.dmp file. But like I said, we dont know your reference and where you want to get the species names from.

0
Entering edit mode

could you post some specific example for input and expected output? Description is too generic.