Question: How to add name of genus and species automatically in headers of a multi-FASTA file after run BLASTn?
0
gravatar for BetterOrWorse
7 months ago by
BetterOrWorse0 wrote:

I need put name of genus and species of the best match in BLASTn with percentage of identity automatically, in headers of a multi-FASTA file. How can I do this? For example:

Before:

>xxxx|yyyy|zzzz
ATCG...
>xxxx|yxyx|zxzx
ATCG...

After:

>xxxx|yyyy|zzzz|*Genus_species*_99%
ATCG...
>xxxx|yxyx|zxzx|*Genus_species_2*_100%
ATCG...

Thanks!

ADD COMMENTlink modified 5 months ago by Biostar ♦♦ 20 • written 7 months ago by BetterOrWorse0

Did you know that the singular from of the word species is species?

ADD REPLYlink written 7 months ago by Benn6.9k

Ok. Thanks. I need put genus and species.

ADD REPLYlink written 7 months ago by BetterOrWorse0

How does your blast output looks like?

ADD REPLYlink written 7 months ago by Benn6.9k

'6 qseqid sseqid stitle pident length evalue sstart send qlen slen'

ADD REPLYlink written 7 months ago by BetterOrWorse0

What have you tried? If you give real workable examples, there is likely someone here that will do this for you.

ADD REPLYlink written 7 months ago by Benn6.9k

What is the blast command you used ?

Specie names is not that easy to catch with blast. Chose the informations you want in your blast output amongst this list. Like qseqid, pident and scomnames...

The specie name you want could be under sscinames (Subject Scientific Name(s), separated by a ';'), scomnames (Subject Common Name(s), separated by a ';') or sblastnames (Subject Blast Name(s), separated by a ';')

Then, keep the line of the best pident for each qseqid

You can now use a script language as Perl or Python (you can even do it in Unix if you want)

  • Read your output blast file
  • Create a dictionnary with qseqid as key and scomnames+pident as value
  • Read your fasta file
  • For each record of your fasta file
    • Check if the id exist as key in your directory, if yes, change the id name
    • Write the record in a new file
ADD REPLYlink modified 7 months ago • written 7 months ago by Bastien HervĂ©4.3k

Thanks for answering! But I need help to write this script.

ADD REPLYlink written 7 months ago by BetterOrWorse0

I could, but you have to help me, giving me the blast command line you used, and the attribute you want as species (sscinames, sscinames, sblastnames)

If you don't know which attribute could be the best "species" for you, re-run your blast command adding sscinames, sscinames and sblastnames to your command

'6 qseqid sseqid stitle pident length evalue sstart send qlen slen sscinames sscinames sblastnames'

And copy the 10 first line of the blast output in your post

ADD REPLYlink modified 7 months ago • written 7 months ago by Bastien HervĂ©4.3k

It all depends on your reference so if you can add that to your question it is easier to help. If you have taxonid's in your database it is "fairly easy" with python. You need to add staxid to the output and use the rankedlineage.dmp file. But like I said, we dont know your reference and where you want to get the species names from.

ADD REPLYlink written 7 months ago by gb790

could you post some specific example for input and expected output? Description is too generic.

ADD REPLYlink written 7 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1508 users visited in the last hour