Question: Filter blastn results by kingdom
0
gravatar for user230613
2.9 years ago by
user230613260
Europe
user230613260 wrote:

I've a perl script which executes a blastn analysis. I want to parse the output of blast results attending to the kingdom of the best hit. For example, if I'm analysing arabidopsis data, I want to be able to extract those sequences with a plantae hit. And the same procedure for the other kingdoms (plantae, fungi, animalia, protista, bacteria, archaea). This is the blastn command line I'm using:

blastn -query sequences.fa -db nt -out blast_out -evalue 0.01 -outfmt '6 qseqid qlen sseqid slen length pident evalue sscinames stitle staxids sskingdoms' -max_target_seqs 1 -num_threads 25

What I've been doing until now is get hit species names, go to NCBI, extract complete taxID of those species, and in the case of "animalia", grep for "33208" number.

Any suggestion to do this automatically in perl or any script/api available for doing this task?


EDIT:


Maybe I'm not explaining correctly. Let's say I'm working with fungi X specie. I blast the sequences of that fungi against nt, and for example I get this hit for a given sequence:

seq_4459    408    gi|517322946|emb|HF679031.1|    2984819    411    85.40    8e-115    Fusarium fujikuroi IMI 58289    Fusarium fujikuroi IMI 58289 draft genome, chromosome FFUJ_chr09    1279085    Eukaryota

The hit is a fusarium hit, and as I'm working with fungi, I want to keep this line. How can I extract all possible fungi hits, but not other kindoms?

blastn bioperl perl • 1.5k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by user230613260

sort both files on the keys and join http://www.computerhope.com/unix/ujoin.htm

ADD REPLYlink written 2.9 years ago by Pierre Lindenbaum115k

Thank you Pierre. I guess that I've not explained the issue correctly. Check the edit if you want, thanks.

ADD REPLYlink written 2.9 years ago by user230613260

Filter your script output to send each kingdom data to its own file. You can do this from within your script, see this.

ADD REPLYlink written 2.9 years ago by Jean-Karim Heriche17k

Thank you Jean-Karim. I guess that I've not explained the issue correctly. Check the edit if you want, thanks.

ADD REPLYlink written 2.9 years ago by user230613260
1

If I understand correctly, the problem is that you don't get the relevant taxonomic division in the output so for each taxon, you need to find out where in the taxonomic tree it is. Maybe this older post could help you.

ADD REPLYlink written 2.9 years ago by Jean-Karim Heriche17k

Yes, that post has the solution to the problem. Thank you a lot Jean. Another question, do you know the taxonomy ID for protista kingdom? I'm not able to find it in NCBI. Thanks again.

ADD REPLYlink written 2.9 years ago by user230613260

I believe that the kingdom protista is obsolete so you won't find it in a modern taxonomy tree.

ADD REPLYlink written 2.9 years ago by Jean-Karim Heriche17k

OK. I found the solution in another post, sorry, I could not find it before: Does A Script Exist That Given A Species Name Will Give You Kingdom Which The Input Belongs To?

ADD REPLYlink written 2.9 years ago by user230613260

You will have to grab all the TaxID's from fungi and then filter on them. You should be able to get those from Taxdump file from NCBI (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/). Take a look at the Taxdump Readme file.

ADD REPLYlink written 2.9 years ago by genomax59k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1471 users visited in the last hour