I've a perl script which executes a blastn analysis. I want to parse the output of blast results attending to the kingdom of the best hit. For example, if I'm analysing arabidopsis data, I want to be able to extract those sequences with a plantae hit. And the same procedure for the other
kingdoms (plantae, fungi, animalia, protista, bacteria, archaea). This is the blastn command line I'm using:
blastn -query sequences.fa -db nt -out blast_out -evalue 0.01 -outfmt '6 qseqid qlen sseqid slen length pident evalue sscinames stitle staxids sskingdoms' -max_target_seqs 1 -num_threads 25
What I've been doing until now is get hit species names, go to NCBI, extract complete taxID of those species, and in the case of "animalia", grep for "
Any suggestion to do this automatically in perl or any script/api available for doing this task?
Maybe I'm not explaining correctly. Let's say I'm working with fungi X specie. I blast the sequences of that fungi against nt, and for example I get this hit for a given sequence:
seq_4459 408 gi|517322946|emb|HF679031.1| 2984819 411 85.40 8e-115 Fusarium fujikuroi IMI 58289 Fusarium fujikuroi IMI 58289 draft genome, chromosome FFUJ_chr09 1279085 Eukaryota
The hit is a fusarium hit, and as I'm working with fungi, I want to keep this line. How can I extract all possible fungi hits, but not other kindoms?