I have a list of 10 000 accessions id from blast in a txt file (XP_002184977.1 GBG35237.1) and I would like to have the taxonomy associated (in a tab file that correpond to my accessions line per line). So I'm using Entrez direct like this :
esearch -db protein -query "XP_002184977.1" \ | elink -target taxonomy \ | efetch -format native -mode xml \ | xtract -pattern Taxon -block "*/Taxon" -unless Rank -equals "no rank" -tab "\n" -element Rank,TaxId,ScientificName
As a result I have different lines for each taxonomy level
superkingdom 2759 Eukaryota clade 2698737 Sar clade 33634 Stramenopiles clade 2696291 Ochrophyta phylum 2836 Bacillariophyta class 33849 Bacillariophyceae clade 33850 Bacillariophycidae order 38748 Naviculales family 38749 Phaeodactylaceae genus 2849 Phaeodactylum species 2850 Phaeodactylum tricornutum
However the output is not always consistant according to the hit accession (sometimes there is no line phylum which is the line that interest me). So if I had a "grep phylum" I can't concatenaate my hit accession with the results...
Any ideas on how to deal with that? It would be great if i could have the whole taxonomy with "N.A" in the column phylum if the information is not present in the database. I have tried some other tools like taxonomisr, unsuccessfully ...