Given a protein family sequence alignment from PFAM, I want to get taxonomy information for each of the sequences. For example, for each sequence, I want to know whether it is eukaryote or prokaryote. How can I do this, in Python, Bash or other scriptable tool?
I've been inspecting the
database_filescontents, but I'm not sure how to use them. Any suggestions on what I can try?If you get this taxonomy file from the
database_filesdirectory then it seems to contain information in this formatNumber in the first column is NCBI
taxID, second column has the name and next column has the phylogeny. Not clear how to relate this back to PFAM. @Mensur may have an idea.