How to check for the completeness of a pathway module in KEGG given a genbank file
0
0
Entering edit mode
5.5 years ago
cleb ▴ 70

I have a similar question as posted here. I would like to check whether an organism has certain pathway modules present given a genbank file with CDS information.

The way I do it is that I create a .fasta file from the genbank file, feed it to BlastKoala which then returns me a list of KOs present in the organism. This information I can then use to check for particular modules e.g. the module M00010 which is described by

Definition  K01647 (K01681,K01682) (K00031,K00030)


which translates to

K01647 AND (K01681 OR K01682) AND (K00031 OR K00030)


So if this expression evaluates to True i.e. the required KOs are among the ones returned by BlastKoala, the module is complete and is therefore be present in the organism.

Is this the way to do it? I ask because I am unsure about how the output of BlastKoala is created. As far as I understand, the output is only based on similarity between sequences which might cause issues. Because just because sequences are similar does not mean that also the associated proteins are identical. So the risk is that I end up with a lot of false-positives because I might conclude that a cherry is the same as a strawberry because both of them are red.

Is my concern justified? Can I assume that a module is (not) presented just based on the output of BlastKoala or do I need to run something afterwards as e.g. Inparanoid?

blast kegg blastkoala orthology module • 1.9k views
ADD COMMENT
0
Entering edit mode

Blast implements a local alignment heuristics so it could easily find that sequences are similar because they share a domain. If you're concerned about non-orthologous sequences being caught in the process then you should rely on a phylogenetic analysis (you own or one provided by a database).

ADD REPLY
0
Entering edit mode

Which database would you recommend? So the way, I currently infer whether a module is present or not is error-prone?

ADD REPLY
0
Entering edit mode

Identifying an orthologous group based on a single blast alignment is definitely error-prone. Which database of orthologs to use depends first on which organism you're working with since it has to be represented in the database. For vertebrates, I would recommend working with Ensembl and its compara database.

ADD REPLY
0
Entering edit mode

Ok, thanks! Then I still wonder what this output of BlastKoala actually means; what can this information about KOs actually be used for!? I always assumed that it is more than a single blast alignment but I am not sure (that's why I asked here). Any suggestions for prokaryotes?

ADD REPLY
1
Entering edit mode

I think COG has prokaryotes. Also a quick search revealed:
Quartets-DB
ATGC
I don't work with prokaryotes, except for cloning plasmids :) so I can't advise on which one has the best features.

ADD REPLY

Login before adding your answer.

Traffic: 2087 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6