Question: Genome Wide Cog Assignment
10
gravatar for Neo
8.8 years ago by
Neo200
australia
Neo200 wrote:

Hi guys, I am working with a bacterial genome (454) at the moment and would like to assign COG functional classification for all the 5000 or so genes. I have used the 'rast' web server to annotate this genome. I have written to NCBI about the cognitor program but they tell me that it is no longer supported and that there is no way to do COG searches in batch mode. It would be fantastic if you any of you could share your experiences on this. Thanks !

genome annotation • 17k views
ADD COMMENTlink modified 2.2 years ago by Biostar ♦♦ 20 • written 8.8 years ago by Neo200

I think you could use online CD-search tool and select search COGs to get COGs annotation of your genome

ADD REPLYlink modified 7 months ago by RamRS21k • written 6.6 years ago by Shuixia100120
11
gravatar for Neilfws
8.8 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

So far as I know, there is no easy web-based tool for COG assignment. As Michael suggested, you could fetch PSSMs for the COG database from the FTP site and use rpsblast, or you could download the fasta format file myva from the COG FTP site and format it for search yourself.

My impression is that NCBI lacks either the resources or the inclination to support COGs: it barely features in their A-Z resources list and is not regularly updated. You may want to look at KEGG instead. Annotation of protein domains using e.g. HMMER or InterPro seems to be a more popular approach than functional assignment to the entire protein sequence, these days.

ADD COMMENTlink modified 7 months ago by RamRS21k • written 8.8 years ago by Neilfws48k
2

Well, KEGG provides an automatic annotation server. And the KO links to COG IDs, if you really want COG. However, my feeling is that KEGG provides better options for functional annotation than COG, which never really felt as though it was widely adopted or well-supported.

ADD REPLYlink modified 7 months ago by RamRS21k • written 8.8 years ago by Neilfws48k

Agreed, NCBI is favoring their CD-database over the alternatives. But COG seems to have been updated on May 8th, so it should be fine. But how would you apply KEGG to this kind of problem?

ADD REPLYlink written 8.8 years ago by Michael Schubert6.9k

I have been frantically searching on the web for tools that let us do this. found this paper describing 'Augur proteomics pipeline' that claims to doit but is currently offline. Thanks for all your input guys.

ADD REPLYlink written 8.8 years ago by Neo200

Excellent suggestions Neil. Your points about COG is exactly right.

ADD REPLYlink written 8.8 years ago by Khader Shameer18k

Sorry, can I ask some questions? I still don't know how to start my search. I have downloaded BLAST+ and cdd database, and read the user manual. But I just can't figure out where should I type those commend? after installed BLAST+, I just see a group of blast program.... It make me feel difficult to follow or understand. Yes, I don't know about program language, but I have to figure out how to use the blast function to classify my identified result. Because I don't have time and patient to use website search COG one by one... please, is there somebody can help me? please tell me how to start m

ADD REPLYlink written 8.5 years ago by Carl0

seems to be a web-based tool for assignment of COGs would be very helpful. Anyone know of anyone working on that? :D

ADD REPLYlink written 8.5 years ago by Treylathe940
9
gravatar for Michael Schubert
8.8 years ago by
Cambridge, UK
Michael Schubert6.9k wrote:

As far as I know, there are two possible ways to solve this:

  1. Use an entirely automatic gene annotation pipeline. I know Augustus+ for eucaryotes, I'm sure someone can point you in the right direction for bacteria.

  2. Do gene prediction and classification seperately. If I understand you correctly, you already have the predicted genes and just want to classify them automatically.

One possibility to do this would be rpsblast (with which I'm also currently working- if there are alternatives please let me know).

[...] that there is no way to do COG searches in batch mode

This is definitely not correct. For example, use rpsblast with the COG database:

  • download and install NCBI BLAST+
  • download the COG database as .smp files from NCBI (cdd.tar.gz here, see README for details)
  • create a COG-only rpsblast database (cf. this tutorial, ignore the BioPython part)
  • BLAST your predicted genes against your newly created database with the rpstblastn executable and interpret the PSSM matches (easiest way: highest COG match with e-value < e_max is a specific hit; note that frame)
ADD COMMENTlink modified 7 months ago by RamRS21k • written 8.8 years ago by Michael Schubert6.9k
0
gravatar for Carl
8.5 years ago by
Carl0
Carl0 wrote:

Sorry, can I ask some questions? I still don't know how to start my search. I have downloaded BLAST+ and cdd database, and read the user manual. But I just can't figure out where should I type those commend? after installed BLAST+, I just see a group of blast program.... It make me feel difficult to follow or understand. Yes, I don't know about program language, but I have to figure out how to use the blast function to classify my identified result. Because I don't have time and patient to use website search COG one by one... please, is there somebody can help me? please tell me how to start my search... I had try to read guide on NCBI, but for a non-English speaking country student, there is to much words to read and make me feel impatient. sorry, I had to say : Compare with other database, NCBI is very~very~very not easy to understood...T^T.

ADD COMMENTlink written 8.5 years ago by Carl0
2

please, ask this new question in another thread http://biostar.stackexchange.com/questions/ask

ADD REPLYlink written 8.5 years ago by Pierre Lindenbaum119k

All of the problem is the BLAST+ is the new version! they change the command we should use.

ADD REPLYlink written 8.5 years ago by Carl0
0
gravatar for Nep
8.3 years ago by
Nep0
Nep0 wrote:

There is a program that does automated COG assignment - look into MEGAN- it's a metagenomics software but basically you could just run a BLAST on all your stuff and input it. It automatically extracts the COGs and gives you a chart of them for your reads (or genes if you want to assemble them first). Good Luck! (P.S. You might have to clean a few up depending on how well annotated the hits are that you get back from BLAST, but it's definitely a lot quicker to do it this way).

ADD COMMENTlink written 8.3 years ago by Nep0

sorry but I tried megan and found that it already needs a blast result and that it would just extract the cog hit from it (if present). it definitely does not seem to be the program if we start with sequence data and want to assign cogs :(

ADD REPLYlink written 8.2 years ago by Neo200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1392 users visited in the last hour