Question: COG assignment using COGNITOR (COGsoft)
1
gravatar for dago
4.9 years ago by
dago2.5k
Germany
dago2.5k wrote:

I am assigning COG categories to proteins of newly seqeunced bacterial genomes.

To do that I am using COGNITOR in the COGsoft.

I used the protein sequences and the other files available on the NCBI ftp server, for creating the blastDB and creating the "seqeunce universe" requested by the program.

Following the isntructions I got stuck in the last step where the program COGcognitor is called. This is what reported in the manual:

"To run COGNITOR you need a COG domain assignment file (as described in 2.10.). If your file is called COGs.csv, the following command will be used:

$ COGcognitor -i=./BLASTcogn -t=COGs.csv -q=GenQuery.p2o.csv -o=GenQuery.COG.csv # COGNITOR results in GenQuery.COG.csv"

I cannot understand the nature of the file COGs.csv. Has it the format <protein_ID>,<COG_category>?

If yes, can use the "whog" file in the COG section of the NCBI ftp server to create the COGs.csv?

Honestly, I find the instruction of the program quite hard to interpret.

 

cog annotation genome • 8.0k views
ADD COMMENTlink modified 4.7 years ago by rotoli0 • written 4.9 years ago by dago2.5k

Hi Dago,

Did you solve your problem. If yes could you share some idea how to cluster newly sequence bacterial genome into cluster.

ADD REPLYlink written 4.5 years ago by HG1.1k
0
gravatar for Siva
4.9 years ago by
Siva1.7k
United States
Siva1.7k wrote:

First of all, the COG FTP files you are using are from an older version. There is a newer version published very recently. The file you need to run COGcognitor is in the above linked FTP site under 'data' directory and it is called 'cog2003-2014.csv'.

It has the following format:

<prot-id>,<genome-id>,<source-prot-id>,<source-prot-length>,<source-prot-start>,<source-prot-end>,<cluster-id>,
ADD COMMENTlink written 4.9 years ago by Siva1.7k

Thanks for your answer. In the case you suggest, which COG.p2o.csv should I use then?

ADD REPLYlink written 4.9 years ago by dago2.5k

I agree that the instructions in the Readme file is indeed difficult to follow. COG.p2o.csv is the query file you create as described in the "3.2.1. Preparation of the sequence universe" in the Readme file (Readme.2012.04.txt). You create a file in the format "<prot-id>,<genome-id>". I usually use the protein GI and the protein taxID of my query sequences to create that file. You also need to run COGmakehash to create "BLASTcogn" as described in the section 3.21. Then you will have everything to run COGcognitor.

Two things I don't understand. First, the 3.2.1 section mentions about 'COG.p2o.csv' file but there are no instructions how to create that file. If I have to guess, it is the list of COGIDs and their names (first and third columns) in the file "ftp://ftp.ncbi.nih.gov/pub/COG/COG2014/data/cognames2003-2014.tab".  The second thing is the newer version of COG has 8 columns in the 'cog2003-2014.csv'. The last column is new and it is not clear what it is. I don't know if this will break the COGsoft program which I don't think is updated. You might want to confirm with the authors ( cogs at ncbi.nlm.nih.gov ).

ADD REPLYlink written 4.9 years ago by Siva1.7k
0
gravatar for 514033532
4.8 years ago by
5140335320
China
5140335320 wrote:

I also don't understand. 'COG.p2o.csv' and GenQuery.p2o.csv,how to creat? where  the protocal??514033532@qq.com

ADD COMMENTlink written 4.8 years ago by 5140335320
0
gravatar for rotoli
4.7 years ago by
rotoli0
European Union
rotoli0 wrote:

I consider Readme file really difficult to follow. Besides, I am a beginner and I want to assign COGs to my annonated genome, only one, and not from NCBI. I made a <protein-id>,<genome-id> file, but which is the firt step to begin with (according to README file)? Besides, the commands don't work... any help?

ADD COMMENTlink written 4.7 years ago by rotoli0

Hi, Rotoli.

Can you figure out how to use the COGsoft? I have a similar problem what you had. I want to assign COGs to my annotated bacterial genome. I finished the blast steps and made the tmp.p2o.csv file through concatenation of the GenQuery.p2o.csv and COG.p2o.csv files. Ather that I made the hash.csv file with the use of the COGmakehash. But after that, I get stuck at the COGreadblast step. This step produces empty files from my input files. What did I do wrong? Can you share your experiences? 

ADD REPLYlink written 3.8 years ago by h.botond40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1678 users visited in the last hour