COG assignment using COGNITOR (COGsoft)
2
1
Entering edit mode
7.6 years ago
dago ★ 2.7k

I am assigning COG categories to proteins of newly seqeunced bacterial genomes.

To do that I am using COGNITOR in the COGsoft.

I used the protein sequences and the other files available on the NCBI ftp server, for creating the blastDB and creating the "seqeunce universe" requested by the program.

Following the isntructions I got stuck in the last step where the program COGcognitor is called. This is what reported in the manual:

To run COGNITOR you need a COG domain assignment file (as described in 2.10.). If your file is called COGs.csv, the following command will be used:

\$ COGcognitor -i=./BLASTcogn -t=COGs.csv -q=GenQuery.p2o.csv -o=GenQuery.COG.csv # COGNITOR results in GenQuery.COG.csv


I cannot understand the nature of the file COGs.csv. Has it the format <protein_ID>,<COG_category>?

If yes, can use the "whog" file in the COG section of the NCBI ftp server to create the COGs.csv?

Honestly, I find the instruction of the program quite hard to interpret.

COG Annotation genome • 11k views
0
Entering edit mode

I also don't understand. 'COG.p2o.csv' and GenQuery.p2o.csv,how to creat? where the protocal??514033532@qq.com

0
Entering edit mode

Hi Dago,

Did you solve your problem. If yes could you share some idea how to cluster newly sequence bacterial genome into cluster.

0
Entering edit mode
7.6 years ago
Siva ★ 1.8k

First of all, the COG FTP files you are using are from an older version. There is a newer version published very recently. The file you need to run COGcognitor is in the above linked FTP site under 'data' directory and it is called 'cog2003-2014.csv'.

It has the following format:

<prot-id>,<genome-id>,<source-prot-id>,<source-prot-length>,<source-prot-start>,<source-prot-end>,<cluster-id>,
0
Entering edit mode

Thanks for your answer. In the case you suggest, which COG.p2o.csv should I use then?

0
Entering edit mode

I agree that the instructions in the Readme file is indeed difficult to follow. COG.p2o.csv is the query file you create as described in the "3.2.1. Preparation of the sequence universe" in the Readme file (Readme.2012.04.txt). You create a file in the format <prot-id>,<genome-id>. I usually use the protein GI and the protein taxID of my query sequences to create that file. You also need to run COGmakehash to create "BLASTcogn" as described in the section 3.21. Then you will have everything to run COGcognitor.

Two things I don't understand. First, the 3.2.1 section mentions about 'COG.p2o.csv' file but there are no instructions how to create that file. If I have to guess, it is the list of COGIDs and their names (first and third columns) in the file "ftp://ftp.ncbi.nih.gov/pub/COG/COG2014/data/cognames2003-2014.tab". The second thing is the newer version of COG has 8 columns in the 'cog2003-2014.csv'. The last column is new and it is not clear what it is. I don't know if this will break the COGsoft program which I don't think is updated. You might want to confirm with the authors (cogs at ncbi.nlm.nih.gov).

0
Entering edit mode
7.4 years ago
rotoli • 0

I consider Readme file really difficult to follow. Besides, I am a beginner and I want to assign COGs to my annotated genome, only one, and not from NCBI. I made a <protein-id>,<genome-id> file, but which is the first step to begin with (according to README file)? Besides, the commands don't work... any help?

0
Entering edit mode

Hi, Rotoli.

Can you figure out how to use the COGsoft? I have a similar problem what you had. I want to assign COGs to my annotated bacterial genome. I finished the blast steps and made the tmp.p2o.csv file through concatenation of the GenQuery.p2o.csv and COG.p2o.csv files. Ather that I made the hash.csv file with the use of the COGmakehash. But after that, I get stuck at the COGreadblast step. This step produces empty files from my input files. What did I do wrong? Can you share your experiences?