Hey, I wish to sort the genes/ proteins from my genome of interest (vibrio cholerae) into categories. One way is by using COGs. Eggnog is very nice but I wish to sort the entire genome.
I am working with a similar genome that is uploaded into the 2020 database. Is there a way to obtain/download the COG list for with ID and categories for the entire genome from eggnog or the NCBI server?
I found an example of what I would like to end up with but it is not my genome. this contains the COG id, categories and the genes. If I could find the genome for vibrio cholerae that would be incredible. https://ftp.ncbi.nih.gov/pub/wolf/COGs/COG0303/listcogs.txt
ex. 61 ||||||||--|||-|-|-|||||||||---|||||||||-||-|||||||||||||--||------ 48 H HemL COG0001 Glutamate-1-semialdehyde aminotransferase
There is another link that has sorted 678 proteins of the V.cholerae genome, but I need the entire genome. I checked some of the genes that weren't sorted in to clusters by looking at the uniprot and eggnog sequences manually and they do have COGs. https://www.research.cs.rutgers.edu/~seabee/cog/Vch.html
ex. VCA0906 [NT] COG0840 (578) Methyl-accepting chemotaxis protein
There should be a way to get the entire genome as it is already sorted. I looked through the FPT NCBI site and was able to sort out the V.cholerae (Vch) COG id along with the gene name. However, I'm not sure how to get the corresponding categories. https://ftp.ncbi.nih.gov/pub/COG/COG2020/data/
ex.
VC0626 COG0001
VC2644 COG0002
VC0067 COG0006
How can I get the COG category and id for the entire genome?
It appears that the COG database has been updated in 2020. Unfortunately I can't get the search to work at NCBI. You may want to write to the web contact and let them know.
Hey GenoMax!
Thank you for the reply. I just sent them an email asking about the database. Do you know of an alternative? I would love to actually be able to go through the entire process of blasting my genome to the database and retrieving the COG output for my genome. This is beyond my skill and I cannot figure it out. Ultimately I just need a pie chart of the gene/protein functions for the genome and my list of 300 genes candidates. That's why the genome in the 2020 database will do just fine.