making COG database, smp criteria
2
1
Entering edit mode
8.1 years ago
sanrrone ▴ 40

Hi guys, I’m trying to set up a COG database using the 2014 updated data (from ftp://ftp.ncbi.nih.gov/pub/COG/COG2014/). As I understand how it works, one needs to run the proteins from the COGs against a database of the same proteins using PSI BLAST. Then I would obtain as many .smp files as queries, with which I could then run makeprofiledb to create an RPS-BLAST database. So far I’ve done the following: Downloaded the protein file prot2003-2014.fa.gz Created a blastdb with the extracted file (makeblastdb) Split the prot2003-2014.fa.gz multifasta file in single fasta’s so I can use each one as query (in PSI-Blast) and get individual .smp files The resulting .smp files have hits against one or more sequences in the DB. Now, I could limit the hits by decreasing the e-value so as to get .smp files per query with only one hit. My question is, does it really matter whether .smp files return hits against only one protein or more? Put it the other way, what’s an acceptable e-value for PSI-BLAST? I know there’s no one-size-fits-all e-value but customarily I would use 10E-5 for finding orthologs. In this case, if I only want to keep one hit per .smp, I need to ramp up the e-value to 10E-100.

regards

blast alignment sequence gene • 2.7k views
ADD COMMENT
0
Entering edit mode
8.1 years ago
sanrrone ▴ 40

well, reading the forum I found a possible solution, in this paper http://bmcevolbiol.biomedcentral.com/articles/10.1186/1471-2148-5-41, they used a evalue of 10^-5 to make a cog database, but also is true that not exist a universal evalue, depend of your database, so, to get the universal evalue its necessary sample the database with random querys and get how is the distribution and take the correct value.

regards

ADD COMMENT
0
Entering edit mode
7.5 years ago

Hi,

I am trying to do cog annotation, I have got stuck at .smp generation. I created single query files from prot2003-2014.fa.gz file. Could you please tell how to generate rps-blast database step by step.

And also how can one get COG family number, IDs as well. Do I have to match the rps-blast results to other file in ftp://ftp.ncbi.nih.gov/pub/COG/COG2014/.

thanks

ADD COMMENT
0
Entering edit mode

You just need to read the instruction of the COGsoft (COGnitor) https://sourceforge.net/projects/cogtriangles/ PS: do not add question to old post. Open a new post instead.

ADD REPLY

Login before adding your answer.

Traffic: 2773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6