Question: How to assign massive amount of protein to pfam using R
0
gravatar for zhangchi2015012290
4 months ago by
zhangchi20150122900 wrote:

Hi! I'm doing a bacteria pan-genome research, which involves thousands of genomes. I'm trying to assign every protein in all the genome to pfam. I know there are tools like NCBI cdd database, but I don't know how to do scripted search, since you can only search 4000 proteins at one time on the website. I wonder if there is a R package to do this job, or any other convenient methods?

pfam pan-genome R • 137 views
ADD COMMENTlink modified 4 months ago by Jean-Karim Heriche23k • written 4 months ago by zhangchi20150122900
1
gravatar for Jean-Karim Heriche
4 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

There's the pfam_scan.pl perl script for this. Here is a quick tutorial.

ADD COMMENTlink written 4 months ago by Jean-Karim Heriche23k

Thank you for your answer. This method seems too slow for me, maybe I should cluster my proteins first.

ADD REPLYlink written 4 months ago by zhangchi20150122900

Have you considered parallelizing? If you cluster the sequences then you could derive a profile HMM for each cluster and use something like HHsearch to compare these to the Pfam profiles.

ADD REPLYlink written 4 months ago by Jean-Karim Heriche23k

I just realized that I can download pfam and COG infomation from IMG database, which have been assigned to proteins already. I haven't tried your method yet, but It sounds plausible. I'll accept your answer.

ADD REPLYlink written 4 months ago by zhangchi20150122900
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1620 users visited in the last hour