Question: What does interproscan add to gene annotation?
gravatar for grayapply2009
2.0 years ago by
United States
grayapply2009150 wrote:

I am doing de novo transcriptome annotation. I have finished blasting against the nr database and imported all the results to blast2go. I'm trying to start interproscan in blast2go but it is very slow. At this rate, the interproscan is going to take me another month (blast against nr has already taken me 1 month).

So I'm wondering what else interproscan adds to my transcriptome annotation. Can I skip this step and map the GO directly?

ADD COMMENTlink written 2.0 years ago by grayapply2009150

It's not necessary to InterProScan, but I would strongly recommend it. You'd get results based on protein signatures, while as BLAST is based on sequence similarity. Running both would give you complementary information that would likely improve your annotation.

I can say from experience that running it won't take nearly as long as BLASTing the nr db. It might be easier to just download InterProScan and run it separately from Blast2GO.

ADD REPLYlink written 2.0 years ago by matt.sarrasin60

Thank you for your reply, Matt. I've considered running interproscan locally but unfortunately we don't have any linux workstations in our lab. All we have is a Mac workstation. I guess I'll have to spend another month on it.

ADD REPLYlink written 2.0 years ago by grayapply2009150

I've never used BLAST2GO before, but InterProScan shouldn't take long because the dbs it pulls from (PfamA, PROSITE, etc.) aren't nearly as big (in total) compared to nr db. You should also be able to choose which dbs to pull from, too, e.g. you could specify to just pull from Pfam, and that would also drastically reduce your computation time.

ADD REPLYlink written 2.0 years ago by matt.sarrasin60

It is a great idea to pick only several main database for interproscan. I noticed the default databases for interproscan in blast2go include 17 different databases. I'm not clear the difference among them and which ones I should use. Can you please give me some advice?

The databases in blast2go are: blasprodom, fprintscan, hmmpir, hmmpfam, hmmsmart, hmmtigr, profilescan, hamap, partterscan, superfamily, signalphmm, tmhmm, hmmpanther, gene3d, phobius and coils.

ADD REPLYlink written 2.0 years ago by grayapply2009150

Including all 17 would certainly increase the computation time. Each can independently add evidence to support an annotation, and some are self-evident (Superfamily determines family, signalp determines signal peptide sequences, etc.). hmmpanther can be useful if you'd like to explore the websuite. tmhmm, gene3d, phobius and coils provide information about structure and signal sequences.

I would say hmmpfam would be a great start, since it could likely be more useful in downstream analyses. I make good use of Pfam domains in my pipeline. However, it's only a start. You may want to include more evidence as you go on, depending on the kinds of questions you may be asking from your data, e.g. consolidating information from the signal peptide/secondary structure searches to infer potential localization of proteins.

ADD REPLYlink written 2.0 years ago by matt.sarrasin60

Great. I'll go try scan against the pfam database. Many thanks, Matt!

ADD REPLYlink written 2.0 years ago by grayapply2009150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 459 users visited in the last hour