Question: What does interproscan add to gene annotation?
gravatar for grayapply2009
4.9 years ago by
United States
grayapply2009210 wrote:

I am doing de novo transcriptome annotation. I have finished blasting against the nr database and imported all the results to blast2go. I'm trying to start interproscan in blast2go but it is very slow. At this rate, the interproscan is going to take me another month (blast against nr has already taken me 1 month).

So I'm wondering what else interproscan adds to my transcriptome annotation. Can I skip this step and map the GO directly?

ADD COMMENTlink modified 22 months ago by harish280 • written 4.9 years ago by grayapply2009210

It's not necessary to InterProScan, but I would strongly recommend it. You'd get results based on protein signatures, while as BLAST is based on sequence similarity. Running both would give you complementary information that would likely improve your annotation.

I can say from experience that running it won't take nearly as long as BLASTing the nr db. It might be easier to just download InterProScan and run it separately from Blast2GO.

ADD REPLYlink written 4.9 years ago by matt.sarrasin80

Thank you for your reply, Matt. I've considered running interproscan locally but unfortunately we don't have any linux workstations in our lab. All we have is a Mac workstation. I guess I'll have to spend another month on it.

ADD REPLYlink written 4.9 years ago by grayapply2009210

I've never used BLAST2GO before, but InterProScan shouldn't take long because the dbs it pulls from (PfamA, PROSITE, etc.) aren't nearly as big (in total) compared to nr db. You should also be able to choose which dbs to pull from, too, e.g. you could specify to just pull from Pfam, and that would also drastically reduce your computation time.

ADD REPLYlink written 4.9 years ago by matt.sarrasin80

It is a great idea to pick only several main database for interproscan. I noticed the default databases for interproscan in blast2go include 17 different databases. I'm not clear the difference among them and which ones I should use. Can you please give me some advice?

The databases in blast2go are: blasprodom, fprintscan, hmmpir, hmmpfam, hmmsmart, hmmtigr, profilescan, hamap, partterscan, superfamily, signalphmm, tmhmm, hmmpanther, gene3d, phobius and coils.

ADD REPLYlink written 4.9 years ago by grayapply2009210

Including all 17 would certainly increase the computation time. Each can independently add evidence to support an annotation, and some are self-evident (Superfamily determines family, signalp determines signal peptide sequences, etc.). hmmpanther can be useful if you'd like to explore the websuite. tmhmm, gene3d, phobius and coils provide information about structure and signal sequences.

I would say hmmpfam would be a great start, since it could likely be more useful in downstream analyses. I make good use of Pfam domains in my pipeline. However, it's only a start. You may want to include more evidence as you go on, depending on the kinds of questions you may be asking from your data, e.g. consolidating information from the signal peptide/secondary structure searches to infer potential localization of proteins.

ADD REPLYlink written 4.9 years ago by matt.sarrasin80

Great. I'll go try scan against the pfam database. Many thanks, Matt!

ADD REPLYlink written 4.9 years ago by grayapply2009210
gravatar for harish
22 months ago by
harish280 wrote:

Alternatively, instead of interproscan you can use eggnog-mapper. It is very fast if you use their server. You have to feed them the protein sequences and if possible select the taxon for best possible annotations.

In my case I tend to mostly use eggnog.

However, you can also use UniProtKb for blasts. That would be faster than blasting against NR.

Since Blast2Go requires XML format, if you can run the searches locally and output them as XML, you can save on the time running InterProScan within Blast2Go.

Look at their API documentation as well.

ADD COMMENTlink written 22 months ago by harish280
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1120 users visited in the last hour