Get integrated coordinates in InterProScan output
0
1
Entering edit mode
4.4 years ago
svet.sidorov ▴ 110

Hello,

When I scan a protein sequence (for example, human TFAP2A) with locally installed InterProScan,

./interproscan.sh -i tfap2a.fasta -f tsv --iprlookup

I get an output file with the matches of domains from different databases (in this case, it's PRINTS and MobiDBLite). Some of these matches correspond to InterPro domains: for example, there are three PRINTS matches corresponding to IPR013854 which is an AP-2 C-terminal domain. However, I cannot see the integrated coordinates of the IPR013854 in my sequence, only coordinates for the PRINTS fragments. On the other hand, I can perfectrly see the integrated coordiates of the IPR013854 domain in TFAP2A when I scan it using the InterProScan web service.

Could I somehow make InterProScan output integrated coordinates of IPR* domains when I run it locally on my server?

Thank you!

InterProScan integrated coordinates • 1.4k views
ADD COMMENT
0
Entering edit mode

from the top of my head: add option -iprlookup ?

ADD REPLY
0
Entering edit mode

It's already there:) It gives only IPR* IDs, without coordinates.

ADD REPLY
1
Entering edit mode

right, my bad.

OK, I checked some of my output files and it does give coordinates even for the ipr-IDs.

Can you perhaps post a small abstract of your output indicating the issue you report here? Keep in mind that in the 4th column in the tsv output it will never report IPR or such but always the original DB match (so CDD or PRINTS, ... )

ADD REPLY
2
Entering edit mode

Yes, sure:

sp|P05549|AP2A_HUMAN    bba57ac412d6ca5ca97fe2fe5fbfca66    437 MobiDBLite  mobidb-lite consensus disorder prediction   49  67  -   T   26-11-2019
sp|P05549|AP2A_HUMAN    bba57ac412d6ca5ca97fe2fe5fbfca66    437 PRINTS  PR01748 Transcription factor AP-2 signature 249 263 1.0E-27 T   26-11-2019  IPR013854   Transcription factor AP-2, C-terminal
sp|P05549|AP2A_HUMAN    bba57ac412d6ca5ca97fe2fe5fbfca66    437 PRINTS  PR01748 Transcription factor AP-2 signature 264 279 1.0E-27 T   26-11-2019  IPR013854   Transcription factor AP-2, C-terminal
sp|P05549|AP2A_HUMAN    bba57ac412d6ca5ca97fe2fe5fbfca66    437 PRINTS  PR01748 Transcription factor AP-2 signature 280 294 1.0E-27 T   26-11-2019  IPR013854   Transcription factor AP-2, C-terminal
sp|P05549|AP2A_HUMAN    bba57ac412d6ca5ca97fe2fe5fbfca66    437 PANTHER PTHR10812       3   437 0.0 T   26-11-2019  IPR004979   Transcription factor AP-2
sp|P05549|AP2A_HUMAN    bba57ac412d6ca5ca97fe2fe5fbfca66    437 PANTHER PTHR10812:SF8       3   437 0.0 T   26-11-2019
sp|P05549|AP2A_HUMAN    bba57ac412d6ca5ca97fe2fe5fbfca66    437 Pfam    PF03299 Transcription factor AP-2   211 405 1.7E-85 T   26-11-2019  IPR013854   Transcription factor AP-2, C-terminal

So, the columns 7 and 8 should be the coordinates of a match from a specific database, not the integrated coordinates of a corresponding IPR* domain. (Although, the coordinates of the Pfam match are the same as the InterPro integrated coordinates: InterPro scan results).

ADD REPLY
1
Entering edit mode

match from a specific database, not the integrated coordinates of a corresponding IPR* domain

yes, indeed.

Although, the coordinates of the Pfam match are the same as the InterPro integrated coordinates

correct. "ipr-domains" are never larger than the largest representative from the member databases. They can (and often will) be as long as one of the memberDBs

Keep in mind that interpro only integrates/groups domains into a shared "ID', so it has no "domains" itself. (== you can not search with a given iprdomainID, only with the domains from the memberDBs)

ADD REPLY
0
Entering edit mode

Thank you @lieven.sterck!

ADD REPLY

Login before adding your answer.

Traffic: 1912 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6