Batch-retrieving protein domain sequences from InterPro
0
0
Entering edit mode
7.3 years ago
mucmvsthlm • 0

Hi,

I would like to look into the relationship of certain annotated protein domains in various proteins and would appreciate input on how to batch-retrieve these sequences (only of the domain, not the full-length protein sequences they belong to) for subsequent alignments.

I found InterPro helpful and believe I might be on the right track, but am currently stuck.

On the overview page for a protein domain on InterPro I found numerous proteins that carry this domain under "Proteins matched". Under "export table TSV" I get a table with the matched proteins and also information where in these proteins the domain is situated (columns "start position" and "end position").

I would like to do two things: (i) restrict this list in a species-specific manner, i.e. only list human proteins with this particular domain (ii) retrieve the sequences of the domains (and only the domains) detailed in the table, e.g. in FASTA format for sequence alignments

Any help is highly appreciated. If you think I am on the completely wrong track, I am also happy to try alternative ones.

Best, M.

Interpro protein domain sequences • 2.8k views
ADD COMMENT
0
Entering edit mode

I have seen that before but, quite frankly, I would need a bit of an explanation how to apply, in particular your post. Absolute beginner with anything beyond looking up single proteins on Uniprot, sorry.

ADD REPLY

Login before adding your answer.

Traffic: 2553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6