Question: Batch-retrieving protein domain sequences from InterPro
gravatar for mucmvsthlm
3.7 years ago by
mucmvsthlm0 wrote:


I would like to look into the relationship of certain annotated protein domains in various proteins and would appreciate input on how to batch-retrieve these sequences (only of the domain, not the full-length protein sequences they belong to) for subsequent alignments.

I found InterPro helpful and believe I might be on the right track, but am currently stuck.

On the overview page for a protein domain on InterPro I found numerous proteins that carry this domain under "Proteins matched". Under "export table TSV" I get a table with the matched proteins and also information where in these proteins the domain is situated (columns "start position" and "end position").

I would like to do two things: (i) restrict this list in a species-specific manner, i.e. only list human proteins with this particular domain (ii) retrieve the sequences of the domains (and only the domains) detailed in the table, e.g. in FASTA format for sequence alignments

Any help is highly appreciated. If you think I am on the completely wrong track, I am also happy to try alternative ones.

Best, M.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by mucmvsthlm0

Duplicate of How To Retrieve Human Proteins Sequence Containing A Given Domain

ADD REPLYlink written 3.7 years ago by Pierre Lindenbaum131k

I have seen that before but, quite frankly, I would need a bit of an explanation how to apply, in particular your post. Absolute beginner with anything beyond looking up single proteins on Uniprot, sorry.

ADD REPLYlink written 3.7 years ago by mucmvsthlm0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1951 users visited in the last hour