Question

Batch-retrieving protein domain sequences from InterPro

0

Entering edit mode

7.3 years ago

mucmvsthlm • 0

Hi,

I would like to look into the relationship of certain annotated protein domains in various proteins and would appreciate input on how to batch-retrieve these sequences (only of the domain, not the full-length protein sequences they belong to) for subsequent alignments.

I found InterPro helpful and believe I might be on the right track, but am currently stuck.

On the overview page for a protein domain on InterPro I found numerous proteins that carry this domain under "Proteins matched". Under "export table TSV" I get a table with the matched proteins and also information where in these proteins the domain is situated (columns "start position" and "end position").

I would like to do two things: (i) restrict this list in a species-specific manner, i.e. only list human proteins with this particular domain (ii) retrieve the sequences of the domains (and only the domains) detailed in the table, e.g. in FASTA format for sequence alignments

Any help is highly appreciated. If you think I am on the completely wrong track, I am also happy to try alternative ones.

Best, M.

Interpro protein domain sequences • 2.8k views

ADD COMMENT • link 7.3 years ago by mucmvsthlm • 0

0

Entering edit mode

Duplicate of How To Retrieve Human Proteins Sequence Containing A Given Domain

ADD REPLY • link 7.3 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

I have seen that before but, quite frankly, I would need a bit of an explanation how to apply, in particular your post. Absolute beginner with anything beyond looking up single proteins on Uniprot, sorry.

ADD REPLY • link 7.3 years ago by mucmvsthlm • 0