I would like to look into the relationship of certain annotated protein domains in various proteins and would appreciate input on how to batch-retrieve these sequences (only of the domain, not the full-length protein sequences they belong to) for subsequent alignments.
I found InterPro helpful and believe I might be on the right track, but am currently stuck.
On the overview page for a protein domain on InterPro I found numerous proteins that carry this domain under "Proteins matched". Under "export table TSV" I get a table with the matched proteins and also information where in these proteins the domain is situated (columns "start position" and "end position").
I would like to do two things: (i) restrict this list in a species-specific manner, i.e. only list human proteins with this particular domain (ii) retrieve the sequences of the domains (and only the domains) detailed in the table, e.g. in FASTA format for sequence alignments
Any help is highly appreciated. If you think I am on the completely wrong track, I am also happy to try alternative ones.