Hi all, I am very new to bioinformatics and have just started using Biopython. I am looking to see if there is a way to extract parts of a sequences from a large number of protein sequences based on the domain. I have sequences for ~500 proteins and I know the location of my domain in question but need the sequence for just that domain +about 50 residues on both sides so I can do an alignment. The solution does not need to be in Biopython. Just really need some help. Thank you.
Since, you say you have domain location. Lets consider the location of your domain as 25-40
You have ~500 protein sequences.
You write a code which
Opens protein sequence file.
Stores protein sequence in string format variable 's' (get rid of any header present in sequence file).
slices the required part i.e s[24+50 : 39+50] where 50 being residues on both sides.
save the slice in a file.
Iterate the above process for each protein sequence using 'for' loop.
Now, you know the steps you can easily implement this in any language you know.
I hope this is what you needed.