Help with domain sequence parsing from protein Fasta file
1
0
Entering edit mode
3.9 years ago

Hi all, I am very new to bioinformatics and have just started using Biopython. I am looking to see if there is a way to extract parts of a sequences from a large number of protein sequences based on the domain. I have sequences for ~500 proteins and I know the location of my domain in question but need the sequence for just that domain +about 50 residues on both sides so I can do an alignment. The solution does not need to be in Biopython. Just really need some help. Thank you.

sequence alignment • 906 views
ADD COMMENT
0
Entering edit mode
3.9 years ago
vinaykusuma ▴ 10

Since, you say you have domain location. Lets consider the location of your domain as 25-40

You have ~500 protein sequences.

You write a code which

  • Opens protein sequence file.

  • Stores protein sequence in string format variable 's' (get rid of any header present in sequence file).

  • slices the required part i.e s[24+50 : 39+50] where 50 being residues on both sides.

  • save the slice in a file.

Iterate the above process for each protein sequence using 'for' loop.

Now, you know the steps you can easily implement this in any language you know.

I hope this is what you needed.

ADD COMMENT
0
Entering edit mode

Yes this makes sense. Thank you. However, the domain does not occupy same location for each protein. Sometimes it lies in residues 50-100 while others it may be in 100-150 and so on.

ADD REPLY

Login before adding your answer.

Traffic: 1977 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6