Question: Help with domain sequence parsing from protein Fasta file
0
gravatar for Siclari.jimmy
6 weeks ago by
Siclari.jimmy0 wrote:

Hi all, I am very new to bioinformatics and have just started using Biopython. I am looking to see if there is a way to extract parts of a sequences from a large number of protein sequences based on the domain. I have sequences for ~500 proteins and I know the location of my domain in question but need the sequence for just that domain +about 50 residues on both sides so I can do an alignment. The solution does not need to be in Biopython. Just really need some help. Thank you.

alignment sequence • 100 views
ADD COMMENTlink modified 6 weeks ago by vnyksm10 • written 6 weeks ago by Siclari.jimmy0
0
gravatar for vnyksm
6 weeks ago by
vnyksm10
vnyksm10 wrote:

Since, you say you have domain location. Lets consider the location of your domain as 25-40

You have ~500 protein sequences.

You write a code which

  • Opens protein sequence file.

  • Stores protein sequence in string format variable 's' (get rid of any header present in sequence file).

  • slices the required part i.e s[24+50 : 39+50] where 50 being residues on both sides.

  • save the slice in a file.

Iterate the above process for each protein sequence using 'for' loop.

Now, you know the steps you can easily implement this in any language you know.

I hope this is what you needed.

ADD COMMENTlink written 6 weeks ago by vnyksm10

Yes this makes sense. Thank you. However, the domain does not occupy same location for each protein. Sometimes it lies in residues 50-100 while others it may be in 100-150 and so on.

ADD REPLYlink written 6 weeks ago by Siclari.jimmy0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1765 users visited in the last hour