Entering edit mode
3.7 years ago
mhwaida258
•
0
Hi , I am working on SARs sequence, I made Multible Alignment for the sequences unsing Mafft and the output in clustal format. My question is how i can extract the conserved sequences from the whole file by python script I use a python script to count this conserved region by counting number of stars (*) but for but the conserved region in the above line how to tell that in python script ( if you see the * print the previous line in the same position and so on ) Any help Or suggestion? Thank you in advance
You can do exactly that. If you have an alignment that looks like this
then you could read them in blocks of three lines and use the position information from the stars. You could walk through the line with the stars and when you find groups of stars of a certain length (single positions probably don't make sense), then you can extract the sequence information corresponding to the group from the sequence lines. Be careful not to trim whitespaces off the 'conservation' line, otherwise the positions will be wrong.
Something like this:
it look like that exactly ;
it's a similar approach, though.
Thank you so much, your code is inspiring, can you tell me how to iterate over the file as read line read one line in the file to do the same ?
You can iterate over rows and columns of a multiple sequence alignment using the AlignIO module in biopython.
It should be reasonably easy to adapt this code: A: Trim sequences based on alignment in python