Question

Classify conserved and non-conserved sequences

0

Entering edit mode

2.4 years ago

Rajavee ▴ 10

I was wondering if there is an open-source code available that allows me to group sequences from a MSA into conserved and non-conserved based on a single residue. I have an alignment and I want to isolate the sequences that do not have a glutamine at position 61 from this alignment.

conserved multiple residues Biopython sequence alignment classification Python • 884 views

ADD COMMENT • link 2.3 years ago by Rajavee ▴ 10

score 1 · Answer 1 · 2021-12-15

Biopython has a way (mentioned here) whereby you can specify a key function to sort on, see sort(self, key=None, reverse=False) under Bio.Align .

The example you want to pay attention to is under 'you could sort on the GC content of each sequence' in that section:

align1.sort(key = lambda record: GC(record.seq))

That function provided can be edited so you can sort alphabetically on a specific position, such as for the position at index #5, you'd write:

align1.sort(key = lambda record: record.seq[5])

You don't say which format your sequences are in, and so I cannot provide a full example of reading it in and accessing the individual records. There's examples of that in Biopython documentation here. I have an example here and here, with more examples to be found in the scripts in my aliignment-utilities sub-repo.
Beaware of Python indexing being zero based.

If you need a place to try out Biopython stuff, the latest release will be installed and available in sessions that come up if you go here and click launch binder.

score 1 · Answer 2 · 2022-01-03

Here's the code I wrote

#Create a list with the SeqID and Sequence records from the alignment
l= []
for i in alignment.__dict__['_records']:
    l.append([i.id,"".join(i.seq)])
df = pd.DataFrame(l)
df.columns = ['id','seq']
df

#Generating a column for analyzing the respecting aa residue

df['pos_87'] = df['sequence'].apply(lambda x:x[86])

#Checking if it is conserved across other sequences in the alignment

df['conserved'] = df['pos_87'].apply(lambda x:x=='I')