Question

finding overlapping motifs to increase length of motif

0

Entering edit mode

17 months ago

andrea • 0

Hi All,

I am able to find matching motif in my sequence, and I would like to now find overlapping motifs. Basically, after matching my motif, I want to find the 6 amino acids after it. This is the code below that I used to find the motif:

import Bio
import regex

from Bio import SeqIO
input_file = 'sequences.fasta'
fasta_sequences = SeqIO.parse(open(input_file),'fasta')
for fasta in fasta_sequences:
     name, sequence = fasta.id, str(fasta.seq)
     result=regex.finditer(r"[YFWLIMVA]..[LMALVN]..[AGSTCD].[LAIVNFYMW]",sequence)
 for x in result:
    print(name, x.start(), x.end(), x.group())

The above code works perfectly becasue it give me the sequence id, positions and the motif. The output is below:

P1  33 41 VTLLPAADL

Right now, what I want to do is to also get the 6 amino acids after matching this motif, such that I get an output like the one below.

P1 33 47 VTLLPAADLLMAIID

The code that I have tried to get the 6 amino acids after my match is below.

import Bio
import regex

from Bio import SeqIO
input_file = 'sequences.fasta'
fasta_sequences = SeqIO.parse(open(input_file),'fasta')
for fasta in fasta_sequences:
     name, sequence = fasta.id, str(fasta.seq)
     result=regex.finditer(r"[YFWLIMVA]..[LMALVN]..[AGSTCD].[LAIVNFYMW]",sequence)
 for x in result:
    print(name, x.start(), x.end() + 6, x.group())

This the output it gives me:

#It does not extend my motif by 6 amino acids, after getting the match.     
P1  33 47 VTLLPAADL 

#My desired output is this which include the overlapping LMAIID motifs
P1   33 47 VTLLPAADLLMAIID

I also tried the code below, but it returns an error.

import Bio
import regex

from Bio import SeqIO
input_file = 'sequences.fasta'
fasta_sequences = SeqIO.parse(open(input_file),'fasta')
for fasta in fasta_sequences:
     name, sequence = fasta.id, str(fasta.seq)
     result=regex.finditer(r"[YFWLIMVA]..[LMALVN]..[AGSTCD].[LAIVNFYMW]",sequence)
 for x in result:
    print(name, x.start(), x.end() + 6, x.group() +6)

motif fasta biopython aminoacid • 832 views

ADD COMMENT • link 17 months ago by andrea • 0

0

Entering edit mode

You have your regex result x, but not the whole fasta record. With the extended numbers, you need to slice the fasta record.

ADD REPLY • link 17 months ago by michael.ante ★ 3.8k

0

Entering edit mode

Thank you Michael, how do I do that? I am still new in this, could you perhaps provide me with an example code on how I must do it

ADD REPLY • link 17 months ago by andrea • 0

score 3 · Accepted Answer · 2022-11-17

3

Entering edit mode

17 months ago

iraun 6.2k

You have to go back to the original sequence, and fetch the subsequence using the coordinates, like this:

print (sequence[x.start():x.end()+6])