Locate for pattern in protein sequence with Biopython
1
0
Entering edit mode
6.9 years ago

Hello,

I am trying to find sequences that has a tripeptide. The tripeptide can have any other amino acids following it, except 'P'. I extracted them with the following way.

from Bio import SeqIO
RGD = [] 
for record in SeqIO.parse("input.fasta", "fasta"):
    rgd_count = record.seq.count('RGD')
    if rgd_count >= 1:
        RGD.append(record) 
SeqIO.write(RGD, "RGD_Proteins.fasta", "fasta")

How can I introduce regex in this such that, RGD(N) is fine except, RGDP ?

Thanks in advance.

AP

Biopython fasta pattern matching • 1.5k views
ADD COMMENT
0
Entering edit mode
6.9 years ago

You need the re module, see here for the documentation.

ADD COMMENT

Login before adding your answer.

Traffic: 4004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6