Regular expression problem while trying to interpolate variable as pattern in Python
2
0
Entering edit mode
5.4 years ago
uxue33 ▴ 20

I'm trying to learn Python to perform some tasks in the lab.

I was trying to find a pattern in a FASTA file, interpolating pattern and fasta as string variables. This way the program doesn't find anything. On the other hand, when I try to do the same, but writing the pattern string instead of using the variable interpolation, it works and finds the pattern. Could you please help me to figure out what the problem is?

Here is my code:

   firstname = "header1"
   for record in SeqIO.parse("prueba_fasta.fasta", "fasta"):
        print ">" + record.id + "\n" + record.seq
        fa = strrecord.id)
        print fa
        fir = str(firstname)
        print fir
        matches = re.search (fir, fa)
        if matches:
            print ">" + record.id + "\n" + record.seq

Thanks in advance

Uxue

python biopython regex • 1.2k views
ADD COMMENT
0
Entering edit mode

Unrelated advice, you should/could use print(seq_record.format("fasta").strip()) to properly write (to stdout) instead of using the awkward print ">" + record.id + "\n" + record.seq

In addition, you miss something here: fa = strrecord.id), I guess you mean: fa = str( record.id)

ADD REPLY
0
Entering edit mode

I'm sorry for the mistake, this is the right version of the code:

 firstname = "header1"
       for record in SeqIO.parse("prueba_fasta.fasta", "fasta"):
            print ">" + record.id + "\n" + record.seq
            fa = strrecord.id)
            print fa
            fir = str(firstname)
            print fir
            matches = re.search (fir, fa)
            if matches:
                print(record.format("fasta").strip())

Thank you!

ADD REPLY
0
Entering edit mode

Oh right, there is some weird auto formatting going on with str( record.id).

Which pattern exactly are you trying to match? So if something occurs in the fasta identifier you want to keep it?

ADD REPLY
0
Entering edit mode
5.4 years ago
Medhat 9.0k

try to use matches.group()

re.search return object so if you use the method group() it will return string if it found matches, other wise it returns None

ADD COMMENT
0
Entering edit mode

Thank you very much for your inside! I will try to follow your recomendations in order to fix this problem.

ADD REPLY
0
Entering edit mode
5.4 years ago

You just want to print any fasta entry where the header contains the text "header1"?

you can just:

for record in SeqIO.parse("prueba_fasta.fasta", "fasta"):
    if strrecord.id).find('header1') != -1:
        print ">" + record.id + "\n" + record.seq
ADD COMMENT
0
Entering edit mode

Thank you for you answer. I have already fix the problem.

ADD REPLY

Login before adding your answer.

Traffic: 2443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6