After many years of using Perl I am starting to learn Python. As an example I want to perform regular expression matching in sequences extracted from a FASTA file. The FASTA files being parsed with Biopython's SeqIO module. In the following code 're.findall' fails to find 'iupac' in 'seq_record.seq', however if the latter is replaced with a string, e.g. 'TTAATT', a match is found. Error = "TypeError: expected string or buffer".
# biopython from Bio import SeqIO # regex library import re # file with FASTA sequence infile = "fasta.fa" # pattern to search for iupac = "taat" # look through each FASTA sequence in the file for seq_record in SeqIO.parse(infile, "fasta"): print "Sequence ID: ", seq_record.id, "; ", len(seq_record), "bp" print seq_record.seq # scan for IUPAC; re.I makes search case-insensitive matches = re.findall( iupac, seq_record.seq, re.I) if matches: print "Matches = ", len(matches)
Thanks for any guidance!