I need to search for a pattern within a multiple sequence alignment allowing any number of - or . symbols to be including within the characters of the patter. For example, I want to search for the string pattern RAGTLQYD (see bold characters) within the alignment below, and to do so I have to ignore any number of - and . symbols that appear between the characters of the pattern. Also, I want to print out the position in the alignement where the first character of the pattern is located. So far I got to this:
from re import search, IGNORECASE import pandas as pd
df1 = pd.read_csv(multiple_sequence_alignment_file, delimiter = "\t") matchseq = pd.read_csv(file_of_patterns) # all the patterns I want to search for seq in matchseq: if search(seq, df1, IGNORECASE): print(seq, df1)
This works only for the patterns that do not have any - or . symbols in between. I couldn't find in the re.search manual how to specify to ignore some characters in the search. Any guidance would be really helpful.