I made a post previously (How to loop on two fasta files for hetero-dimerization analysis) but I think I could not explain my problem clearly.
Let's assume I have two fast files as below:
>seq20 NNNNNNN >seq11 NNACCGGNN >seq2 NNNNNNNN
>seq3 AAACCGGTAAA >seq9 GGGGGGGGGGG
I want to make a loop using two files - for each sequence in file1 the loop should go through all the sequences in file2.
What I want to do is to start with the first sequence in file1 - make 5mers - save the 5mer's reverse complements in a list and look for any item in the list if exists in ANY sequence in file2. Then loop goes to the second sequence in file2 and makes a NEW list (not adding up to the previous list).
Based on the dummy sequences - one of the possible 5mers from the seq11 in the file1.fa is
ACCGG which it's reverse complement becomes
(CCGGT) - if you look at the
seq3 in the file2 you could see that
CCGGT exists in that sequence then these sequence ids should be reported.
Base on my examples only seq11 hetero-dimerizes with seq3 so the desired output should look like a table as below:
file2 seq3 seq9 file1 seq20 0 0 seq11 1 0 seq2 0 0
I have written this script but it does not give any output - also I have not been able to set the rest of the script to make the output in a table format. Any help is appreciated - Many thanks.
from Bio import SeqIO with open('test1.fa', 'r') as fw, open('test2.fa', 'r') as rv: for fw_record in SeqIO.parse(fw, 'fasta'): lst =  for i in range(len(fw_record.seq)): kmer = str(fw_record.seq[i:i + 5]) if len(kmer) == 5: C_kmer = Seq(kmer).complement() lst.append(C_kmer[::-1]) print(lst) for rv_record in SeqIO.parse(rv, 'fasta'): if any(items in rv_record.seq for items in lst): print(fw_record.id) print(fw_record.seq) print(rv_record.id) print(rv_record.seq)