Hi there, I currently have the code stated below. The output I currently have it set to write out is not what I'm looking for. However, this has been the only way I have successfully run the script. I would like to remove the sequences from a fasta file that have a hamming distance of 50 or more. I tried to use an additional for loop and write the output file with SeqIO but the way 'seqs' is currently written, it was not defined. I created an empty list and tried to append the results.
from Bio import SeqIO REFERENCE_SEQ = 'ATGC.....' def hamming_distance(seq): result = sum(b1 != b2 for b1, b2 in zip(REFERENCE_SEQ, seq)) return result if __name__ == '__main__': filename = open('test.fasta', 'r') seqs = [record for record in SeqIO.parse(filename, 'fasta')] counted = list(map(lambda x: (x, hamming_distance(x)), seqs)) results = list(filter(lambda x: x < 50, counted)) print(*results, sep='\n', file=open("output.fasta", "a"))
The current output is as follows:
(SeqRecord(seq=Seq('TCGCCCTTGCTCACCATGCTAGGTTCTCCTTCTTAATCTAGAATGCAATATACT...CGG',SingleLetterAlphabet()), id='rev_comp_consensus', name='rev_comp_consensus', description='rev_comp_consensus', dbxrefs=), 7) (SeqRecord(seq=Seq('TCGCCCTTGCTCACCATGCTAGGTTCTCCTTCTTAATCTAGAATGTAATATACT...CCG', SingleLetterAlphabet()), id='1', name='1', description='1', dbxrefs=), 7) (SeqRecord(seq=Seq('TCGCCCTTGCTCACCATGCTAGGTTCTCCTTCTTAATCTAGAATGCAATATACT...GGT', SingleLetterAlphabet()), id='2_also7', name='2_also7', description='2_also7', dbxrefs=), 7)
Please let me know if you have any ideas to
1. remove the sequences with a specified hamming distance
2. write the remaining sequences to a fasta file
Thank you so much.