My request is based on filtering and curing several multifastas. For instance, I have downloaded about 150 complete genomes from NCBI belonging to Influenza Virus that infect humans. Within these sequences, there are ambiguous nucleotides (W, S, K, M, Y, R, V, H, D, B, N), which are produced by sequencing process as following:
>H1N1_12 ATGCTTACTGGGTGATC >H5N1 TTGCCRTCACCGNACTGC >H1N1_9 CTGYNATTGCCATCGWAA >H5N1_1 ATCTTACTCGGCGACTCC >H5N1_5 ACTGYRATTCGCCTAKAA
With use of Biopython tools, I wish a script where it identifies these ambiguous nucleotides with its respective fasta header (in this case, >H5N1 has R and N, >H1N1_9 has Y, N and W, >H5N1_5 has Y, R and K) if finds whatever ambiguous nucleotide then it must remove the sequence with its ID in a new fasta file and else the process goes on until find it.