Hi everyone. I have a multiple FASTA file which contains roughly 150k sequence. The first sequence is my reference sequence and the rest of it contains sequences that have some mutations in them.
I'm working with single-point mutations and I am going to find them by comparing each sequence with the reference. So I need to remove the ones that have deletions because they would act like single-point mutations. Does anyone have any idea how to do that?
The deletion mutations are like this:
There is a deletion in the selected area hence the whole sequence shifted. There are many mutations like that. I don't have any idea how to detect and remove them.
I don't know how to align that much sequence but if I could that would solve my problem too.
Thanks in advance for your opinions and helps.