I am newbie in bioinformatics, and now trying to learn about alignments, so sorry about the lack of vocabulary. After blasting a sequence, this is my output from a BLASTN
... lots of other aligns ... AACGTATACGGATCGACTGC AACGTATACGGATCGAC AACGTATACGGATCGACTGC AACGTATACGGATCGAC AACGTATATGGATCGACTGC AACGTATACGGATCGACTGC AACGTATACGGATCGACTGCA AACGTATACGGATCGACTG AACGTATACGGATCGACTGC AACGTATATGGATCGACTGC AACGTATACGGATCGCTGC AACGTATACGGATCGACTGC AACGTATACGGATCGCTGCAA ...
That list are the extracted alignment strings for the database, from the blastn results. As you've seen, the most common pattern has length of 20 nucleotides, but some sequences in the results have insertions and deletions, and some has both of them. I've grouped the alignments for a population analysis, and if I understood correctly, the arlequin software requires to format the aligns for having all sequences with equal size, so I want to "fix" the alignments.
Now I'm thinking in two options:
- I missed a parameter in my NCBI blastn setup to limit to only alignments with fixed size (of 20 in my case)
- This is very typical and there is a software to fix this situation, where you preserve the accuracy by adding gaps up to the longest sequence (I imagine there is a case for which you cannot add gaps without having two possible different sequences)