I have two runs of sequencing from the same PCR product. They are a bit misaligned (one starts ~20bp earlier and ends ~250bp earlier) and they both contain a few N's. What would be the best way to "merge" them? - i.e. create an alignment, and replace nonexistent positions or Ns with bases from the other strand if there are any.
I would prefer to do this via a python tool or a tool with a python interface.
I was thinking that as a last resort, I could:
- BLAST them
- subtract the alignment positions
- pad the shorter sequence with N's
- convert them to arrays
- zip the arrays
- iterate through the resulting matrix, and replace N's with values
But it would be really great if a tool exists that can take care of at least some of the above steps.
EDIT: actually, it turns out that they are not misaligned. one is simply shorter. I could therefore skip the BLAST and alignment step, but it would be nice to find a solution that would work even when the sequences are a bit misaligned.
I don't just want to align them (as it turns out, they happen to be aligned already), I want to merge them into one, so as to minimize unknown base positions.