Merge sequence files
1
0
Entering edit mode
8.9 years ago

I have two runs of sequencing from the same PCR product. They are a bit misaligned (one starts ~20bp earlier and ends ~250bp earlier) and they both contain a few N's. What would be the best way to "merge" them? - i.e. create an alignment, and replace nonexistent positions or Ns with bases from the other strand if there are any.

I would prefer to do this via a python tool or a tool with a python interface.

I was thinking that as a last resort, I could:

  • BLAST them
  • subtract the alignment positions
  • pad the shorter sequence with N's
  • convert them to arrays
  • zip the arrays
  • iterate through the resulting matrix, and replace N's with values

But it would be really great if a tool exists that can take care of at least some of the above steps.

EDIT: actually, it turns out that they are not misaligned. one is simply shorter. I could therefore skip the BLAST and alignment step, but it would be nice to find a solution that would work even when the sequences are a bit misaligned.

dna sequencing genome sequence • 2.0k views
ADD COMMENT
0
Entering edit mode
8.9 years ago

I think you just want to align them, but I'm not sure that doing that natively in Python is the way to go. I think you might do something like call clustalW within your program, the write something to parse its output.

ADD COMMENT
0
Entering edit mode

I don't just want to align them (as it turns out, they happen to be aligned already), I want to merge them into one, so as to minimize unknown base positions.

ADD REPLY

Login before adding your answer.

Traffic: 1788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6