Biostars,
I have some DNA sequences, of various lengths where some are forward orientation, and some are reverse. If reverse, they need flipped, and the complement sequence used. Except there's no way to really tell what orientation it is from the start.
I can use the following snippet to join two sequences, by finding the first 21bp of the second sequence in the first sequence, slicing, and joining. The problem is, I have more than 2 sequences, for example 5, there would be many iterations of this snippet looking for all possible combinations. Also, I'm having trouble determining if a sequence needs flipped. The end result needed, is a single super string, taking into account all 5 sequences.
def joinseqs(first, second):
""Assumes all input seqs are forward orientation"""
x = (first, second)
find_overlap = x[1][:21]
overlap = x[0].find(find_overlap)
if overlap < 0:
return None
else:
begin = x[0][:overlap]
seq = begin + x[1]
return seq
All help appreciated.
Hi, I have written a pure python package called pydna that implements a sequence Assembly algorithm based on graph theory. Look at the pydna.Assembly class. It was recently published in BMC Bioinformatics.
Hope this helps / Bjorn