Question

Align the oligos with the fasta sequences

0

Entering edit mode

9.1 years ago

pabbathi.ranjit • 0

I am new to python and I am having a hard time to figure out how to do this. I have fasta sequences and a few oligo sequences. I would like to align the oligos with the fasta sequences and find percentage similarity of each oligo to each fasta sequence. can any one help me with this

e.g.

fasta sequences

>seq1
cttatatggtaaccgaagcacttcgcccgtataaaaatcatctaaatatgcactttgttt
caaatgtcgatggt
>seq2
tgtttactggcgaaaaaatcaatcgtacagaaaatcgtgccgtgctacatactgcacttc
gcaa

oligos

>x1
ttaacatctgcagcaaaatc
>x2
aaattggggggataccttaa

alignment • 3.1k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by pabbathi.ranjit • 0

Ram · Answer 1 · 2015-03-27

0

Entering edit mode

9.1 years ago

JC 13k

Why are you using python? You can do that with blast/blat/fasta/many_other_aligners. If this a homework, then you need to check RegEx or how a local/global alignment works.

ADD COMMENT • link 9.1 years ago by JC 13k

0

Entering edit mode

Thank you for the prompt reply. I understand that blast, fasta and many other tools are available for this purpose. Since i wanted to learn how to do this in python , and stuck i posted this question

ADD REPLY • link 9.1 years ago by pabbathi.ranjit • 0

1

Entering edit mode

It's not a trivial problem. If you want to do it yourself as a programming exercise, your best bet is to re-implement Needleman-Wunsch or a similar string alignment algorithm.

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Brian Bushnell 20k

Ram · Answer 2 · 2015-03-27

There's a program in BBTools that will do this. The sequences need to be in fasta format, and all letters need to be uppercase. You can convert to uppercase with reformat:

reformat.sh in=file.fasta out=upper.fasta touppercase

Then run msa:

msa.sh in=upper.fasta out=mapped.sam literal=TTAACATCTGCAGCAAAATC

It will produce exactly one output line per fasta sequence, so you need to run it once per oligo. You can give it a comma-delimited list of oligos instead, in which case there will still be one output line per sequence, but it will be for the oligo that matched best. The output lines will have at the end "YI:f:" followed by a number, indicating the identity. For example, YI:f:97.3 would indicate 97.3% identity.