Align the oligos with the fasta sequences
2
0
Entering edit mode
9.1 years ago

I am new to python and I am having a hard time to figure out how to do this. I have fasta sequences and a few oligo sequences. I would like to align the oligos with the fasta sequences and find percentage similarity of each oligo to each fasta sequence. can any one help me with this

e.g.

fasta sequences

>seq1
cttatatggtaaccgaagcacttcgcccgtataaaaatcatctaaatatgcactttgttt
caaatgtcgatggt
>seq2
tgtttactggcgaaaaaatcaatcgtacagaaaatcgtgccgtgctacatactgcacttc
gcaa

oligos

>x1
ttaacatctgcagcaaaatc
>x2
aaattggggggataccttaa
alignment • 3.1k views
ADD COMMENT
0
Entering edit mode
9.1 years ago
JC 13k

Why are you using python? You can do that with blast/blat/fasta/many_other_aligners. If this a homework, then you need to check RegEx or how a local/global alignment works.

ADD COMMENT
0
Entering edit mode

Thank you for the prompt reply. I understand that blast, fasta and many other tools are available for this purpose. Since i wanted to learn how to do this in python , and stuck i posted this question

ADD REPLY
1
Entering edit mode

It's not a trivial problem. If you want to do it yourself as a programming exercise, your best bet is to re-implement Needleman-Wunsch or a similar string alignment algorithm.

ADD REPLY
0
Entering edit mode
9.1 years ago

There's a program in BBTools that will do this. The sequences need to be in fasta format, and all letters need to be uppercase. You can convert to uppercase with reformat:

reformat.sh in=file.fasta out=upper.fasta touppercase

Then run msa:

msa.sh in=upper.fasta out=mapped.sam literal=TTAACATCTGCAGCAAAATC

It will produce exactly one output line per fasta sequence, so you need to run it once per oligo. You can give it a comma-delimited list of oligos instead, in which case there will still be one output line per sequence, but it will be for the oligo that matched best. The output lines will have at the end "YI:f:" followed by a number, indicating the identity. For example, YI:f:97.3 would indicate 97.3% identity.

ADD COMMENT

Login before adding your answer.

Traffic: 1479 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6