Question

Query Two Files To The Program Fasta (Performs Pairwise Alignments)

0

Entering edit mode

12.6 years ago

martinkara7 ▴ 20

Hello, I am rather very new to programming and am trying to design a perl script. I have two very large fasta files(one file conain 4000 sequences and the other 3500). I want to conduct a pairwise alignment on the the files with every possible combination. I am using the software FASTA to conduct the pairwise so i can tailor the outfile. The thing is query the file against each other to do this. I am having great difficult with the loops and submit to FASTA part. Below is how I am trying to do it but the loops are being very difficult. Could any one advise on this or offer alternative idea i could try please

Open file 1
While reading file 1
  Extract 1st sequence to align
  Open file 2
  While reading file 2
    Extract seq from file2
     submit to FASTA
  Close file 2
Close file 1
 system call FASTA

fasta alignment • 3.5k views

ADD COMMENT • link updated 12.6 years ago by Andrzej Zielezinski 11k • written 12.6 years ago by martinkara7 ▴ 20

0

Entering edit mode

Why are the loops difficult to you? Post your code to see if it could be improved.

Do you need reciprocal combinations? I mean, if you already compared A vs B, do you need B vs A too?

Why do you need to do one by one, it's simple to use a multifasta as target and query, then parse the output to obtain the pairs information.

ADD REPLY • link 12.6 years ago by JC 13k

0

Entering edit mode

I think you should add this as an answer. I suspect the OP was not aware that one could use the multifasta for both target and query.

ADD REPLY • link 12.6 years ago by Istvan Albert 102k

0

Entering edit mode

I was expecting some additional info, but I wil add that part as answer.

ADD REPLY • link 12.6 years ago by JC 13k

score 5 · Answer 1 · 2012-12-05

You can do the whole thing (align all possible non-redundant pairs) in Python (assuming you have BioPython installed)::

from itertools import product
from Bio import SeqIO
from Bio import pairwise2

seqs1 = SeqIO.to_dict(SeqIO.parse(open('file1.fasta'),'fasta'))
seqs2 = SeqIO.to_dict(SeqIO.parse(open('file2.fasta'),'fasta'))

for sr1, sr2 in product(seqs1,seqs2):
    for a in pairwise2.align.localxx(str(seqs1[sr1].seq), str(seqs2[sr2].seq)):
        print format_alignment(*a)

score 1 · Answer 2 · 2012-12-05

1

Entering edit mode

12.6 years ago

JC 13k

Why do you need to do one by one, it's simple to use a multifasta as target and query, then parse the output to obtain the pairs information.

ADD COMMENT • link 12.6 years ago by JC 13k