Question: Query Two Files To The Program Fasta (Performs Pairwise Alignments)
0
gravatar for martinkara7
7.6 years ago by
martinkara720
martinkara720 wrote:

Hello, I am rather very new to programming and am trying to design a perl script. I have two very large fasta files(one file conain 4000 sequences and the other 3500). I want to conduct a pairwise alignment on the the files with every possible combination. I am using the software FASTA to conduct the pairwise so i can tailor the outfile. The thing is query the file against each other to do this. I am having great difficult with the loops and submit to FASTA part. Below is how I am trying to do it but the loops are being very difficult. Could any one advise on this or offer alternative idea i could try please

Open file 1
While reading file 1
  Extract 1st sequence to align
  Open file 2
  While reading file 2
    Extract seq from file2
     submit to FASTA
  Close file 2
Close file 1
 system call FASTA
fasta alignment • 1.9k views
ADD COMMENTlink modified 7.6 years ago by a.zielezinski9.2k • written 7.6 years ago by martinkara720

Why are the loops difficult to you? Post your code to see if it could be improved.

Do you need reciprocal combinations? I mean, if you already compared A vs B, do you need B vs A too?

Why do you need to do one by one, it's simple to use a multifasta as target and query, then parse the output to obtain the pairs information.

ADD REPLYlink modified 7.6 years ago • written 7.6 years ago by JC10k

I think you should add this as an answer. I suspect the OP was not aware that one could use the multifasta for both target and query.

ADD REPLYlink written 7.6 years ago by Istvan Albert ♦♦ 84k

I was expecting some additional info, but I wil add that part as answer.

ADD REPLYlink written 7.6 years ago by JC10k
5
gravatar for a.zielezinski
7.6 years ago by
a.zielezinski9.2k
a.zielezinski9.2k wrote:

You can do the whole thing (align all possible non-redundant pairs) in Python (assuming you have BioPython installed)::

from itertools import product
from Bio import SeqIO
from Bio import pairwise2

seqs1 = SeqIO.to_dict(SeqIO.parse(open('file1.fasta'),'fasta'))
seqs2 = SeqIO.to_dict(SeqIO.parse(open('file2.fasta'),'fasta'))

for sr1, sr2 in product(seqs1,seqs2):
    for a in pairwise2.align.localxx(str(seqs1[sr1].seq), str(seqs2[sr2].seq)):
        print format_alignment(*a)
ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by a.zielezinski9.2k
1
gravatar for JC
7.6 years ago by
JC10k
Mexico
JC10k wrote:

Why do you need to do one by one, it's simple to use a multifasta as target and query, then parse the output to obtain the pairs information.

ADD COMMENTlink written 7.6 years ago by JC10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1495 users visited in the last hour