Question: Pattern Alignment And Report Identity
gravatar for abhiwash
7.0 years ago by
abhiwash0 wrote:

I have 2 fasta files with sequence's in it.i want to align the sequences in second file to first file and report identity

for example:





i want to take the file2 sequences and look in file1 and print the matching with mismatched base in uppercase and identity in csv format



here what i did so far

import sys
import os,csv
from Bio import SeqIO
from itertools import *
from optparse import OptionParser

parser = OptionParser()
parser.add_option("-m", "--mismatch_threshold", dest="mismatch_threshold", default = 2,
          help="This is the number of differences you'll allow between the   actualread and your sequence of interest.  Default is 2")
(options, args) = parser.parse_args()

if len(sys.argv) != 4:
  print "Usage : python <file1> <file2> <fout>"

f1 = open(sys.argv[1],'r')
f2 = open(sys.argv[2],'r')
fout = open(sys.argv[3],'w')
writer = csv.writer(fout)

def long(f1):
    for record in SeqIO.parse(f1,'fasta'):
        header =
        sequence = record.seq   
   yield [header, sequence]

def short(f2):
    for record in SeqIO.parse(f2,'fasta'):
        head =
        seq = record.seq  
    return seq

def alignment(sequence,seq,mismatch_threshold):
    l1 = len(sequence)
    l2 = len(seq)
    alignment = []
    for i in range(0,min(l1,l2)):
        if sequence[i] == seq[i]:
            mismatch = sum( c1 != c2 for c1,c2 in zip(sequence,seq))                                    
                if mismatch <= mismatch_threshold:                           
    k = 0
l = 0
for read in alignment:
    for letter in read:
        if letter == isupper():
            if letter == alignment[0].seq[j]:
                l +=1
        k += 1
    k = 0
    length = seq
    percent = 100*l/len(seq)
    #print percent
    yield percent

longsequences = long(open(sys.argv[1],'r'))
shortsequences = short(open(sys.argv[2],'r'))
align = alignment(longsequences,shortsequences,options.mismatch_threshold)

for name in head:
    writer.writerow(( name +'_alignment' , name + '_identity'))
for s in align:
# print to csv file

i am confused and have problem in reading the the sequences in the def function getting this error

what is the best of looping the sequences of file2 and aligning them to sequences in file1 and find an alignment with mismatches?

File "", line 34, in alignment
    l1 = len(sequence)
TypeError: object of type 'generator' has no len()
python alignment biopython • 2.1k views
ADD COMMENTlink modified 4.0 years ago by Biostar ♦♦ 20 • written 7.0 years ago by abhiwash0

Looks like you're re-inventing the wheel here?!? Is there any particular reason, given you're using BioPython already, that you're not using the alignment functions already supplied?

from Bio import pairwise2
ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by george.ry1.1k
gravatar for Peter
7.0 years ago by
Scotland, UK
Peter5.9k wrote:

Some problems/suggestions:

def long(f1):
    for record in SeqIO.parse(f1,'fasta'):
        header =
        sequence = record.seq   
   yield [header, sequence]

If you don't need/want the SeqRecord object from SeqIO, just use SimpleFastaParser instead? It will be faster.

More generally, you don't seem to be looping properly to compare all against all (the short function looks at only the final sequence in the second file). I would expect to see a nested pair of for loops here.

P.S. Have you tried EMBOSS tool needleall? That or one of the other EMBOSS pairwise alignment tools might do what you want.

Update: Regarding the double loop idea, I'd expect something like this:

for record1 in SeqIO.parse(file1, "fasta"):
    for record2 in SeqIO.parse(file2, "fasta"):
        print("Now compare %s from file one against %s from file two..." %,

If file1 had 10 sequences, this would mean looping over file2 10 times. If file2 had 5 sequences, this would mean doing 50 comparisons in total.

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Peter5.9k

can i get help in looping the short sequences to long sequences,as i new to this nested loops @Peter

ADD REPLYlink written 7.0 years ago by abhiwash0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1409 users visited in the last hour