compare two list of nucleotide sequences
1
0
Entering edit mode
3.0 years ago
karthic ▴ 110

Dear all,

I have a list of nucleotide fasta sequences which has to be compared to another list of nucleotide sequences and output which two sequences are an exact match without any gaps or mismatches.

Eg.

List A

>query1
sequence
>query2
sequnce
>query3
sequence


List B

    >subject1
sequence
>sub2
sequence
>sub3
sequence


Example output

query2 == sub3


I want to know, if the solution already exists, if not how shall I approach it

Thanks KK

blast bowtie mummer sequence • 1.3k views
3
Entering edit mode
3.0 years ago
Joe 19k

You can do this easily with BioPython.

This is some code just off the top of my head so may not be 100% correct, but should get you started.

import sys
from Bio import SeqIO

GenesA = list(SeqIO.parse(sys.argv[1], ‘fasta’))
GenesB = list(SeqIO.parse(sys.argv[2], ‘fasta’))

matches = []
for i in GenesA:
for j in GenesB:
if i.seq == j.seq:
matches.append(i)

for seq in matches:
print seq.format('fasta')


This isn’t the fastest or most elegant way to do it, but it is safe because it doesn’t assume your lists are the same length or anything.

1
Entering edit mode

Hello jrj.healey ,

but this would print out all sequences from GenesA that have a match in GenesB, wouldn't it. If I understood OP correct, (s)he want's a table o ids which have the same sequence. So your code should look something like this:

import sys
from Bio import SeqIO

GenesA = list(SeqIO.parse(sys.argv[1], ‘fasta’))
GenesB = list(SeqIO.parse(sys.argv[2], ‘fasta’))

matches = []
for i in GenesA:
for j in GenesB:
if i.seq == j.seq:
matches.appendi.id+" == "+j.id)

for m in matches:
print(m)


(Also just off the top of my head)

fin swimmer

Edit: There seems to be a bug in biostars. It removes the open brace here matches.append(.

0
Entering edit mode

Right you are, I didn’t read that bit properly.

OP will have access to the whole SeqRecord object inside matches and can print it out however they wish (I originally had the printing lines commented out because I wasn’t sure what output was desired (e.g. to a file or to screen etc)).

0
Entering edit mode

Hi fin swimmer

I have noticed that and added that open brace, thank you for the solution.

Thanks KK

0
Entering edit mode

Dear Jrj Healey and fin swimmer,

Thank you for the solution and the code. It is working and the results are just as I expected.

Thank you very much KK

0
Entering edit mode

Ok great, you're welcome.

Be sure to accept the answer (the check mark on the left of the post) if this is resolved for you.

0
Entering edit mode

Hi jrj healey,

Thanks for the code, I will start with that and let you know.

Thanks KK