Comparing two BLAST output files with biopython
1
0
Entering edit mode
6.4 years ago

Hi folks,

I have been trying to work out how to compare two or more BLAST output files with biopython and either (a) remove hits from these files where the output sequence ID is the same between files (i.e. hits that are common between multiple BLAST searches); or (b) preserve only those hits that are common between those files. Anyone got any advice on how to script this?

Thanks. :)

blast biopython • 1.5k views
ADD COMMENT
1
Entering edit mode

If you don't mind not using python, a lot of this hard work has been done for you in commandline tools like diff. You'd need to sort the files equivalently first, but it's easy to output just the relevant lines after that.

Even better, I use icdiff(https://github.com/jeffkaufman/icdiff) which colourises the output in an intelligent way.

If you want to start doing comparisons based on numerical fields though, (e.g. keep all lines that are different, but with an E-value <0.1 for example), then I would go with Sej's suggestion.

Just to throw the cat amongst the pidgeons too, you can do similar manipulations with the csv package of python which is in the standard library, if you're just interested in string comparisons for example.

ADD REPLY
1
Entering edit mode
6.4 years ago
Sej Modha 5.3k

I'd go for BLAST tabular output -m 6 as it's easy to parse and then use pandas in Python to compare two files as dataframes.

ADD COMMENT

Login before adding your answer.

Traffic: 1779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6