Comparing Blast M8 Outputs
3
0
Entering edit mode
10.3 years ago
oussumenten ▴ 40

Hi,

I want to compare BLAST outputs in m8 format to find out if any queries in the separate output files have the same subject hits to NCBI nr database.

For example, for the 2 BLAST outputs in m8 format below, displaying only the query and subject columns, I want to find PFD1 and GHT3 have the same hit (Hsp90):

BLAST Output 1:

PFD1 Hsp90

PFD2 Gan80

PFD5 Kan38

BLAST Output 2:

GHT1 Lsg70

GHT2 Jkl78

GHT3 Hsp90

GHT6 Odf45

Any help, suggestions or recommendations will be greatly appreciated. Also let me know if you have any questions.

blast • 5.8k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
2
Entering edit mode
10.3 years ago

sort both files on the 2nd column and join on the 2nd column (replace '\t' with a tab )..

join -t '\t' -1 2 -2 2 <(sort -t '\t' -k2,2 output1.tsv) <(sort -t '\t' -k2,2 output2.tsv)
ADD COMMENT
0
Entering edit mode

Thanks. This will provide me with starting point

ADD REPLY
1
Entering edit mode
10.3 years ago
Pavel Senin ★ 1.9k

Let's do this in R:

table1 = data.frame(read.table("~/tmp/test1.out", header=F))
table2 = data.frame(read.table("~/tmp/test2.out", header=F))
names(table1) = c("gene","hit")
names(table2) = names(table1)
merge(table1, table2, by="hit")

output:

> merge(table1, table2, by="hit")
    hit gene.x gene.y
1 Hsp90   PFD1   GHT3
ADD COMMENT
0
Entering edit mode

Thanks. This will provide me with a starting point.

ADD REPLY
1
Entering edit mode
10.1 years ago
Prakki Rama ★ 2.7k

Did this using Perl.

cat o1.txt o2.txt >o3.txt  ## combined both outputs

SCRIPT

open FH,"o3.txt";
%HoA=();
print "_______INPUT_DATA_________\n";
while(<FH>)
{
print "$_";
    @array=split;
    for($i=1;$i<=$#array;$i++)
    {
    push @{ $HoA{$array[$i]} }, "$array[0]";
    }
}
print "___________RESULT_________\n";
foreach $k (sort keys %HoA)
{
    print "$k\t@{$HoA{$k}}\n";
}

OUTPUT:

________INPUT_DATA_________
PFD1 Hsp90
PFD2 Gan80
PFD5 Kan38
GHT1 Lsg70
GHT2 Jkl78
GHT3 Hsp90
GHT6 Odf45
___________RESULT_________
Gan80    PFD2
Hsp90    PFD1 GHT3
Jkl78    GHT2
Kan38    PFD5
Lsg70    GHT1
Odf45    GHT6
ADD COMMENT

Login before adding your answer.

Traffic: 1790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6