Question: How Can I Sort Blast Output Based On E-Value?
2
gravatar for biostar
5.7 years ago by
biostar150
biostar150 wrote:

Hello: I have a blast output with about 50000 sequence list. The problem I have is that I need to sort those results based on the smallest e-value to greater e-value. Could you guys please suggest me anything that I can resort to? Thanks! The blast output looks like this:

    Tx_332323_c0_seq1    110815944    RecName: Full=Gag-Pol polyprotein; AltName: Full=Pr160Gag-Pol; Contains: RecName: Full=Matrix protein p17; Short=MA; Contains: RecName: Full=Capsid protein p24; Short=CA; Contains: RecName: Full=Spacer peptide p2; Contains: RecName: Full=Nucleocapsid protein p7; Short=NC; Contains: RecName: Full=Transframe peptide; Short=TF; Contains: RecName: Full=p6-pol; Short=p6*; Contains: RecName: Full=Protease; AltName: Full=PR; AltName: Full=Retropepsin; Contains: RecName: Full=Reverse transcriptase/ribonuclease H; AltName: Full=Exoribonuclease H; AltName: Full=p66 RT; Contains: RecName: Full=p51 RT; Contains: RecName: Full=p15; Contains: RecName: Full=Integrase; Short=IN    1446    5    457    107    1028    1138    4e-06    66    35.00    -3/0
Tv_333332_c0_seq2    206558251    RecName: Full=Zinc finger MYM-type protein 5    656    8    41    796    405    615    6e-14    92    27.63    2/0
Tv_144391_c0_seq1    75057844    RecName: Full=Nitrogen permease regulator 2-like protein; Short=NPR2-like protein; AltName: Full=Tumor suppressor candidate 4    380    4    1410    304    7    380    6e-118    65    48.27    -3/0
Ty_116400_c0_seq1    73920872    RecName: Full=Longitudinals lacking protein, isoforms F/I/K/T    970    1    29    190    902    956    4e-09    33    45.45    2/0
Ty_144400_c0_seq2    73920872    RecName: Full=Longitudinals lacking protein, isoforms F/I/K/T    970    1    29    190    902    956    2e-08    20    45.45    2/0
Tx_444402_c0_seq1    74834619    RecName: Full=Cathepsin L-like proteinase; Flags: Precursor    324    10    57    1034    1    323    9e-75    91    42.09    3/0
blast output sort • 4.3k views
ADD COMMENTlink modified 5.7 years ago by Michael Dondrup46k • written 5.7 years ago by biostar150
2

you know, there is a linux tool named 'sort': It sorts.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Pierre Lindenbaum122k
1

Thanks, but I tried that and gives faulty result.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by biostar150
2

Try this way: sort -k 10,10g input -o output.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by a.zielezinski8.8k
1

Also, there is always the manual way...

ADD REPLYlink written 5.7 years ago by PoGibas4.8k
4
gravatar for PoGibas
5.7 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:

As Pierre suggested you can try sort (take a look at -g option (for general numerical value)).

With a given example this works:

awk '{print $(NF-3),$0}' INPUT | sort -gk1  
# Used awk as original INPUT needs a little bit munging
ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by PoGibas4.8k
1

Thanks Pgibas! The awk command works as needed!

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by biostar150
3
gravatar for Michael Dondrup
5.7 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Don't sort by e-value, sort by score instead! The reason is that there can be many more ties in e-values due to numerical accuracy (e-value is exponentially decreasing wrt score) and database sizes. Often, one will for example get many alignments with e-value of 0 but different scores. Based on the numerical value of e-value these will be ranked randomly. Except for ties yielding random ranking, sorting by bit-score of HSPs of the same query and subject, will yield the same ranking of HSPs as sorting by e-values.

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by Michael Dondrup46k

Thanks Mike, good to know that as well.

ADD REPLYlink written 5.7 years ago by biostar150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1082 users visited in the last hour