Question

how to write a script to grep the lowest e-value for each gene?

0

Entering edit mode

6.7 years ago

Anny ▴ 30

Hi all,

I have a file containing blast hits for each gene, one hit per line, as following

ID  E-value

g1 0.0

g1 1e-20

g1 1e-5

g2 1e-5

g2 1e-13

g3 0.01

I would like to grep the lowest e-value for each gene, how to write a script to make it?

Thank you!

Alexie

linux script • 1.8k views

ADD COMMENT • link updated 6.7 years ago by venu 7.1k • written 6.7 years ago by Anny ▴ 30

1

Entering edit mode

Why not just do the blast over with -max_target_seqs 1?

ADD REPLY • link 6.7 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Actually, they are a merged file of results generated by different methods, I want to check which method generally give the best results or more reliable.

ADD REPLY • link 6.7 years ago by Anny ▴ 30

0

Entering edit mode

my data: blast_scores.txt

ID  E-value
g1  0.0
g1  1e-20
g1  1e-5
g2  1e-5
g2  1e-13
g3  0.01

code:

$ datamash -H -g 1 min 2 < blast_scores.txt

output:

$ datamash -H -g 1 min 2 < blast_scores.txt 
GroupBy(ID) min(E-value)
g1  0
g2  1e-13
g3  0.01

datamash is command line tool from GNU

ADD REPLY • link 6.7 years ago by cpad0112 21k

score 1 · Answer 1 · 2017-07-31

Assuming input file is tab-delimited. If not, reformat, or change the code below. Save code as sort_eval.py and run as python sort_eval.py file.txt

#!/usr/bin/env python
import sys
from collections import defaultdict

evalue = defaultdict(list)

with open(sys.argv[1], 'r') as f:
        next(f)
        for line in f:
                evalue[line.strip().split('\t')[0]].append(line.strip().split('\t')[1])

for i in sorted(evalue):
        print i, '\t', sorted(evalue[i])[0]

Output:

g1      0.0
g2      1e-13
g3      0.01

score 1 · Answer 2 · 2017-07-31

Something like this

show contents | remove header | sort based on first column followed by second | keep rows based on first occurrence of first column

code

cat test.test | grep -v ID | sort -k1,1 -k2,2n | awk '!a[$1]++'

result

g1 0.0
g2 1e-13
g3 0.01

P.S. You can directly pass input file to grep i.e. grep -v ID text.txt | ...