Python for a BLASTed file
4
0
Entering edit mode
8.1 years ago
Kevin_Smith ▴ 10

I have a BLASTED file as tabular with comment lines. I have to find the total hits. I will appreciate any help for a python script.

blast • 2.2k views
ADD COMMENT
1
Entering edit mode

Please make yourself familiar with the EXACT meaning of the columns in your blast output file. The main problem with blast output is to decide if a hit (a row in your table) is relevant for your project. For example, the second row in your table indicates a short sequence repeat within 'sequence_A'. Is this relevant or not?

ADD REPLY
6
Entering edit mode
8.1 years ago

Here's your Python script:

from itertools import groupby

fh = open('BLASTED.txt')
oh = open('BLASTED.txt.out', 'w')
queries_no = 0
hits_no = 0
for qid, hsps in groupby(fh, lambda l: l.split()[0]):
    if qid.startswith('#'): continue
    hits = len(set([l.split()[1] for l in hsps]))
    hits_no += hits
    queries_no += 1
    oh.write('{0}\t{1}\n'.format(qid, hits))
oh.close()
fh.close()

print 'Total queries  :', queries_no
print 'Total hits     :', hits_no
print 'Averaged hits  :', float(hits_no)/queries_no
ADD COMMENT
1
Entering edit mode
8.1 years ago
Michael 54k

The average number of hits is of very limited value, especially if the number of hits to display was restricted in the blast search.

To do this in practice on a tabular output is very simple: each line represents a hit, therefore you can count the occurrences of each unique query id (sequence_A := 5) and divide by the total number of query ids (1).

ADD COMMENT
1
Entering edit mode
8.1 years ago
Kevin_Smith ▴ 10

Thank you very much a.zielezinski. The script works perfect !!

ADD COMMENT
0
Entering edit mode
8.1 years ago
Kevin_Smith ▴ 10

I have to consider one hit per subject sequence. All of this using the BLASTED txt file.

ADD COMMENT

Login before adding your answer.

Traffic: 1961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6