How to compute the number of unique proteins hit found in the blastx search
1
0
Entering edit mode
8.9 years ago
seta ★ 1.9k

Dear all,

Could anybody please let me know how to calculate the number of unique proteins hit found in the blastx search or the number of unique contigs hit by proteins in the tblastn search and create a unique sequence list? Sorry if you find it's so basic question, but I have to ask for making sure. Thanks

blast RNA-Seq alignment • 1.8k views
ADD COMMENT
0
Entering edit mode

How to parse the blast results will depend on the output format.

ADD REPLY
0
Entering edit mode

The output format is tabular (-outfmt 6)

ADD REPLY
1
Entering edit mode
8.9 years ago
5heikki 11k

In case you used tabular output this is as simple as applying sort..

E.g., how many of my query seqs got a hit?

cut -f1 outputFile | sort -u | wc -l

How many subjects did my query seqs hit?

cut -f2 outputFile | sort -u | wc -l

The first commands works when each query ID is unique, which is not the case in at least blastx. For that, you could also cut the fields that specify where the alignment starts and ends, but it's complicated because you would still likely have multiple hits that differ slightly in their start and end coordinates. I circumvent such problems by using actual protein prediction algorithms (like Prodigal) and applying blastp. It's way faster this way too..

ADD COMMENT
0
Entering edit mode

Thanks friend, however I would like to have the number and list of sequences with unique hit for further analysis. Could you please help me out to access them?

ADD REPLY
0
Entering edit mode
If you mean ids of contigs that your queries hit, just drop the last pipe.
ADD REPLY

Login before adding your answer.

Traffic: 2190 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6