Counting number of hits of blastn output
1
0
Entering edit mode
3.0 years ago
Clovering ▴ 20

Hi all, I am working with makeblastdb and blastn for the first time and apologize for what is probably a basic question. I recently indexed and blasted against my own database, all of which are the sequences from the same species.

module load BLAST+/2.9.0-foss-2018a
blastn -query my.fasta -db my_databse -out  -evalue 0.0001 -num_threads 4


I am now interested in counting the number of hits in my samples that match the sequences in my blast database. Any advice on the best methods to do this would be fantastic! Thanks in advance for any help. Cheers

makeblastdb blastn database index • 3.3k views
0
Entering edit mode

counting the number of hits in my samples that match the sequences in my blast database

Are you looking for some kind of summary? Since you only blasted against your own sequences/database the hits you see are what you are interested in.

0
Entering edit mode

Thank you, yes, I am looking for some sort of summary of how many hits there were for each sequence in my database. Do you have any suggestions on how to best get counts of how many reads in my sample file blasted to each of the sequences in my blast file? Thanks!

0
Entering edit mode
3.0 years ago

At first sight (but I assume it might be simply a typo) this command will not work as the -out parameter needs an argument (== file to write result to)

As for your specific question: I advise to use a different output format than the default, use -outfmt 6 (or 7 , == tabular output formats, check blast manual for details) which can easily be parsed into summaries you want.

Specifically: run the following command on the tabular output format: cut -f1 <blast result file> | sort -u | wc -l , this will print a number which represent the number of your query sequences that match your database

0
Entering edit mode

cut -f1 <blast result file> | sort | uniq -c


This will give you the number of hits each query sequence had.

0
Entering edit mode

Thank you for the insight! This is helpful, but I am actually most interested in how many of the quary sequences blasted to each of the reference/database sequences individually. Do you happen to have a suggestion on best way to get these numbers?

Maybe it is best to create a custom reference file for alignment and then use more of a DE analysis approach? Thanks for the help!

1
Entering edit mode

Can you do the above command on the column containing the database sequence itself? I don't think a query sequence would hit a database sequence twice, so that should let you see how many times the database sequence was hit, with the number of unique hits representing the number of query sequences.

0
Entering edit mode

Thank you, you are correct and this worked wonderfully!