Question: Counting number of hits of blastn output
0
gravatar for caranlove
12 months ago by
caranlove10
caranlove10 wrote:

Hi all, I am working with makeblastdb and blastn for the first time and apologize for what is probably a basic question. I recently indexed and blasted against my own database, all of which are the sequences from the same species.

module load BLAST+/2.9.0-foss-2018a
blastn -query my.fasta -db my_databse -out  -evalue 0.0001 -num_threads 4

I am now interested in counting the number of hits in my samples that match the sequences in my blast database. Any advice on the best methods to do this would be fantastic! Thanks in advance for any help. Cheers

ADD COMMENTlink modified 12 months ago by lieven.sterck8.5k • written 12 months ago by caranlove10

counting the number of hits in my samples that match the sequences in my blast database

Are you looking for some kind of summary? Since you only blasted against your own sequences/database the hits you see are what you are interested in.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax89k

Thank you, yes, I am looking for some sort of summary of how many hits there were for each sequence in my database. Do you have any suggestions on how to best get counts of how many reads in my sample file blasted to each of the sequences in my blast file? Thanks!

ADD REPLYlink modified 12 months ago • written 12 months ago by caranlove10
0
gravatar for lieven.sterck
12 months ago by
lieven.sterck8.5k
VIB, Ghent, Belgium
lieven.sterck8.5k wrote:

At first sight (but I assume it might be simply a typo) this command will not work as the -out parameter needs an argument (== file to write result to)

As for your specific question: I advise to use a different output format than the default, use -outfmt 6 (or 7 , == tabular output formats, check blast manual for details) which can easily be parsed into summaries you want.

Specifically: run the following command on the tabular output format: cut -f1 <blast result file> | sort -u | wc -l , this will print a number which represent the number of your query sequences that match your database

ADD COMMENTlink written 12 months ago by lieven.sterck8.5k

Adding to this comment,

cut -f1 <blast result file> | sort | uniq -c

This will give you the number of hits each query sequence had.

ADD REPLYlink written 12 months ago by Giovanni.madrigal12170

Thank you for the insight! This is helpful, but I am actually most interested in how many of the quary sequences blasted to each of the reference/database sequences individually. Do you happen to have a suggestion on best way to get these numbers?

Maybe it is best to create a custom reference file for alignment and then use more of a DE analysis approach? Thanks for the help!

ADD REPLYlink modified 12 months ago • written 12 months ago by caranlove10
1

Can you do the above command on the column containing the database sequence itself? I don't think a query sequence would hit a database sequence twice, so that should let you see how many times the database sequence was hit, with the number of unique hits representing the number of query sequences.

ADD REPLYlink written 12 months ago by Giovanni.madrigal12170

Thank you, you are correct and this worked wonderfully!

ADD REPLYlink written 11 months ago by caranlove10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1482 users visited in the last hour