Question: Counting number of hits of blastn output
0
gravatar for caranlove
7 weeks ago by
caranlove0
caranlove0 wrote:

Hi all, I am working with makeblastdb and blastn for the first time and apologize for what is probably a basic question. I recently indexed and blasted against my own database, all of which are the sequences from the same species.

module load BLAST+/2.9.0-foss-2018a
blastn -query my.fasta -db my_databse -out  -evalue 0.0001 -num_threads 4

I am now interested in counting the number of hits in my samples that match the sequences in my blast database. Any advice on the best methods to do this would be fantastic! Thanks in advance for any help. Cheers

ADD COMMENTlink modified 7 weeks ago by lieven.sterck6.2k • written 7 weeks ago by caranlove0

counting the number of hits in my samples that match the sequences in my blast database

Are you looking for some kind of summary? Since you only blasted against your own sequences/database the hits you see are what you are interested in.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by genomax74k

Thank you, yes, I am looking for some sort of summary of how many hits there were for each sequence in my database. Do you have any suggestions on how to best get counts of how many reads in my sample file blasted to each of the sequences in my blast file? Thanks!

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by caranlove0
0
gravatar for lieven.sterck
7 weeks ago by
lieven.sterck6.2k
VIB, Ghent, Belgium
lieven.sterck6.2k wrote:

At first sight (but I assume it might be simply a typo) this command will not work as the -out parameter needs an argument (== file to write result to)

As for your specific question: I advise to use a different output format than the default, use -outfmt 6 (or 7 , == tabular output formats, check blast manual for details) which can easily be parsed into summaries you want.

Specifically: run the following command on the tabular output format: cut -f1 <blast result file> | sort -u | wc -l , this will print a number which represent the number of your query sequences that match your database

ADD COMMENTlink written 7 weeks ago by lieven.sterck6.2k

Adding to this comment,

cut -f1 <blast result file> | sort | uniq -c

This will give you the number of hits each query sequence had.

ADD REPLYlink written 7 weeks ago by Giovanni.madrigal12110

Thank you for the insight! This is helpful, but I am actually most interested in how many of the quary sequences blasted to each of the reference/database sequences individually. Do you happen to have a suggestion on best way to get these numbers?

Maybe it is best to create a custom reference file for alignment and then use more of a DE analysis approach? Thanks for the help!

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by caranlove0
1

Can you do the above command on the column containing the database sequence itself? I don't think a query sequence would hit a database sequence twice, so that should let you see how many times the database sequence was hit, with the number of unique hits representing the number of query sequences.

ADD REPLYlink written 6 weeks ago by Giovanni.madrigal12110

Thank you, you are correct and this worked wonderfully!

ADD REPLYlink written 20 days ago by caranlove0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1252 users visited in the last hour