How to interpret blastn results output
7 weeks ago


I am trying to align shotgun metagenomics dataset to NCBI eukaryotic reference database using Blastn to evaluate the dietary assessment from fecal samples of black bear. I am new to this. I've got the blastn output as a tabular format (outfmt 6). From this, I extracted the information regarding unique queries and unique subject sequences to check how often the queries hit the exact spot in the database. I used the follwing commands:

for i in $(ls blastn_out_nt/); do cut -f 1 blastn_out_nt/$i | sort | uniq | wc -l >> query; done

for i in $(ls blastn_out_nt/); do sort -k2,2 blastn_out_nt/$i | cut -f 2,9,10 | uniq | wc -l >> unique_subjects; done

The output looks like:

enter image description here

Does a smaller ratio of unique queries and unique subjects upon blastn results potentially indicate that the input fasta sequences were redundant (pcr duplicates) becasue they hit the same database entry? I appreciate your kind help!

