I'm getting confused about a so basic matter, please clear me what happened. I did de novo transcriptome assembly for a non-model organism, then run blastx. I computed part of blastx output using the cods:
cut -f1 blast_output.txt | sort -u | wc -l
(that show how many of query sequences got a hit) and
cut -f2 blast_output.txt | sort -u | wc -l
, which show how many subjects did my query sequences hit), these number were 36725 and 16542, respectively, for one of my assembly, with 57210 sequences is it usual?. Please be patient with me and tell me how to present the results, in fact can I say 36725 from 57210 has been annotated? Also, please explain what is the source of this difference between two numbers (36725 and 16542), one reason is, more than one contigs got the same hit, am I right? I'm so concerned about the issue, please put here what you know regardless the issue may be simple and stupid for you.