I've got 344,964 sequences and did a blastx against the nr database but I got 350,252 hits. There must be some redundancy in the results. How do I explain this redundancy and can I remove it from my results?
This is a misunderstanding of non-redundant: The term means there are not two or more 100% identical sequences in the database, with different headers, instead redundant sequences are collapsed and maintain headers and gi's of all identical entries.
This doesn't guarantee only a single hit per query, and it should not, think of highly conserved proteins they will have multiple hits all over the place, but these hits are not redundant. Your result contains on average ~1 hits per query, there might be queries with 0 hits and some with many. How you should treat multiple hits depends on your application. For most types of applications keeping only the best hit is not recommended.
It works, $1: your file with sequences on fasta format -db: Database -out: result on txt format -evalue: You can filter results based on min or max evalue, on this case 1e-3 -outfmt 6:, Tabular format, you can quit and get the default -max_target_seqs 1: Just one result per sequence on this case, just one in case it is equal or better than 1e-3
blastx -query $1 -db data_base.fasta -out blastx_result.txt -evalue 1e-3 -outfmt 6 -max_target_seqs 1