My copy of the nr database has about 148 million sequences which i found using blastdbcmd command using the -info option.
However when I try to extract the accession numbers using the following command:
blastdbcmd -db nr -entry all -outfmt "%a"
I am getting much more than 148 million accessions. I am guessing it is because multiple genbank accessions having the exact same sequence will be represented with only one sequence but with multiple accessions in the nr database.
What is the best way to get only one accession per sequence using blastdbcmd? I donn't care which accession it is but the number of accessions returned by blastdbcmd should match the number of sequences in it. How do I do this?