I went to the blast ftp database ftp://ftp.ncbi.nlm.nih.gov/blast/db/, there are 18 nt files, each is less than 800 MB, and for refseq_genome it has 83 files, most of which are larger than 800 MB, which means the refseq_genome is much larger than nt database. However, when I search the definition of nt on http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml, it says nt database include All GenBank + RefSeq Nucleotides + EMBL + DDBJ + PDB sequences (excluding HTGS0,1,2, EST, GSS, STS, PAT, WGS). No longer "non-redundant".
My question is:
1. In my understanding RefSeq Nucleotides should include refseq_genome and refseq_rna, so refseq_genome should be much smaller than nt database. why is refseq_genome alone is much larger than the whole nt database?
2. I tried one accession number NZ_AARG01000001.1 from refseq bacteria genome, and blastn against nt and refseq_genome database. For nt case, it took a few seconds and got less than 10 hits. For refseq_genome database, it took more than 10 minutes and got more than 100 results (all the accession number began with NZ). Then I searched NZ and found NZ represent not completed project. So the difference between nt and refseq_genome is that nt doesn't include NZ records?