Question: How to retrieve Blast nr database dimensions?
0
Macspider • 3.0k wrote:
Hi guys, sorry if I look dumb (I do feel that way now).
I am going through info pages, citations and database metrics of Blast since this morning because I need to find these infos:
- Number of sequences in the non-redundant protein database (nr)
- Number of sequences in the non-redundant nucleotide database (nt)
I am a little desperate. I don't think that the number of sequences in NCBI-Genbank and the number of sequences in NCBI-Protein are the same as in nr and nt, because they're probably redundant. Am I wrong? Do you know where could I get such numbers?
EDIT: I've been suggested a workaround elsewhere so I post it for future readers. The database folder contains the information in a file.
- nt.nal for the nucleotide nr
- nr.pal for the protein nr