what's sequences are in the precomputed human_genomic database
1
0
Entering edit mode
4.6 years ago
nkinney06 ▴ 90

I recently installed blast and downloaded the precomputed human_genomic.*tar.gz database available here:

ftp://ftp.ncbi.nlm.nih.gov/blast/db/

I tested my installation with the following fasta file:

cat test_query.fa 
>chr13:83987454-83987503
GCTGGGTGGTCAGCGCTGGTTCCATGGGCAGTAATGATTTCCTCTGTTTT

when I blast against my local database I see the primary assembly but also many additional hits:

>NC_000013.11 Homo sapiens chromosome 13, GRCh38.p7 Primary Assembly <- matches my test query
Query  1         GCTGGGTGGTCAGCGCTGGTTCCATGGGCAGTAATGATTTCCTCTGTTTT  50
Sbjct  83987454  GCTGGGTGGTCAGCGCTGGTTCCATGGGCAGTAATGATTTCCTCTGTTTT  83987503

>NT_024524.15 Homo sapiens chromosome 13 genomic scaffold, GRCh38.p7 Primary 
Query  1         GCTGGGTGGTCAGCGCTGGTTCCATGGGCAGTAATGATTTCCTCTGTTTT  50
Sbjct  65579348  GCTGGGTGGTCAGCGCTGGTTCCATGGGCAGTAATGATTTCCTCTGTTTT  65579397

>GL583019.1 Homo sapiens unplaced genomic scaffold scaffold_39, whole genome 
Query  1       GCTGGGTGGTCAGCGCTGGTTCCATGGGCAGTAATGATTTCCTCTGTTTT  50
Sbjct  731735  GCTGGGTGGTCAGCGCTGGTTCCATGGGCAGTAATGATTTCCTCTGTTTT  731686

>Lots more results...

My question is what is the source of all additional sequences that this blast database uses?

I have looked at the README (available at ftp://ftp.ncbi.nlm.nih.gov/blast/db/README) but the information there is not very thorough. Is there a complete list of what's in this database? Thanks!

blast • 992 views
ADD COMMENT
1
Entering edit mode
4.6 years ago
GenoMax 125k

See if this helps:

Capture

ADD COMMENT
0
Entering edit mode

this is better than the README file but when I use blast is says

Effective search space used: 1344767968614
  Database: NCBI genome chromosomes - human
    Posted date:  Jul 19, 2017  11:08 PM
  Number of letters in database: 64,036,671,579
  Number of sequences in database:  3,505

Perhaps the database also includes some older assemblies and unplaced contigs?

ADD REPLY
1
Entering edit mode

Take a look to see what is included using this command:

blastdbcmd -db human_genomic -entry all -outfmt %i%t

That said NCBI is offering something different on their human genome blast page where I captured the above screenshot from.

ADD REPLY

Login before adding your answer.

Traffic: 1235 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6