Question: cannot make BLAST database using makeblastdb command line tool
0
gravatar for b10hazard
21 days ago by
b10hazard30
United States
b10hazard30 wrote:

I'm trying to build a custom database using makeblastdb command line tool from a large fasta file (about 3.0 GB in size). Here is my command....

makeblastdb -in /Users/myname/custom_hg19.fa -dbtype nucl -title hg19 -out /Users/myname/blast_dbs/hg19

The result is...

Building a new DB, current time: 11/14/2019 15:48:40
New DB name:   /Users/myname/genome_references/blast_dbs/hg19
New DB title:  hg19
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 1 sequences in 0.00405002 seconds.

Only one sequence? Also, the resulting files are no where near the size they should be. Any ideas what I'm doing wrong?

blastn makeblastdb • 133 views
ADD COMMENTlink written 21 days ago by b10hazard30

Are you sure you have all the chromosomes there? try grep -c ">" custom_hg19.fa

ADD REPLYlink written 21 days ago by Asaf6.5k

I ran this command and it outputs 46 as the count. If I rerun it as.. grep ">" custom_hg19.fa I get...

>chr1
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr18
>chr19
>chr20
>chr21
>chr22
>chrX
>chrY
>chrM
>chr1_gl000191_random
>chr1_gl000192_random
>chr4_gl000193_random
>chr4_gl000194_random
>chr7_gl000195_random
>chr8_gl000196_random
>chr8_gl000197_random
>chr9_gl000198_random
>chr9_gl000199_random
>chr9_gl000200_random
>chr9_gl000201_random
>chr11_gl000202_random
>chr17_gl000203_random
>chr17_gl000204_random
>chr17_gl000205_random
>chr17_gl000206_random
>chr18_gl000207_random
>chr19_gl000208_random
>chr19_gl000209_random
>chr21_gl000210_random
ADD REPLYlink modified 20 days ago • written 20 days ago by b10hazard30

Weird. Is it blast+ v 2.10.0 by any chance?

ADD REPLYlink written 20 days ago by Asaf6.5k
$ makeblastdb -version
makeblastdb: 2.2.18+
Package: blast 2.2.18, build Oct 14 2008 16:26:16
ADD REPLYlink written 20 days ago by b10hazard30

Dear,

Check your input fasta file and also try to add DB name. What is your fasta file size?

ADD REPLYlink written 21 days ago by archana.bioinfo87180

Fasta file size is ~3.0GB . By DB name did you mean the -out argument? I tried -out /path/to/db/dbname and also -out dbname and neither worked.

ADD REPLYlink modified 20 days ago • written 20 days ago by b10hazard30

Hi dear,

You can do a try for a small file first. And see will you able to make blastdb for that? If yes then do a cross-check again with your fasta file also check the empty fasta header. One more check you can do like empty lines in your file.

Hoping these criteria check will help you.

ADD REPLYlink written 17 days ago by archana.bioinfo87180

if this is one chromosome, you should only have one sequence in the fasta.

you can count the number of sequences in your fasta via grep '>' custom_hg19.fa | wc -l

ADD REPLYlink written 20 days ago by konkelzach10

Yup, did that (see previous comment)

ADD REPLYlink written 20 days ago by b10hazard30
1
gravatar for Asaf
20 days ago by
Asaf6.5k
Israel
Asaf6.5k wrote:

My only advice is upgrade to 2.9.0

ADD COMMENTlink written 20 days ago by Asaf6.5k

That worked. But the system version I'm trying to mimic is still 2.2.18, which is why I was using that version to begin with. Looks like I'll have to talk to my system admin... Thanks for the help!

ADD REPLYlink written 20 days ago by b10hazard30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1883 users visited in the last hour