makeblastdb with fasta error: BLAST options error
1
1
Entering edit mode
4.2 years ago

Hello! I downloaded 3 reference fasta files(IGHV, IGHD, IGHJ) on IMGT.

COMMAND like: makeblastdb –in IGHV.fasta –dbtype nucl –out IGHVdb –parse_seqids

It runs successfully when i use IGHD.fasta.

IGHD.fasta:

>J00256|IGHJ4*01|Homo sapiens|F|J-REGION|1912..1959|48 nt|3| | | | |48+0=48| | |
actactttgactactggggccaaggaaccctggtcaccgtctcctcag
>X86355|IGHJ4*02|Homo sapiens|F|J-REGION|1480..1527|48 nt|3| | | | |48+0=48| | |
actactttgactactggggccagggaaccctggtcaccgtctcctcag
>M25625|IGHJ4*03|Homo sapiens|F|J-REGION|446..493|48 nt|3| | | | |48+0=48| | |
gctactttgactactggggccaagggaccctggtcaccgtctcctcag

But when i use IGHJ.fasta, it shows: BLAST options error: IGHJ.fasta does not match input format type, default input type is FASTA.

IGHJ.fasta:

>X97051|IGHD1-1*01|Homo sapiens|F|D-REGION|33714..33730|17 nt|1| | | | |17+0=17| | |
ggtacaactggaacgac
>X13972|IGHD1-14*01|Homo sapiens|ORF|D-REGION|14518..14534|17 nt|1| | | | |17+0=17| | |
ggtataaccggaaccac
>X97051|IGHD1-20*01|Homo sapiens|F|D-REGION|62015..62031|17 nt|1| | | | |17+0=17| | |
ggtataactggaacgac

I compared IGHJ.fasta with IGHD.fasta. I didn't find obvious format difference.

I searched the fasta format online and had a try. I kept the sequence but changed the name.

So IGHJ.fasta became:

>seq1
ggtacaactggaacgac
>seq2
ggtataaccggaaccac
>seq3
ggtataactggaacgac

And it works!

So i think there must be some mistake in the sequence name, i don't know how to figure it.

Thanks for your patience!

sequence • 1.7k views
ADD COMMENT
0
Entering edit mode

Using only the first sequence entry seems already enough to reproduce the problem.

ADD REPLY
0
Entering edit mode
4.2 years ago
Michael 54k

Think you hit a bug in makeblastdb, congratulations ;)

In conclusion, the bug seems to be triggered by the ratio of long def-line/short sequence, and only if the last character in the def-line is |.

So, either update Blast or remove some trailing empty | delimited fields. That should do the trick.

Throws the error:

>X97051|IGHD1-1*01|Homo sapiens|F|D-REGION|33714..33730|17 nt|1| | | | |17+0=17| | |
gtacaactggaacgagggggggg


$ makeblastdb -in IGHJ.fasta -dbtype nucl -out IGHFdb -parse_seqids

Building a new DB, current time: 02/18/2020 11:13:55
New DB name:   ~/IGHFdb
New DB title:  IGHJ.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
BLAST options error: IGHJ.fasta does not match input format type, default input type is FASTA

Works fine, only difference is the leading T!

>X97051|IGHD1-1*01|Homo sapiens|F|D-REGION|33714..33730|17 nt|1| | | | |17+0=17| | |
Tgtacaactggaacgagggggggg

This also works, just missing the last |:

>X97051|IGHD1-1*01|Homo sapiens|F|D-REGION|33714..33730|17 nt|1| | | | |17+0=17| |
gtacaactggaacgagggggggg

Version:

makeblastdb: 2.7.1+
Package: blast 2.7.1, build Sep 20 2018 02:20:26

Identical behavior with the latest bioconda version:

makeblastdb: 2.9.0+
Package: blast 2.9.0, build May 31 2019 20:53:30

And also LATEST:

makeblastdb: 2.10.0+
Package: blast 2.10.0, build Dec  3 2019 18:03:18
ADD COMMENT
0
Entering edit mode

if you don't really need it (though I would personally also always add it ) you could avoid this behaviour by omitting the -parse_seqids parameter

but this is definitely a patch approach rather than a solid solution

ADD REPLY
0
Entering edit mode

removing -parse_seqids doesn't fix this special case.

ADD REPLY
0
Entering edit mode

noted, thx for looking into it Michael Dondrup

curious to hear what the NCBI blast team has to say for it ;)

ADD REPLY
1
Entering edit mode

They'll say: update to latest version of Blast first.

ADD REPLY
0
Entering edit mode

Just checked it, the behavior is exactly the same with version 2.9.0. Think it is time to tell NCBI support. Just point them to this post as a reference.

ADD REPLY
0
Entering edit mode

Blast is now v. 2.10.0 so please upgrade :-)

ADD REPLY
0
Entering edit mode

Yeah, but it's not available via bioconda. So, not today. Honestly, I don't think they have touched that piece of code for ages.

ADD REPLY
0
Entering edit mode

Ok, so got it, the same result for 2.10.0+

ADD REPLY
0
Entering edit mode

Thanks a lot! So there is some bug. However, missing the last | doesn't work for me. Weird!(Version: 2.9.0) I make the name shorter:

X97051|IGHD1-1*01|Homo sapiens|F|D-REGION|33714..33730|17 nt|1| |

And it works! So before NCBI fix this case, i think i need change all the names. : (

ADD REPLY

Login before adding your answer.

Traffic: 2010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6