Question

makeblastdb.exe Crashes when using a cleaned FASTA file

0

Entering edit mode

6.7 years ago

friedrichlab • 0

Using bedtools2, I extracted the CDS fasta from a GFF3 file and it's reference fasta. I then attempted to create a local database using blast command line and the 'makeblastdb' command. This failed due to a large number of duplicated sequences in the file. I then ran the fasta through the sequence cleaner python script found here:

http://biopython.org/wiki/Sequence_Cleaner

Now, with my newly cleaned fasta, I am again attempting to create a custom database. In windows command prompt, running the command:

makeblastdb -in clear_transcriptome.fa -out clear_transcriptome -dbtype nucl -parse_seqids

Causes a window to open saying "makeblastdb.exe has stopped working".

I have attempted a fresh install of Blast command line, and have successfully built a database with a different fasta file (that has worked in the past), but this error is still occurring.

Additionally, running the exact same command WITHOUT the '-parse_seqids' option successfully builds a database, but not one that I can blast against (tblastn returns "No alias or index file found for nucleotide database").

For refrenece, here is the format of the first few sequences in my clear_transcriptome.fa file (the ellipses are only for space conservation, as this post is long enough already):

>exon::Scaffold2376:18278-18883

NNNNNNNNNNNNNNNN ...

>gene::Scaffold1190:58965-85903_mRNA::Scaffold1190:58965-85903

AGAAGGTGCAGGGCTG ...

>exon::Scaffold2694:84739-84921_CDS::Scaffold2694:84739-84921

ATGAAGTTGAACGTTATA ...

>exon::Scaffold50:750576-750666_CDS::Scaffold50:750576-750666_exon::Scaffold50:750576-750666_CDS::Scaffold50:750576-750666_exon::Scaffold50:750576-750666_CDS::Scaffold50:750576-750666_exon::Scaffold50:750576-750666_CDS::Scaffold50:750576-750666

GAGCAGCACTCAGTAGAA ...

I will admit it is not the cleanest, but that is a result of the sequence cleaner python script.

Any idea as to what about the sequence ids is causing the crashes? Could it be the excessive length of some of them? If so, why would that cause this issue? If any more information is needed let me know.

Thanks in advance.

blast+ sequence blast command line • 1.7k views

ADD COMMENT • link 6.7 years ago by friedrichlab • 0

score 1 · Answer 1 · 2017-08-09

1

Entering edit mode

6.7 years ago

GenoMax 141k

Could it be the excessive length of some of them?

That could be one of the issues. You could shorten the headers by using something here: Fasta header trimming