Question: Error with makeblastdb from UniProt fasta file: no sequences were found
0
gravatar for mafernandez
15 months ago by
Madrid, Spain
mafernandez0 wrote:

Hello there

I am trying to create a custom database for BLAST to run a local BLAST with some of my own proetin sequences.

First, I searched for sequences from IDs in UniProt Retrieve/ID mapping tool. Then, I downloaded the output in FASTA (canonical) format.

With this file, I tried to run 'makblastdb' on BLAST 2.8.1+ standalone executable through command line in Windows 10 (ftp version I downloaded is named after 'Windows 7'):

makeblastdb -in prots.fasta -parse_seqids -blastdb_version 5 -dbtype prot -out prots

and I always get the same error:

No volumes were created because no sequences were found

I have tried several modifications on the command line, such as not using '-parse_seqids' or including/excluding '-out' option. Similarly, I have tried several modifications on my FASTA file, such as changing '|' to a lower bar or eliminating spaces (entering a lower bar instead, again) or reducing ID length... and I always got the same error.

What is intriguing me is that the command worked if I only used as input a FASTA file with only one sequence, although I only got two output files, i.e. '.pdb' and '.pdb-lock' files.

Any idea on what is going wrong? How can it be possible to have the problem with more than one sequences in the file but not with only one?

I have searched many different forums and I did not find anything similar...

Thanks a lot

ADD COMMENTlink written 15 months ago by mafernandez0

Is there any chance you could move this off windows and on to unix? If you have Win 10 you could install Windows Subsystem for Linux.

BLAST v.2.8.1 has new functionality to limit blast searches to sequence ID's (if your ID's are standard accessions) or taxID's etc. Is that something you can use?

 -seqidlist <String>
   Restrict search of database to list of SeqIDs
ADD REPLYlink written 15 months ago by genomax84k

Thanks genomax.

I think the option you suggest is not suitable for my issue, since my problem is in creating the database itself, not in searching within it.

ADD REPLYlink written 15 months ago by mafernandez0

Option I was suggesting will use nt/nr from NCBI itself to limit the search to ID's you specify.

ADD REPLYlink written 15 months ago by genomax84k

Got it. I will try to do it that way.

However, with this solution, I will skip the creation of my own database, won't I? So, if I have a sequence that is not in NCBI databases (or at least not so similar) I will lose the BLAST query to it, am I right?

I suppose this command will work also on protein databases

On the other hand, I wonder if maybe I have a problem of format in the fasta file... But I cannot find what it is!

ADD REPLYlink written 15 months ago by mafernandez0
1

It looks like your problem is trying to create v5 database indexes. If you take out -blastdb_version 5 the command seems to work on windows.

ADD REPLYlink modified 15 months ago • written 15 months ago by genomax84k

Seems to work!

Thanks a lot. I will try to continue with downstream analyses on Genome Workbench and if I have any other issue I will post it here.

Cheers!

ADD REPLYlink written 15 months ago by mafernandez0

There appears to be some strangeness with v. 5 database format. I wonder if there is a specific requirement for fasta headers that is not correctly spelled out. Glad to hear old format works and can be used in your case.

ADD REPLYlink written 15 months ago by genomax84k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1845 users visited in the last hour