Blastp stopped
0
0
Entering edit mode
10 months ago

hi everyone i'm new in this, and i've a problem. i was running a blastp with the next commad line:

./blastp -query longest.pep -db uniprop.pep -num_threads 4  -max_target_seqs 1 - evalue 1e-5  -outfmt 6 -out blastpc.latifolia.cvs


it running about 10 hours and stopped, my query have about 280 thousand amin sequences (longest ORFs) from a De novo trancriptome and my db have more half million proteins, and the outfile have only about 55 thousand matches

why don´t do all matches and why stopped at 50 thuosand?

i'm using a Lenovo desktop.

thanks

blastp • 967 views
1
Entering edit mode

Is this on a personal machine (i.e. you are the only user)? Did you run out of disk space to where the output was being written to? Since 55K sequences are in the output this was obviously working (so you must have enough RAM to run this search).

0
Entering edit mode

Yes, I'm the only user. My machine has 16 GB of ram and a processor intel® Core™ i7-10700 CPU @ 2.90GHz × 16

1
Entering edit mode

run df -h . in the same directory you running blast from. I am a bit concerned because of ./blastp so you running the query in the same directory where blast is installed. that is maybe not a good idea, although I doubt it is the reason for blast stopping.

1
Entering edit mode

Also, what does blast stopped mean? Has it not produced more output for a while? Possibly, it is just processing output in chunks and will continue after a while. Blasting 280k sequences might take much longer than 10 hours, so it might be worth to just wait.

0
Entering edit mode

everything stopped, only gave one file.

2
Entering edit mode

Way you are running the search you are only going to get one file. You will need to run -outfmt 7 to include queries that did not produce any hits. So 55K entries you are observing are likely those that actually produced a hit. Your search may have actually completed in 10 h.

1
Entering edit mode

I agree, looks like everything went just "fine" and the process finished. Use -outfmt 7 or 0 to see all queries. 55k out of 280k seems rather low. I would try blastx instead, it will take longer but the result should be more robust to frameshifts and fragmented transcripts than a longest-ORF approach. If you want to reduce runtime, you could reduce the assembly to just the longest isoform per gene (unigene). In fact, I would prefer to use longest isoform -> blastx over all -> translation -> blastp any time.

0
Entering edit mode

i'm going to try with your recomendations

0
Entering edit mode

Please confirm if above was true i.e. job did finish properly.

0
Entering edit mode

I did a blastx with the transcriptome versus uniprot, I got 97k hit

0
Entering edit mode

That is an improvement indeed. Do you think you could blast against all NR too?