Context: I´m running a bi-directional best blast hit with a transcriptomic assembly (around 49 000 sequences) and the KEGG ag protein dataset. So I have to run a blastx and then a tblastn and compare the results.
I started by running the blastx and I noticed after an hour or so that my output file was not growing. I thought this was strange so I made a simple script to monitor my blast and how much each sequence takes to be analyzed (this script simply copies a sequence to a temporary file, applies blast, and dumps the output in another file).
As I suspected some sequences got "stuck" and did not return results even after 6 ~ hours of computation. The problem does seem to be in the sequence itself, as using the NCBI Blast service with these sequences and the same database gives results in seconds.
I´m currently running tblastn in a server and I have the same issue. The difference is, when I try to use NCBI blast service I get a bad gateway error 502.
How can I solve this? Ideally, I would like to just need to run my local Blast
here is the code I used:
##Blastx makeblastdb -in KEGG_agProteins.fasta -out db/KEGG_agProteins -dbtype prot blastx -query Out2RefSeq.fasta -db db/KEGG_agProteins -outfmt 6 -out output/BLASTxout_idio2_agKO.txt -max_target_seqs 1 -max_hsps 1 -use_sw_tback -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -num_threads 4 ## tblastn makeblastdb -in C:\Users\Faculdade\Desktop\Dissertação\Dados_Illumina\expFolhas_inOut_Joana\RefSeq_outsideLeaves\Out2RefSeq\Out2RefSeq.fasta -out db\OutLeafrefseq -dbtype nucl tblastn -query KEGG_agProteins.fasta -db db/OutRefSeq -outfmt 6 -out output/tBLASTn_agKO_OuLeafRefSeq.txt -max_target_seqs 1 -max_hsps 1 -use_sw_tback -evalue 1e-10 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -num_threads 4