Hi all, I am facing some difficulties in blasting my de novo assembled unigenes.
I have about 85000 unigenes and was planning to blast it against the nt preformatted database from the ncbi ftp link.
I used this command
to download and check whether the db is up to date
# update_blastdb.pl --passive --decompress nt
to blast my query (query3.fa which only have 4 sequence) against the 90+ GB nt database
# blastn -db nt -query query3.fa -task blastn -dust no -outfmt "6 delim=, qacc stitle sacc evalue bitscore qcovus pident" -max_target_seqs 1 -num_threads 4 -out results.txt
my CPU is intel i5-7300hq which has 4 cores and thread, 8gb ram
However, the time taken to blast only this 4 sequence took about 30 minutes, and my whole sequence is about 85000. It would probably take about 1.5 years for me to fully blast all my sequence at this rate.
Is there no other way to speed this up other that using a more powerful CPU?
Will formatting my query file or even using the fasta version of nt database will help?
This is how my query file look like (I have already deleted a big portion of the sequence to show here)
>H42_1_(paired,_trimmed_pairs)_contig_1_consensus CATCACCTCCAAGATCCGGCTTGTGAATTCAACTTGTCGCCCGGAGGCTTCCCAAATTCT TAGACTGCGCGCCTGCCTAAGCCAGCTACCTAACAATATACCACTCTCATTGCACTCAAT GATGTCTGCAGAGTCGGCGCGCTG >H42_1_(paired,_trimmed_pairs)_contig_2_consensus GCAGAACCGAGCTTCAAGCTCCAAGATCCGGCTTTTGAATTCAACTTGTCGCCTGGAGGC TTCCCAAATTCTTAGACTGCGCGCCTGCCTGAGCCAGCTACTTAACAATATACCACCCCC ATTGAACTCAATGATGTCTCAATCGAACGTGTAAGGCTTGGAGCTTGGAGCTTGAAGCTC GGTTC >H42_1_(paired,_trimmed_pairs)_contig_3_consensus GAGGAATATGAATCCGGATAACAATATTACAATGATGCGATGTTTAACTGCTACTGCCTC TTAACTATCAACGTCTACATAC >H42_1_(paired,_trimmed_pairs)_contig_4_consensus ACCGCCGGATGGGTCTGCAGAGAGGTTAACGAAAGTCGGTGCGGAGACGCCTTTCTCGCC GCCGATA
Thank you very much in advance!