Question: How to solve the different number of hits from tblastn when I use the option -parse_seqids in makeblastdb command?
0
gravatar for carina2817
4 months ago by
carina281710
carina281710 wrote:

Hello,

I am running tblastn and I am getting different hit numbers depending on if I add the option -parse_seqids to makeblastdb .I initially ran the following command to make the database:

makeblastdb -in GCA_003024985.1_Erow_1.0_genomic_Euperipatoides_rowelli.fna -dbtype nucl

then I ran tblastn with the next command:

tblastn -db GCA_003024985.1_Erow_1.0_genomic_Euperipatoides_rowelli.fna -query S_cerevisiae_all_prot_uniq_join.fa -out S_cerevisiae_all_prot_E_rowelli_tblastn_ful_test_original.out -outfmt '6 qseqid qgi qlen sseqid bitscore length pident qcovs evalue qstart qend sstart send qseq sseq' -num_threads 20

The output has 194 hits.

But then I was trying to retrieve the hits DNA sequences from tblastn to run the reciprocal blast test and I had an error which I solved adding the option -parse_seqids to makeblastdb [https://github.com/lindenb/jvarkit/issues/134]:

makeblastdb -in GCA_003024985.1_Erow_1.0_genomic_Euperipatoides_rowelli.fna -dbtype nucl -out blastdb_E_rowelli -parse_seqids

and I ran tblastn again with this new database:

tblastn -db blastdb_E_rowelli -query S_cerevisiae_all_prot_uniq_join.fa -out S_cerevisiae_all_prot_E_rowelli_tblastn_ful_test.out -outfmt '6 qseqid qgi qlen sseqid bitscore length pident qcovs evalue qstart qend sstart send qseq sseq' -num_threads 20

This time I am getting 195 hits. I am working with many genomes and I have this problem with some of them (the biggest difference is 20 hits). Do you have any idea how to correct this or which output I should select?

Thanks.

Paola

blast tblastn makeblastdb • 268 views
ADD COMMENTlink modified 3 months ago • written 4 months ago by carina281710

Can you confirm that makeblastdb -parse_seqids does not give a warning like "duplicate accessions found" or something like that

ADD REPLYlink modified 4 months ago • written 4 months ago by gb1.3k

I don't get any warning message

ADD REPLYlink written 4 months ago by carina281710
1
gravatar for carina2817
3 months ago by
carina281710
carina281710 wrote:

It comes out the problem was not the "-parse_seqids" option, I discovered that every time I ran blast with some of the genomes I am using the results file had a different number of hits and this happens (with some genomes) in all blast versions after 2.3.0. I sent an e-mail to blast support and they told me that's a new bug and that they would make a report, the problem is produced when using many processors, using 1 processor does not produce the problem and the results must be consistent at some number of processors above 1.

ADD COMMENTlink written 3 months ago by carina281710

oh! thanks for updating your post

ADD REPLYlink written 3 months ago by gb1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1175 users visited in the last hour