Question: Duplicated console-version BLAST results
1
gravatar for crimsontabaq
3.8 years ago by
crimsontabaq50
Russia, Kazan
crimsontabaq50 wrote:

Hi there!

I'm blasting (blastx2.2.28) my assembly to the annotated refseq. Below - the command line I used:

blastx -query assembly.fa -out Blast -db BlastCustomDB -num_threads 8 -evalue 1e-6 -best_hit_score_edge 0.05 -best_hit_overhang 0.25 -max_hsps_per_subject 1 -max_target_seqs 1 -outfmt 6

I specified quantity of HighcoringSegmentPairs (1), but still BLAST output contains two or more HSP for some queries.

Here's the output example (it goes like query ID, subject ID, identity %, length of HSP, mismatches, gap opens ... E-value, bit score):

 TR60-c0_g1_i1  EKV52009.1  37.36   174 57  7   ...         2e-14   74.3
 TR60-c0_g1_i1  EKV52009.1  64.37   87  30  1   ...         6e-10   59.7
 TR61-c0_g1_i1  EKV51611.1  49.41   253 121 2   ...         4e-47    161
 TR64-c0_g1_i1  EKV51289.1  60.00   175 62  1   ...         9e-66    208

After examination with 'uniq' command it became clear that all duplicated results for a query each time were aligned to a single gene in different position (what is exactly HPSs).

There's the way just to remove multiple HSP, but maybe it's possible to change some parameters in blast run to get the right single-hit-single-HSP output directly?

Thanks in advance!

blast alignment • 1.4k views
ADD COMMENTlink modified 3.2 years ago • written 3.8 years ago by crimsontabaq50

Please see this related question and answer: Why Is Blast Creating Duplicates In My Output Files?! including the comment not to use tabular blast output. Does it solve your problem?

ADD REPLYlink written 3.8 years ago by Michael Dondrup47k

Unfortunately, no. Blast failed to run with the same parameters to create ASN.1 output, so I removed all parameters but -max_target_seqs, -num_threads, -evalue. I've got the same output (with duplicates) after converting ASN.1 to tabular.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by crimsontabaq50

Why did blast fail with that output format? Anyway, could you post the sequences TR60-c0_g1_i1 and EKV52009.1 so we can check it out?

ADD REPLYlink written 3.8 years ago by Michael Dondrup47k

This is the text of the error (when I ran it with parameters which I've used for tabular output):

BLAST options error: err:tried to set option (130) and value (1), line (537).

TR60-c0_g1_i1:

AATCGGGTCTTGGGTATGGTACGTTACAAGAATTTCAAGAGATATACTTTATACAGAAGA GCTATTCACAATACAACTTGAGATATATACAATAAACAGTATGCACAATGCAATAATCTA TACAACAAAGCTGATGTCAGAACCCAAAAAATTGCTGAACGAGTGCAAGCAAGCCAAAAA AAAAAGAAAGGAGAGTGATTAGTATGATGGTATGTATACAGTCCTCGTCCTCGTTTGCTC TAGTAAAGAAACACAGATGGCATGGCGAAACATCATGAAGGCGCGTCAAAATCATCGCTG TCGTCGTTCATCCGGCGTCGCTTAGCTGGTCCCGGCTGGAGGAGAGAGTCTGCAGACACG ATGCCTCTAGATTCGAGGACGTCGCGAGACACTTTGGCTGGTCCATCTGCAGGCCTTTTA GACCTAACTCGAGTTCTGCGCAAAGCGCCGAAGGCCTGAGCTTTGATCACCGCCAGAGTA CCCGACAGCGGAGCCACGCCAGGCGAAGGTGAGCGAGAGGAAGGCGCTGTTATCCGACGT GGACGAGCCTGGTGCTGACTGGAGTAGGAAGAATCGCCACCATCGCTGCCGTGCTGAGAC CCTTGCTGAGAATCATTCTCACTTCCTGCGGTACCGCTTGTTATCGGGGGAAGGAGTGGA AGTAGGCTGGTACGGTCCAAAATAGCATCCATGCGTCGATTGTTTCCACCAAGTCTGTAG CGAACTCCCCCGTCGTTTCCTGAGCTTGAGGCGATGGCGGAAATATCTGCAGTGCCACCA TGAAGAGAGATAACCTCAGCGCCGTGACTCTGTGTCGTTTGACGACGATACTCAAATTCT GGTCGATCCTGTGCGGCACCCTGCGCGTCGTTGAAGGCAAAAGGAAAAGGGGAAGTCAAA GCGGTCGTAGATGAAGAGGGTGGCGTAGATGAAGAAGGAGAAATAGATCGACTGTTTGGG AAGACGTATGGAACGACCGTGGGTTCTAAATTGTTCAAAGGTACTTTTTGGTCATCGGGA TATCGATTGCCAATGAGATTCATATCGGATGCGGAGGTGAGTGTCGGTGATTCAACAAAA TGCGGCGAATTGGAGGAGTGGCTACTGGTCTGTGTCCAAGAAGTATATGGTTGATTCTGG TGATTAAATGAGACTGAAGGTGAGGCAGGATACTGATTAAAACCCAGACTGTTATCAGGG TATTGTGTGCTGTTGAACGTCGTAGACGGCGGGGACGGGTAAGAGGGAAACTCTGATTCG GTCTCCGAATTTTGTCCTCGCTTTGGCCAAGACGCTCTCCAAAAGTTTTCTTGATTTCGT AGTGCTTGTCGCAGCTGATTGTTCTCTTGCTGTAATTCCTGAGCCTCGGTAGTCGCTTCC CGGCATGAGTCCTGAAGTACTCGGACAACTGATTCTAGGTTCACGACCGTCTCCTCAAGA GTTGTAATGTAATTTGTGCGACGAGCTCGAAAAGCCGCTTGAGCATCAGCATTCTTTTTT TTCCTTAGTGCAATATCGGTCATTCGAGAAGGATTCTGTTCATCCGACTTGGTCGATTGG TTTGTCGTCGTCGTCGAAGAGGACATGGCCACAGTGGCGATGGGGAAAGTGGGATCCTAC GAATTGCCAATAGCGGAGATCTAGAAGAGATTTAAAACACTTTGCATGAAATGCCACGAA AAGAATTCGAAGTAGGGCACGAGGTCA

EKV52009.1:

MPKSPRASPSFATVKSTPDNPVSRLSDIALRKKKNADAQAAFRARRANYIATLEETVTSLESV VLQLQDSWRESRTELQDARQELLRLQHHCRERDRFWRDLWENRRGGPTPDADDIASSPSFS PVHVHPNGLGSQIQNSQLGPYTLDGMTYRTSEDTPSQPPYAGQQDFSTTVPNPLPFNDGDVS GDASHCLGQRVEKMTYVPRFPNDDSKLALGNFEATSFTFSSDRFTDSRSLSPTSTPSSSSSTS IPSASYPYTFPDPAFTQDTFRRPSHGAEMTLHGGTADVSVIPGADAIRYRLGNRQPMSMTDRP MLPQTRPGSANESQHERGSSDAAPALSCTVAVIKAQAFGALRRTRTRTKKSTETASRVAHDVL EARGIGIGSRQRASEDDEF

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by crimsontabaq50

Does this happen only in a tabular output?

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by a.zielezinski9.2k
1

Dobry wieczór! I tried xml and ASN.1 (which is I converted to tabular again - and it gave me the same duplicated results), but it's difficult to me to read these formats, so I can't say if it's the same thing happens with other formats.


Just checked .xml output of blastx2.2.28, same duplication appeared.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by crimsontabaq50

dobryj večer! see my answer below :)

ADD REPLYlink written 3.8 years ago by a.zielezinski9.2k
3
gravatar for a.zielezinski
3.8 years ago by
a.zielezinski9.2k
a.zielezinski9.2k wrote:

In NCBI BLAST+ versions below 2.2.29, the -max_hsps_per_subject parameter does not work correctly. This parameter was further replaced with -max_hsps, which works fine.

Download the latest NCBI BLAST+ (from NCBI FTP) and you will be fine:

blastx -query assembly.fasta -db blastdb.fa -max_hsps 1 -outfmt 6

Output:

TR60-c0_g1_i1   EKV52009.1  37.356  174 57  7   809 288 273 394 1.32e-18    74.3
ADD COMMENTlink written 3.8 years ago by a.zielezinski9.2k
1

It works! with latest blast:

blastx -db BlastDB -query assembly.fa  -max_target_seqs 1 -max_hsps 1 -num_threads 8 -evalue 1e-6 -outfmt 6

This job gave me little bit more results than blast-2.2.28 output sorted with 'uniq' command, yet without any double-hsps. Dziękuję!

ADD REPLYlink written 3.8 years ago by crimsontabaq50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 790 users visited in the last hour