Question: blastp word_size parameter seems to be ignored - BLASTP 2.4.0+
0
gravatar for matthys.potgieter
3.3 years ago by
South Africa
matthys.potgieter0 wrote:

Hi Everyone.

I am trying to blast many short peptide sequences against a protein database. I am looking for nearly exact matches and selected word_size of 4.

However, there are some hsp matches where the longest stretch of consecutive amino acids that are identical between query and subject is 3. Please can someone clarify why this is, as I though every hsp match sequence should have at least one stretch of identical consecutive amino acids equal or greater than the word_size.

Here is my command that demonstrates this:

blastp -query testPeptide.fasta -matrix PAM30 -outfmt 5 -word_size 4 -subject test.fasta

Query:

">peptide FTDFQGGV"

Subject:

">S507_scaffold13_size114854|S507_scaffold13_size114854_recno_56.0|(+)20770:21546 WVVVDRGVDRGARRAAGSGMQLRPPSGVLHAGAGTAQPVGSAPLAVLITGHDLEPIAAQV TGLAELDRLAKHPGAARPPIGHVPDCPHRAGSPDLAGGDDTGGVVQQGAQRTGRCRRGAQ RRRNDAKTQHARSRRREFEHITPRDRHMPQGTTKTTTVTLVSVVTDASHWQNTCMRPYRH RCGLGQAASPCDHYYGVIAYAPNGAMGKIVAPPHSRPGGYRRIRTLRRLSCKVLSNFTNY HGGVRRSRPLAEPGRATS"


http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
  <BlastOutput_program>blastp</BlastOutput_program>
  <BlastOutput_version>BLASTP 2.4.0+</BlastOutput_version>
  <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a n
ew generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db></BlastOutput_db>
  <BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
  <BlastOutput_query-def>peptide <unknown description=""></BlastOutput_query-def>
  <BlastOutput_query-len>8</BlastOutput_query-len>
  <BlastOutput_param>
    <Parameters>
      <Parameters_matrix>PAM30</Parameters_matrix>
      <Parameters_expect>10</Parameters_expect>
      <Parameters_gap-open>9</Parameters_gap-open>
      <Parameters_gap-extend>1</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-ID>Query_1</Iteration_query-ID>
  <Iteration_query-def>peptide <unknown description=""></Iteration_query-def>
  <Iteration_query-len>8</Iteration_query-len>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
  <Hit_id>S507_scaffold13_size114854|S507_scaffold13_size114854_recno_56.0|(+)20770:21546</Hit_id>
  <Hit_def>S507_scaffold13_size114854|S507_scaffold13_size114854_recno_56.0|(+)20770:21546 Six_Frame_ORF</Hit_def>
  <Hit_accession>Subject_1</Hit_accession>
  <Hit_len>258</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>20.5747</Hsp_bit-score>
      <Hsp_score>41</Hsp_score>
      <Hsp_evalue>0.000156356</Hsp_evalue>
      <Hsp_query-from>1</Hsp_query-from>
      <Hsp_query-to>8</Hsp_query-to>
      <Hsp_hit-from>237</Hsp_hit-from>
      <Hsp_hit-to>244</Hsp_hit-to>
      <Hsp_query-frame>0</Hsp_query-frame>
      <Hsp_hit-frame>0</Hsp_hit-frame>
      <Hsp_identity>5</Hsp_identity>
      <Hsp_positive>8</Hsp_positive>
      <Hsp_gaps>0</Hsp_gaps>
      <Hsp_align-len>8</Hsp_align-len>
      <Hsp_qseq>FTDFQGGV</Hsp_qseq>
      <Hsp_hseq>FTNYHGGV</Hsp_hseq>
      <Hsp_midline>FT+++GGV</Hsp_midline>
    </Hsp>
  </Hit_hsps>
</Hit>
</Iteration_hits>
  <Iteration_stat>
    <Statistics>
      <Statistics_db-num>0</Statistics_db-num>
      <Statistics_db-len>0</Statistics_db-len>
      <Statistics_hsp-len>0</Statistics_hsp-len>
      <Statistics_eff-space>2064</Statistics_eff-space>
      <Statistics_kappa>0.11</Statistics_kappa>
      <Statistics_lambda>0.294</Statistics_lambda>
      <Statistics_entropy>0.61</Statistics_entropy>
    </Statistics>
  </Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>

..... More specifically:

FTDFQGGV

FT+++GGV

FTNYHGGV

Any advice would be much appreciated!

Kind regards

Thys

blast • 824 views
ADD COMMENTlink modified 3.3 years ago by genomax78k • written 3.3 years ago by matthys.potgieter0

Not a solution to this particular problem but adding -task blastp-short to the blastp command could be tested as in described in the NCBI Blast help page

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by microfuge1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1937 users visited in the last hour