Question

Is there a way to find the percent similarity just like percent identity in BLAST?

0

Entering edit mode

8.5 years ago

Neha shri ▴ 30

I am using standalone BLAST, version 2.2.26 for which I have a query sequence and a locally created database. The sequences in the database share a 65 percent identity with the query.

Unlike identity I want the database to be sorted on the basis of similarity like A= (V,L,I,M) and not A=A. Hope I am making myself clear. Will really appreciate any help. Thank you in advance

sequence blast • 12k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.5 years ago by Neha shri ▴ 30

1

Entering edit mode

"A= (V,L,I,M) and not A=A. Hope i am making myself clear."

You're not.

ADD REPLY • link 8.5 years ago by 5heikki 11k

1

Entering edit mode

Sorry for not being clear. In case of identity, the program searches for exact matches, for example- If (A) Alanine is replaced by only Alanine then it is a match otherwise not. But another case could be - if A (Alanine) gets substituted by any other hydrophobic residues ( ex-V,L,I,M) then it is also considered a match since they share similar characteristics. Is there a way to find those matches (the later case) in form of percentage?

ADD REPLY • link 8.5 years ago by Neha shri ▴ 30

1

Entering edit mode

Yes, there is. I suppose the used substitution matrix affects these numbers..

-outfmt <String>
   alignment view options:
     0 = pairwise,
     1 = query-anchored showing identities,
     2 = query-anchored no identities,
     3 = flat query-anchored, show identities,
     4 = flat query-anchored, no identities,
     5 = XML Blast output,
     6 = tabular,
     7 = tabular with comment lines,
     8 = Text ASN.1,
     9 = Binary ASN.1,
    10 = Comma-separated values,
    11 = BLAST archive format (ASN.1)

   Options 6, 7, and 10 can be additionally configured to produce
   a custom format specified by space delimited format specifiers.
   The supported format specifiers are:
           qseqid means Query Seq-id
              qgi means Query GI
             qacc means Query accesion
          qaccver means Query accesion.version
             qlen means Query sequence length
           sseqid means Subject Seq-id
        sallseqid means All subject Seq-id(s), separated by a ';'
              sgi means Subject GI
           sallgi means All subject GIs
             sacc means Subject accession
          saccver means Subject accession.version
          sallacc means All subject accessions
             slen means Subject sequence length
           qstart means Start of alignment in query
             qend means End of alignment in query
           sstart means Start of alignment in subject
             send means End of alignment in subject
             qseq means Aligned part of query sequence
             sseq means Aligned part of subject sequence
           evalue means Expect value
         bitscore means Bit score
            score means Raw score
           length means Alignment length
           pident means Percentage of identical matches
           nident means Number of identical matches
         mismatch means Number of mismatches
         positive means Number of positive-scoring matches
          gapopen means Number of gap openings
             gaps means Total number of gaps
-->             ppos means Percentage of positive-scoring matches
           frames means Query and subject frames separated by a '/'
           qframe means Query frame
           sframe means Subject frame
             btop means Blast traceback operations (BTOP)
          staxids means Subject Taxonomy ID(s), separated by a ';'
        sscinames means Subject Scientific Name(s), separated by a ';'
        scomnames means Subject Common Name(s), separated by a ';'
       sblastnames means Subject Blast Name(s), separated by a ';'
                (in alphabetical order)
       sskingdoms means Subject Super Kingdom(s), separated by a ';'
                (in alphabetical order)
           stitle means Subject Title
       salltitles means All Subject Title(s), separated by a '<>'
          sstrand means Subject Strand
            qcovs means Query Coverage Per Subject
          qcovhsp means Query Coverage Per HSP
   When not provided, the default value is:
   'qseqid sseqid pident length mismatch gapopen qstart qend sstart send
   evalue bitscore', which is equivalent to the keyword 'std'
   Default = `0'

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by 5heikki 11k

0

Entering edit mode

Thank you so much.

ADD REPLY • link 8.5 years ago by Neha shri ▴ 30

0

Entering edit mode

Sir, could you please clarify the significance of positive scoring matches. I searched through a bit, couldnt find anything. Would be grateful

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by Neha shri ▴ 30

0

Entering edit mode

Check these slides (7th page).

ADD REPLY • link 8.5 years ago by 5heikki 11k

0

Entering edit mode

Link is not working.

ADD REPLY • link 8.5 years ago by Neha shri ▴ 30

0

Entering edit mode

should work now, http vs https :)

ADD REPLY • link 8.5 years ago by 5heikki 11k

0

Entering edit mode

Thank you :)

ADD REPLY • link 8.5 years ago by Neha shri ▴ 30

Ram · Answer 1 · 2015-10-12

1

Entering edit mode

8.5 years ago

cpad0112 21k

I guess you are talking about blastp. Identical residues are subset of similar residues. I am not sure if stand alone blast allows you do that.

ADD COMMENT • link 8.5 years ago by cpad0112 21k

0

Entering edit mode

Yes its blastp I was talking about. Is there any other way to do it. I tried many online servers but could not aid me much.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by Neha shri ▴ 30

0

Entering edit mode

My understanding is that OP was asking for extracting similar residues (excluding identical residues) from alignment, not just their % and/or number

ADD REPLY • link 8.5 years ago by cpad0112 21k

0

Entering edit mode

Yes, you are absolutely right. But if the extraction could be done in the form of some score or percentage,it would be more helpful. Besides finding out the synonymous mutation in the sequences is my objective. For example- how many residues in each sequence in the database have undergone synonymous mutation and how similar they are still from the query sequence.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by Neha shri ▴ 30