Question: Using BLAST to calculate all-pair sequence similarity score
gravatar for convalescence21
15 months ago by
convalescence210 wrote:

I'm trying to calculate all-pair protein sequence similarity score via blastp. Later using this as a feature for my machine learning model. So that I can measure every sequence to sequence similarity score in my proteins sequences. By using blastp with local DB, blast only reports bit-score of match that is below certain evalue threshold. So I adjusted evalue threshold to super big number (e.g. 10E10) to get all pair-to-pair similarity score. Current DB size is 10,000 and when using evalue threshold 10E10, blast reports only 500 matches with lowest bit-score of 22.

Question is, 1) Can I ignore protein sequences that are 'not found as a match' and just treat similarity score as 0(or certain value that is below lowest bit-score) 2) Or is there any way to calculate similarity score(bit-score) which currently is not found by making evalue threshold super big (10E10)

To be precise, I'm using blastp locally with command 'blastp -evalue 10E10 -query {} -out {} -db {} -outfmt {} -num_threads {}'

Thank you.

ADD COMMENTlink written 15 months ago by convalescence210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1594 users visited in the last hour