Using BLAST to calculate all-pair sequence similarity score
0
0
Entering edit mode
6.0 years ago

I'm trying to calculate all-pair protein sequence similarity score via blastp. Later using this as a feature for my machine learning model. So that I can measure every sequence to sequence similarity score in my proteins sequences. By using blastp with local DB, blast only reports bit-score of match that is below certain evalue threshold. So I adjusted evalue threshold to super big number (e.g. 10E10) to get all pair-to-pair similarity score. Current DB size is 10,000 and when using evalue threshold 10E10, blast reports only 500 matches with lowest bit-score of 22.

Question is, 1) Can I ignore protein sequences that are 'not found as a match' and just treat similarity score as 0(or certain value that is below lowest bit-score) 2) Or is there any way to calculate similarity score(bit-score) which currently is not found by making evalue threshold super big (10E10)

To be precise, I'm using blastp locally with command 'blastp -evalue 10E10 -query {} -out {} -db {} -outfmt {} -num_threads {}'

Thank you.

blast protein sequence sequence similarity • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6