I'm trying to calculate all-pair protein sequence similarity score via blastp. Later using this as a feature for my machine learning model. So that I can measure every sequence to sequence similarity score in my proteins sequences. By using blastp with local DB, blast only reports bit-score of match that is below certain evalue threshold. So I adjusted evalue threshold to super big number (e.g. 10E10) to get all pair-to-pair similarity score. Current DB size is 10,000 and when using evalue threshold 10E10, blast reports only 500 matches with lowest bit-score of 22.
Question is, 1) Can I ignore protein sequences that are 'not found as a match' and just treat similarity score as 0(or certain value that is below lowest bit-score) 2) Or is there any way to calculate similarity score(bit-score) which currently is not found by making evalue threshold super big (10E10)
To be precise, I'm using blastp locally with command 'blastp -evalue 10E10 -query {} -out {} -db {} -outfmt {} -num_threads {}'
Thank you.