First of all, mind that blast hits are HSPs which may be just some part of query and subject sequences not all of them. So here I just talk about these matched parts of your Q and S sequences.
Identity: "the average identity of percentage 35%" is meaningless because blast hits are independent. For example a BlastP with two hits: protein 1 against protein 2 with
pident 55% and protein 1 against 3 with
pident 15% say that protein 1 is to a high confidence homolog of protein 1, but about the homology between protein 1 and protein 3 you must be more cautious. mind that proteins are made of 20 different AAs and if you align two irrelevant protein sequences (or any other random AA sequences) with any length you will have a 5% random identity (for DNA and RNA sequences random identity is 25% since those are made of for different bases A,T,C,G). there is another parameter
ppos in BlastP which is based on similarity.
pident+(the percentage of similar but not identical AA matches). At all, I think, two AA sequences with
pident higher than 20% and
ppos higher than 30% are close enough to be called homolog. in NA sequences I think
pident 40% and above is OK.
P-Value: depends on query and DB lengths but I think p-value lower than 10^-5 shows a relation.
BitScore: Very depends on query length. Compare bitscore with your
qlen, I think if bitscore of a hit is 0.7 of qlen or greater,
subject are close enough.
modified 2.3 years ago
2.3 years ago by
utsafar • 70