Entering edit mode

2.2 years ago

ricardoguerreiro2121
▴
60

Hello,

A simple question. What is the sequence identity between 2 sequences when one is much larger than the other?

Example:

```
seq1: -------------------AGTGTGAAAAAGGT----------------
seq2: ATATATGCGCATGGTAATAAGTGTGAAAAAGGTTATATGCGCATAAGGT
```

The smaller sequence corresponds 100% to a subset of the bigger one. Do they have 100% identity? Or rather something like 30%, as seq1 corresponds to 30% of seq2?

The reason why I ask this is that I am filtering an alignment of two assemblies of the same genome (with nucmer/mumer) and I can filter out aligned contigs based on identity.

Thank you,

Ricardo

Would have say that, if you look at seq1 it has 100% identity on 100% of its length, if you look at seq2 it has 100% identity on 30% of its length, it's a point a view

I would say seq1 is 100% identical to seq2, while seq2 is only 30% identical to seq1 .

unfortunately heavily depending on how you look at this

This is a relevant blog post: https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity

Great, that's it, thanks! It depends on what is the query and what is the reference. Thanks! (If you write it as an answer instead of a comment I'll accept it)

It also depends on whether you use global or local alignment.