Question: Sequence identity between sequences with different lengths

1

ricardoguerreiro2121 •

**60**wrote:Hello,

A simple question. What is the sequence identity between 2 sequences when one is much larger than the other?

Example:

```
seq1: -------------------AGTGTGAAAAAGGT----------------
seq2: ATATATGCGCATGGTAATAAGTGTGAAAAAGGTTATATGCGCATAAGGT
```

The smaller sequence corresponds 100% to a subset of the bigger one. Do they have 100% identity? Or rather something like 30%, as seq1 corresponds to 30% of seq2?

The reason why I ask this is that I am filtering an alignment of two assemblies of the same genome (with nucmer/mumer) and I can filter out aligned contigs based on identity.

Thank you,

Ricardo

ADD COMMENT
• link
•
modified 22 months ago
by
Bastien Hervé ♦

**4.9k**• written 22 months ago by ricardoguerreiro2121 •**60**
Would have say that, if you look at seq1 it has 100% identity on 100% of its length, if you look at seq2 it has 100% identity on 30% of its length, it's a point a view

4.9kI would say seq1 is 100% identical to seq2, while seq2 is only 30% identical to seq1 .

unfortunately heavily depending on how you look at this

9.0kThis is a relevant blog post: https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity

44kGreat, that's it, thanks! It depends on what is the query and what is the reference. Thanks! (If you write it as an answer instead of a comment I'll accept it)

60It also depends on whether you use global or local alignment.

8.0k