Sequence identity between sequences with different lengths

1

Entering edit mode

5.2 years ago

ricardoguerreiro2121 ▴ 80

Hello,

A simple question. What is the sequence identity between 2 sequences when one is much larger than the other?

Example:

seq1:  -------------------AGTGTGAAAAAGGT----------------
seq2:  ATATATGCGCATGGTAATAAGTGTGAAAAAGGTTATATGCGCATAAGGT

The smaller sequence corresponds 100% to a subset of the bigger one. Do they have 100% identity? Or rather something like 30%, as seq1 corresponds to 30% of seq2?

The reason why I ask this is that I am filtering an alignment of two assemblies of the same genome (with nucmer/mumer) and I can filter out aligned contigs based on identity.

Thank you,
Ricardo

sequence identity alignment filter mummer • 1.7k views

ADD COMMENT • link updated 5.2 years ago by Bastien Hervé 5.3k • written 5.2 years ago by ricardoguerreiro2121 ▴ 80

1

Entering edit mode

Would have say that, if you look at seq1 it has 100% identity on 100% of its length, if you look at seq2 it has 100% identity on 30% of its length, it's a point a view

ADD REPLY • link 5.2 years ago by Bastien Hervé 5.3k

1

Entering edit mode

I would say seq1 is 100% identical to seq2, while seq2 is only 30% identical to seq1 .

unfortunately heavily depending on how you look at this

ADD REPLY • link 5.2 years ago by lieven.sterck 15k

1

Entering edit mode

This is a relevant blog post: https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity

ADD REPLY • link 5.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Great, that's it, thanks! It depends on what is the query and what is the reference. Thanks! (If you write it as an answer instead of a comment I'll accept it)