Question: Sequence identity between sequences with different lengths
1
gravatar for ricardoguerreiro2121
22 months ago by
Germany
ricardoguerreiro212160 wrote:

Hello,

A simple question. What is the sequence identity between 2 sequences when one is much larger than the other?

Example:

seq1:  -------------------AGTGTGAAAAAGGT----------------
seq2:  ATATATGCGCATGGTAATAAGTGTGAAAAAGGTTATATGCGCATAAGGT

The smaller sequence corresponds 100% to a subset of the bigger one. Do they have 100% identity? Or rather something like 30%, as seq1 corresponds to 30% of seq2?

The reason why I ask this is that I am filtering an alignment of two assemblies of the same genome (with nucmer/mumer) and I can filter out aligned contigs based on identity.

Thank you,
Ricardo

ADD COMMENTlink modified 22 months ago by Bastien Hervé4.9k • written 22 months ago by ricardoguerreiro212160
1

Would have say that, if you look at seq1 it has 100% identity on 100% of its length, if you look at seq2 it has 100% identity on 30% of its length, it's a point a view

ADD REPLYlink written 22 months ago by Bastien Hervé4.9k
1

I would say seq1 is 100% identical to seq2, while seq2 is only 30% identical to seq1 .

unfortunately heavily depending on how you look at this

ADD REPLYlink written 22 months ago by lieven.sterck9.0k
1

This is a relevant blog post: https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity

ADD REPLYlink written 22 months ago by WouterDeCoster44k

Great, that's it, thanks! It depends on what is the query and what is the reference. Thanks! (If you write it as an answer instead of a comment I'll accept it)

ADD REPLYlink written 22 months ago by ricardoguerreiro212160
2

It also depends on whether you use global or local alignment.

ADD REPLYlink written 22 months ago by Benn8.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1124 users visited in the last hour