Question: Interpret genome alignment results
0
gravatar for el97004
4 weeks ago by
el9700410
el9700410 wrote:

Hi all!

I assembled two different genomes and wanted to see how similar they are on both nucleotide and protein levels so I aligned their nucleotide and translated nucleotide sequences. Here are the results I obtained:

Nucleotide identity=90% Protein identity=57%

How would one make sense of this high nucleotide yet low protein identity result? I have been doing a lot of reading and it seems that if the species are close its better to use the DNA sequence to compare, and I believe these two species should be fairly close. However, I am still confused as to why the values would differ so much.

Thanks for your input!

protein alignment nucleotide • 167 views
ADD COMMENTlink modified 4 weeks ago by michael.ante3.5k • written 4 weeks ago by el9700410

There are lots of reasons for this, and all else being equal this is to be expected.

You need to clarify whether these are DNA sequences of genes or the whole genome etc.

ADD REPLYlink written 4 weeks ago by Joe15k

Sorry I should have clarified. Whole genomes!

ADD REPLYlink written 4 weeks ago by el9700410
1

It doesn't make any sense to translate the whole genome, and consequently even less to align/compare them.

ADD REPLYlink written 4 weeks ago by Joe15k
1

Exactly!! Only translate and compare protein-coding regions. For non-coding regions, DNA similarity can be high but when ERRONEOUSLY translated, the "protein" sequences could be from different frames and therefore very low similarity. Again, only translate and compare protein-coding regions.

ADD REPLYlink written 4 weeks ago by Cupton70
1
gravatar for michael.ante
4 weeks ago by
michael.ante3.5k
Austria/Vienna
michael.ante3.5k wrote:

Hi,

Little changes on nucleotide level can lead to drastic changes on protein level. In a worst case scenario, you might introduce a frame shift with a mutation in a gene's 5' region which lead to a totally different products. You'll have in such a case nearly 100%identity on nucleotide level but nearly none for the protein.

Depending on your species you have more or less "junk DNA" intergenic region, introns, etc. These non-coding regions can increase the overall nucleotide identity, but not that of the proteins.

Cheers,

Michael

ADD COMMENTlink written 4 weeks ago by michael.ante3.5k

Thank you that makes sense. But how about in less extreme case scenarios, for example if the third codon in the DNA is mutated it could have no affect on the protein sequence

ADD REPLYlink written 4 weeks ago by el9700410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1415 users visited in the last hour