Gaps in Protein coding Sequenzes allowed?
1
0
Entering edit mode
6.4 years ago
dieter • 0

Hi to all, I have a short question: After aligning protein coding nucleotide sequences I sometimes find gaps within a codon. Are these gaps always sequencing errors, or can a protein coding nucleotide sequence actually have gaps in the codons? All the best Dieter

MSA protein alignment gaps • 2.7k views
ADD COMMENT
1
Entering edit mode

When aligning two sequences, you're trying to maximize some measure of similarity between them. If you allow them, gaps can sometimes improve the quality of the alignment. How to interpret these gaps depends on the context. For example, when comparing sequences from different species, you may attribute the gaps to evolution.
I am not sure what you mean by "can a protein coding nucleotide sequence actually have gaps in the codons". Nucleic acids don't have physical gaps if that's what you mean.

ADD REPLY
0
Entering edit mode

Hello, Thank you.

When aligning two sequences, you're trying to maximize some measure of similarity between them. If you allow them, gaps can sometimes improve the quality of the alignment. How to interpret these gaps depends on the context. For example, when comparing sequences from different species, you may attribute the gaps to evolution.

Yes, I knew that.

I am not sure what you mean by "can a protein coding nucleotide sequence actually have gaps in the codons". Nucleic acids don't have physical gaps if that's what you mean.

OK, I think I need to explain more detailed. A protein coding nucleotide sequence (DNA) is a sequence, which can be transcripted into a RNA, and this RNA can be translated into a protein. Such nucleotide sequence always consist of "codons" - a codon is a set of three nucleotides. You can read more about it here: https://en.wikipedia.org/wiki/Coding_region

Nucleic acids don't have physical gaps if that's what you mean.

No, that's not what I meant --> better explaination of what I menat follows below (sorry, I'm a very beginner)

All the best Dieter

ADD REPLY
0
Entering edit mode

Did you discover introns?

I'm pretty sure that Jean-Karim Heriche knows about codons, dna and transcription. I'm less sure about your biological understanding, or you need to explain it again because it doesn't make sense to me.

ADD REPLY
0
Entering edit mode

Hi, Thank you,

Did you discover introns?

No, Introns need to be excludet from a protein coding region. You only use the exons.

I'm pretty sure that Jean-Karim Heriche knows about codons, dna and transcription.

Ah, OK - I'm very sorry for explaining it. Wait - I will prepare a picture to explain what I mean. All the best Dieter

ADD REPLY
0
Entering edit mode

Hi, Ok, I hope this explains my question: Here is a typical "CDS-alignment". These are exons only, already "cleaned": LINK In the middle you see such a single gap, which I sometimes find. My question is: Is this a real gap in that sequence, which really belongs to the sequence of that species, or is it just an error which happened while sequencing? All the best and thanks for your answers. All the best Dieter

ADD REPLY
0
Entering edit mode

It depends on the context but a single nucleotide gap in a column otherwise very conserved and in a sequence that's otherwise almost identical to the others could indeed suggest a sequencing error.

ADD REPLY
0
Entering edit mode
6.4 years ago
dieter • 0

Thank you very much Jean-Karim, You wrote it depends to the context - that means some protein coding sequences can contain singel gaps in codons and some not. Right? So, how can I find out if one special region can contain gaps or not? In many studys I have the beta-tub gene (Exon 5 to 6) and the ef-1alpha (Domain I-III) - would these exons allow single gaps? (These are only 2 examples - the main question is as above). All the best Dieter

ADD COMMENT
0
Entering edit mode

Please use the 'add reply' button to reply to a comment. This keeps the discussion organized.
It's possible that a coding sequence is missing nucleotides compared to related sequences, e.g. pseudogenes. However a single mutation causing a frameshift or an inactive protein in a highly conserved sequence looks like an unlikely event although the sequence could be coming from a recent duplication. If the context (biological or otherwise) doesn't offer any indication to the contrary then I wouldn't rule out a sequencing error.

ADD REPLY
0
Entering edit mode

Thank you Jean-Karim, That answers my questin. All the best Dieter

ADD REPLY

Login before adding your answer.

Traffic: 3790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6