Length(Ucsc/Ensgene) % 3 != 0
2
1
Entering edit mode
11.7 years ago

Before I ask the question to the UCSC mailing list: is it me or something else ?

I've noticed that some records (not all) in hg18/UCSC ensGene.txt coding for a protein have a size where length%3!=0

>ENST00000383614
ccccagacgccgacgatggggtcATGGCGCCCCGAACCCTCCTCCTGCTG
CTCTCGGGGACCCTGGCCCTGGCCGAGACCTGGGCGGCCCCCCCCAAGAC
ACACGTGACCCacccccctctctgaacatgaggcataa

echo -n ATGGCGCCCCGAACCCTCCTCCTGCTGCTCTCGGGGACCCTGGCCCTGGCCGAGACCTGGGCGGCCCCCCCCAAGACACACGTGACCC | wc -c
88


but 88%3!=0

is it an error from the UCSC or am I missing something ?

protein ucsc cdna translation sequence • 2.0k views
0
Entering edit mode

this sequence you posted has a stop codon in position 87 of the nucleotide seq (84 starting counting from 0).

0
Entering edit mode
0
Entering edit mode

yes, that was an error "We have determined that the data as originally incorporated into the track was strangely annotated and Ensembl has since corrected the error. The track on our side will be updated (and this data corrected) at the next update"

2
Entering edit mode
11.7 years ago

It may be an error in the annotation: there are many, I can assure you. A while ago, the ensembl's maintainer made disappear a gene that I was studying, as they merged its transcript with another gene.

Notice that the sequence you posted has a stop codon in position 87 of the nucleotide seq (84 if you start counting from 0).

By the way, the sequence you posted belong to a MHC chain, a gene which is well known for its variability and for generating a lot of transcripts.

0
Entering edit mode

Agree that this is a very variable region with several transcripts and a pseudogene.

2
Entering edit mode
11.7 years ago
Neilfws 49k

The same transcript at ensembl.org has length = 87 bp and a slightly different 3' sequence. I wonder if this is related to UCSC sequences having zero-based starts (i.e. first base = 0)?

0
Entering edit mode

Forgot to add that this comes from the latest ensembl, whereas your data are from HG18.