What factors cause amino acid sequence lengths to differ for the same gene product between the uniprot & ensembl databases?
0
0
Entering edit mode
4.8 years ago
jmfriedman7 ▴ 10

Although the amino acid sequence lengths match between uniprot & ensembl databases for most gene products in the human proteome, it seems that there are a few thousand gene products for which the length differs, some apparently by very large amounts. (FILES: ensembl:Homo_sapiens.GRCh38.pep.all.fa ; uniprot:UP000005640_9606.fasta.gz; a few genes with alias names renamed to match the current standard list)

I was trying to pick which of the two sequence lengths from the different databases were more likely to be reliable & was wondering about literature references with details about the factors causing such differences between the two. I imagine that since this is such a fine point, the details are likely to be buried somewhere.

Maybe I should just ignore genes for which there are large differences.

Out of 3937 proteins for which the database chain lengths differ, the uniprot sequence is longer for 343 of them and ensembl is longer for 3594 of them.

As mentioned, the amino acid sequence lengths are actually identical for most of the proteins, but here is small portion of the list of 3937 with amino acid sequence length differences:

ACSF3 576 U 646 E; ACSL5 683 U 739 E; ACSL6 697 U 722 E; ACSS2 701 U 714 E; ACTN1 892 U 930 E; ACTN3 901 U 944 E; ACTR2 394 U 399 E; ACTR3C 210 U 222 E; ACVR1B 505 U 546 E; ACVRL1 503 U 517 E; ACYP1 99 U 129 E; ACYP2 99 U 172 E; ADAD2 583 U 665 E; ADAM20 726 U 776 E; ADAM7 754 U 776 E; ADAMTS14 1223 U 1226 E; ADAMTS19 1207 U 1213 E; ADAMTSL2 951 U 1060 E; ADAMTSL4 1074 U 1097 E; ADAMTSL5 481 U 471 E; ADAP1 374 U 385 E; ADAP2 381 U 387 E; ADAT3 351 U 367 E; ADCK1 530 U 523 E; ADCY3 1144 U 1145 E; ADCYAP1R1 468 U 524 E; ADD1 737 U 768 E; ADGRA1 560 U 1279 E; ADGRD1 874 U 906 E; ADGRD2 963 U 972 E; ADGRE2 823 U 831 E; ADGRF4 695 U 752 E; ADGRG6 1221 U 1250 E; ADGRL2 1459 U 1474 E; ADGRL3 1447 U 1580 E; ADH4 380 U 399 E; ADH6 368 U 375 E; ADH7 386 U 394 E; ADIG 80 U 197 E; ADM5 389 U 153 E;

sequence genome • 663 views
ADD COMMENT

Login before adding your answer.

Traffic: 2675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6