I'm trying to reproduce work from a paper. I have expression values from multiple cell lines identified by the gene symbol taken from the gencode v.10 annotation.
In my script I assume that all transcripts belonging to one gene symbol should be on the same chromosome. However there seem to be exceptions (e.g DHRSX - see table below).
There are two (slightly different ids) for the same gene symbol. Does anyone know what the R in the id stands for? The two genes (?) with the same symbol have roughly the same genomic position, same strand, but are on different chromosomes.
Is this an annotation error or is there a rational argument for this annotation?
Thanks for the very helpful explanation. Will keep in mind that images are probably not the best representation for tabular data :)