Hello everybody, I am working with annotated gene expression databases and software from different sources that I often need to analyze together, for example, microarray, RNA-seq, and iTRAQ data. Since gene annotation can change over time and between sources, I figured I need to update the annotation of all the data I work with and eventually found one R package that seems to suit my needs: mygene. However, I haven't found an appropriate way to deal with two particular situations: 1- Sometimes one gene symbol will get updated to more than one 2- Sometimes two gene symbols will get updated to the same symbol
My current approach is checking for duplicated gene symbols in the updated annotations and eliminating all duplicates but the lowest gene id. However, I have no solid base to support that decision, How do other people deal with similar situations?
If you have genome build/coordinate information available why not use a trusted external annotation source such as NCBI/GENCODE (for human/mouse/rat only)/Ensembl instead and avoid these types of issues.