Entering edit mode
8 months ago
Pac314
▴
10
Hi, I am trying to annotate my list of gene IDs, some of which have multiple loci, e.g.:
refseq_mrna hgnc_symbol gene_biotype chromosome_name start_position end_position
7 NM_000076 CDKN1C protein_coding HSCHR11_1_CTG7 115392 118091
8 NM_000076 CDKN1C protein_coding 11 2883213 2885775
What is the recommended practice for collapsing gene annotation with multiple entries of genes with alternative loci for a given gene?
Where did this annotation originate from?
CDKN1C seems to be annotated only at one location. https://www.ncbi.nlm.nih.gov/gene/1028/ and https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:1786
Thanks for your reply. I obtained this annotation using the biomaRt R library:
I have multiple gene IDs with different chromosome names like the above example.
You might look for a canonical gene isoform. UCSC may have some useful advice: https://genome.ucsc.edu/FAQ/FAQgenes.html#singledownload