I am a beginner in RNASeq. I am running Cufflinks on my human cell transcriptome analysis. Finally, CummeRund gave me the UCSC gene id of differentiallyexpressed genes, not the gene symbols. So I converted the gene id to gene symbols by UCSC Genome Browser. My question is:
I submitted 2900 gene ids, it gave me about 3000
gene_id-gene_symbolpairs. About 100 new gene ids were added. What's the reason for this?
My downstream analysis do not allow for duplicate gene symbols. What should I do about the duplicates? I searched in Biostars, found that the different gene ids corresponding to one common gene symbol are different haplotypes of the gene. Should I just add up the expression values with the same gene symbol?