Hi,
I am a beginner in RNASeq. I am running Cufflinks on my human cell transcriptome analysis. Finally, CummeRund gave me the UCSC gene id of differentiallyexpressed genes, not the gene symbols. So I converted the gene id to gene symbols by UCSC Genome Browser. My question is:
I submitted 2900 gene ids, it gave me about 3000
gene_id-gene_symbol
pairs. About 100 new gene ids were added. What's the reason for this?My downstream analysis do not allow for duplicate gene symbols. What should I do about the duplicates? I searched in Biostars, found that the different gene ids corresponding to one common gene symbol are different haplotypes of the gene. Should I just add up the expression values with the same gene symbol?
Thanks.
Would it be possible to post an example gene_id with multiple gene_symbols?
Sure. But I think It should be 'gene symbol with multiple gene IDs'.
Here is two examples:
Could they not be transcript variants (isoforms) of the same gene?
But I did gene level differential expression analysis in cuffdiff and cummeRbund.
Here is my hg19 GTF file format:
No gene symbol within it. Is that correct?