Question: Duplicate gene symbols converted from UCSC gene id
0
gravatar for gndy90
4.0 years ago by
gndy900
China
gndy900 wrote:

Hi,

I am a beginner in RNASeq. I am running Cufflinks on my human cell transcriptome analysis. Finally, CummeRund gave me the UCSC gene id of differentiallyexpressed genes, not the gene symbols. So I converted the gene id to gene symbols by UCSC Genome Browser. My question is:

  1. I submitted 2900 gene ids, it gave me about 3000 gene_id-gene_symbol pairs. About 100 new gene ids were added. What's the reason for this?

  2. My downstream analysis do not allow for duplicate gene symbols. What should I do about the duplicates? I searched in Biostars, found that the different gene ids corresponding to one common gene symbol are different haplotypes of the gene. Should I just add up the expression values with the same gene symbol?

Thanks.

ADD COMMENTlink modified 17 months ago by RamRS25k • written 4.0 years ago by gndy900

Would it be possible to post an example gene_id with multiple gene_symbols?

ADD REPLYlink written 4.0 years ago by geek_y10k

Sure. But I think It should be 'gene symbol with multiple gene IDs'.

Here is two examples:

gene id       gene symbol
uc001ajr.3    TNFRSF14
uc001ajt.1    TNFRSF14
uc001aju.3    FAM213B
uc001ajw.2    FAM213B
ADD REPLYlink modified 17 months ago by RamRS25k • written 4.0 years ago by gndy900

Could they not be transcript variants (isoforms) of the same gene?

ADD REPLYlink written 4.0 years ago by andrew.j.skelton735.9k

But I did gene level differential expression analysis in cuffdiff and cummeRbund.

Here is my hg19 GTF file format:

chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 12613 12721 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
chr1 hg19_knownGene exon 12646 12697 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";

No gene symbol within it. Is that correct?

ADD REPLYlink modified 17 months ago by RamRS25k • written 4.0 years ago by gndy900
0
gravatar for geek_y
4.0 years ago by
geek_y10k
Barcelona
geek_y10k wrote:

For this particular example,the gene_id is same as transcript_id. The gene_id column in mandatory for a gtf format, hence ucsc just added the transcript_id as gene_id. So they are different transcripts of a Gene.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by geek_y10k

Solved! I used the incomplete gtf file exported from table browser. Thank you !

ADD REPLYlink written 4.0 years ago by gndy900
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1586 users visited in the last hour