How to convert UCSC ID to gene symbol
2
1
Entering edit mode
5.6 years ago

Hi all!

I am trying to do RNA seq analysis using STAR and hg19 from UCSC. I notice that the GTF file of UCSC only contains UCSC ID but no gene symbol. Does anyone knows how to convert UCSC ID to gene symbol after I finish my analysis? I tried DAVID and HGNC but can't find any match.

Would you guys recommend to use GENCODE genome and GTF? While I also notice that in the GENCODE release 19, there are only genome sequence on all region and also GTF on all region. I want to focus on primary chromosome, so those scaffold may not be helpful. What if I use genome on all region while using the GTF only on primary chromosome? Will that cause some issues?

Also, is there any difference between hg19 from UCSC and from GENCODE?

Thanks a lot!

RNA-Seq • 11k views
ADD COMMENT
3
Entering edit mode
5.6 years ago
jotan ★ 1.2k

That's a lot of questions. I'll answer the first one for you.

UCSC Table browser

Select Genes and Gene Prediction Tracks under "Group"

Click on - identifiers (names/accessions): Paste/upload your list.

On "Output Format", selected fields from primary and related tables.

Click on get output.

Check the fields you need. Make sure you check "Gene Symbol".

ADD COMMENT
0
Entering edit mode

Make sure you set track to UCSC genes

and If you are only interested in IDs set table to kgXref

ADD REPLY
2
Entering edit mode
5.6 years ago

using ucsc/mysql:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select * from kgXref limit 3\G'
*************************** 1. row ***************************
       kgID: uc001aaa.3
       mRNA: NR_046018
       spID: 
spDisplayID: 
 geneSymbol: DDX11L1
     refseq: NR_046018
    protAcc: 
description: Homo sapiens DEAD/H (Asp-Glu-Ala-Asp/His) box helicase 11 like 1 (DDX11L1), non-coding RNA.
    rfamAcc: 
   tRnaName: 
*************************** 2. row ***************************
       kgID: uc010nxr.1
       mRNA: AM992878
       spID: 
spDisplayID: 
 geneSymbol: DDX11L1
     refseq: 
    protAcc: 
description: Homo sapiens DEAD/H (Asp-Glu-Ala-Asp/His) box helicase 11 like 1 (DDX11L1), non-coding RNA.
    rfamAcc: 
   tRnaName: 
*************************** 3. row ***************************
       kgID: uc010nxq.1
       mRNA: AM992880
       spID: B7ZGX9
spDisplayID: B7ZGX9_HUMAN
 geneSymbol: DDX11L1
     refseq: 
    protAcc: 
description: Homo sapiens DEAD/H (Asp-Glu-Ala-Asp/His) box helicase 11 like 1 (DDX11L1), non-coding RNA.
    rfamAcc: 
   tRnaName: 
ADD COMMENT
0
Entering edit mode

Here is a thing. Let's say I have a gene list of interest in UCSC ID. How can I convert them into gene name? (Typing them in one SQL command is not plausible.) I expected to get the result in three columns. Maybe like this:

UCSCID GeneSymbol mRNAID

uc001aaa.3 DDX11L1 NR_046018

...

Or do you recommend to write a shell script to do multiple SQL searching? Like one entry at a time and store the output in the format that I expecte.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6