Question

I need to download a list of all human genes with their respective Ensembl gene name | transcription start site ..

1

Entering edit mode

10.2 years ago

cherifbenhamda ▴ 10

Hello,

I'm new here and and I need help please!

Actually I'm using UCSC table and I get something like this:

#hg38.knownGene.name  hg38.knownGene.chrom  hg38.knownGene.strand  hg38.knownGene.txStart  hg38.knownGene.txEnd  hg38.kgXref.geneSymbol
uc001aaa.3            chr1                  +                      11873                   14409                 DDX11L1

And I want to know if it's possible to include ensembl gene symbol with UCSC table or with another method

Thank you in advance

Cherif

ChIP-Seq gene • 13k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by cherifbenhamda ▴ 10

1

Entering edit mode

Hi! Do you tried Biomart?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by RafaelMP ▴ 120

0

Entering edit mode

I'm about to try (until now I do not know how proceed :/)

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by cherifbenhamda ▴ 10

Ram · Accepted Answer · 2014-08-27

5

Entering edit mode

10.2 years ago

RafaelMP ▴ 120

Introduction to BioMart

Mining data

Data Mining in Ensembl

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by RafaelMP ▴ 120

0

Entering edit mode

Thank you for your help!

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by cherifbenhamda ▴ 10

Ram · Accepted Answer · 2014-08-27

4

Entering edit mode

10.2 years ago

Ashutosh Pandey 12k

Unfortunately UCSC table browser doesn't have Ensembl gene track for hg38. But they do have it for hg19 and the below command should work for you.

mysql \
  --user=genome \
  -N \
  --host=genome-mysql.cse.ucsc.edu \
  -A \
  -D hg19 \
  -e "select ensGene.name, name2, chrom, strand, txStart, txEnd, value from ensGene, ensemblToGeneName where ensGene.name = ensemblToGeneName.name" > \
  output.txt

You can try the same command with hg38 but you will have to choose other gene models such as refseq or ucsc.

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Thank you very much!!! Yep it works!!

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by cherifbenhamda ▴ 10

Ram · Accepted Answer · 2014-08-27

4

Entering edit mode

10.2 years ago

EagleEye 7.6k

Follow these steps: http://kandurilab.org/bioinformatics/biostars/UCSC_tableBrowser_annotation.pdf

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by EagleEye 7.6k

Ram · Accepted Answer · 2014-08-27

3

Entering edit mode

10.2 years ago

Mitch Bekritsky ★ 1.3k

Here's an easy way to do it from UCSC's table browser:

In the table browser, select Ensembl Genes as your track
Under output format, choose "selected fields from primary and related tables"
Add your output file name, if you want one (otherwise, it will print to the browser)
Click "get output"
On the next page, you will get to choose your fields.
Under linked tables, check ensemblToGeneName, then press "Allow selection from checked tables"
The page will refresh, and you should have a new table called hg19.ensemblToGeneName
Check of name, chrom, strand, txStart, txEnd, and name2 in hg19.ensGene (or any fields you'd like)
In hg19.ensemblToGeneName, check "value", which has the description "alternate gene name"
Press "get output"

If you did it right, you should get a table that looks a bit like this (I took this chunk from chr1:100,000,000-150,000,000):

#hg19.ensGene.name  hg19.ensGene.chrom  hg19.ensGene.strand hg19.ensGene.txStart    hg19.ensGene.txEnd  hg19.ensGene.name2  hg19.ensemblToGeneName.value
ENST00000263174 chr1    +   100111498   100160097   ENSG00000099260 PALMD
ENST00000605497 chr1    +   100111748   100155633   ENSG00000099260 PALMD
ENST00000605613 chr1    +   100133135   100135379   ENSG00000099260 PALMD
ENST00000496843 chr1    +   100148821   100160097   ENSG00000099260 PALMD
ENST00000434734 chr1    +   100163797   100164734   ENSG00000223656 HMGB3P10

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Mitch Bekritsky ★ 1.3k

0

Entering edit mode

Ashutosh noticed what I missed -- hg38 does not have Ensembl annotations in UCSC yet. Is there a reason you're choosing hg38 and not hg19?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Mitch Bekritsky ★ 1.3k

0

Entering edit mode

Actually I can use Hg19, but maybe I will need it soon (with Hg38)

ps: Thanks a lot! Yes I get it.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by cherifbenhamda ▴ 10

0

Entering edit mode

Just one more question please

When select Ensembl Genes as my track , I get "204941" genes and their respective txStart and ..

and when I select ucsc genes as my track , I get just 82961 => about the half.

Is it normal?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by cherifbenhamda ▴ 10

1

Entering edit mode

Those are two different genome annotations--

The UCSC gene track is described here. It is a set of genes taken from RefSeq, GenBank, CCDS, Rfam, and the tRNA genes track.

I couldn't find a similarly clear description for Ensembl, but this is a good start. It seems they rely on deposited mRNAs and protein sequences in public databases. That might mean that their curation is a bit more relaxed than RefSeq, CCDS, etc.

FWIW, whenever I do annotation, I've generally relied on CCDS and RefSeq.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Mitch Bekritsky ★ 1.3k

1

Entering edit mode

Silly me! Here is the paper describing the Ensembl annotation pipeline. I only skimmed it, but it is an automated gene pipeline that includes gene predictions, which may explain the higher number of transcripts from the Ensembl table.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Mitch Bekritsky ★ 1.3k

0

Entering edit mode

I understand ! Thank you Mitch !!

ADD REPLY • link 10.2 years ago by cherifbenhamda ▴ 10

0

Entering edit mode

It's my pleasure!

ADD REPLY • link 10.2 years ago by Mitch Bekritsky ★ 1.3k