Question

Microarray Probe IDs to ENSEMBL ID

0

Entering edit mode

4.6 years ago

Scott McKay ▴ 30

I am currently converting a list of microarray probe ids to ENSEMBL IDs through biomaRt. I am currently in excel and I am seeing the same probe ID mapped to two different ENSEMBL IDs. Any idea what this means? I thought ENSEMBL IDs were based on the genes, not splicing variates or isoforms. Thanks!

microarray r ensembl gene high-throughput • 3.0k views

ADD COMMENT • link updated 4.6 years ago by Ben_Ensembl ★ 2.4k • written 4.6 years ago by Scott McKay ▴ 30

score 0 · Answer 1 · 2019-09-28

0

Entering edit mode

4.6 years ago

Ben_Ensembl ★ 2.4k

Hi Scott McKay,

In the annotation process, Ensembl annotates transcripts and groups them together to form genes. We use stable IDs for all of our annotated transcripts and genes (and other annotated features too). We use ENSG# IDs for human genes, and ENST# IDs for human transcripts. There is further documentation about the stable IDs in Ensembl on the following pages: [1] http://www.ensembl.org/info/genome/stable_ids/index.html [2] http://www.ensembl.org/info/genome/stable_ids/prefixes.html

Then, Ensembl maps microarray probes to the individual transcripts of a gene: http://www.ensembl.org/info/genome/microarray_probe_set_mapping.html

Best wishes,

Ben Ensembl Helpdesk

ADD COMMENT • link 4.6 years ago by Ben_Ensembl ★ 2.4k

0

Entering edit mode

Hi Ben,

biomaRt gave me each probe ID mapped to multiple ENSG# IDs. As in 1 probe had been assigned to multiple genes. I would understand if one gene ID had multiple array IDs, but that does not seem to be the case. Any insight on this?

ADD REPLY • link 4.6 years ago by Scott McKay ▴ 30

0

Entering edit mode

Hi,

Could you share you biomaRt query?

ADD REPLY • link 4.6 years ago by Ben_Ensembl ★ 2.4k

0

Entering edit mode

Hi,

Sorry for the late reply. The query was as follows: library(biomaRt) ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl") probeids=read.table(file.choose()) ##single column txt file of probe ids getBM(attributes=c('affy_hugene_1_0_st_v1', 'ensembl_gene_id'), filters = 'affy_hugene_1_0_st_v1', values = probeids, mart = ensembl)

it returns me a list of probe ids and ensembl gene ids, but probe ids will be listed twice with 2 DIFFERENT gene ids

ADD REPLY • link 4.6 years ago by Scott McKay ▴ 30

0

Entering edit mode

Hi,

No problem. Individual probes are approx. 25 bp in length and are grouped together to form probesets. The individual probes can be used in different probesets to assess the expression of more than one gene, since the 25bp sequence can map to more than one region in the genome.

Best wishes

Ben

ADD REPLY • link 4.6 years ago by Ben_Ensembl ★ 2.4k