Question: "old" microarray probes IDs correspondance with gene names
gravatar for guillaume.rbt
4 months ago by
guillaume.rbt600 wrote:

Hi all,

I'm working on a 2006 public microarray dataset ( ).

I've reanalysed the data to get differentially expressed transcripts, and now I'm trying to test pathway enrichment for some gene sets.

The problem I'm facing is that I need to identify which probes of the chip correspond to the genes in the sets I want to test. Considering the chip probes are annotated with old EMBL transcript ID (most of the id are like AAXXXXXX, AIXXXXXX, HXXXXX, NXXXXX, RXXXXX, TXXXXX, with numbers for Xs, for example I know that "AI375736" corresponds to CD28 gene).

I'm not really sure how to find a correspondance between the genes I want to study and these transcripts IDs.

If anyone has any advice on how to do that it would be very helpful.

Many thanks

ADD COMMENTlink modified 4 months ago by Pierre Lindenbaum121k • written 4 months ago by guillaume.rbt600

The array is quite old indeed. There are mappings to what appear to be gene descriptions, here:

Check the Excel files.

The arrays are Agilent but do not appear to be supported in biomaRt. However, I note that these IDs that you list are likely GenBank accession IDs and not probe names.

ADD REPLYlink written 4 months ago by Kevin Blighe44k

Thank you for your response. It's in those files that I found the IDs, the exact name of the column is "Reporter Database Entry[embl]", it's indeed not the probe name.

ADD REPLYlink modified 4 months ago • written 4 months ago by guillaume.rbt600

You may try to map them with this code, in that case:

ids <- c("AI375736", "AI092544")

mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
  attributes=c("protein_id", "embl", "ensembl_gene_id", "gene_biotype", "external_gene_name"),
  values = ids,

I tried but failed. Some may map, though. Otherwise you may consider eUtils to map these to gene symbols.

ADD REPLYlink written 4 months ago by Kevin Blighe44k

thank you very much for trying, I will check other Ids to see if it could work

ADD REPLYlink written 4 months ago by guillaume.rbt600
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

using mysql ucsc (for your example, it's an EST )

$ mysql --user=genome -A -P 3306 -D hg38 -e 'select distinct E.qName,E.tName,E.tStart,E.tEnd,,K.name2,K.txStart,K.txEnd from all_est as E,wgEncodeGencodeBasicV28 as K where E.qName="AI375736" and K.chrom=E.tName and NOT( K.txEnd < E.tStart || E.tEnd < K.txStart) ;'
| qName    | tName | tStart    | tEnd      | name              | name2 | txStart   | txEnd     |
| AI375736 | chr2  | 203735217 | 203735676 | ENST00000374481.7 | CD28  | 203706474 | 203738910 |
| AI375736 | chr2  | 203735217 | 203735676 | ENST00000324106.8 | CD28  | 203706547 | 203738912 |
ADD COMMENTlink written 4 months ago by Pierre Lindenbaum121k

great, thank for the tip!

ADD REPLYlink written 4 months ago by guillaume.rbt600
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1733 users visited in the last hour