Question: What people do with the probes that doesn't have any GENE SYMBOL in annotation databases?
2
gravatar for arronar
12 months ago by
arronar150
Austria
arronar150 wrote:

Hello.

I'm doing some microarray analysis, and I figured out that some probe ids have not any GENE SYMBOL. Am I suppose to delete those probes or let them exist ?

The way I apply annotation info for an Affymetrix HGU133plus2 array after RMA application, is the following:

probes=row.names(expressions)
Symbols = unlist(mget(probes, hgu133plus2SYMBOL, ifnotfound=NA))
Entrez_IDs = unlist(mget(probes, hgu133plus2ENTREZID, ifnotfound=NA))
expressions=cbind(probes,Symbols,Entrez_IDs,expressions)

Is something wrong with my code or such behavior is expected ? What do you guys do with these NA genes ?

annotation microoarrays • 671 views
ADD COMMENTlink modified 12 months ago by Maxime Lamontagne2.1k • written 12 months ago by arronar150
6
gravatar for Maxime Lamontagne
12 months ago by
Québec
Maxime Lamontagne2.1k wrote:

It’s not unusual in older array. Due to changes in gene annotation and reference sequence, some probe set may be outdated (probes not mapping to any gene in the new build). To find a gene for your probe set, you need to blat all the probes of the probe set and look where they are mapping. ​

ADD COMMENTlink written 12 months ago by Maxime Lamontagne2.1k

It would be helpful enough if you could explain the procedure with more details because I'm not so experienced with such tasks. Thank you very much.

ADD REPLYlink written 12 months ago by arronar150
1

You could get the probe sequences from the available Affymetrix annotation files, and then paste the sequences into a BLAT tool like the one at UCSC here to find where the probes map to the latest genome build.

ADD REPLYlink written 12 months ago by Ahill1.3k

I think this paper is doing exactly what you want.

Framework for reanalysis of publicly available Affymetrix® GeneChip® data sets based on functional regions of interest

https://www.biorxiv.org/content/biorxiv/early/2017/04/11/126573.full.pdf

ADD REPLYlink written 12 months ago by Maxime Lamontagne2.1k

Those probe sets that didn't map in gene symbols, were about 12000 . So I needed a more automatic way to retrieve a gene symbol for them and not one by one . So I found this site in which you configure the input as Affyid and the output as gene symbol. Then, I wrote these lines in R to retrieve the results (for anyone else that will need such a thing):

for ( probe in missed_probes ){
  q <- paste(sep="","https://biodbnet-abcc.ncifcrf.gov/webServices/rest.php/biodbnetRestApi.json?method=db2db&input=affyid&inputValues=",probe,"&outputs=genesymbol&taxonId=9606&format=row")
  results <- rbind(results,fromJSON(txt=q))
}

The thing now is that for many probe sets, there are more than one gene symbols. :s And maybe I have to decide randomly which one to keep :-p

EDIT: Also I found these sites that do this work.

http://biit.cs.ut.ee/gprofiler/gconvert.cgi

http://idmap.genestimuli.org/

ADD REPLYlink modified 12 months ago • written 12 months ago by arronar150
2
gravatar for Hussain Ather
12 months ago by
Hussain Ather890
National Institutes of Health, Bethesda, MD
Hussain Ather890 wrote:

You probably want to get the current annotated reference (maybe look here on Bioconductor) and map the sequence of each probe to it. It's possible that the gene list you were using is out of date.

ADD COMMENTlink written 12 months ago by Hussain Ather890

I'm already using the lateset version of that package.

Package hgu133plus2.db version 3.2.3

ADD REPLYlink written 12 months ago by arronar150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1070 users visited in the last hour