Question: Getting Probeset Sequence Information From A Custom Cdf?
4
gravatar for Sam
9.2 years ago by
Sam90
Sam90 wrote:

Hello, I downloaded a custom CDF file for Affymetrix U133plus2.0 arrays. I am trying to see if I can get the probeset sequence information from this file for a particular Gene. Can anyone help me do this? I looked in the file and see some information about cbase, pbase, and tbase. Is that the place to find the information?

[EDIT: text below moved here from answer]

I am actually using one of those remapped CDFs of the U133plus2.0, so I am most interested in the probe level information...the sequences that are actually making up my new probeset. I suppose this is a tougher task than anticipated

affymetrix • 4.8k views
ADD COMMENTlink modified 5.2 years ago by Biostar ♦♦ 20 • written 9.2 years ago by Sam90

Problem with custom CDFs is that the contents vary, because they're...customised. Can you post a link to the custom CDF download location, so we can look at it?

ADD REPLYlink written 9.2 years ago by Neilfws48k
7
gravatar for David Quigley
9.2 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

The CDF does not contain probe sequences. That information can be downloaded from Affymetrix's web site under Support (free registration required), then select Annotation Files for the platform you want. Sequence information is stored for probes in a FASTA file you can download; the one I think you want is

http://www.affymetrix.com/analysis/downloads/data/HG-U133Plus2.probe_fasta.zip

So long as the probeset IDs (e.g. "1007sat") can be pulled out of your file, you should be able to match them to this Fasta file. The probes have identifiers of the form:

probe:HG-U133A2:1007sat:416:177; InterrogationPosition=3330; Antisense;

ADD COMMENTlink modified 7.3 years ago by Michael Kuhn5.0k • written 9.2 years ago by David Quigley11k

I think they want the Plus 2.0 file, at http://www.affymetrix.com/Auth/analysis/downloads/data/HG-U133_Plus_2.probe_fasta.zip .

ADD REPLYlink modified 7 weeks ago by RamRS24k • written 9.2 years ago by Neilfws48k

Thanks, I noticed that immediately after I posted it. The link is correct in the original response.

ADD REPLYlink written 9.2 years ago by David Quigley11k
5
gravatar for Neilfws
9.2 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

In general, CDFs do not contain sequence information, unless they have been customised to contain a SEQUENCE field. A CDF maps probes to probesets and probesets to (X,Y) coordinates on the chip, hence the name (chip descriptor file). CBASE, PBASE and TBASE refer to the nucleotides at positions 12, 13 and 14 in the probe.

To get probe sequences for the U133 Plus 2.0 file, go to the Affymetrix product page for that array. From there, you can download either a FASTA file or a tabular file. You'll need to create an account and/or login first.

Even if your CDF is customised, there should be matching probeset IDs with the original product file. If you want to get probeset IDs for a particular gene, you can use BioMart, either via the web, or using the Bioconductor biomaRt package. Here is some sample R code, to find the probesets for gene HOXB13:

library(biomaRt)
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
results <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol", \
           "affy_hg_u133_plus_2"), filters = "hgnc_symbol", \
           values = "HOXB13", mart = mart)
results
ensembl_gene_id hgnc_symbol affy_hg_u133_plus_2
1 ENSG00000159184      HOXB13           230105_at
2 ENSG00000159184      HOXB13           209844_at

From there, you can go back to your FASTA file and pull out the probe sequences for those probesets.

ADD COMMENTlink modified 7 weeks ago by RamRS24k • written 9.2 years ago by Neilfws48k

I am actually using one of those remapped CDFs of the U133plus2.0, so I am most interested in the probe level information...the sequences that are actually making up my new probeset. I suppose this is a tougher task than anticipated.

ADD REPLYlink written 9.2 years ago by Sam90
2
gravatar for Will
9.2 years ago by
Will4.5k
United States
Will4.5k wrote:

I'm not sure that info is actually contained in the CDF file. My understanding has always been that the CDF file only keeps track of which propes are in each probeset. If your custom CDF is in GEO then they often have a link to the sequences. If you got it from some other website then you'll have to root around in there.

ADD COMMENTlink written 9.2 years ago by Will4.5k
2
gravatar for Karolis
7.7 years ago by
Karolis20
Karolis20 wrote:

You can also get probe sequence information for stock CDFs this way:

source("http://www.bioconductor.org/biocLite.R")
biocLite("hgu133plus2probe")
library("hgu133plus2probe")
head(hgu133plus2probe)

For custom CDF you have to download and install probe file. For example: http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/14.1.0/ensg.download/hgu133plus2hsensgprobe_14.1.0.tar.gz This is found in: http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/14.1.0/ensg.asp

Then install the probe file:

R CMD INSTALL hgu133plus2hsensgprobe_14.1.0.tar.gz

Then run these commands in R:

library(hgu133plus2hsensgprobe)
head(hgu133plus2hsensgprobe)
ADD COMMENTlink modified 7 weeks ago by RamRS24k • written 7.7 years ago by Karolis20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1968 users visited in the last hour