Question

Fold Change Differences Between Mbni'S Custom Cdf Probeset And Affymetrix Original Probesets (Netaffx Annotation)

1

Entering edit mode

13.3 years ago

Lei Huang ▴ 10

Hello, I'm not sure if anybody addressed this issue here before. I'm using the custom CDF to analyze gene expression differences between two phenotypes from 3 individual tumor datasets (hgu133a, raw .CEL files). Preprocessing is done by rma(). I have a very interesting observation about the fold changes between these two phenotypes. For example, I'm looking at the ESR1 gene expression. Given FDR of 0.05, if I use custom CDF file, this gene (probeset 2099at, ENTREZG) is differentially expressed between two phenotypes (using limma package in bioconductor). If I use Affymetrix original CDF file where 9 probesets are mapped to ESR1, two of the probsets (205225at and 211235sat) are differentially expressed between two phenotypes. When I look at the log2 fold changes between the phenotypes, here comes the part that I am not sure how I can interpret. From literature I learned that ESR1 has large fold changes between the phenotypes. But by looking at the log2 fold changes listed below, I find that the probeset from custom CDF shows much lower fold changes compared to probeset 205225_at from Affy CDF. I understand that fold changes in microarray experiments are not as accurate as those in qPCR. But the difference of the scale shown in my observation confuses me. Which fold change should I trust? I'd like to hear your take on this issue. Thanks a lot!

NetAffy

affyId dataset1 dataset2 dataset3
205225_at 4.040 3.580 4.130
211233xat 0.835 0.437 0.691
211234xat 0.656 0.378 0.517
211235sat 1.034 0.580 1.013
211627xat -0.008 0.069 0.012
215551_at 0.108 0.086 0.040
215552sat 0.802 0.582 0.989
217190xat 0.001 0.044 0.003
217163_at 0.293 0.209 0.301

Custom CDF

probesetId dataset1 dataset2 dataset3
2099_at 1.015 0.646 1.123

microarray affymetrix • 3.9k views

ADD COMMENT • link updated 13.3 years ago by Lei Huang • 0 • written 13.3 years ago by Lei Huang ▴ 10

score 3 · Answer 1 · 2011-01-11

3

Entering edit mode

13.3 years ago

User 59 13k

You should probably trust the fold-change that you verify by qRT-PCR.

The thing is you can't expect the custom CDF to give you the same results as the original CDF. They might be similar in the main, but because you're using the custom CDF, you would need to know which probes were used in the re-annotated CDF probeset, and whether they genuinely matched the original CDF probeset.

It also doesn't seem unlikely that rma() operating on the same chip with 2 different underlying CDFs is going to give you different results regardless of whether the probes making up the probesets are the same on each chip.

The fact that only 2/9 probesets from the original CDF are differentially expressed between your conditions, suggests that using the remapped CDF file is probably a good bet anyway...

Is 2099_at really that different from 211235_s_at anyway?

I think your best bet to see if 205225_at is any good as a probeset would be to take the probes and BLAT them against hg19 and see if they match what you think they match, and are unique to put your mind at ease.

ADD COMMENT • link 13.3 years ago by User 59 13k

0

Entering edit mode

Thanks Daniel! The expression of 2009_at from custom CDF is actually similar as 211235_s_at from NetAffy as you can see. I'll take a closer look at the probes as you suggested. Thanks again! -Lei

ADD REPLY • link 13.3 years ago by Lei Huang ▴ 10

0

Entering edit mode

Following Daniel's suggestion, I extract the probe sequences for 205225_at from NetAffy and probe sequences for 2099_at from Custom CDF file. Then I BLAT them against hg19. They are all 100% mapped to genomic sequence of ESR1. The 11 probes of 205225_at are mapped to the most 3' end while 46 probe sequences (including 11 probes above) for 2099_at are located across the genomic region of that gene. The latter may better represent the overall gene.

ADD REPLY • link 13.3 years ago by Lei Huang ▴ 10

0

Entering edit mode

Of course the probes being spread over the length of the gene, rather than the 3' end might leave you at the mercy of a) splice variants and b) RNA degradation, both of which might skew the results. A quick glance suggests ESR1 has all kinds of alternatively spliced isoforms and a number of different promoters, so are you sure which isoforms you're picking up across the length of the gene with those probesets?

ADD REPLY • link 13.3 years ago by User 59 13k

score 0 · Answer 2 · 2011-01-12

0

Entering edit mode

13.3 years ago

Lei Huang • 0

Hi Daniel, those probes are mapped to four different ESR1 isoforms. So the gene expression is the consensus of those isoforms on HGU133A chip. Correct me if I'm wrong. Thinks again for the very helpful answer!

ADD COMMENT • link 13.3 years ago by Lei Huang • 0