Batch effect in Axiom Human Origins Array
2
1
Entering edit mode
6.8 years ago
Simo ▴ 50

I'm working with an Axiom Human Origins Array. The samples have been typed in two separate plates and the PCA is clearly showing the separation between the two batches. I guess I'm facing a batch effect. I'd like to normalize the array data and start over with the work. So far, I've been trying with dChip but it says that the required CDF file I provided is "Not a CDF file or not in text format" (I downloaded it from the Affymetrix web page, and I can't see what's wrong with it). If you have any suggestion, it would be helpful, not only on dChip, but even on how to deal with the batch effect and how to perform a normalization of the data.

Thank you

batcheffect array affymetrix • 2.4k views
0
Entering edit mode

Hi F.R. I have the exact same problem. Can you please summarize quickly how you dealt with it?! I would appreciate a lot!!

1
Entering edit mode
6.8 years ago
Ar ★ 1.1k

I have a couple of questions: Is the dataset public ? Do you mind sharing the GEO id. I think I can help you analyzing the dataset. And, if you don't know that there are different condition or different genotypes then how can you say the Batch Effects will play any role in the analysis. It is possible that the non biological variation does not confound the results. I would recommend you to look at the experiment design and see if it make sense or not.

Regarding the analysis part, I would recommend you to use R rather dChip. R has a lot of packages which are useful for processing the microarray datasets. Here is what I will do:

1. Use read.celfiles from oligo package to read the CEL files
2. Use rma from affy package to do the normalization
3. Get the normalized data using exprs function from Biobase package
4. Visualize box plots and PCA plots to see if the data is normalized or not and whether the samples are clustering based on genotypes or not.
5. If they are not clustering based on genotypes, then use ComBat (if you know the batch) or sva (if you don't know the batch)
6. Rerun the step 4 to check if ComBat or sva normalized dataset eventually led to clustering of the samples based on genotype or not.
7. Further down stream analysis using batch corrected matrix using limma.
0
Entering edit mode

Unfortunatelly I cannot share the data.

Regarding the Batch Effect, I'm analyzing samples coming from the same area and they should cluster toghether. Since the split in the PCA is clearly representing the two plates, I can only suppose that something is affecting the raw data (e.g. Batch Effect).

Furthermore, the Axiom technology seems to be not supported by several R packages, thus I can't perform the analyses with them. I'm trying to use the APT tools from Affymetrix. Now I'm performing the normalization on the CEL files. Then I can try to execute the genotyping again and see if something is changed.

Unfortunatelly (my limit I guess), it's not so immediate having the new PCA, The genotyped data from this Array are a mess! They need to be post-corrected for different issues.

I'dlike to know how I can check the raw data in order to be sure that I've fixed the "problem", before moving further.

I really thank you for your suggestions and time.

0
Entering edit mode
6.8 years ago
ablanchetcohen ★ 1.2k

The ComBat function, available in the R package sva, is the most commonly used method to remove batch effects. https://bioconductor.org/packages/release/bioc/html/sva.html

Batch correction only works if you have common conditions between the batch groups. If the two batches only have samples belonging to different conditions, there is no means of distinguishing the batch effect from the biological differences between the conditions.

0
Entering edit mode

Thank you.

I don't know if there are common conditions between the batch groups, the only thing I can see is that my PCA is perfectly splitting the 2 batches, and it's not possible. I suppose it's related to a batch effect.

I'm trying to menage the data in R, but there must be some issue related to the CEL and CDF files: they are both not recognized as what they are. The official software Axiom Analysis Suite 1.1. from Affymetrix reads the cel files without any problem. I suppose that with this Array the CEL files are in a format that is not supported by the dedicated R packages.

However, I'll try to figure it out.

Really thank you for your support.