Batch correction between cancer only data and normal only data
Entering edit mode
11 days ago
Eren • 0

I am working with methylation data where Beta values are already available. I have normal samples created by combining different GSE data, for which I can do batch correction using ComBat. However, I also have one big data from the same source that contains cancer samples but no normal samples. I want to compare cancer and normal, but before that, I want to know whether I should make any batch corrections because these do not come from the same source. What I am unable to do is that since my cancer source only contains cancer samples and the normal sample contains only normal samples, functions such as ComBat do not work.

Is it necessary to do batch removal in such situations (currently, my cancer and normal samples are well separated in the t-sne plot, while normal samples do not show any source-dependent clustering)?

If necessary, how can I do it in such a situation?

methylation batch-effect • 203 views
Entering edit mode
10 days ago

In general you need to do batch correction when samples come from different sources BUT you cannot do batch correction when there are no overlapping samples. That means there is no robust way to do what you want to do (compare cancer and normal from difference sources).

You can think of the difference between the Beta value in two samples as being the sum of different due to the condition and difference due to batch effect. You can caculate the size of the batch effect by looking at the difference between samples that should have no difference due to condition (its a little more complex than this in reality, not least because ComBat finds the batches as well as removing them, but it'll do for this explaination).

What can you do then? If this were RNA-seq, I'd probably tell you can't do the analysis (you'll find plenty of questions on this site where I say exactly this in similar situations), but I don't know enough about the statistical properties of methylation data to be so confident. You could check that most sites don't change, and checking there is no systematic difference between the data sets. However, this assumes that any difference in bias accross the datasets is independent of the methylation site (i.e. the same biases apply to all locations).


Login before adding your answer.

Traffic: 4429 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6