Question: How to remove batch effect in copy number segment mean
Hi there,

I am wondering how to remove batch effect on segmented_scna data downloading from TCGA PANCANA project. The demo of data format is as following:

Sample  Chromosome  Start   End Num_Probes  Segment_Mean
TCGA-KL-8323-11A-01D-2308-01    1   3218610 104558357   58272   0.0026
TCGA-KL-8323-11A-01D-2308-01    1   104561488   104573702   10  -0.6372
TCGA-KL-8323-11A-01D-2308-01    1   104579877   179610058   27754   0.0041
TCGA-KL-8323-11A-01D-2308-01    1   179621932   179622081   2   -1.6956
TCGA-KL-8323-11A-01D-2308-01    1   179623244   247813706   43114   0.0043
TCGA-KL-8323-11A-01D-2308-01    2   484222  242476062   131310  0.006
TCGA-KL-8323-11A-01D-2308-01    3   2212571 197538677   106379  0.0022
TCGA-KL-8323-11A-01D-2308-01    4   1053934 71781186    38527   0.0048
TCGA-KL-8323-11A-01D-2308-01    4   71781554    71782247    2   -2.2184

I am trying to remove batch effect across tumor type but I am not sure if the segment_mean value could be treated as gene expression and remove batch effect by using ComBat.

If not, could anyone give me some suggestions? Many thanks advanced!

Why do you believe there is a batch effect?

Because it is a Pan Cancer analysis and it may have a batch effect.

If you are unsure about a batch effect existing in the first place, then you should not blindly assume that there does exist one - that could result in adjusting your data too much to the extent that you eliminate any interesting clinical implications that may exist in the data. Indeed, the copy number profile varies among different cancers, and also the grade / stage of these. How could you distinguish between technical and biological variability in this context?

You could just include 'CancerType' as a covariate in whichever statistical modeling that you are doing, and proceed from there.

