ComBat on TCGA RNAseq data generates all NAN
1
0
Entering edit mode
3.8 years ago
mforde84 ★ 1.3k

Hi,

I have FPKM-UQ data from COAD-TCGA.

I generated an expression set of this data using:

> edata = log2(data + 1)
> edata[1:2,1:2]
      X01240896.3f3f.4bf9.9799.55c87bfacf36
1                                  8.540967
10                                13.528968
100                               16.296422
1000                              13.658199
10000                             15.143788
      X01ad5016.f691.4bca.82a0.910429d8d25b
1                                  9.886537
10                                16.719682
100                               17.212312
1000                              10.317842
10000                             13.166767

Row name is entrezid, and column names are case_ids.

The batch data provided by TCGA which indicates a total N = 42

> omics[1:5,]
                                      snf    boo order batch
X01240896.3f3f.4bf9.9799.55c87bfacf36   3 192807     1    22
X01ad5016.f691.4bca.82a0.910429d8d25b   2  82213     2     2
X01f493d4.229d.47a6.baa8.32a342c65d01   1 A08907     3    29
X022f39e9.57ee.4b2b.8b3a.8929e3d69a37   2 A16S13     4    32
X02f9668c.71e6.485f.88b1.b37dc8bdd2ab   3 102113     5     7
> batch = omics$batch
> batch
  [1] 22  2 29 32  7  1  8  8  7 41 26 14 26 26  1  5 22  5 15 18 28 12 12  8 19
 [26] 13 38  5 27  7 41  1 22 26 28 22 33 14 13  8 10 13 11 17  5 38  5  7  7 31
 [51] 22 42 11  8  7 17 15 17 12 17  2 19  2 12 18 14 22 11 21 15 17 17 33 15 19
 [76] 15  6  1 10 11  2 28 36 17  9 15 12 17 15 10  6  8 13 25 11 15  1 13 24  8
[101] 13 15 17  8  2 22 17 11 13 37 19 38 13 19  5 16  5  5 22 35 35 19 13 17 22
[126]  7  8  8 41 39 14  6 19 18 38 22 14 30 19 24 11 14 13 16  7 13  8 22 12  8
[151] 22 17  2 40 13 35 15  5 24 19 41 22 22 19 17  8 13 33 10 17 29 12  9 11 24
[176] 27  5 14  7 25 13 35  9 18 16 13 31 25 18 10 35 18 14 33  5 19 17  8 11 11
[201] 42  7 10  8 39 19  8  7 11 19 31 35 28 36 13 37 10 38  3 18 13 34 39 13  3
[226] 40 31 17 19 26  7 19 12 22 19 17 12 17 13 12 28 33 37 17 35 18 40 13 20 35
[251] 17  6 15 14 12 27  5 41 31 41 18 21 19 17 40  8 41  2  5 15  5 15 15  5 39
[276]  1 22 33 13 17  7 11 15  3  7 38 15  7 14 14 22 31 14 18  3 19  7  9  6  5
[301] 22  8 11  1 13 10 19  8 14 15  6 32  1  2 19  4  5 13 18 42 11 17  8 36 19
[326] 15 22 28 15  8 11  8 15 38 19 39  7 11 42 23  5  1 22 17 35 13 40 25  7 41
[351] 38  5  8 19 31  6 38  8 36  4 15 15  7 12 22 38 29 35 15  2 10  2 19 15 25
[376]  5 39 30 19 18 15 10 10 33  1 19 32 19 19 15 20 15  7 22 35 22 30 14 12 16
[401] 21 13 15 18  7  5  6 31 12  7 15 32 33 18 14 15 15 26 31  1 14 19 17 18  1
[426]  5 10 32

Unfortunately, when I run the ComBat analysis it generates all NANs, and no values:

> modcombat = model.matrix(~1, data=omics)
> combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)
> combat_edata[1:5,1:2]
      X01240896.3f3f.4bf9.9799.55c87bfacf36
1                                       NaN
10                                      NaN
100                                     NaN
1000                                    NaN
10000                                   NaN
      X01ad5016.f691.4bca.82a0.910429d8d25b
1                                       NaN
10                                      NaN
100                                     NaN
1000                                    NaN
10000                                   NaN

Any suggestions?

ComBat tcga • 1.8k views
ADD COMMENT
0
Entering edit mode
3.8 years ago
mforde84 ★ 1.3k

I think I might know what the problem is.

I plotted the log2 corrected values, and the distribution may not be normal because of the 0 values present in RNASeq data. If I understand correctly, this data is actually missing, whereas with my current analysis it is viewed by R as an actual value of 0. So either the missing data needs to be imputed, or the data needs to be scaled and normalized to mean of 0, then subsequently log2 transformed.

> data <- standardNormalization(data)
> data <- log2(data + 1)
> combat_data = ComBat(dat=data, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)
> combat_data[1:5,1:2]
          X01240896.3f3f.4bf9.9799.55c87bfacf36
100507661                           -0.13857428
57103                               -0.06700506
22838                                0.16295096
55567                               -0.13579991
6147                                 1.64990772
          X01ad5016.f691.4bca.82a0.910429d8d25b
100507661                           -0.13789534
57103                               -0.06210232
22838                                0.01546224
55567                               -0.13443289
6147                                 1.48473072
ADD COMMENT
0
Entering edit mode

Hello,

Where did you get the batch information from?

ADD REPLY

Login before adding your answer.

Traffic: 2849 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6