Question: ComBat on TCGA RNAseq data generates all NAN
0
gravatar for mforde84
22 months ago by
mforde841.2k
mforde841.2k wrote:

Hi,

I have FPKM-UQ data from COAD-TCGA.

I generated an expression set of this data using:

> edata = log2(data + 1)
> edata[1:2,1:2]
      X01240896.3f3f.4bf9.9799.55c87bfacf36
1                                  8.540967
10                                13.528968
100                               16.296422
1000                              13.658199
10000                             15.143788
      X01ad5016.f691.4bca.82a0.910429d8d25b
1                                  9.886537
10                                16.719682
100                               17.212312
1000                              10.317842
10000                             13.166767

Row name is entrezid, and column names are case_ids.

The batch data provided by TCGA which indicates a total N = 42

> omics[1:5,]
                                      snf    boo order batch
X01240896.3f3f.4bf9.9799.55c87bfacf36   3 192807     1    22
X01ad5016.f691.4bca.82a0.910429d8d25b   2  82213     2     2
X01f493d4.229d.47a6.baa8.32a342c65d01   1 A08907     3    29
X022f39e9.57ee.4b2b.8b3a.8929e3d69a37   2 A16S13     4    32
X02f9668c.71e6.485f.88b1.b37dc8bdd2ab   3 102113     5     7
> batch = omics$batch
> batch
  [1] 22  2 29 32  7  1  8  8  7 41 26 14 26 26  1  5 22  5 15 18 28 12 12  8 19
 [26] 13 38  5 27  7 41  1 22 26 28 22 33 14 13  8 10 13 11 17  5 38  5  7  7 31
 [51] 22 42 11  8  7 17 15 17 12 17  2 19  2 12 18 14 22 11 21 15 17 17 33 15 19
 [76] 15  6  1 10 11  2 28 36 17  9 15 12 17 15 10  6  8 13 25 11 15  1 13 24  8
[101] 13 15 17  8  2 22 17 11 13 37 19 38 13 19  5 16  5  5 22 35 35 19 13 17 22
[126]  7  8  8 41 39 14  6 19 18 38 22 14 30 19 24 11 14 13 16  7 13  8 22 12  8
[151] 22 17  2 40 13 35 15  5 24 19 41 22 22 19 17  8 13 33 10 17 29 12  9 11 24
[176] 27  5 14  7 25 13 35  9 18 16 13 31 25 18 10 35 18 14 33  5 19 17  8 11 11
[201] 42  7 10  8 39 19  8  7 11 19 31 35 28 36 13 37 10 38  3 18 13 34 39 13  3
[226] 40 31 17 19 26  7 19 12 22 19 17 12 17 13 12 28 33 37 17 35 18 40 13 20 35
[251] 17  6 15 14 12 27  5 41 31 41 18 21 19 17 40  8 41  2  5 15  5 15 15  5 39
[276]  1 22 33 13 17  7 11 15  3  7 38 15  7 14 14 22 31 14 18  3 19  7  9  6  5
[301] 22  8 11  1 13 10 19  8 14 15  6 32  1  2 19  4  5 13 18 42 11 17  8 36 19
[326] 15 22 28 15  8 11  8 15 38 19 39  7 11 42 23  5  1 22 17 35 13 40 25  7 41
[351] 38  5  8 19 31  6 38  8 36  4 15 15  7 12 22 38 29 35 15  2 10  2 19 15 25
[376]  5 39 30 19 18 15 10 10 33  1 19 32 19 19 15 20 15  7 22 35 22 30 14 12 16
[401] 21 13 15 18  7  5  6 31 12  7 15 32 33 18 14 15 15 26 31  1 14 19 17 18  1
[426]  5 10 32

Unfortunately, when I run the ComBat analysis it generates all NANs, and no values:

> modcombat = model.matrix(~1, data=omics)
> combat_edata = ComBat(dat=edata, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)
> combat_edata[1:5,1:2]
      X01240896.3f3f.4bf9.9799.55c87bfacf36
1                                       NaN
10                                      NaN
100                                     NaN
1000                                    NaN
10000                                   NaN
      X01ad5016.f691.4bca.82a0.910429d8d25b
1                                       NaN
10                                      NaN
100                                     NaN
1000                                    NaN
10000                                   NaN

Any suggestions?

combat tcga • 961 views
ADD COMMENTlink modified 5 months ago by coyoung0 • written 22 months ago by mforde841.2k
0
gravatar for mforde84
22 months ago by
mforde841.2k
mforde841.2k wrote:

I think I might know what the problem is.

I plotted the log2 corrected values, and the distribution may not be normal because of the 0 values present in RNASeq data. If I understand correctly, this data is actually missing, whereas with my current analysis it is viewed by R as an actual value of 0. So either the missing data needs to be imputed, or the data needs to be scaled and normalized to mean of 0, then subsequently log2 transformed.

> data <- standardNormalization(data)
> data <- log2(data + 1)
> combat_data = ComBat(dat=data, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)
> combat_data[1:5,1:2]
          X01240896.3f3f.4bf9.9799.55c87bfacf36
100507661                           -0.13857428
57103                               -0.06700506
22838                                0.16295096
55567                               -0.13579991
6147                                 1.64990772
          X01ad5016.f691.4bca.82a0.910429d8d25b
100507661                           -0.13789534
57103                               -0.06210232
22838                                0.01546224
55567                               -0.13443289
6147                                 1.48473072
ADD COMMENTlink written 22 months ago by mforde841.2k

Hello,

Where did you get the batch information from?

ADD REPLYlink written 5 months ago by coyoung0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1666 users visited in the last hour