I have a question about TCGA methylation 450K data.
When you look at the TCGA methylation beta values,
Level 2 data has all the values, but I found many Level 3 probes have NAs (e.g., cg00000108, cg00000109, etc).
Composite Element REF Methylated_Intensity Unmethylated_Intensity Detection_P_value
cg00000029 2488.00579881129 2281.3142892634 0
cg00000108 8943.62421381116 336.745081332759 0
cg00000109 3827.0493383932 219.47270455192 0
cg00000165 263.820225926362 2355.4623873349 0
cg00000236 3733.92206994152 722.124674419151 0
Composite Element REF Beta_value Gene_Symbol Chromosome Genomic_Coordinate
cg00000029 0.521668865344633 RBL2 16 53468112
cg00000108 NA C3orf35 3 37459206
cg00000109 NA FNDC3B 3 171916037
cg00000165 0.100722321673368 1 91194674
cg00000236 0.837944995677383 VDAC3 8 42263294
There are so many NAs and I wonder why.
I thought they were filtered out because of detection p-value but when I downloaded the IDAT files and calculated detection p-values, they were all below than 0.01. So, they were not filtered out because of detection p-value.
Additionally, they are not on the chrX/Y, they are not SNPs, they are not cross-reactive probes.
There are ~90k NAs per sample. Almost 1/5 of 450k.
Why there are so many NAs in the 450k methylation beta data?
And does anyone know how they normalized the data from raw IDAT files?
I searched hard but couldn't find..