Filtering methylation probe ID from multiple probes
0
0
Entering edit mode
16 months ago
1769mkc ★ 1.2k

For downstream analysis I trying to use methylation 450k data The M value, So the data is something like this

 dput(methyl[1:10,1:3])
    structure(list(Symbol = c("A4GALT", "A4GALT", "A4GALT", "A4GALT", 
    "A4GALT", "A4GALT", "A4GALT", "A4GALT", "A4GALT", "A4GALT"), 
        `TCGA-AB-2856` = c(-0.69571999396859, 6.59452651543373, -3.31241269267196, 
        -2.27831006586008, -6.67087214612625, -4.07354075597074, 
        -6.72587345772808, -3.99270962745257, 6.30759056557904, 4.35275426216806
        ), `TCGA-AB-2849` = c(-2.1258506029936, 6.29288805154616, 
        -0.989789351415863, -1.87695599373517, -6.29710435957612, 
        -0.552953206195101, -6.39859496846795, 1.81668629311401, 
        6.2635495345495, 3.91415116195022)), row.names = c(NA, -10L
    ), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000002708c1367e0>)

Here we can see for single gene there are multiple probes how do i filter or merge?

Is it correct to merge them or average them?

methylation • 718 views
ADD COMMENT
1
Entering edit mode

Each probe should correspond to one CpG, I do not think it would be a good idea to merge them as it stands because they are located in different genomic regions. Maybe if you find an appropriate annotation you can average them over a same CpG island or regulatory region but the gold standard in methylation array analysis at the gene/region scale remains to find differentially methylated regions (DMRs) between your conditions.

ADD REPLY
0
Entering edit mode

thank you i did that actually but many of them are not coming up or common to my rna seq , I wanted to use Multiā€Omics Factor Analysis framework that requires perhaps having both the datasets same dimension so can you suggest me a way without merging or filtering how to do that?

ADD REPLY
0
Entering edit mode

I'm just curious how did they give gene name which are unique TCGA LAML methylation in this data they don't have probe ID rather genes which are already mapped i guess

ADD REPLY
2
Entering edit mode

I have never used MOFA so unfortunately I cannot help you on this point, but from what I see it seems possible that datasets which have different features dimension as long as the number of samples are the same Indeed it is really interesting how the summed up the methylation per gene, maybe they took the average (I cannot find any information on that) but it is not a good approach to me since with microarray genes are not equally covered. It is even more surprising that for other datasets from TCGA, LinkedOmics seems to provide methylation per CpG

ADD REPLY
0
Entering edit mode

" dimension as long as the number of samples are the same " Thank you now i can give it a try . yes my samples are same in both the data sets

"t is even more surprising that for other datasets from TCGA, LinkedOmics seems to provide methylation per CpG" yes not sure how they did it and thank you for the mofa part clearing confusion

ADD REPLY

Login before adding your answer.

Traffic: 1826 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6