I want to adjust covariates (like age, BMI etc.) with RRBS data and then identify differentially methylated regions (DMRs) in control vs treated conditions.
I have 50 bismarck coverage files (50 samples, control = 25 samples, treated = 25 samples). The amount of data is huge and computationally difficult to handle. Each sample file has around 30 to 40 million CpGs.
I am looking for some easier methods so that I can handle such data on PC, and determine DMRs in treated conditions by adjusting covariates.
So far I know packages like DSS but I don't think it can adjust 2-3 covariates and it is also computationally difficult to handle. Methylkit can handle covariates as I checked the manual, but I think even that can be computationally challenging, as I don't have resources to run. I have a normal laptop with 8GB RAM. [I tried loading all 50 bismarck files in R and created BS-seq object using DSS pacakge, but my system crashed after a while.]
Hence, I am looking for less extensive methods that can easily handle this type of data and adjust covariates.
This is the format of bismarck files (per sample) I have.
- Does anyone have any ideas how to adjust covariates using easier methods which can be executed on a normal PC/laptop?
- I recently got to know EdgeR can be used. But I am not sure how to generate the input CpG matrix. Can someone suggest how to create a matrix (30-40 million rows x 50 columns) from the bismarck files I have?
- For methods like limma, what kind of input will be required?
Does it take CpG matrix? and if yes, then what values should be present in the matrix - count values (raw methyl reads count) or methylation level values (methylation proportion) for DMR analysis ?
For limma, I am not able to find any manual related to RRBS data. Is limma a good option for handling RRBS data?
I am quite new to the RRBS data and hence looking for any feasible options. I am aware of RNA seq pipelines using edger and limma, hence wanted to know if I can use these to handle RRBS bismarck files and computationally feasible. If anyone can please advise me how to go about this analysis? All I need is to adjust for confounding factors and determine DMRs in treated condition, so that the results I get will only be due to the treatment and not because of any confounding factors. And I have only these 50 bismarck files to begin with, where I am struggling computationally. Please help me out.
Thanks!