I'm fairly new to bioinformatics and data processing, but up until now I was able to find many solutions to my problems, provided by other individuals within this community. I am following a R tutorial for the QSEA package for Methylated DNA Immunoprecipitation data analysis, and realized they use a previously generated dataset for use in normalizing the data.
The dataset is composed of TCGA 450K Methylation, where the Beta values are averaged over 500bp windows across all samples (a combination of lung cancer tissue and adjacent normals). My understanding is that regions of Beta values 0.8 or greater in both tumour and adjacent normal samples are kept for use in calibrating the normalization of your samples of interest.
I've spent the last week looking at several ways to obtain 450K TCGA data specific for my cancer of interest (i.e. Head and Neck) and have lvl3 data downloaded, as well as other files obtained from R packages such as TCGABioLinks and RTCGAToolbox.
I've looked at the dataset used for calibration within the QSEA tutorial, which looks something like this:
I've tried to reproduce the same results, albiet with TCGA HNSC data instead of Lung, but I cannot figure out how to 1) produce IRanges of 500bp instead of probe location and 2) how to separate tumor samples from normals, and then the mean value among each group of samples. I've played around with the packages mentioned above, as well as Minfi, GRanges, and SummarizedExperiment, and while I'm able to produce a GRange object, it fails to meet the above criteria.
I was wondering for those who often work with TCGA 450K Methylation data, if they would be able to provide suggestions or even 450K data processing software/R packages which I may be overlooking.
Here is the link to the QSEA R package for those interested, specifically under section 3.2.3: https://bioconductor.org/packages/release/bioc/vignettes/qsea/inst/doc/qsea_tutorial.html
I'd really appreciate any assistance with this,