I'm a little bit confused on how to interpret the methylation data provided by TCGA.
I'm trying to get a general idea of the methylation state of the gene promoter for the gene MGMT. Where I'm running into difficulty is aggregating the data for multiple reported elements for a single gene.
For example, each level three file has the following column headers of:
Composite Element REF
Start End Gene_Symbol Gene_Type
I can easily filter out target methylation site by grepping for the gene symbol.
grep MGMT level-3-methyl-data.txt > hits.txt;
When I review the data, it reports the methylation status for single base pairs, which may or may not be within the associated genes promoter region. Should I, and if reasonable, how can I integrate this data into a more comprehensive view of methylation on the MGMT promoter? I'm not sure if this is even a reasonable idea, so I'm thinking about this the wrong way, please just let me know.