I have a list of genes and want to test whether the expression level of genes in this list could correlate with DNA methylation level. I verify my hypothesis in TCGA breast cancer. below is my plan
- Extract the expression and methylation level for each gene in my list. Expression can be defined as RPKM from RNA-seq data and methylation level from probe in the promoter region of this gene (from -3kb to 500bp around TSS. if there are multple probes in this region, I prefer to average these probe values as final methylation level value for this gene).
- calculating the correlation between these two data (eg. pearson correlation coefficient). if the P-value is significant I can say that there is a significant correlation between these two data.
- Calculating Z score of gene expression for each gene (z score as (value - mean normal)/SD normal).
- Calculating Z score of methylation level for each gene (z score as (value - mean normal)/SD normal). from -3kb to 500bp around TSS. if there are multple probes in this region, I prefer to average these probe values. then to calculate Z score.
- calculate the correlation coefficient just as metioned above.
which could be better? if you have suggestions please tell me.