I'm beginner in the bioinformatics area and I have some doubts on how to work with integration of biological data involving RNA-Seq and DNA methylation (Illumina 450k).
I have done some research on the internet and found several articles related to data integration process based on concatenation, but I've been difficulty in reproducing their experiments in order to learn how to manipulate the data.
I would like to integrate the RNA-seq data with DNA methylation. About the integration process, I imagine that is at GeneRef ID. But observing the methylation data, each sample containing multiple probes of the methylation levels for the genes. Therefore, there are cases where there are more probes for the same gene. Below is an example of DNA Methylation:
Heat: 6005486023_R04C02, IlmnID, CHR UCSC_RefGene_Name, UCSC_RefGene_Group, Relation_to_UCSC_CpG_Island
Data: 0.075187176583887, cg00000029,16, RBL2, TSS1500, N_Shore
I wonder what the treatment should be done to know the level of methylation of a gene having various probes. This right calculate the average of these probes? What is the right technique to get the methylation level for each gene?
I would like to generate a co-expression network just tumor data. Is this make sense?
Can anyone help me, please.
Thank you very much.