Question: How to compare methylation array data with methylation MeDip-seq data?
gravatar for rdbcasillas11
5.9 years ago by
United States
rdbcasillas1110 wrote:


I wanted to compare the methylation data generated in a study by both 450k array and MeDip-seq techniques. The file formats of both differ. The array data is in a txt file with CpG island number and its methylation percentage. The Medip seq data is in BED format with chromosome coverage and the corresponding methylation content. I am new to this field and not sure how to compare both as they both have different column values.

Here are the files:

Medip seq :

Array :

Can anyone see how it would be possible to compare the two?

Thank you.

methylation • 1.9k views
ADD COMMENTlink modified 5.9 years ago by Devon Ryan98k • written 5.9 years ago by rdbcasillas1110

could you elaborate a little on what you mean for "compare"? you mean one is the control (i.e. 450k) and the other one your condition (i.e. MeDip) and you want to see the changes between the two?

what's you goal? what would you like to find out with the data?


ADD REPLYlink written 5.9 years ago by TriS4.3k

Both the data are from the same individual and highlight the methylation content. One has cg probes while the other one has chromosome number.

Sample of a Txt file from 450k array :

    * ID_REF            VALUE           Detection.Pval
    * cg00000029    0.533466            0
    * cg00000108    0.9221188      0

The VALUE represents the amount of methylation(0 - unmethylated and 1 - very highly methylated)

Sample of a Bed file from Medip seq data:

    * chr1    1       1000        0.000852090112432486
    * chr1    501    1500       0.0005609473955776
    * chr1    1001   2000       0
    * chr1    1501    2500      0
    * chr1    2001    3000      0

I want to do a comparison between data generated from 2 techniques and see the correlation of methylation percentage. How much they match or differ.


My concern is how to find chromosome or genome location of the CpG probe in txt file?

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by rdbcasillas1110
gravatar for Devon Ryan
5.9 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

N.B., the R packages I reference below are available on Bioconductor.

I happen to have been comparing some human 450k samples to mouse RRBS samples recently, so perhaps I can provide some guidance.

anno <- get450k()

The anno GRanges object now contains a convenient annotation of each probe on the array and the CpG that it should be informative about. I can also recommend the minfi package, which facilitates a lot of the raw file processing.

Regarding incorporating the MeDIP-seq dataset, that's a little tougher. One possibility is to simply correlate the methylation in the two files. This can be conveniently done by creating a GRanges object from the MeDIP-seq file and then using findOverlaps() to get overlapping CpGs from the 450k array. You would then need to get average (or median) methylation of the probes for each of the overlapped regions and then you can plot/calculate the correlation. How well they're correlated remains to be seen. I've generally been unimpressed with MeDIP-seq (we do mostly bisulfite sequencing).

ADD COMMENTlink written 5.9 years ago by Devon Ryan98k

I have a data_1 which is in text format with columns, chr (representing chromosome number), stable_id, start, end & methylation. This is in txt format, mm9 version.

I have a data_2 which is in bigwig format with columns, seqnames, ranges, strand, methylation score. This is in mm10 format. (over 10 million rows)

I am to compare the data_1$start, data_1$end with data_2$ranges and compute the average methylation score and number of CpG islands.

Steps I followed which I believe is a long route.

  1. Step:1 - Converted data_1 to a file format like 'chrN:start-end' and exported the CSV .
  2. Step:2 - Used this CSV file, uploaded to ucsc genome browser LiftOver tool, converted from mm9 to mm10 - Output was a bed file.
  3. Step:3 - Replaced the start and end of data_1 file with new start and end coordinates of the liftovered output bed file.
  4. Step: 4- Comparing the start and end of data_1 with data_2, This is where I am stuck, takes a lot of time using R to process. IS there a simpler way than what I followed? New to field. Please explain in steps.
ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by startup_biostar10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1481 users visited in the last hour