Ideal Way Of Calculating Dna Methylation Of A Region (E.G: Exon/Promoter)
1
0
Entering edit mode
10.1 years ago
skm770 ▴ 150

Hi

I am trying to use a public dataset which does not have coverage information

chr1 10468 10470 0.895333

chr1 10470 10472 0.895967

chr1 10483 10485 0.99393

What is the best/ideal way to calculate regioinal level hai to information?

There are certain tools like roimethstat that I use in methpipe but they require coverage information.

So I was wondering what would be the ideal way to get a normalized value of a region (say exon). Should I just add all the values and divide by total number of CpG positions or the total length of the region.

• 3.6k views
ADD COMMENT
0
Entering edit mode

Where does this data come from? It looks to me like perhaps it should be organized as follows:

 52090 chr1 249208638 249208758 + 0.00000000
 52091 chr1 249210801 249212769 + 0.05332656
 52092 chr1 249210801 249212772 + 0.05324544
 52093 chr1 249211478 249211600 + 0.07317073
 52094 chr1 249211478 249214145 + 0.03260870
 52095 chr1 249230780 249231242 + 0.00000000

If this is the case, it looks like you already have percentage methylation values (last column - all regions listed would be unmethylated) for segments.

There might be an arbitrary ID in the first column, or possibly a sample ID (the 2nd and 3rd, as well as the 4th and 5th, segments appear to be strongly overlapping). This would make a difference in the interpretation.

If you were working with raw data, the best tool would depend upon the technology being used.

If you are working with processed data, I think you would want to look for a tool that operates on genomic intervals (for example, look for overlap between your regions and a set of promoter locations)

ADD REPLY
0
Entering edit mode

I apologize I had wrongly written the processed information that I calculated which is basically the average over a region. I have updated the question with correct info.

ADD REPLY
0
Entering edit mode

The regions in the updated example look small - each is 3 bp. If they are all like this, you could take the middle nucleotide (such as 10469 for the first row) and force it into format for tools that analyze percentage methylation. For example, I think you can make it look like the .bed file needed for methylKit. COHCAP also accepts just percentage methylation values (in fact, that is the only thing it directly works with), but that is meant for targeted BS-Seq (so, you would also need to provide a pre-defined list of regions).

ADD REPLY
0
Entering edit mode
10.1 years ago

If that's all you have access to then the best you can do is calculate the average. The alternative would be to see if you can get the original reads, which you could then map and derive more useful metrics (I'm not familiar with roimethstat, but I assume it does some sort of weighted average).

Edit: BTW, you could read that into R, make it a GRanges object, findOverlaps(...,select="all"), split() into a GRangesList according to that and then lapply a function to average the appropriate columns (just put the scores in into the "score" column and you can simply use mean(score(GrangesObject))).

ADD COMMENT
0
Entering edit mode

I was thinking that rather than taking the average of a region I would divide it by the total number of Cpg recognized in hg19 genome at that position. Would that be the right thing to do with this data set.

ADD REPLY
0
Entering edit mode

I guess it depends on what you want to do with the resulting data. Normally you're interested in some metric describing a region that's relatively unaffected by missing values, which dividing the sum of the scores by number of known CpGs wouldn't yield (imagine comparing two regions of equal methylation where one just had some CpGs with no information...you'd see a spurious difference).

ADD REPLY

Login before adding your answer.

Traffic: 2685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6