Retrieve all the differentially methylated cpg's in a region
2
1
Entering edit mode
8.9 years ago

I have a very large file containing differentially methylated cytosines and another file containing annotations of genes including CDS etc. I need to get a graph like this

I wanted to use genomic ranges for it so that I can then find out how many of them are >5kb away from TSS and so that I can do other manipulations.

But GenomicRanges does not accept sites with same start and end as is the case with differentially methylated cytosines. So I was wondering if somebody has done it using GenomicRanges.

Another way I can do it is use binary search ( using file:sorted:seek in perl) but I am looking for a cookbook solution if it is available.

Thanks

methylation GenomicRanges R • 2.4k views
0
Entering edit mode

Can anyone explain how to get the fold enrichment in the bar plot and the p-value.

0
Entering edit mode

The fold-enrichment there is just the number of DmC/total mC in a given region divided by that in CDS. At least that's how it appears, though I suspect that the figure legend would actually say. I would presume that the p-values are from a Fisher's test.

0
Entering edit mode
8.9 years ago
Xingyu Yang ▴ 280

You can sort both of them first by coordinate and chromosome and them compare these two files parallelly. This idea is similar to "merge sort". I'm not sure if there exist a software to do so.

The order of time complexity is o(nlog(n)) for sorting and o(n) for merging.

To use binary search, you still need to sort them first. And it might be difficult to consider overlapping genes.

Another optimization you can do is to separate them by chromosome and repeat all the work above for each chromosome.

1
Entering edit mode

You might be interested in reading about BEDOPS, which has tools to do fast sorting and set operations. These can be done from within R via system() calls.

0
Entering edit mode

Thanks! I'll have a try.

0
Entering edit mode
8.9 years ago

GenomicRanges is perfectly happy with ranges sharing the same start/end coordinate:

gr <- GRanges(seqnames="chr1", range=IRanges(start=c(1), end=c(1)))


You might post some example data, the command you're using, and the resulting error message.