Methylkit using hadoop for DNA analysis
1
0
Entering edit mode
7.4 years ago

Hi,

Can someone tell me is there a way to do Methylkit R Package on Hadoop?

Please give me some idea on how to proceed this.

This is the Code (Original) :

library(methylKit)
file.list=list( "new_sample1.txt","new_sample2.txt","n_sample3.txt")
coverage.col=6,strand.col=3,freqC.col=5 ))
getMethylationStats(myobj[[1]],plot=F,both.strands=F)
pdf("sample1_statistics.pdf")
getMethylationStats(myobj[[1]],plot=T,both.strands=F)
dev.off()
getMethylationStats(myobj[[2]],plot=F,both.strands=F)
pdf("sample2_statistics.pdf")
getMethylationStats(myobj[[2]],plot=T,both.strands=F)
dev.off()
getCoverageStats(myobj[[3]],plot=F,both.strands=F)
pdf("sample3_statistics.pdf")
getMethylationStats(myobj[[3]],plot=T,both.strands=F)
dev.off()
library("graphics")
pdf("sample1_coverage.pdf")
getCoverageStats(myobj[[1]], plot = T, both.strands = F)
dev.off()
pdf("sample2_coverage.pdf")
getCoverageStats(myobj[[2]], plot = T, both.strands = F)
dev.off()
pdf("sample3_coverage.pdf")
getCoverageStats(myobj[[3]], plot = T, both.strands = F)
dev.off()
meth=unite(myobj, destrand=FALSE)
pdf("correlation.pdf")
getCorrelation(meth,plot=T)
dev.off()
pdf("cluster.pdf")
clusterSamples(meth, dist="correlation",method="ward", plot=TRUE)
dev.off()
hc <- clusterSamples(meth, dist = "correlation", method = "ward",plot = FALSE)
pdf("pca.pdf")
PCASamples(meth, screeplot = TRUE)
PCASamples(meth)
myDiff=calculateDiffMeth(meth)
write.table(myDiff, "mydiff.txt", sep='\t')
myDiff25p.hyper <-get.methylDiff(myDiff,differenc=25,qvalue=0.01,type="hyper")
myDiff25p.hyper
write.table(myDiff25p.hyper,"hyper_methylated.txt",sep='\t')
myDiff25p.hypo <-get.methylDiff(myDiff,differenc=25,qvalue=0.01,type="hypo")
myDiff25p.hypo
write.table(myDiff25p.hypo,"hypo_methylated.txt",sep='\t')
myDiff25p <-get.methylDiff(myDiff,differenc=25,qvalue=0.01)
myDiff25p
write.table(myDiff25p,"differentialy_methylated.txt",sep='\t')
diffMethPerChr(myDiff,plot=FALSE,qvalue.cutoff=0.01,meth.cutoff=25)
pdf("diffMethPerChr.pdf")
diffMethPerChr(myDiff,plot=TRUE,qvalue.cutoff=0.01,meth.cutoff=25)
dev.off()
gene.obj <- read.transcript.features(system.file("extdata","refseq.hg18.bed.txt", package = "methylKit"))
write.table(gene.obj,"gene_obj.txt", sep='\t')
annotate.WithGenicParts(myDiff25p, gene.obj)
cpg.obj <- read.feature.flank(system.file("extdata","cpgi.hg18.bed.txt", package = "methylKit"),feature.flank.name = c("CpGi","shores"))
write.table(cpg.obj,"cpg_obj.txt", sep='\t')
diffCpGann <- annotate.WithFeature.Flank(myDiff25p,cpg.obj$CpGi, cpg.obj$shores, feature.name = "CpGi",flank.name = "shores")
write.table(diffCpGann,"diffCpCann.txt", sep='\t')
diffCpGann
promoters <- regionCounts(myobj, gene.obj\$promoters)
write.table(promoters,"promoters.txt", sep='\t')
diffAnn <- annotate.WithGenicParts(myDiff25p, gene.obj)
diffAnn
write.table(getAssociationWithTSS(diffAnn),"diff_ann.txt", sep='\t')
getTargetAnnotationStats(diffAnn, percentage = TRUE,precedence = TRUE)
pdf("piechart1.pdf")
plotTargetAnnotation(diffAnn, precedence = TRUE, main ="differential methylation annotation")
dev.off()
pdf("piechart2.pdf")
plotTargetAnnotation(diffCpGann, col = c("green","gray", "white"), main = "differential methylation annotation")
dev.off()
getFeatsWithTargetsStats(diffAnn, percentage = TRUE)


Thanks, Shalini.

DNA Methylkit R hadoop • 2.7k views
0
Entering edit mode
7.4 years ago

There's no native support for hadoop in methylkit. You'd need to either modify the code to use hadoop (likely via Rhadoop) or more simply by using Rmpi or something along those lines to run statistics across regions in parallel (you'd probably have to modify the code to do this too).

0
Entering edit mode

Thanks for information Devon. I will look into the Rhadoop and Rmpi. My concern is, this Methylkit takes input as 3 text files and produces a bunch of files with graphs. Will it create any problem if I use it on Hadoop (RHadoop) ??

0
Entering edit mode

Err, well you have to rewrite the package so it'll use hadoop anyway, so just rewrite it so that that will work too.

0
Entering edit mode

thanks devon. I will try it.

0
Entering edit mode

Hi Devon,

I was trying to rewrite but then I am wondering that my R program takes 3 text files as input and uses a library called methylkit and produces multiple output (mostly PDFs which has charts/graphs) Will this process be affected if I use Hadoop?

0
Entering edit mode

You'll be reimplementing to use hadoop-aware methods in R, so then it shouldn't be a problem.

0
Entering edit mode

Re implementing in sense ,Do I need to change the whole code ?

0
Entering edit mode

Probably not.

0
Entering edit mode

Can you please explain what do I need to do like steps to achieve this ? It would really help me a lot

0
Entering edit mode

I'd have to go through the entirety of the code-base to do that. You're going to have to figure this all out on your own.