Question: Methylkit using hadoop for DNA analysis
0
gravatar for shalini.ravishankar
5.8 years ago by
United States
shalini.ravishankar10 wrote:

Hi,

Can someone tell me is there a way to do Methylkit ( https://code.google.com/p/methylkit/ ) R Package on Hadoop ???

Please give me some idea on how to proceed this.

This is the Code (Original) :

library(methylKit)
file.list=list( "new_sample1.txt","new_sample2.txt","n_sample3.txt")
myobj=read(file.list,sample.id=list("test1","test2","ctrl1"),assembly="hg19",treatment=c(1,1,0),context="CpG", pipeline=list(fraction=TRUE,chr.col=1,start.col=2,end.col=2,
coverage.col=6,strand.col=3,freqC.col=5 ))
getMethylationStats(myobj[[1]],plot=F,both.strands=F)
pdf("sample1_statistics.pdf")
getMethylationStats(myobj[[1]],plot=T,both.strands=F)
dev.off()
getMethylationStats(myobj[[2]],plot=F,both.strands=F)
pdf("sample2_statistics.pdf")
getMethylationStats(myobj[[2]],plot=T,both.strands=F)
dev.off()
getCoverageStats(myobj[[3]],plot=F,both.strands=F)
pdf("sample3_statistics.pdf")
getMethylationStats(myobj[[3]],plot=T,both.strands=F)
dev.off()
library("graphics")
pdf("sample1_coverage.pdf")
getCoverageStats(myobj[[1]], plot = T, both.strands = F)
dev.off()
pdf("sample2_coverage.pdf")
getCoverageStats(myobj[[2]], plot = T, both.strands = F)
dev.off()
pdf("sample3_coverage.pdf")
getCoverageStats(myobj[[3]], plot = T, both.strands = F)
dev.off()
meth=unite(myobj, destrand=FALSE)
pdf("correlation.pdf")
getCorrelation(meth,plot=T)
dev.off()
pdf("cluster.pdf")
clusterSamples(meth, dist="correlation",method="ward", plot=TRUE)
dev.off()
hc <- clusterSamples(meth, dist = "correlation", method = "ward",plot = FALSE)
pdf("pca.pdf")
PCASamples(meth, screeplot = TRUE)
PCASamples(meth)
myDiff=calculateDiffMeth(meth)
write.table(myDiff, "mydiff.txt", sep='\t')
myDiff25p.hyper <-get.methylDiff(myDiff,differenc=25,qvalue=0.01,type="hyper")
myDiff25p.hyper
write.table(myDiff25p.hyper,"hyper_methylated.txt",sep='\t')
myDiff25p.hypo <-get.methylDiff(myDiff,differenc=25,qvalue=0.01,type="hypo")
myDiff25p.hypo
write.table(myDiff25p.hypo,"hypo_methylated.txt",sep='\t')
myDiff25p <-get.methylDiff(myDiff,differenc=25,qvalue=0.01)
myDiff25p
write.table(myDiff25p,"differentialy_methylated.txt",sep='\t')
diffMethPerChr(myDiff,plot=FALSE,qvalue.cutoff=0.01,meth.cutoff=25)
pdf("diffMethPerChr.pdf")
diffMethPerChr(myDiff,plot=TRUE,qvalue.cutoff=0.01,meth.cutoff=25)
dev.off()
gene.obj <- read.transcript.features(system.file("extdata","refseq.hg18.bed.txt", package = "methylKit"))
write.table(gene.obj,"gene_obj.txt", sep='\t')
annotate.WithGenicParts(myDiff25p, gene.obj)
cpg.obj <- read.feature.flank(system.file("extdata","cpgi.hg18.bed.txt", package = "methylKit"),feature.flank.name = c("CpGi","shores"))
write.table(cpg.obj,"cpg_obj.txt", sep='\t')
diffCpGann <- annotate.WithFeature.Flank(myDiff25p,cpg.obj$CpGi, cpg.obj$shores, feature.name = "CpGi",flank.name = "shores")
write.table(diffCpGann,"diffCpCann.txt", sep='\t')
diffCpGann
promoters <- regionCounts(myobj, gene.obj$promoters)
head(promoters[[1]])
write.table(promoters,"promoters.txt", sep='\t')
diffAnn <- annotate.WithGenicParts(myDiff25p, gene.obj)
head(getAssociationWithTSS(diffAnn))
diffAnn
write.table(getAssociationWithTSS(diffAnn),"diff_ann.txt", sep='\t')
getTargetAnnotationStats(diffAnn, percentage = TRUE,precedence = TRUE)
pdf("piechart1.pdf")
plotTargetAnnotation(diffAnn, precedence = TRUE, main ="differential methylation annotation")
dev.off()
pdf("piechart2.pdf")
plotTargetAnnotation(diffCpGann, col = c("green","gray", "white"), main = "differential methylation annotation")
dev.off()
getFeatsWithTargetsStats(diffAnn, percentage = TRUE)

Thanks, Shalini.

methylkit R dna hadoop • 2.1k views
ADD COMMENTlink modified 5.8 years ago by _r_am31k • written 5.8 years ago by shalini.ravishankar10
0
gravatar for Devon Ryan
5.8 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

There's no native support for hadoop in methylkit. You'd need to either modify the code to use hadoop (likely via Rhadoop) or more simply by using Rmpi or something along those lines to run statistics across regions in parallel (you'd probably have to modify the code to do this too).

ADD COMMENTlink written 5.8 years ago by Devon Ryan97k

Thanks for information Devon. I will look into the Rhadoop and Rmpi. My concern is, this Methylkit takes input as 3 text files and produces a bunch of files with graphs. Will it create any problem if I use it on Hadoop (RHadoop) ??

ADD REPLYlink written 5.8 years ago by shalini.ravishankar10

Err, well you have to rewrite the package so it'll use hadoop anyway, so just rewrite it so that that will work too.

ADD REPLYlink written 5.8 years ago by Devon Ryan97k

thanks devon. I will try it.

ADD REPLYlink written 5.8 years ago by shalini.ravishankar10

Hi Devon,

I was trying to re write but thn I am wondering that my R program takes 3 text files as input and uses a library called methylkit and produces multiple output (mostly PDFs which has charts/graphs) Will this process be affected if I use Hadoop ??

ADD REPLYlink written 5.8 years ago by shalini.ravishankar10

You'll be reimplementing to use hadoop-aware methods in R, so then it shouldn't be a problem.

ADD REPLYlink written 5.8 years ago by Devon Ryan97k

Re implementing in sense ,Do I need to change the whole code ?

ADD REPLYlink written 5.8 years ago by shalini.ravishankar10

Probably not.

ADD REPLYlink written 5.8 years ago by Devon Ryan97k

Can you please explain what do I need to do like steps to achieve this ? It would really help me a lot

ADD REPLYlink written 5.8 years ago by shalini.ravishankar10

I'd have to go through the entirety of the code-base to do that. You're going to have to figure this all out on your own.

ADD REPLYlink written 5.8 years ago by Devon Ryan97k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1186 users visited in the last hour