Question

MeDip-seq data interpretation + analysis / MEDIPS, MACS2

0

Entering edit mode

9.6 years ago

florian.noack • 0

Hello everybody,

Since three weeks I try to make sense out of my (h)medip-seq data (I am an absolute beginner in deep sequencing data analysis and bioinformatics in general) and I hope you guys can give me some tips.

Sample: in vivo material of three linage related cell types

Method: comperative (h) MeDip-seq ( Li Tan et al. 2013) with 300 ng of each of the three celltypes (in total 900ng DNA) using the Diagnode MeDip and hMeDip kits.

First of all I don't know if my data look alright since I have really narrow peaks (reads are extended to 250 bp) which align quite often only to on strand (hemimethylated DNA or non-CpG methylation), (http://i.imgur.com/1TiUPTQ.png). I tried already to download published raw medip-seq data to compare them with my data but without success. Most times the data were already processed and therefor i couldnt compare them with my data. Is this narrow peak pattern normal or should I see more broader peaks (MeDip QC and Sequencing QC seem all okay)?

The second problem is to analyse the data and find differential methylated peaks/regions). I used already the MEDIPS package (see code below) but these gives me around 10 peaks without duplicates whereas I get around 500 peaks with duplicates (adj=T).

`library("MEDIPS") library("BSgenome.Mmusculus.UCSC.mm10") library("edgeR")

bcv <- 0.01 counts <- matrix( rnbinom(40,size=1/bcv^2,mu=10), 20,2) y <- DGEList(counts=counts, group=1:2) et <- exactTest(y, dispersion=bcv^2) medipoN="/home/crtd_c...MeDip_Output.bam" medipoR="/home/crtd_c ...2MeDip_Output.bam" BSgenome="BSgenome.Mmusculus.UCSC.mm10" uniq=FALSE extend=100 shift=0 ws=100 chr.select=NULL

N_Output= MEDIPS.createSet(file = medipoN, BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select) R_Output= MEDIPS.createSet(file = medipoR, BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select)

mr.edgeR.MeDip = MEDIPS.meth(MSet1 = N_Output, MSet2 = R_Output, p.adj = "hochberg",diff.method = "edgeR", prob.method = "poisson", MeDIP = F, CNV = F, type = "rpkm", minRowSum = 1)

mr.edgeR.Medip.s = MEDIPS.selectSig(results = mr.edgeR.MeDip, p.value = 0.2, adj = T, ratio = NULL, bg.counts = 30, CNV = F)


However if I visualize the called regions in seqmonk they don't look really interesting whereas I find easily other regions which look more differential methylated. So I don't really trust the package but may used some wrong settings? Can I use adj=F (but i think then i will get to much false positive peaks) ?

I proceed then with MACS2 to call broad peaks in all three samples, merged them and count the reads of the peaks using bedtools.

`macs2 callpeak -t /home/crtd_c ... MeDip_Output.bam -c /home/crtd_c ... MeDip_Input.bam --broad -f BAM -g mm --bw 250 -n PP_MeDip_MACS2` -> for each of the three celltypes

merge all three with "cat" function

bedtools merge -i /home/crtd_csortedmacs2peaks.bed -n -scores sum& -d 100 > mergedMACS2peaks.bed bedtools multicov -bams /home/crtd_c...PP_MeDip_Output.bam /home/crtd_c...DP_MeDip_Output.bam /home/crtd_c...N_MeDip_Output.bam -bed /home/crtd_c...mergedMACS2peaks.bed > MeDippeaks.bed -D -q 30 ```

I sorted all peaks out which have no more than 20 reads in one of the three samples ( sorted <-subset(mydata, ((V6>20)|(V7>20)|(V8>20))))and calculated the log2 ratio between the three celltypes (so at the end I had a excelsheet with the genomic coordinates and the three log2 ratios). However now is the question can I do it like this and if yes, how can I define the significant changes of the log2 ratios (simply +1/-1 fold change ?).

However I am happy about any other suggestions regarding the data analysis. May there is a tool out there which I didn't try so far.

Thanks a lot!

ChIP-Seq • 7.8k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by florian.noack • 0

Ram · Answer 1 · 2014-09-18

1

Entering edit mode

9.6 years ago

Ming Tommy Tang ★ 3.9k

I have post related to MeDIP-seq

You may find it helpful.

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Ming Tommy Tang ★ 3.9k

0

Entering edit mode

Thanks a lot, I will try diffreps as well as the MACS2 bdgcmp option.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by florian.noack • 0

0

Entering edit mode

Hi Florian,

How was your experience with the MACS2 bdgcmp option? I was trying to compare a Tumor and Normal sample ( both having CONTROL Sample) and after running MACS2 bdgcmp option, I am only getting 1-2 DMR peaks ? Is that anything obvious that I am missing?

Thanks

ADD REPLY • link 4.8 years ago by always_learning ★ 1.1k