Question: Merging Peaks from macs2
0
gravatar for dimitrischat
16 months ago by
dimitrischat90
dimitrischat90 wrote:

After macs2, you get files such as .xls. I have 6 .xls files (0h,3h,6h each of them have 2 replicates) and i want to merge them so i can use them (and the bam files) into CSAW for Differential Binding Analysis of ChIP-seq peaksets. So my workflow is like this:

I open the xls file, delete first rows ( until chr,start,end,length,pileup,-log10(pvalue),fold_enrichment,-log10(qvalue),name) and Save it as .csv then rename it as .bed. After that I use this command (which I found here: merge chipseq peaks with bedtools/other tool ) :

awk '{print $0"\t","nm6k27_rep2"NR}' nm6k27_rep2_macs2.bed > nm6k27_rep2_new.bed

in that was every peak has next to it a name which shows where it came from.

Then i am trying to merge them by using this :

cat nm0k27_rep1_macs2.bed nm0k27_rep2_macs2.bed \
    nm3k27_rep1_macs2.bed nm3k27_rep2_macs2.bed \
    nm6k27_rep1_macs2.bed nm6k27_rep2_macs2.bed | \
    sort -k1,1 -k2,2n | mergeBed -i stdin \
    > locations.bed

or

cat nm0k27_rep1_macs2.bed nm0k27_rep2_macs2.bed \
    nm3k27_rep1_macs2.bed nm3k27_rep2_macs2.bed \
    nm6k27_rep1_macs2.bed nm6k27_rep2_macs2.bed | \
    sort -k1,1 -k2,2n | mergeBed -i stdin -o collapse -c 4

but both result an error:

ERROR: file stdin has non positional records, which are only valid for the groupBy tool.

Any suggestions?

chip-seq • 1.2k views
ADD COMMENTlink modified 16 months ago by RamRS25k • written 16 months ago by dimitrischat90
1

Hello dimitrischat,

please take care of formating your post in a way that makes it more easily, e.g. by using paragraphs.

Please use also the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Furthermore it would be helpful if you show us the first lines of your csv and bed file.

Thank you!

fin swimmer

ADD REPLYlink modified 16 months ago • written 16 months ago by finswimmer13k

Sorry..will do!

chr1    713706  714672  967 10.18   13.67773    8.27315 11.37243    macs2hisat2_peak_1   nm0k27_rep12
chr1    824823  825324  502 3.73    4.20165 3.50376 2.17421 macs2hisat2_peak_2   nm0k27_rep13

i now used the bedtools sort as @Devon Ryan said and then used what @ATpoint said. Now i got a merged.bed file with :

chr1    9968    10562
chr1    51536   51712
chr1    603212  604780
chr1    710221  711759
chr1    712948  714845
chr1    726856  727105

is this correct now ? For me to use it in CSAW?

thanks!

Edit: Personal email address removed by mod

ADD REPLYlink modified 16 months ago by finswimmer13k • written 16 months ago by dimitrischat90

Though you can use called peaks with CSAW, they kind of recommend that you don't. It's possible to just feed it the BAM files and still get results, considering it's a window-based method anyway.

ADD REPLYlink written 16 months ago by jared.andrews074.3k

They kind of recommend that you dont? What do they reccomend?

ADD REPLYlink written 16 months ago by dimitrischat90
1

Please refer to the manual. It is outstandingly extensive, and given that you are probably rather new in the ChIP-seq field, it provides a lot of knowledge that one should have prior to diving into the analysis. I strongly recommend to spend some quality time on it ;-)

ADD REPLYlink written 16 months ago by ATpoint28k
1

I recommend reading their user guide, it provides a good breakdown of how the package works and should help you run it properly.

If you really want to use a consensus peakset, you will probably want to use DiffBind instead.

edit: @ninja'd by @ATpoint.

ADD REPLYlink modified 16 months ago • written 16 months ago by jared.andrews074.3k

CSAW works great with consensus peak sets and is far less prone to breaking than DiffBind.

ADD REPLYlink written 16 months ago by Devon Ryan93k

Huh, I had many more issues with csaw than DiffBind, but to each their own.

ADD REPLYlink written 16 months ago by jared.andrews074.3k
1

First rule of bioinformatics, never use Excel things ;-D There should be narrowPeak files from the peak calling. With these, simply do:

cat *narrowPeak | cut -f1-3 | sort -k1,1 -k2,2n | bedtools merge -i - > merged.bed
ADD REPLYlink written 16 months ago by ATpoint28k

there are broadPeak and gappedPeak along with the excel thingy.. :D which one should i use?

ADD REPLYlink written 16 months ago by dimitrischat90
1

I just noticed that you aim to use csaw. The idea of this package is to omit peak calling categorically, and use the slidign window approach. Is there a reason you do it anyway?

EDIT: @jared.andrews07 just asked the same thing.

ADD REPLYlink modified 16 months ago • written 16 months ago by ATpoint28k
1
gravatar for Devon Ryan
16 months ago by
Devon Ryan93k
Freiburg, Germany
Devon Ryan93k wrote:

You're looking for bedtools sort.

ADD COMMENTlink written 16 months ago by Devon Ryan93k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2043 users visited in the last hour