After macs2, you get files such as .xls. I have 6 .xls files (0h,3h,6h each of them have 2 replicates) and i want to merge them so i can use them (and the bam files) into CSAW for Differential Binding Analysis of ChIP-seq peaksets. So my workflow is like this:
I open the xls file, delete first rows ( until chr,start,end,length,pileup,-log10(pvalue),fold_enrichment,-log10(qvalue),name) and Save it as .csv then rename it as .bed. After that I use this command (which I found here: merge chipseq peaks with bedtools/other tool ) :
awk '{print $0"\t","nm6k27_rep2"NR}' nm6k27_rep2_macs2.bed > nm6k27_rep2_new.bed
in that was every peak has next to it a name which shows where it came from.
Then i am trying to merge them by using this :
cat nm0k27_rep1_macs2.bed nm0k27_rep2_macs2.bed \
nm3k27_rep1_macs2.bed nm3k27_rep2_macs2.bed \
nm6k27_rep1_macs2.bed nm6k27_rep2_macs2.bed | \
sort -k1,1 -k2,2n | mergeBed -i stdin \
> locations.bed
or
cat nm0k27_rep1_macs2.bed nm0k27_rep2_macs2.bed \
nm3k27_rep1_macs2.bed nm3k27_rep2_macs2.bed \
nm6k27_rep1_macs2.bed nm6k27_rep2_macs2.bed | \
sort -k1,1 -k2,2n | mergeBed -i stdin -o collapse -c 4
but both result an error:
ERROR: file stdin has non positional records, which are only valid for the groupBy tool.
Any suggestions?
Hello dimitrischat,
please take care of formating your post in a way that makes it more easily, e.g. by using paragraphs.
Please use also the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Furthermore it would be helpful if you show us the first lines of your csv and bed file.
Thank you!
fin swimmer
Sorry..will do!
i now used the bedtools sort as @Devon Ryan said and then used what @ATpoint said. Now i got a merged.bed file with :
is this correct now ? For me to use it in CSAW?
thanks!
Edit: Personal email address removed by mod
Though you can use called peaks with CSAW, they kind of recommend that you don't. It's possible to just feed it the
BAM
files and still get results, considering it's a window-based method anyway.They kind of recommend that you dont? What do they reccomend?
Please refer to the manual. It is outstandingly extensive, and given that you are probably rather new in the ChIP-seq field, it provides a lot of knowledge that one should have prior to diving into the analysis. I strongly recommend to spend some quality time on it ;-)
I recommend reading their user guide, it provides a good breakdown of how the package works and should help you run it properly.
If you really want to use a consensus peakset, you will probably want to use DiffBind instead.
edit: @ninja'd by @ATpoint.
CSAW works great with consensus peak sets and is far less prone to breaking than DiffBind.
Huh, I had many more issues with csaw than DiffBind, but to each their own.
First rule of bioinformatics, never use Excel things ;-D There should be narrowPeak files from the peak calling. With these, simply do:
there are broadPeak and gappedPeak along with the excel thingy.. :D which one should i use?
I just noticed that you aim to use csaw. The idea of this package is to omit peak calling categorically, and use the slidign window approach. Is there a reason you do it anyway?
EDIT: @jared.andrews07 just asked the same thing.