I have some CHiP-Seq data for transcription factor binding in Arabidopsis Thaliana (the model plant). The data is paired-end, with two replicates and a control (total input). I have trimmed and aligned the data and now have sorted, indexed .BAM files (or .BED files). Reads are 100bp each, with average DNA fragment sizes between 300-450 depending on the sample.
Viewing the reads in IGV I can see some regions (for two genes that we think are targets of the TF) that are highly enriched across the whole gene (rather than the promoter region), as well as various bits of noise where both Input and control have large peaks.
When I try using MACS, I get a huge list of peaks that include those two genes. But when I look at these other "peaks" in IGV, the plots are almost exactly the same shape between the ChIP and Input. They are sometimes different sizes (presumably due to read count), but on a visual inspection they look almost identical. My call to MACS is something like:
macs -t TF_3ul_P_sorted.bam -c TF_Input_P_sorted.bam -f BAM -g 111755668 -n TF_3ul -B -s 100 -S --bw=350
I've been looking for different Peak calling algorithms that are designed for paired-end reads and I seem to be struggling. A lot of the possible options then tell me they only take paired end data in the form of ELAND, whatever that is. Or I can't manage to successfully install them. I'm using a Windows 7 machine with a VirtualBox running Ubuntu. My Linux skills are fairly basic, and this is causing problems with installation of some of the tools that I find. Or they only work on Human/Mouse data, not Arabidopsis, which is completely useless to me.
Can anyone suggest a peak-calling algorithm that takes paired-end data and successfully removes peaks that are the same shape in the Input control sample?