Question: Using bedtools intersect on multiple bed files to retain peaks in 2 out of 3 files
0
gravatar for camerond
9 months ago by
camerond150
Cardiff
camerond150 wrote:

I have 4 standard bed files, a large general file and three smaller files.

I'd like to run bedtools intersect to retain the peaks in the general file that overlap with peaks in at least two of the three smaller files keeping only the peak boundary limits of the general file peaks. This is the general command I'm looking at for this:

bedtools intersect -wa -a general.bed -b file1.bed file2.bed file3.bed

However, the flags for restricting the output based on the files passed to -b relate only to the proportion of overlap in base pairs between the files (i.e. using -F -f -r etc.). I don't see a way to restrict the general file based on overlaps with peaks contained in a set number of files sent to -b - I hope this makes sense.

I have considered creating a consensus peak set of the smaller files first using Diffbind retaining peaks contained in 2/3 files, then intersecting this consensus peak file with the general file, but I fear this method may drop/miss peaks that I'd like to retain.

Any suggestions regarding whether I can do this using bedtools, or an alternative method would be greatly appreciated.

bed file intersect bedtools • 420 views
ADD COMMENTlink modified 3 months ago by jrmerritt60 • written 9 months ago by camerond150
2
gravatar for benformatics
9 months ago by
benformatics1.8k
ETH Zurich
benformatics1.8k wrote:

It can be done in pairwise manner no?

g: general

f: file

  • g-f1 overlaps, then resulting file overlaps with f2 --> overlaps1
  • g-f1 overlaps, then resulting file overlaps with f3 --> overlaps2
  • g-f2 overlaps, then resulting file overlaps with f3 --> overlaps3

Then take the unique peaks present after merging the three resulting files: overlaps1/2/3.

ADD COMMENTlink modified 9 months ago • written 9 months ago by benformatics1.8k

@benformatics Yes - I was hoping there was something streamlined for this but perhaps it doesn't exist. Many thanks for the suggestion.

ADD REPLYlink written 9 months ago by camerond150
0
gravatar for camerond
9 months ago by
camerond150
Cardiff
camerond150 wrote:

As suggested above by @benformatics I used the following in Snakemake:

rule bedtools_intersect_pairwise_method:
    input:
        general="general_file.bed",
        rep1="file1.bed"
        rep2="file2.bed"
        rep3="file3.bed"
    output:
        "FINAL_pairwise.bed"
    shell:
        """
        bedtools intersect -wa -a {input.general} -b {input.rep1} > gen_rep1_ovrlps.bed;
        bedtools intersect -wa -a {input.general} -b {input.rep2} > gen_rep2_ovrlps.bed;
        bedtools intersect -wa -a gen_rep1_ovrlps.bed -b {input.rep2} > dual_ovrlps_1.bed;
        bedtools intersect -wa -a gen_rep1_ovrlps.bed -b {input.rep3} > dual_ovrlps_2.bed;
        bedtools intersect -wa -a gen_rep2_ovrlps.bed -b {input.rep3} > dual_ovrlps_3.bed;
        cat dual_ovrlps_1.bed dual_ovrlps_2.bed dual_ovrlps_3.bed > all_ovrlps.bed;
        sort-bed all_ovrlps.bed > all_ovrlps.srtd.bed;
        bedtools merge -i all_ovrlps.srtd.bed > {output};
        rm *ovrlps*;
        """
ADD COMMENTlink modified 9 months ago • written 9 months ago by camerond150
0
gravatar for jrmerritt6
3 months ago by
jrmerritt60
jrmerritt60 wrote:

Use bedtools intersect. Here I'm using the narrowPeak files from MACS2, but this can be adapted to any file that has chr, st, end as the first three columns.

Concatenate narrowPeak files, coordinate sort, then merge peaks within 10 bp

cat ${SampleName}_1_peaks.narrowPeak ${SampleName}_2_peaks.narrowPeak ${SampleName}_1_peaks.narrowPeak > ${SampleName}_123
sort -k1,1 -k2,2n -k3,3n ${SampleName}_123 > ${SampleName}_sort
cat ${SampleName}_sort | awk '{print $1"\t"$2"\t"$3"\t"$7}' > ${SampleName}_sort.bed
bedtools merge -d 10 -c 4 -o mean -i ${SampleName}_sort.bed > ${SampleName}_mergedpeaks.bed

Write out bed file with showing the number of replicates that support peaks in summary file ${SampleName}_mergedpeaks.bed

bedtools intersect -wa -c \ 
    -a ${SampleName}_mergedpeaks.bed \
    -b ${SampleName}_1_peaks.narrowPeak ${SampleName}_2_peaks.narrowPeak ${SampleName}_3_peaks.narrowPeak \
    -sorted \
    -F 1.0 > ${SampleName}_mergedpeak_replicates.bed

To double check your results, you could also write out all of the peaks in each file that overlap with the merged peak files:

bedtools intersect -wa -wb \
    -a ${SampleName}_mergedpeaks.bed \
    -b ${SampleName}_1_peaks.narrowPeak ${SampleName}_2_peaks.narrowPeak ${SampleName}_3_peaks.narrowPeak \
    -sorted \
    -filenames \
    -F 1.0 > ${SampleName}_mergedpeak_verbose_replicates.bed

Filter ${SampleName}_mergedpeak_replicates.bed file for peaks that have greater than or equal to 2 samples, sort, then remove the column with the number of replicates:

awk '$5 >= 2 {print}' ${SampleName}_mergedpeak_replicates.bed > ${SampleName}_peak_replicates_filter.bed
sort -k1,1 -k2,2n -k3,3n ${SampleName}_peak_replicates_filter.bed > ${SampleName}_peak_replicates_sort.bed
cat ${SampleName}_peak_replicates_sort.bed | awk '{print $1"\t"$2"\t"$3}' > ${SampleName}_peak_replicates.bed
ADD COMMENTlink written 3 months ago by jrmerritt60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1421 users visited in the last hour