Question: Finding Chip Seq Overlaps with Bed files
2
gravatar for morovatunc
3.7 years ago by
morovatunc400
Turkey
morovatunc400 wrote:

Hello,

I have written here about finding overlaps and I came a point where I got very confused. I have tried several methods for for finding overlaps but none of them seem to me logical. I have tried bedtools multi inter , bedops and bedmap. Though please help me a way to find these overlaps. 

My data is consistedof 20 files ( 13 tumour, 7 normal). All of them are bed files. What I wanna  know;

1) Overlapping peaks of both datasets.

2) Overlaps of from unique ( n=1) to n= 13 for tumour or 7 for normal overlaps. 

3) Bedtools multi inter does this pretty good. However, I realised that it creates false negative overlaps. (2bp region of overlap which makes no sense).

4) With bedtools intersectbed; I have to make combinations of all of the samples which makes enormous amount combination that confuses me a lot.

Can somebody help me out who has done it before? It should not be that hard?

Thank you very much

Tunc

bedmap bedops chip-seq bedtools • 2.3k views
ADD COMMENTlink modified 7 months ago • written 3.7 years ago by morovatunc400

"2bp region of overlap which makes no sense" --> why does it make no sense? 1bp overlap is still an overlap if you do not set a minimum number of bp

ADD REPLYlink written 3.7 years ago by TriS3.9k

 

Tool:    bedtools multiinter (aka multiIntersectBed)

Version: v2.24.0

Summary: Identifies common intervals among multiple

     BED/GFF/VCF files.

 

Usage:   bedtools multiinter [OPTIONS] -i FILE1 FILE2 .. FILEn

     Requires that each interval file is sorted by chrom/start. 

 

Options: 

    -cluster    Invoke Ryan Layers's clustering algorithm.

 

    -header        Print a header line.

            (chrom/start/end + names of each file).

 

    -names        A list of names (one/file) to describe each file in -i.

            These names will be printed in the header line.

 

    -g        Use genome file to calculate empty regions.

            - STRING.

 

    -empty        Report empty regions (i.e., start/end intervals w/o

            values in all files).

            - Requires the '-g FILE' parameter.

 

    -filler TEXT    Use TEXT when representing intervals having no value.

            - Default is '0', but you can use 'N/A' or any text.

 

    -examples    Show detailed usage examples.

 

Error: missing file names (-i) to combine.

This is the help of multi inter. Now please tell me how to spesify that? Thank you

ADD REPLYlink written 3.7 years ago by morovatunc400
5
gravatar for Alex Reynolds
3.7 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

1) Overlapping peaks of both datasets.

First, if not sorted, make sure that your peak, tumour and normal BED files are sorted, e.g.:

$ sort-bed tumour01.unknown_sort_state.bed > tumour01.bed

Repeat sorting for the remaining peak, tumour and normal BED files, as needed. You only have to sort once, at the beginning.

Take the multiset union of your tumour BED files with bedops, and pipe that unioned set to a second bedops command, to find peaks that overlap all tumour elements:

$ bedops --everything tumour01.bed tumour02.bed ... tumour13.bed | bedops --element-of 1 peaks.bed - > peaks_overlapping_tumour_sets.bed

Or all normal elements:

$ bedops --everything normal01.bed normal02.bed ... normal07.bed | bedops --element-of 1 peaks.bed - > peaks_overlapping_normal_sets.bed

Or elements from both categories:

$ bedops --everything tumour01.bed tumour02.bed ... tumour13.bed normal01.bed normal02.bed ... normal07.bed | bedops --element-of 1 peaks.bed - > peaks_overlapping_tumour_and_normal_sets.bed

If you're trying to do something else, please clarify the kind of set operation or association that you want to do.

For example, do you need to know which tumour or normal element's subset overlaps with a particular peak? The bedmap tool can help you here, but you need to preprocess your tumor and normal element subsets, first. Feel free to follow up.

2) Overlaps of from unique ( n=1) to n= 13 for tumour or 7 for normal overlaps. 

You can use a generalization of this approach for finding elements common to all N subsets. For example, for N=13, where A.bed through N.bed are your 13 tumour element sets:

$ N=13
$ bedops --everything A.bed B.bed C.bed ... N.bed \
    | bedmap --count --echo --delim '\t' - \
    | uniq \
    | awk -vN=${N} '$1==N' \
    | cut -f2- \
    > common_to_all_N_tumour_subsets.bed

You can modify this approach for N-1 (12) subsets, N-2 (11) subsets, and so on, by modifying the awk test:

$ N=13
$ bedops --everything A.bed B.bed C.bed ... N.bed \
    | bedmap --count --echo --delim '\t' - \
    | uniq \
    | awk -vN=${N} '$1==(N-1)' \
    | cut -f2- \
    > common_to_N_minus_1_tumour_subsets.bed

You would repeat this for N=7 for your seven normal set files.

Once you have files common_to_*.bed that you need, you can use bedops or bedmap with each of them to do overlap or association tests with peaks, e.g.:

$ bedmap --echo --echo-map peaks.bed common_to_all_N_tumour_subsets.bed > common_tumour_elements_that_overlap_each_peak.bed
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Alex Reynolds28k

 

 

Dear Alex, Thank your for your detailed answer. I followed the protocol at http://bedops.readthedocs.org/en/latest/content/usage-examples/multiple-inputs.html#multiple-inputs

Which gave me peaks within groups. I guess it will give me the same results. However, I did understand the part where we compare both groups. 

Should I merge all peak files in a same bed and do the line below ? 

$bedmap --count --echo --delim '\t' all_bed_files.bed 

Also, You used bedops -elemen of 1 for finding overlaps but I used bedmap. Would there be a significant difference ?

Thank you very much for your patient while helping with me. 

ADD REPLYlink written 3.7 years ago by morovatunc400

Can you explain what you mean by "compare both groups"? Do you want to compare the peak-overlaps-with-tumour set against the peak-overlaps-with-normal set?

To answer your second question, bedops --element-of 1 just reports an overlap. It won't tell you the associated element that overlaps. To report that association (or "map") you would use bedmap.

ADD REPLYlink written 3.7 years ago by Alex Reynolds28k

Alex, exactly like you said. I want to compare normal vs tumour. However, I will achieve this with getting all of them to the same bed file. Then, $bedmap --count --echo . Do you prefer another way ? 

ADD REPLYlink written 3.7 years ago by morovatunc400

Perhaps you want the following:

$ bedmap --echo --count --fraction-both 0.5 peaks.bed tumours.bed > peaks_with_counts_of_overlapping_tumours.bed
$ bedmap --echo --count --fraction-both 0.5 peaks.bed normals.bed > peaks_with_counts_of_overlapping_normals.bed

You might also count the number of overlaps in common:

$ bedmap --echo --count --fraction-both 0.5 peaks.bed <(bedops --everything tumours.bed normals.bed) > peaks_with_counts_of_overlapping_tumours_and_normals.bed

From these three count numbers, you can build a two-set Venn or Euler diagram of overlap events: The number of overlaps unique to tumours, the number of overlaps unique to normals, and the number of overlaps common to both tumours and normals.

This first pass is a fairly naive approach. You may want to think about normalization with this approach, since a 13-tissue set will likely have more elements than a 7-tissue set, and, by chance, the number of overlap events you get with tumours could be overrepresented by virtue of simply having more elements to start with. You might use bedops to count how many elements are common within the 13 tumour sets, and separately with the 7 normal sets, to determine how to normalize counts of both tumour and normal together.

In any case, please note the use of  --fraction-both 0.5 with bedmap, which ensures that an overlapping tumour or normal element covers at least half of a peak element's region. This avoids counting an event as "common", where a tumour element only overlaps on one side of the peak, and a normal element only overlaps on the other side. Requiring 50% or more coverage ensures all elements overlap to be counted as common.

If this isn't clear, draw out three generic intervals on a line and enumerate the different ways overlap events can occur between the three intervals.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Alex Reynolds28k

Alex, 

Thank you for your answer. It solved my problem and This is actually what I want. But I have one last question. when I do this;

$bedmap --count --echo --echo-map-id-uniq --mean --fraction-both .95  --delim "\t" bedops_merge_normalall.bed > answer1.bed.txt 

I will explain it by example:

Say we have 5 regions that are overlapping, bedmap overlaps among each other which will create duplicates and this duplicate may mess up the calculations. Therefore, my question How can I get rid of this duplicates? What I did was taking only the unique values. Since they are 4 decimal point numbers, i think taking only the uniqes won't cause me a big problem?

regions,overlapping regions,ave

A —> B,C,D,E —> 15

B —> A,C,D,E —> 15

C —> A,B,D,E —> 15

D —> A,B,C,E —> 15

E —> A,B,C,D —> 15

 

ADD REPLYlink written 3.7 years ago by morovatunc400
0
gravatar for geek_y
3.7 years ago by
geek_y9.8k
Barcelona
geek_y9.8k wrote:

Bedtools Compare Multiple Bed Files?

ADD COMMENTlink written 3.7 years ago by geek_y9.8k

I honestly read that thread 20 times. Like I mentioned at my 3rd question, multiinter way causes problems such as false positive occurance. And like I mentioned at my 4th question. since I have too many files, I did ask about alternative methods. I did not started this thread without reading current threads. I am aware of intersectbed, bedops and bedmap are possible ways to solve this.

ADD REPLYlink written 3.7 years ago by morovatunc400
0
gravatar for morovatunc
7 months ago by
morovatunc400
Turkey
morovatunc400 wrote:

For the ones who has not found an answer. Homer's mergePeak function exactly what I want.

Link itself is pretty explanatory.

http://homer.ucsd.edu/homer/ngs/mergePeaks.html

However, author seems not to respond problems related with the software so heads up.

ADD COMMENTlink written 7 months ago by morovatunc400

"based on my experience"

ADD REPLYlink written 7 months ago by morovatunc400
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 577 users visited in the last hour