Question: Merge a bed file to generate non-overlapping entries!
3
gravatar for renuka.pasupuleti1910
3.4 years ago by
Germany
renuka.pasupuleti191030 wrote:

hi all,

am having a bed file which is the output of 2 bed files.but it has given me some overlaps

for example

chr1 1234 1240

chr1 1234 1245

chr1 1235 1245

I wanted to find out all the overlaps and also i want the regions appears only once in my bed file .can some1 help me please
 

sequence • 4.8k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by renuka.pasupuleti191030

You want to remove the duplicate lines or overlapping regions ?

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by geek_y8.8k

There is no duplicate in that example.

ADD REPLYlink written 3.4 years ago by Manu Prestat3.9k

I think she meant overlapping reads/regions. I'll edit the Q

ADD REPLYlink written 3.4 years ago by Sukhdeep Singh9.5k
2
gravatar for michael.ante
3.4 years ago by
michael.ante2.8k
Austria/Vienna
michael.ante2.8k wrote:

Hi,

your example shows overlapping features, which are no duplicates. If you want to merge those features try:

mergeBed -i test.bed
chr1    1234    1245

Cheers,

Michael

ADD COMMENTlink written 3.4 years ago by michael.ante2.8k

hi michael,

since i am new to this field i was mistaken.and when i am using mergebed i have following error

ERROR: input file: (hipo_agi4w) is not sorted by chrom then start.
       The start coordinate at line 6 is less than the start at line 5

can you suggest me something

ADD REPLYlink written 3.4 years ago by renuka.pasupuleti191030
1

Sort the bed file and then mergeBed

sort -k1,1 -k2,2n input.bed | mergeBed > output.bed
ADD REPLYlink written 3.4 years ago by geek_y8.8k

thanks alot gautham.it worked so well. i just have few doubts in it.the actual procedure has started in this way.

i have 4 bed files and got the target bed file by intersecting all these.than i have a bam file n have done coverage bed with my target file.but after this step when i have some calculations in low covered regions it showed some overlapping regions as my collegue is using same procedure but considering target file as individual 4 files she has got a very good and clear results but mine has showed some overlaps.do you have any idea where in intersect or coverage bed this could happen

Thanking you,

Renuka Pasupuleti

Masters in Bioinformatics

Saarland University

ADD REPLYlink written 3.4 years ago by renuka.pasupuleti191030

Post your code, any parameter can change the output.

ADD REPLYlink written 3.4 years ago by Sukhdeep Singh9.5k

for intersection

interscetBed -a a.bed -b b.bed | intersectBed -a stdin -b c.bed | intersectBed -a stdin -b d.bed > output(e.bed)

coverageBed -d -abam a.bam -b e.bed > output

to get the regions which are under certain threshold i have used some perl script

!/bin/perl

use strict;

1. input is the result of coverageBed -a BAM file and target bed file

2. output file

3. depth threshhold

4. gap length allowed

my ($input, $output, $thresh, $gap) = @ARGV;

open(INFILE, $input); open(OUT, ">$output");

my $flag = 0; my $flag_start = 0; my $flag_end = 0; my $gap_length = 0; my $chr_up = ''; my $start_up = 0; my $end_up = 0;

while(<infile>){ chomp; my @items = split('\t', $_); my $chr = $items[0]; my $start = $items[1]; my $end = $items[2]; my $pos = $items[-2]; my $depth = $items[-1]; if(($chr != $chr_up || $start != $start_up) && $flag == 1){ $flag = 0; $gap_length = 0;

    $flag_end = $end_up-$gap_length;
    print OUT

"$chr_up\t$flag_start\t$flag_end\t$chr_up\t$start_up\t$end_up\n"; }

if($depth&lt;=$thresh &amp;&amp; $flag==0){
    $flag = 1;
    $flag_start = $start+$pos-1;


}
if($depth&lt;=$thresh &amp;&amp; $flag==1){
    $gap_length = 0;

}

if($depth>$thresh && $flag==1){ if($gap_length<$gap){ $gap_length += 1; } else{ $flag = 0; $gap_length = 0; $flag_end = $start+$pos-$gap-1; print OUT "$chr\t$flag_start\t$flag_end\t$chr\t$start\t$end\n"; } } $chr_up = $chr; $start_up = $start; $end_up = $end; }

close(OUT)

i am getting overlaps in the resultant file .now i am not understanding in which stage the problem has been occured.

Thanking you,

Renuka Pasupuleti

Masters in Bioinformatics

Saarland University

ADD REPLYlink modified 3.4 years ago by Sukhdeep Singh9.5k • written 3.4 years ago by renuka.pasupuleti191030

Sorry but did you wrote the Perl script, do you know what's happening there?

Its possible to get an overlap
From the manual:
http://bedtools.readthedocs.org/en/latest/content/tools/coverage.html

Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Bam File  ***************

BED File   ^^^^ ^^^^
              ^^^^^^^^

                     ^^^^^ ^^^^^ ^^

Its possible to have an overlap with regions passing your coverage threshold. The Perl script you are using has an additional gap length filter that you can modulate, else you can filter low enriched regions using awk in a one liner.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Sukhdeep Singh9.5k

but the same script was used by my friend and it worked well with it.anyhow can you suggest what modifications i can do in the script so that my bed file dsnt contain any overlaps or that filtering low enriched regions.

Thanking you,

Renuka Pasupuleti

Masters in Bioinformatics

Saarland University

ADD REPLYlink written 3.4 years ago by renuka.pasupuleti191030

you mean the problem is occurring due to the script.can you help me out please,since my further analysis is based entirely on this

Thanking you,

Renuka Pasupuleti

Masters in Bioinformatics

Saarland University

ADD REPLYlink written 3.4 years ago by renuka.pasupuleti191030

I'd suggest asking a new follow-up question that describes what you want to do now that you have your merged regions. When you change the question that you want us to answer, things can get difficult to follow. Tell us what you want to do AND why (the biological context). Sometimes, you think you need an answer to one question, but when you give the context, we can see that you actually need an answer to a different question.  

ADD REPLYlink written 3.4 years ago by Sean Davis25k
0
gravatar for Sean Davis
3.4 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

try:

sort -k1,1 -k2,2n -k3,3n BEDFILE.bed | uniq
ADD COMMENTlink written 3.4 years ago by Sean Davis25k

hi i have tried this but still it has showed the same regions

chr1    205799465       205799518       chr1    205799394       205799554
chr1    205799528       205799537       chr1    205799394       205799554


here as u see the regions are getting overlapped.

ADD REPLYlink written 3.4 years ago by renuka.pasupuleti191030
0
gravatar for Alex Reynolds
3.4 years ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

Save time and hassle and use BEDOPS.

You can sort, remove duplicates, and merge overlapping elements in one statement:

$ sort-bed myData.bed | uniq | bedops --merge - > answer.bed
ADD COMMENTlink written 3.4 years ago by Alex Reynolds26k

my system is not supporting bedops can you tell me the alternative if there are any

Thanking you,

Renuka Pasupuleti

Masters in Bioinformatics

Saarland University

ADD REPLYlink written 3.4 years ago by renuka.pasupuleti191030

What system do you use? As long as it is a modern UNIX, it should be compilable. If you want help, let me know more details.

ADD REPLYlink written 3.4 years ago by Alex Reynolds26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1703 users visited in the last hour