Question

Merge a bed file to generate non-overlapping entries!

3

Entering edit mode

8.7 years ago

renuka.pasupuleti1910 ▴ 30

Hi all,

I am having a bed file which is the output of 2 bed files but it has given me some overlaps

For example

chr1 1234 1240
chr1 1234 1245
chr1 1235 1245

I wanted to find out all the overlaps and also I want the regions appears only once in my bed file .can some1 help me please

sequence • 11k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by renuka.pasupuleti1910 ▴ 30

0

Entering edit mode

You want to remove the duplicate lines or overlapping regions?

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by GouthamAtla 12k

0

Entering edit mode

There is no duplicate in that example.

ADD REPLY • link 8.7 years ago by Manu Prestat 4.1k

0

Entering edit mode

I think she meant overlapping reads/regions. I'll edit the Q

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by Sukhi Singh 11k

Ram · Answer 1 · 2015-08-06

2

Entering edit mode

8.7 years ago

michael.ante ★ 3.8k

Hi,

Your example shows overlapping features, which are no duplicates. If you want to merge those features try:

mergeBed -i test.bed
chr1    1234    1245

Cheers,
Michael

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by michael.ante ★ 3.8k

0

Entering edit mode

Hi michael,

Since I am new to this field I was mistaken.and when I am using mergebed I have following error

ERROR: input file: (hipo_agi4w) is not sorted by chrom then start.
       The start coordinate at line 6 is less than the start at line 5

Can you suggest me something

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by renuka.pasupuleti1910 ▴ 30

1

Entering edit mode

Sort the bed file and then mergeBed

sort -k1,1 -k2,2n input.bed | mergeBed > output.bed

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks a lot gautham. It worked so well. I just have few doubts in it. The actual procedure has started in this way:

I have 4 bed files and got the target bed file by intersecting all these. Then I have a bam file and have done coverage bed with my target file. But after this step when I have some calculations in low covered regions it showed some overlapping regions as my colleague is using same procedure but considering target file as individual 4 files she has got a very good and clear results but mine has showed some overlaps. Do you have any idea where in intersect or coverage bed this could happen?

Thanking you,
Renuka Pasupuleti
Masters in Bioinformatics
Saarland University

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by renuka.pasupuleti1910 ▴ 30

0

Entering edit mode

Post your code, any parameter can change the output.

ADD REPLY • link 8.7 years ago by Sukhi Singh 11k

0

Entering edit mode

For intersection

interscetBed -a a.bed -b b.bed | intersectBed -a stdin -b c.bed | intersectBed -a stdin -b d.bed > output(e.bed)
coverageBed -d -abam a.bam -b e.bed > output

to get the regions which are under certain threshold I have used some perl script

#!/bin/perl

use strict;

### 1. input is the result of coverageBed -a BAM file and target bed file
### 2. output file
### 3. depth threshhold
### 4. gap length allowed

my ($input, $output, $thresh, $gap) = @ARGV;

open(INFILE, $input); open(OUT, ">$output");

my $flag = 0; my $flag_start = 0; my $flag_end = 0; my $gap_length = 0; my $chr_up = ''; my $start_up = 0; my $end_up = 0;

while(<infile>) {
  chomp;
  my @items = split('\t', $_);
  my $chr = $items[0];
  my $start = $items[1];
  my $end = $items[2];
  my $pos = $items[-2];
  my $depth = $items[-1];
  if(($chr != $chr_up || $start != $start_up) && $flag == 1) {
    $flag = 0;
    $gap_length = 0;
    $flag_end = $end_up-$gap_length;
    print OUT "$chr_up\t$flag_start\t$flag_end\t$chr_up\t$start_up\t$end_up\n";
  }

  if($depth<=$thresh && $flag==0) {
    $flag = 1;
    $flag_start = $start+$pos-1;
  }

  if($depth<=$thresh && $flag==1){
    $gap_length = 0;
  }

  if($depth>$thresh && $flag==1) {
    if($gap_length<$gap) {
      $gap_length += 1;
    }
    else {
      $flag = 0;
      $gap_length = 0;
      $flag_end = $start+$pos-$gap-1;
      print OUT "$chr\t$flag_start\t$flag_end\t$chr\t$start\t$end\n";
    }
  }

$chr_up = $chr; $start_up = $start; $end_up = $end;
}

close(OUT)

I am getting overlaps in the resultant file. Now I am not understanding in which stage the problem has been occurred.

Thanking you,
Renuka Pasupuleti
Masters in Bioinformatics
Saarland University

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by renuka.pasupuleti1910 ▴ 30

0

Entering edit mode

Sorry but did you wrote the Perl script, do you know what's happening there?

It's possible to get an overlap. From the manual:

Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bam File  ***************
BED File   ^^^^ ^^^^
              ^^^^^^^^
                     ^^^^^ ^^^^^ ^^

It's possible to have an overlap with regions passing your coverage threshold. The Perl script you are using has an additional gap length filter that you can modulate, else you can filter low enriched regions using awk in a one liner.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by Sukhi Singh 11k

0

Entering edit mode

But the same script was used by my friend and it worked well with it. Anyhow can you suggest what modifications I can do in the script so that my bed file doesn't contain any overlaps or that filtering low enriched regions.

Thanking you,
Renuka Pasupuleti
Masters in Bioinformatics
Saarland University

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by renuka.pasupuleti1910 ▴ 30

0

Entering edit mode

You mean the problem is occurring due to the script. Can you help me out please, since my further analysis is based entirely on this

Thanking you,
Renuka Pasupuleti
Masters in Bioinformatics
Saarland University

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by renuka.pasupuleti1910 ▴ 30

0

Entering edit mode

I'd suggest asking a new follow-up question that describes what you want to do now that you have your merged regions. When you change the question that you want us to answer, things can get difficult to follow. Tell us what you want to do AND why (the biological context). Sometimes, you think you need an answer to one question, but when you give the context, we can see that you actually need an answer to a different question.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by Sean Davis 26k

Ram · Answer 2 · 2015-08-06

0

Entering edit mode

8.7 years ago

Sean Davis 26k

Try:

sort -k1,1 -k2,2n -k3,3n BEDFILE.bed | uniq

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by Sean Davis 26k

0

Entering edit mode

Hi I have tried this but still it has showed the same regions

chr1    205799465       205799518       chr1    205799394       205799554
chr1    205799528       205799537       chr1    205799394       205799554

Here as u see the regions are getting overlapped.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by renuka.pasupuleti1910 ▴ 30

Ram · Answer 3 · 2015-08-06

0

Entering edit mode

8.7 years ago

Alex Reynolds 35k

Save time and hassle and use BEDOPS.

You can sort, remove duplicates, and merge overlapping elements in one statement:

$ sort-bed myData.bed | uniq | bedops --merge - > answer.bed

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by Alex Reynolds 35k

0

Entering edit mode

My system is not supporting bedops. Can you tell me the alternative if there are any

Thanking you,
Renuka Pasupuleti
Masters in Bioinformatics
Saarland University

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by renuka.pasupuleti1910 ▴ 30

0

Entering edit mode

What system do you use? As long as it is a modern UNIX, it should be compilable. If you want help, let me know more details.

ADD REPLY • link 8.7 years ago by Alex Reynolds 35k