Question: Finding only intergenic regions in a BED file [Answered]
0
gravatar for Sam
4.0 years ago by
Sam70
United States
Sam70 wrote:

I have a bed file that contains 'enhancer' regions based on histone marks obtained using PARE. The program uses nucleosome free regions to identify possible enhancer sites based on the histone mark H3K4me1. However, a small number of the regions in the bed file are located in introns / exons and i'd like to filter and keep only the rows that contain intergenic regions (defined as being 5k+ from the TSS / TSE).

I assumed that this could be done pretty easily using something like bedtools. Using slopBed to window RefSeq TSS (from UCSC) 5k +/- and then using the intersectBed option to keep only the regions in my 'enhancer' bed file that don't overlap with my TSS-windowed bed file via -v parameter. But I am still getting a couple of regions that fall inside introns so that method isn't exactly working.

Does anyone know a relatively simple way to do this?

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Sam70
2
gravatar for Alex Reynolds
4.0 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Using the BEDOPS toolkit:

  1. Make sure TSSs and regions are sorted:

    $ sort-bed TSS.unsorted.bed > TSS.bed
    $ sort-bed regions.unsorted.bed > regions.bed
    
  2. Symmetrically pad TSSs by 5000 bases and merge them into disjoint regions with bedops --range and --merge. Pipe the result to a filter step using bedops --not-element-of:

    $ bedops --range 5000 --merge TSS.bed | bedops --not-element-of 1 regions.bed - > regions.outsidePaddedTSS.bed
    

If this doesn't do what you expect, please post sample BED files and we can try it on this end.

An alternative approach might be to build a list of exons from a source of annotations and filter for any regions which entirely overlap (are contained within) those exons.

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz \
    | gunzip --stdout - \
    | awk '$3 == "exon"' \
    | convert2bed -i gff - \
    > exons.bed
$ bedops --element-of 100% regions.bed exons.bed > regions.entirelyWithinExons.bed

If you want the least stringent overlap, the following only requires one base of overlap between a region and an exon:

$ bedops --element-of 1 regions.bed exons.bed > regions.overlappingExons.bed

But a region could straddle an exon and intron in this least stringent case.

ADD COMMENTlink modified 2 days ago by RamRS25k • written 4.0 years ago by Alex Reynolds29k

The first method seems to have worked well, thank you! I'll have to start working with bedops more often.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Sam70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1966 users visited in the last hour