Finding only intergenic regions in a BED file [Answered]
1
0
Entering edit mode
8.4 years ago
Sam ▴ 100

I have a bed file that contains 'enhancer' regions based on histone marks obtained using PARE. The program uses nucleosome free regions to identify possible enhancer sites based on the histone mark H3K4me1. However, a small number of the regions in the bed file are located in introns / exons and I'd like to filter and keep only the rows that contain intergenic regions (defined as being 5k+ from the TSS / TSE).

I assumed that this could be done pretty easily using something like bedtools. Using slopBed to window RefSeq TSS (from UCSC) 5k +/- and then using the intersectBed option to keep only the regions in my 'enhancer' bed file that don't overlap with my TSS-windowed bed file via -v parameter. But I am still getting a couple of regions that fall inside introns so that method isn't exactly working.

Does anyone know a relatively simple way to do this?

intergenic ChIP-Seq bedops bedtools • 2.8k views
ADD COMMENT
2
Entering edit mode
8.4 years ago

Using the BEDOPS toolkit:

  1. Make sure TSSs and regions are sorted:

    $ sort-bed TSS.unsorted.bed > TSS.bed
    $ sort-bed regions.unsorted.bed > regions.bed
    
  2. Symmetrically pad TSSs by 5000 bases and merge them into disjoint regions with bedops --range and --merge. Pipe the result to a filter step using bedops --not-element-of:

    $ bedops --range 5000 --merge TSS.bed | bedops --not-element-of 1 regions.bed - > regions.outsidePaddedTSS.bed
    

If this doesn't do what you expect, please post sample BED files and we can try it on this end.

An alternative approach might be to build a list of exons from a source of annotations and filter for any regions which entirely overlap (are contained within) those exons.

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz \
    | gunzip --stdout - \
    | awk '$3 == "exon"' \
    | convert2bed -i gff - \
    > exons.bed
$ bedops --element-of 100% regions.bed exons.bed > regions.entirelyWithinExons.bed

If you want the least stringent overlap, the following only requires one base of overlap between a region and an exon:

$ bedops --element-of 1 regions.bed exons.bed > regions.overlappingExons.bed

But a region could straddle an exon and intron in this least stringent case.

ADD COMMENT
0
Entering edit mode

The first method seems to have worked well, thank you! I'll have to start working with bedops more often.

ADD REPLY

Login before adding your answer.

Traffic: 2074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6