Depleting CTCF sites from interval file
2
0
Entering edit mode
4.0 years ago
rbronste ▴ 380

Hi,

Wondering the most efficient way to remove CTCF sites from a BED file? Thanks.

Rob.

ChIP-Seq • 854 views
3
Entering edit mode
4.0 years ago
mmmmcandrew ▴ 130

bedTools intersect can probably get the job done. You can take your regions.bed file and a separate bed file containing CTCF sites, then use the -v option to output only regions that are not CTCF sites like this:

bedtools intersect -v -a regions.bed -b CTCF.bed > regions_noCTCF.bed

The default is to remove any regions which have even a single base pair of overlap with the B file, but you can change that so that a certain amount of overlap is required for removal.

2
Entering edit mode
4.0 years ago
$bedops --not-element-of -1 regions.bed CTCF.bed > regionsWithoutCTCFOverlaps.bed  Using --not-element-of preserves the original intervals in regions.bed and any additional columns they have (ID, score, strand, etc.). If you actually wanted to carve out the space taken up by CTCF intervals, you could use --difference: $ bedops --difference regions.bed CTCF.bed > answer.bed


This calculates new intervals, discarding additional columns in regions.bed.

0
Entering edit mode

If I understand correctly the first option leaves me with a file where the CTCF sites are identified in the BED, and the second option totally drops them out of the BED record? Thanks again!

1
Entering edit mode

The first option removes any elements that overlap CTCF sites by one or more bases. The second option removes the genomic space within elements, which is occupied by the genomic space of CTCF sites. The cartoons in the BEDOPS docs explain this graphically.

0
Entering edit mode

Very helpful info thank you. In the second case once the genomic space is removed does the interval get split into two if the CTCF site isn't on one end of the other? Juts trying to see a signature of this in the number of intervals at the end.

0
Entering edit mode

Yes, you'd get two or more pieces. It's like painting a wall and pulling away pieces of masking tape from within the middle of the wall, if that analogy is useful.

However, an easier tool to use for that would be bedmap:

\$ bedmap --echo --fraction-map 1 regions.bed CTCF.bed > regionsThatEntirelyContainCTCFSites.bed


Then run wc -l on regionsThatEntirelyContainCTCFSites.bed and regions.bed to get counts. This would give an accurate account of relative, full CTCF occupancy.