Pulling all DNAse HS sites from ENCODE
1
0
Entering edit mode
8.1 years ago
bdevil • 0

I am trying to download a .bed file containing all sites from the ENCODE/UCSC table wgEncodeAwgDnaseMasterSites. However, for each DNAse hypersensitivity site, I also want the flanking 1000 basepairs. Is it possible to do this via the Table browser?

encode • 1.7k views
ADD COMMENT
0
Entering edit mode

Your question is not entirely clear to me, do you want a bed file with coordinates extended by 1000 basepairs or the sequences?

ADD REPLY
3
Entering edit mode
8.1 years ago

Here's how to get the coordinates for one chromosome:

$ mysql -h genome-mysql.cse.ucsc.edu -u genome -D hg19 -N -A -e 'select chrom, chromStart, chromEnd from wgEncodeAwgDnaseMasterSites where chrom like "chrX"' > wgEncodeAwgDnaseMasterSites.chrX.bed

Once you have coordinates, you can pad them with basic set operations. For example:

$ bedops --range 1000 --everything wgEncodeAwgDnaseMasterSites.chrX.bed > wgEncodeAwgDnaseMasterSites.chrX.1k_pad.bed

You could use bash to write a loop to write out BED files for each chromosome and apply padding operations:

$ for chr in `seq 1 22` X Y; do echo $chr; ... ; done

Replace ... with relevant commands and variable placeholders.

ADD COMMENT

Login before adding your answer.

Traffic: 3031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6