How to download all the CpG islands data of hg38 or hg19 in ucsc?
2
5
Entering edit mode
4.2 years ago
winjorchen ▴ 50

Hi friends: How can i download all the CpG islands data of hg38 or hg19 in ucsc? Are there have a CpG island database? thx

genome alignment sequence next-gen • 7.8k views
ADD COMMENT
9
Entering edit mode
4.2 years ago

For hg19, you can grab the cpgIslandExt table from UCSC's goldenpath service, and use BEDOPS sort-bed to build a sorted BED4+ file:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cpgIslandExt.txt.gz \
   | gunzip -c \
   | awk 'BEGIN{ OFS="\t"; }{ print $2, $3, $4, $5$6, substr($0, index($0, $7)); }' \
   | sort-bed - \
   > cpgIslandExt.hg19.bed

Derived from the table schema for this file, the first four columns are the island's genomic interval and name. The remaining columns are island length, number of CpGs in the island, the number of C and G in the island, the percentage of island that is CpG, the percentage of island that is C or G, and the ratio of observed(cpgNum) to expected(numC*numG/length) CpG in island.

You can do the same thing for hg38, with a slight tweak to the URL:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/cpgIslandExt.txt.gz \
   | gunzip -c \
   | awk 'BEGIN{ OFS="\t"; }{ print $2, $3, $4, $5$6, substr($0, index($0, $7)); }' \
   | sort-bed - \
   > cpgIslandExt.hg38.bed

The schema is the same between builds, but you can take a look at it here.

ADD COMMENT
0
Entering edit mode

thanks´╝îit is helpful!

ADD REPLY
2
Entering edit mode
4.2 years ago
EagleEye 6.8k

You can use table browser.

ADD COMMENT
0
Entering edit mode

thanks! it is a easy way to get it, i never find this way befor!

ADD REPLY

Login before adding your answer.

Traffic: 2181 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6