Question: Downloading genomic interval for hg38
0
gravatar for ChIP
22 months ago by
ChIP490
Netherlands
ChIP490 wrote:

Hi,

I want to download genomic information for hg38, that have following information:

region gene exon/tss/intron/intergenic/CpG/nonCpG

The idea is to get this in a table format and then use intersectBed to to get overlap between the ChIP-seq data and this genomic information file.

How can I get this information. The CpG information is important as I have methylation data.

Thank you

chip-seq • 928 views
ADD COMMENTlink modified 21 months ago by Alex Reynolds28k • written 22 months ago by ChIP490
1

Hi, please look at UCSC genome browser and select table - you can get print all required informations.

ADD REPLYlink written 22 months ago by Paul1.3k
2
gravatar for shwethacm
21 months ago by
shwethacm200
Seattle, WA
shwethacm200 wrote:

UCSC table browser has what you need (and more! ) https://genome.ucsc.edu/cgi-bin/hgTables

(( PS: CpG information is under group:Regulation ))

ADD COMMENTlink written 21 months ago by shwethacm200
1
gravatar for Ming Tang
21 months ago by
Ming Tang2.5k
Houston/MD Anderson Cancer Center
Ming Tang2.5k wrote:

see http://crazyhottommy.blogspot.com/2016/11/define-intronic-exonic-and-intergenic.html it is for hg19, but you can just change to hg38.

ADD COMMENTlink written 21 months ago by Ming Tang2.5k
1
gravatar for Alex Reynolds
21 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

GENCODE release 26 offers some of these annotations for hg38. Pipe to BEDOPS gff2bed to make a sorted BED file.

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_26/gencode.v26.annotation.gff3.gz | gunzip -c - | gff2bed - > gencode.v26.bed

For hg38, you can grab the cpgIslandExt table from UCSC's goldenpath service, and use BEDOPS sort-bed to build a sorted BED4+ file:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/cpgIslandExt.txt.gz \
   | gunzip -c - \
   | awk 'BEGIN{ OFS="\t"; }{ print $2, $3, $4, $5$6, substr($0, index($0, $7)); }' - \
   | sort-bed - \
   > cpgIslandExt.hg38.bed

Derived from the table schema for this file, the first four columns are the island's genomic interval and name. The remaining columns are island length, number of CpGs in the island, the number of C and G in the island, the percentage of island that is CpG, the percentage of island that is C or G, and the ratio of observed(cpgNum) to expected(numC*numG/length) CpG in island.

Once you have these files in sorted BED format, you can start doing set operations and mapping with BEDOPS bedops and bedmap etc.

ADD COMMENTlink modified 21 months ago • written 21 months ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1648 users visited in the last hour