Annotaion Based On The Genomic Range
5
0
Entering edit mode
10.1 years ago

Hello all I have some data like this related to mouse genome

chr1 3000000 3000090
chr2 4339993 4389898
chr5 3000330 3003339
chr7 3323233 3390393


I know that by using UCSC genome browser we can get the information related to the presence of genes, proteins at those regions. however i am more interested in identifying all functional elements (may be promoters, enhancers tfs etc) with in that region. is there any way to do that. With in UCSC is there any option like that?

genome annotation r ucsc • 4.9k views
3
Entering edit mode
10.1 years ago

0
Entering edit mode
2
Entering edit mode
10.1 years ago

You have to define promoters and enhancers by yourself, there is no proper definition. Get a list of all genes, refer this Fetching Transcription Start And End For A Custom Gene List From Ucsc (Hg18/Ncbi36) for that, change the organism and build. If you know R or any other language, add and subtract the number of bases or a region of some KB (eg +/-1KB) from the TSS (labelled as txStart in the table) strand specifically. This number depends on how you define promters and then use the intersectBed tool from Bedtools. Check this How To Determine Overlaps From Coordinates or manual for usage.

For Enhancers, some people say they are 5-10KB far, but a way to do it would be overlay the ChIP-Seq data(peaks) of p300 (marker for enhancers) on the genome to get the list of enhancers and then intersect with you own file If you know Galaxy, then this might be helpful, From BED Coordinates to Genes

Cheers

0
Entering edit mode

Thank you sukhdeep for your reply. i have obtained the chip-seq from GEO database. An enhancer might be of 500 bps (avg). but the chip-seq data shows only the areas where p300 binding is present. so i can regard +/- 200bps from the peak start region of chip-seq as enhancer?

2
Entering edit mode
10.1 years ago
Irsan ★ 7.6k

If you want information on annotating genomic intervals in general see some similar Biostars-posts:

1
Entering edit mode
9.8 years ago
Emily 23k

No idea about UCSC, but you can do that using the Ensembl Region Report tool.http://www.ensembl.org/tools.html

This allows you to inout genomic coordinates, then see everything that's within them. There's a tick box list where you can choose what to see. The options are:

Genes, Transcripts and Proteins

Genomic Sequence

Constrained Elements (Conserved Regions)

Variations (SNPs and InDels)

Structural Variations (CNVs etc)

Regulatory Features

0
Entering edit mode

Thank you for the reply. i am more interested in Constrained Elements (Conserved Regions) feature. does this tool supports graphical view? I know ECR browser does but I cannot give each coordinate manually.

0
Entering edit mode

This will just give you a list.

0
Entering edit mode
10.1 years ago

If you are comfortable with little programming and unix or you can use snpEff software and set up databases for different genomic elements like genes, transcription factor binding sites, enhancers etc then it is pretty simple thing to do. You can get most of the files you need from ENSEMBL

http://useast.ensembl.org/info/data/ftp/index.html

The cis-regulatory elements information could be derived from Regulations gff file and Regulation data files.