Identification of SNPs in regulatory regions
3
0
Entering edit mode
5.3 years ago
valerie ▴ 100

Hi guys,

I have a list of coordinates of my SNPs (converted and pre-processed vcf file). I used it to identify which of the SNPs are located inside the genes, using gtf file for mouse genome. Now I want to identify the SNPs that are located in regulatory regions e.g. transcription factor binding sites, protomotor regions. Is there any simple way to get a gtf file for such regions?

I looked through genome.ucsc.edu and found out that I can generated gtf file based on ORegAnno. Is it the correct way to act?

Thanks!

SNP • 2.1k views
2
Entering edit mode
5.3 years ago
Chun-Jie Liu ▴ 280

You may check this paper The Ensembl Regulatory Build.

Ensembl call the regulatory region based on the Chip-seq and genome-wide and protein-specific measurements of DNA binding, histone modifications from ENCODE or other consortia.

Ensembl Regulatory Build contains human and mouse data, you can download from here.

The database regulomeDB may did the same things you want to do, you can check it.

1
Entering edit mode

This is what the VEP will check your variants against. Definitely easier to just use the VEP rather than download and cross-reference.

0
Entering edit mode

Thanks for your information. I even miss such good tool before.

0
Entering edit mode

Thank you so much for your advice! May I ask you one more question: is it possible to identify which genes do these regulatory regions affect on?

1
Entering edit mode

We don't know this, I'm afraid. If they're promoters you can usually make a good inference based on position, but enhancers and insulators, we have no idea.

0
Entering edit mode

Understood! Thank you very much!

1
Entering edit mode

As Emily_Ensemb mentioned, we can find the promoter target gene by the position. It's very hard to identify the other regulatory elements targets.

The FANTOM also identified the regulatory regions. It provides PrESSto to view the regulatory element target and enhancer-promoter associations.

The Hi-C data could give you a hint of chromatin long range interaction based on the DNA binding protein interaction.

1
Entering edit mode
5.3 years ago

Regulatory regions are often represented as bed files. Promoters are roughly +/-500 bp of annotated TSS.

Now the distant regulatory elements are cell-type specific. So you may need to get the data from ENCODE portal for cell type of your interest or closely related cell type.

The SNPs overlapping regulatory regions often done as an enrichment analysis. A simple overlap might be purely by chance.

From the ENCODE portal, you could get all the annotations you are looking for. You need to download the Peaks file and choose appropriate cell-type.

https://www.encodeproject.org/data/annotations/

1
Entering edit mode
5.3 years ago
Emily 23k

If you run them through the Ensembl VEP it will identify regulatory features the variants hits, and will also give you score changes for hits to TF motifs.