Dear all, I want to select SNPs of specific genes in promoter region, how to do that?
Thanks in advance
Dear all, I want to select SNPs of specific genes in promoter region, how to do that?
Thanks in advance
Hello, you need to first get coordinate/position of the promoter region from gff/gtf file on Ensembl. Then you need to extract all SNPs on that region from dbSNP with tool tabix. I hope this can help you. Last, you need to ask question with more detials, your question is too broad.
Hello fatma.mokhtar. In general, we take 3000bp upstream/downstream of TSS(transcriptions start site) of gene as possible promoter region, or you can adjust to 2000bp as you like. So if you want to extract promoter region from gtf/gff you first need to get TSS position from it. This may be complicated, so I think use R language may be better solution for this question. Or there is one website EPD which has collect promoter information, but I don't know this website much.
This is R code to get promoter region and save to local file.
# assumes need GRCh38 position
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(rtracklayer, quietly = TRUE)
# This will get regions around TSS, you can adjust the length you want
tss <- promoters(genes(txdb), upstream = 3000, downstream = 3000)
# Then, save to file in bed format
export.bed(object=tss, con="~/Other/tss.bed", format="bed")
This is first few lines of bed file we get.
chr19   58359751        58365751        1       0       -
chr8    18388281        18394281        10      0       +
chr20   44649233        44655233        100     0       -
chr18   28174130        28180130        1000    0       -
chr11   70072433        70078433        100009613       0       -
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
As a general rule of thumb: if your question fits in one sentence, you did not explain it sufficiently. Please see Brief Reminder On How To Ask A Good Question It is unclear which data you have, which file format you are using, which organism you are studying. Please elaborate.
Thank you all for your replay,
I will formulate my question,
I have selected some genes and from those genes, I have downloaded their SNPs from Ensembl in Excel file. I want to select SNPs with high frequency in the European population with a minor allele frequency (MAF) between (0.24-0.49) but from that file, I couldn't find the SNPs that are in the promoter region?
Regards,
Please use
ADD COMMENTorADD REPLYto answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.There are two things wrong here:
https://www.ensembl.org/biomart/martview/5750b8dcd08b12d040ed5727b7bab963
From BioMart (Ensembl) I have downloaded the data in an Excel sheet. What is the appropriate way to download it?