Question: How to find the SNP in promoter regions
gravatar for camelbbs
5.1 years ago by
camelbbs670 wrote:

Hi all,

I want to ask if there is a database that storing the human disease-related SNPs. I want to acquire those SNP located in gene promoter regions. Can anyone help this.

Thanks very much.


snp promoter • 2.3k views
ADD COMMENTlink modified 5.1 years ago by Alex Reynolds30k • written 5.1 years ago by camelbbs670

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by A3.8k
gravatar for Alex Reynolds
5.1 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Say you're working with hg19.

Grab SNP entries from NCBI and convert them to sorted BED with vcf2bed:

$ wget -qO- \
    | gunzip -c - \
    | convert2bed --input=vcf --output=bed --sort-tmpdir=${PWD} - \
    > hg19.snp151.bed

Or use whatever subset or other source of SNPs desired, and use the command-line to turn it into a sorted BED file.

Grab gene annotations of interest (e.g., GENCODE) and filter for genes into a sorted BED with gff2bed:

$ wget -qO- \
    | gunzip -c - \
    | gff2bed - \
    | awk '$8=="gene"' - \
    > genes.bed

Say we define proximal promoters as a region 1kb upstream of the gene. We can process the file genes.bed per-strand and generate promoter regions:

$ awk '{ \
        if ($6=="+") { \
            print $1"\t"($2 - 1000)"\t"$2"\t"$4"\t"$5"\t"$6; \
        } \
        else { \
            print $1"\t"$3"\t"($3 + 1000)"\t"$4"\t"$5"\t"$6; \
        } \
    }' genes.bed \
    > promoters.bed

Finally, we map SNP IDs to promoters with bedmap:

$ bedmap --echo --echo-map-id-uniq --delim '\t' promoters.bed hg19.snp151.bed > snps_over_promoters.bed
ADD COMMENTlink modified 2.2 years ago • written 5.1 years ago by Alex Reynolds30k

This would just find all SNPs in promoter regions though. In order to get disease associated SNPs, you would have to use the Catalogue of Published GWAS or ClinVar to draw your SNPs from.

ADD REPLYlink written 5.1 years ago by Alexander Skates360

Thanks Alexander, does ClinVar include the cancer-associated SNPs?

ADD REPLYlink written 5.1 years ago by camelbbs670

ClinVar includes SNPs from any disease/phenotypic response observed by the researchers who upload them.  If you're looking specifically for cancer SNPs, COSMIC might be a better choice (somatic).

ADD REPLYlink written 5.1 years ago by Steven Lakin1.5k

Thanks Steven. I want to check the SNP in promoter sequence, but the SNP database don't include the strand info. So How do I know whether the SNP is in forward or reverse strand?

ADD REPLYlink written 5.0 years ago by camelbbs670
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 732 users visited in the last hour