Question: How to find the SNP in promoter regions
1
gravatar for camelbbs
3.8 years ago by
camelbbs650
China
camelbbs650 wrote:

Hi all,

I want to ask if there is a database that storing the human disease-related SNPs. I want to acquire those SNP located in gene promoter regions. Can anyone help this.

Thanks very much.

Cam 

snp promoter • 1.8k views
ADD COMMENTlink modified 3.8 years ago by Alex Reynolds28k • written 3.8 years ago by camelbbs650
1

http://www.hsls.pitt.edu/obrc/index.php?page=URL1151420236         

http://genome.ufl.edu/mapper/

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by F3.4k
2
gravatar for Alex Reynolds
3.8 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Say you're working with hg19.

Grab SNP entries from NCBI and convert them to sorted BED with vcf2bed:

$ wget -qO- ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/common_all_20180423.vcf.gz \
    | gunzip -c - \
    | convert2bed --input=vcf --output=bed --sort-tmpdir=${PWD} - \
    > hg19.snp151.bed

Or use whatever subset or other source of SNPs desired, and use the command-line to turn it into a sorted BED file.

Grab gene annotations of interest (e.g., GENCODE) and filter for genes into a sorted BED with gff2bed:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz \
    | gunzip -c - \
    | gff2bed - \
    | awk '$8=="gene"' - \
    > genes.bed

Say we define proximal promoters as a region 1kb upstream of the gene. We can process the file genes.bed per-strand and generate promoter regions:

$ awk '{ \
        if ($6=="+") { \
            print $1"\t"($2 - 1000)"\t"$2"\t"$4"\t"$5"\t"$6; \
        } \
        else { \
            print $1"\t"$3"\t"($3 + 1000)"\t"$4"\t"$5"\t"$6; \
        } \
    }' genes.bed \
    > promoters.bed

Finally, we map SNP IDs to promoters with bedmap:

$ bedmap --echo --echo-map-id-uniq --delim '\t' promoters.bed hg19.snp151.bed > snps_over_promoters.bed
ADD COMMENTlink modified 10 months ago • written 3.8 years ago by Alex Reynolds28k
1

This would just find all SNPs in promoter regions though. In order to get disease associated SNPs, you would have to use the Catalogue of Published GWAS or ClinVar to draw your SNPs from.

ADD REPLYlink written 3.7 years ago by Alexander Skates340

Thanks Alexander, does ClinVar include the cancer-associated SNPs?

ADD REPLYlink written 3.7 years ago by camelbbs650
1

ClinVar includes SNPs from any disease/phenotypic response observed by the researchers who upload them.  If you're looking specifically for cancer SNPs, COSMIC might be a better choice (somatic).

ADD REPLYlink written 3.7 years ago by Steven Lakin1.4k

Thanks Steven. I want to check the SNP in promoter sequence, but the SNP database don't include the strand info. So How do I know whether the SNP is in forward or reverse strand?

ADD REPLYlink written 3.7 years ago by camelbbs650
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1178 users visited in the last hour