Question: Looking for a database containing the least-conserved, most highly polymorphic SNPs (or regions) of the human genome (hg38)
gravatar for fwuffy
19 months ago by
Michigan, USA
fwuffy100 wrote:

Hi- I need a reliable method to identify SNPs in the human genome with the most variability. Does anyone know of a data file I can download (a VCF or something easy to parse) that contains snps and a conservation score like PhyloP or PhastCons?

In lieu of that if you can recommend something to identify the least conserved regions I could pick SNPs from those regions.


snp human conservation • 653 views
ADD COMMENTlink modified 18 months ago by colindaven2.3k • written 19 months ago by fwuffy100
gravatar for Alex Reynolds
19 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:
  1. Download dbSNP and convert from VCF to BED with BEDOPS vcf2bed.

    $ vcf2bed < (gunzip -c dbSNP.vcf.gz) > dbSNP.bed
  2. Download phyloP44 or other alignments and convert from WIG to BED with BEDOPS wig2bed.

    $ wig2bed < (gunzip -c phyloP.wig.gz) > phyloP.bed

    File names will depend on what you download from NCBI and UCSC Goldenpath.

  3. Map signal (score) to SNPs with BEDOPS bedmap:

    $ bedmap --echo --echo-map-score --skip-unmapped --delim '\t' dbSNP.bed phyloP.bed > answer.bed
  4. Read answer.bed into R with read.table() and find the population of signals/scores for all mapped variants. You should be able to get the minimum-scoring variants from that distribution. Or use sort -nr on the signal column and head to get the minimum score, and awk to filter answer.bed for variants with that score.

ADD COMMENTlink modified 19 months ago • written 19 months ago by Alex Reynolds30k

Ok, thanks. I'll try that...

ADD REPLYlink written 19 months ago by fwuffy100

There are answers on biostars to deal with parts 1 and 2. Should be easy to find with a little searching.

ADD REPLYlink modified 19 months ago • written 19 months ago by Alex Reynolds30k
gravatar for colindaven
18 months ago by
Hannover Medical School
colindaven2.3k wrote:

In terms of published data, A. Quinlans group performed a nice analysis and released the data properly for others to use. Kudos.

Data are in BED and Bigwig format so hopefully useful. You can perhaps take the inverse of the constrained regions and restrict to exons to find the regions you want (use bedtools or similar?).

ADD COMMENTlink written 18 months ago by colindaven2.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1538 users visited in the last hour