Question: Looking for a database containing the least-conserved, most highly polymorphic SNPs (or regions) of the human genome (hg38)
0
gravatar for fwuffy
3 months ago by
fwuffy50
Michigan, USA
fwuffy50 wrote:

Hi- I need a reliable method to identify SNPs in the human genome with the most variability. Does anyone know of a data file I can download (a VCF or something easy to parse) that contains snps and a conservation score like PhyloP or PhastCons?

In lieu of that if you can recommend something to identify the least conserved regions I could pick SNPs from those regions.

Thanks.

snp human conservation • 281 views
ADD COMMENTlink modified 3 months ago by colindaven1.2k • written 3 months ago by fwuffy50
3
gravatar for Alex Reynolds
3 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:
  1. Download dbSNP and convert from VCF to BED with BEDOPS vcf2bed.

    $ vcf2bed < (gunzip -c dbSNP.vcf.gz) > dbSNP.bed
    
  2. Download phyloP44 or other alignments and convert from WIG to BED with BEDOPS wig2bed.

    $ wig2bed < (gunzip -c phyloP.wig.gz) > phyloP.bed
    

    File names will depend on what you download from NCBI and UCSC Goldenpath.

  3. Map signal (score) to SNPs with BEDOPS bedmap:

    $ bedmap --echo --echo-map-score --skip-unmapped --delim '\t' dbSNP.bed phyloP.bed > answer.bed
    
  4. Read answer.bed into R with read.table() and find the population of signals/scores for all mapped variants. You should be able to get the minimum-scoring variants from that distribution. Or use sort -nr on the signal column and head to get the minimum score, and awk to filter answer.bed for variants with that score.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Alex Reynolds28k

Ok, thanks. I'll try that...

ADD REPLYlink written 3 months ago by fwuffy50
1

There are answers on biostars to deal with parts 1 and 2. Should be easy to find with a little searching.

ADD REPLYlink modified 3 months ago • written 3 months ago by Alex Reynolds28k
1
gravatar for colindaven
3 months ago by
colindaven1.2k
Hannover Medical School
colindaven1.2k wrote:

In terms of published data, A. Quinlans group performed a nice analysis and released the data properly for others to use. Kudos.

Data are in BED and Bigwig format so hopefully useful. You can perhaps take the inverse of the constrained regions and restrict to exons to find the regions you want (use bedtools or similar?).

https://github.com/quinlan-lab/ccrhtml

ADD COMMENTlink written 3 months ago by colindaven1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2075 users visited in the last hour