Question: Looking for a database containing the least-conserved, most highly polymorphic SNPs (or regions) of the human genome (hg38)
0
gravatar for fwuffy
19 months ago by
fwuffy100
Michigan, USA
fwuffy100 wrote:

Hi- I need a reliable method to identify SNPs in the human genome with the most variability. Does anyone know of a data file I can download (a VCF or something easy to parse) that contains snps and a conservation score like PhyloP or PhastCons?

In lieu of that if you can recommend something to identify the least conserved regions I could pick SNPs from those regions.

Thanks.

snp human conservation • 653 views
ADD COMMENTlink modified 18 months ago by colindaven2.3k • written 19 months ago by fwuffy100
3
gravatar for Alex Reynolds
19 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:
  1. Download dbSNP and convert from VCF to BED with BEDOPS vcf2bed.

    $ vcf2bed < (gunzip -c dbSNP.vcf.gz) > dbSNP.bed
    
  2. Download phyloP44 or other alignments and convert from WIG to BED with BEDOPS wig2bed.

    $ wig2bed < (gunzip -c phyloP.wig.gz) > phyloP.bed
    

    File names will depend on what you download from NCBI and UCSC Goldenpath.

  3. Map signal (score) to SNPs with BEDOPS bedmap:

    $ bedmap --echo --echo-map-score --skip-unmapped --delim '\t' dbSNP.bed phyloP.bed > answer.bed
    
  4. Read answer.bed into R with read.table() and find the population of signals/scores for all mapped variants. You should be able to get the minimum-scoring variants from that distribution. Or use sort -nr on the signal column and head to get the minimum score, and awk to filter answer.bed for variants with that score.

ADD COMMENTlink modified 19 months ago • written 19 months ago by Alex Reynolds30k

Ok, thanks. I'll try that...

ADD REPLYlink written 19 months ago by fwuffy100
1

There are answers on biostars to deal with parts 1 and 2. Should be easy to find with a little searching.

ADD REPLYlink modified 19 months ago • written 19 months ago by Alex Reynolds30k
1
gravatar for colindaven
18 months ago by
colindaven2.3k
Hannover Medical School
colindaven2.3k wrote:

In terms of published data, A. Quinlans group performed a nice analysis and released the data properly for others to use. Kudos.

Data are in BED and Bigwig format so hopefully useful. You can perhaps take the inverse of the constrained regions and restrict to exons to find the regions you want (use bedtools or similar?).

https://github.com/quinlan-lab/ccrhtml

ADD COMMENTlink written 18 months ago by colindaven2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1538 users visited in the last hour