Question: Looking for a database containing the least-conserved, most highly polymorphic SNPs (or regions) of the human genome (hg38)
0
gravatar for fwuffy
5 days ago by
fwuffy50
Michigan, USA
fwuffy50 wrote:

Hi- I need a reliable method to identify SNPs in the human genome with the most variability. Does anyone know of a data file I can download (a VCF or something easy to parse) that contains snps and a conservation score like PhyloP or PhastCons?

In lieu of that if you can recommend something to identify the least conserved regions I could pick SNPs from those regions.

Thanks.

snp human conservation • 112 views
ADD COMMENTlink modified 1 day ago by colindaven930 • written 5 days ago by fwuffy50
3
gravatar for Alex Reynolds
4 days ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:
  1. Download dbSNP and convert from VCF to BED with BEDOPS vcf2bed.

    $ vcf2bed < (gunzip -c dbSNP.vcf.gz) > dbSNP.bed
    
  2. Download phyloP44 or other alignments and convert from WIG to BED with BEDOPS wig2bed.

    $ wig2bed < (gunzip -c phyloP.wig.gz) > phyloP.bed
    

    File names will depend on what you download from NCBI and UCSC Goldenpath.

  3. Map signal (score) to SNPs with BEDOPS bedmap:

    $ bedmap --echo --echo-map-score --skip-unmapped --delim '\t' dbSNP.bed phyloP.bed > answer.bed
    
  4. Read answer.bed into R with read.table() and find the population of signals/scores for all mapped variants. You should be able to get the minimum-scoring variants from that distribution. Or use sort -nr on the signal column and head to get the minimum score, and awk to filter answer.bed for variants with that score.

ADD COMMENTlink modified 4 days ago • written 4 days ago by Alex Reynolds27k

Ok, thanks. I'll try that...

ADD REPLYlink written 4 days ago by fwuffy50
1

There are answers on biostars to deal with parts 1 and 2. Should be easy to find with a little searching.

ADD REPLYlink modified 4 days ago • written 4 days ago by Alex Reynolds27k
1
gravatar for colindaven
1 day ago by
colindaven930
Hannover Medical School
colindaven930 wrote:

In terms of published data, A. Quinlans group performed a nice analysis and released the data properly for others to use. Kudos.

Data are in BED and Bigwig format so hopefully useful. You can perhaps take the inverse of the constrained regions and restrict to exons to find the regions you want (use bedtools or similar?).

https://github.com/quinlan-lab/ccrhtml

ADD COMMENTlink written 1 day ago by colindaven930
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1230 users visited in the last hour