Question: Looking for a database containing the least-conserved, most highly polymorphic SNPs (or regions) of the human genome (hg38)
0
gravatar for fwuffy
2.0 years ago by
fwuffy100
Michigan, USA
fwuffy100 wrote:

Hi- I need a reliable method to identify SNPs in the human genome with the most variability. Does anyone know of a data file I can download (a VCF or something easy to parse) that contains snps and a conservation score like PhyloP or PhastCons?

In lieu of that if you can recommend something to identify the least conserved regions I could pick SNPs from those regions.

Thanks.

snp human conservation • 731 views
ADD COMMENTlink modified 2.0 years ago by colindaven2.6k • written 2.0 years ago by fwuffy100
3
gravatar for Alex Reynolds
2.0 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:
  1. Download dbSNP and convert from VCF to BED with BEDOPS vcf2bed.

    $ vcf2bed < (gunzip -c dbSNP.vcf.gz) > dbSNP.bed
    
  2. Download phyloP44 or other alignments and convert from WIG to BED with BEDOPS wig2bed.

    $ wig2bed < (gunzip -c phyloP.wig.gz) > phyloP.bed
    

    File names will depend on what you download from NCBI and UCSC Goldenpath.

  3. Map signal (score) to SNPs with BEDOPS bedmap:

    $ bedmap --echo --echo-map-score --skip-unmapped --delim '\t' dbSNP.bed phyloP.bed > answer.bed
    
  4. Read answer.bed into R with read.table() and find the population of signals/scores for all mapped variants. You should be able to get the minimum-scoring variants from that distribution. Or use sort -nr on the signal column and head to get the minimum score, and awk to filter answer.bed for variants with that score.

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by Alex Reynolds31k

Ok, thanks. I'll try that...

ADD REPLYlink written 2.0 years ago by fwuffy100
1

There are answers on biostars to deal with parts 1 and 2. Should be easy to find with a little searching.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Alex Reynolds31k
1
gravatar for colindaven
2.0 years ago by
colindaven2.6k
Hannover Medical School
colindaven2.6k wrote:

In terms of published data, A. Quinlans group performed a nice analysis and released the data properly for others to use. Kudos.

Data are in BED and Bigwig format so hopefully useful. You can perhaps take the inverse of the constrained regions and restrict to exons to find the regions you want (use bedtools or similar?).

https://github.com/quinlan-lab/ccrhtml

ADD COMMENTlink written 2.0 years ago by colindaven2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2334 users visited in the last hour
_