Question: Retrieve All Population Frequency Data For A Snp In 1000Genomes Phase_1
0
gravatar for haansi
6.3 years ago by
haansi50
Austria
haansi50 wrote:

Hi all!

Just found this entry: Retrieving All Available Frequency Data For A Snp Using Ensembl Api Tools which is very close to what i need. Similar to Krisr I would like to retrieve all population frequency data available from 1000Genomes phase 1 for a SNP, if possible via SQL.

Ensembls Biomart provides minor allele information for the ALL superpopulation only. Pierre Lindenbaum's solution is almost getting me to the desired result - but when I run the sql statement (on homosapiensvariation6937), I only get results from 1000Genoms:pilot_1 - not from phase1.

select distinct V.name, S.handle, A.frequency, M.name, F.allele_string
  from (  allele as A,   variation as V,   subsnp_handle as S,  variation_feature as F  ) left join  sample as M
  on (M.sample_id = A.sample_id ) 
    where 
        V.variation_id=A.variation_id and
        S.subsnp_id =A.subsnp_id and
        F.variation_id=V.variation_id and 
        V.name="rs3"
      order by 2;

Any suggestions where I could find this data? Alternatively: is there a way to get the sql statements from bioperl - since Bert Overduin provided a nice perl-script (need sql for my workflow) ?

1000genomes variation bioperl snp • 2.5k views
ADD COMMENTlink modified 6.3 years ago by Peixe580 • written 6.3 years ago by haansi50
2
gravatar for Peixe
6.3 years ago by
Peixe580
Spain
Peixe580 wrote:

Maybe dbSNP-Q could be useful...

Lets you make a query to dbSNP, 1KG, HapMap and more all-in-one through simple mySQL queries or customized ones.

ADD COMMENTlink written 6.3 years ago by Peixe580
1

Hi Peixe! Thank you very much for this very interesting web-app!! Just gave it a try - unfortunately there's only data from 1000Genomes Pilot 1 not but not from 1000Genomes Phase 1. Otherwise a very cool and fast application!

ADD REPLYlink written 6.3 years ago by haansi50
1
gravatar for Zev.Kronenberg
6.3 years ago by
United States
Zev.Kronenberg11k wrote:

This isn't an SQL solution but it will do the trick.

use tabix and point it at this file:

http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.wgs.integrated_phase1_v3.20101123.snps_indels_sv.sites.vcf.gz

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       10523   .       TCCG    T       152     PASS    VT=INDEL;RSQ=0.5246;ERATE=0.0023;AN=2184;AA=.;THETA=0.0172;AC=5;AVGPOST=0.9954;LDAF=0.0045;AF=0.00;AMR_AF=0.00;AFR_AF=0.01

AF= global allele freq

AMR_AF = AMR population

ect...

ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Zev.Kronenberg11k
1

Hi Zev! Thank you very much for your answer! Found this approach also in a previous question (Getting Allele Frequencies From 1000 Genomes. Wasn't aware of the ftp file you mentioned though. I gave it a try, the query time was about 5 seconds (could live with it), but I need all informtions to frequencys as here in the 1000 Genomes table for my pipeline (just an example with a random snp) : http://www.ensembl.org/Homo_sapiens/Variation/Population?db=core;r=12:51419664-51420664;v=rs6580779;vdb=variation;vf=4447658 Are there other locations for querying?

ADD REPLYlink written 6.3 years ago by haansi50
1

The file is only a couple gigs (~2?). You could just download it and use tabix locally. The tabix index scheme makes querying trivial and very very fast.

ADD REPLYlink written 6.3 years ago by Zev.Kronenberg11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1272 users visited in the last hour