Question: Retrieving All Available Frequency Data For A Snp Using Ensembl Api Tools
6
gravatar for Krisr
8.2 years ago by
Krisr460
United States
Krisr460 wrote:

Hi,

I would like to retrieve all available SNP population frequency data, for ~5000 SNPs. I have used the API perl variation tools to access the ENsembl data for transcript features, however, I can't figure out how to retrieve frequency information for the SNP, there is little documentation in the API variation tutorial on this. Is anyone familiar with what routines can provide this info, or if there are any to do such a thing?

So, if I had a SNP rsXXXXXX, I would like to retrieve all population frequency data available, HapMap population, CEPH, etc.

Thanks!!

ensembl api variation frequency snp • 4.1k views
ADD COMMENTlink modified 8.2 years ago by Bert Overduin3.6k • written 8.2 years ago by Krisr460
7
gravatar for Pierre Lindenbaum
8.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

I'm not an expert on the mysql schema for ensembl, so use my answer with caution; The following SQL query seems to return the data you want:

mysql -u anonymous -h ensembldb.ensembl.org -P 5306 -D homo_sapiens_core_61_37f -A
  select distinct 
V.name,
S.handle,
A.allele,
A.frequency,
M.name,
F.allele_string
  from (
allele as A,
variation as V,
subsnp_handle as S,
variation_feature as F
  ) left join
  sample as M
on (M.sample_id = A.sample_id
  ) where 
V.variation_id=A.variation_id and
S.subsnp_id =A.subsnp_id and
F.variation_id=V.variation_id and 
V.name="rs3"
  order by 1,2,5;

results:

+------+--------------+--------+-----------+---------------------------------+---------------+
    | name | handle       | allele | frequency | name                            | allele_string |
    +------+--------------+--------+-----------+---------------------------------+---------------+
    | rs3  | 1000GENOMES  | C      |      NULL | NULL                            | C/T           |
    | rs3  | 1000GENOMES  | T      |      NULL | NULL                            | C/T           |
    | rs3  | 1000GENOMES  | C      |      NULL | 1000GENOMES:CEU                 | C/T           |
    | rs3  | 1000GENOMES  | T      |      NULL | 1000GENOMES:CEU                 | C/T           |
    | rs3  | 1000GENOMES  | C      |  0.958333 | 1000GENOMES:low_coverage:CEU    | C/T           |
    | rs3  | 1000GENOMES  | T      | 0.0416667 | 1000GENOMES:low_coverage:CEU    | C/T           |
    | rs3  | 1000GENOMES  | C      |  0.958333 | 1000GENOMES:low_coverage:CHBJPT | C/T           |
    | rs3  | 1000GENOMES  | T      | 0.0416667 | 1000GENOMES:low_coverage:CHBJPT | C/T           |
    | rs3  | 1000GENOMES  | C      |  0.838983 | 1000GENOMES:low_coverage:YRI    | C/T           |
    | rs3  | 1000GENOMES  | T      |  0.161017 | 1000GENOMES:low_coverage:YRI    | C/T           |
    | rs3  | 1000GENOMES  | C      |  0.958333 | 1000GENOMES:pilot.1.CEU         | C/T           |
    | rs3  | 1000GENOMES  | T      | 0.0416667 | 1000GENOMES:pilot.1.CEU         | C/T           |
    | rs3  | 1000GENOMES  | C      |      0.84 | 1000GENOMES:pilot.1.YRI         | C/T           |
    | rs3  | 1000GENOMES  | T      |      0.16 | 1000GENOMES:pilot.1.YRI         | C/T           |
    | rs3  | 1000GENOMES  | C      |      NULL | 1000GENOMES:YRI                 | C/T           |
    | rs3  | 1000GENOMES  | T      |      NULL | 1000GENOMES:YRI                 | C/T           |
    | rs3  | BCM_SSAHASNP | C      |      NULL | NULL                            | C/T           |
    | rs3  | BCM_SSAHASNP | T      |      NULL | NULL                            | C/T           |
    | rs3  | KWOK         | C      |      NULL | NULL                            | C/T           |
    | rs3  | KWOK         | T      |      NULL | NULL                            | C/T           |
    | rs3  | KWOK         | C      |     0.967 | CSHL-HAPMAP:HapMap-CEU          | C/T           |
    | rs3  | KWOK         | T      |     0.033 | CSHL-HAPMAP:HapMap-CEU          | C/T           |
    | rs3  | KWOK         | C      |     0.956 | CSHL-HAPMAP:HapMap-HCB          | C/T           |
    | rs3  | KWOK         | T      |     0.044 | CSHL-HAPMAP:HapMap-HCB          | C/T           |
    | rs3  | KWOK         | C      |     0.978 | CSHL-HAPMAP:HapMap-JPT          | C/T           |
    | rs3  | KWOK         | T      |     0.022 | CSHL-HAPMAP:HapMap-JPT          | C/T           |
    | rs3  | KWOK         | C      |      0.89 | CSHL-HAPMAP:HapMap-YRI          | C/T           |
    | rs3  | KWOK         | T      |      0.11 | CSHL-HAPMAP:HapMap-YRI          | C/T           |
    | rs3  | KWOK         | C      |      0.95 | KWOK:B                          | C/T           |
    | rs3  | KWOK         | T      |      0.05 | KWOK:B                          | C/T           |
    | rs3  | KWOK         | C      |      0.96 | KWOK:C                          | C/T           |
    | rs3  | KWOK         | T      |      0.04 | KWOK:C                          | C/T           |
    | rs3  | KWOK         | C      |      0.89 | KWOK:H                          | C/T           |
    | rs3  | KWOK         | T      |      0.11 | KWOK:H                          | C/T           |
    | rs3  | KWOK         | C      |      0.93 | KWOK:S                          | C/T           |
    | rs3  | KWOK         | T      |      0.07 | KWOK:S                          | C/T           |
    | rs3  | PERLEGEN     | C      |  0.673913 | PERLEGEN:AFD_AFR_PANEL          | C/T           |
    | rs3  | PERLEGEN     | T      |  0.326087 | PERLEGEN:AFD_AFR_PANEL          | C/T           |
    | rs3  | PERLEGEN     | C      |  0.958333 | PERLEGEN:AFD_CHN_PANEL          | C/T           |
    | rs3  | PERLEGEN     | T      | 0.0416667 | PERLEGEN:AFD_CHN_PANEL          | C/T           |
    | rs3  | PERLEGEN     | C      |  0.979167 | PERLEGEN:AFD_EUR_PANEL          | C/T           |
    | rs3  | PERLEGEN     | T      | 0.0208333 | PERLEGEN:AFD_EUR_PANEL          | C/T           |
    | rs3  | SC_SNP       | C      |      NULL | NULL                            | C/T           |
    | rs3  | SC_SNP       | T      |      NULL | NULL                            | C/T           |
    +------+--------------+--------+-----------+---------------------------------+---------------+
    44 rows in set (0.09 sec)
ADD COMMENTlink written 8.2 years ago by Pierre Lindenbaum120k
1

ERROR 1146 (42S02): Table 'homo_sapiens_core_61_37f.allele' doesn't exist

ADD REPLYlink written 8.2 years ago by Giovanni M Dall'Olio26k
1

the correct database name is homo_sapiens_variation_61_37 , instead of homo_sapiens_core_61_37f

ADD REPLYlink written 8.2 years ago by Giovanni M Dall'Olio26k

Thanks Giovanni

ADD REPLYlink written 8.2 years ago by Pierre Lindenbaum120k

I think the correct table is homo_sapiens_variation_61_37f, but am not sure why the table retrieving different values when I compared to 1000 genomes MAFs in dbSNP.

ADD REPLYlink written 8.1 years ago by Khader Shameer18k
5
gravatar for Bert Overduin
8.2 years ago by
Bert Overduin3.6k
Edinburgh Genomics, The University of Edinburgh
Bert Overduin3.6k wrote:

Hi,

This script should do the trick:

#!/usr/local/bin/perl
use strict;
use warnings;

use Bio::EnsEMBL::Registry;

my $reg = 'Bio::EnsEMBL::Registry';

$reg->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous'
);

# get a variation adaptor
my $va = $reg->get_adaptor("human", "variation", "variation");

# fetch the variation by name
my $v = $va->fetch_by_name('rs123');

# get all the alleles
my @alleles = @{$v->get_all_Alleles()};

foreach my $allele(@alleles) {
    print
        $v->name, "\t",
        $allele->allele, "\t",
        (defined($allele->frequency) ? $allele->frequency : "-"), "\t",
        (defined($allele->population) ? $allele->population->name : "-"), "\n";
}

Hope this helps.

Note that questions about Ensembl and the Ensembl API can also be posted to the Ensembl helpdesk or the Ensembl developers mailing list (how to subscribe to this list you can find here).

ADD COMMENTlink written 8.2 years ago by Bert Overduin3.6k
1
gravatar for Krisr
8.2 years ago by
Krisr460
United States
Krisr460 wrote:

Thanks, I will try this!

I realize I'm supposed to post responses in the comments, but this wouldn't fit :)

I was trying to piece together some code from the various API Tutorials.

I have the following, but am not sure where, or how, to call the population 'fetch_all_hapmap_populations'. I want to call this on the rs# to retrieve the data, but how do I link the variation object to the population object?

#!/usr/bin/perl -w
use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org',
    -user => 'anonymous'
);

my $va_adaptor = $registry->get_adaptor('human', 'variation', 'variation'); #get the different adaptors for the different objects needed
my $vf_adaptor = $registry->get_adaptor('human', 'variation', 'variationfeature');
my $pa = $registry->get_adaptor("human","variation","population");

my $pop = $pa->fetch_all_HapMap_Populations();

my $variation = $va_adaptor->fetch_by_name('rs123');

 if (defined $variation)
{
    foreach my $allele (@{$variation->get_all_Alleles()}) 
    {
       print $pop->$allele->fetch_all_HapMap_Populations() . "\t" . $allele->allele . "\t" . $allele->frequency . "\n";
    }
}
ADD COMMENTlink modified 8.2 years ago • written 8.2 years ago by Krisr460

I applied the above MySQL to a DBI interface with perl. I've made a loop to iterate through all SNPs and report back the freqs. Thanks again!!!

ADD REPLYlink written 8.2 years ago by Krisr460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1286 users visited in the last hour