Question: Extract number of SNPs for any given gene
3
gravatar for Matt En
2.4 years ago by
Matt En30
Matt En30 wrote:

What is the easiest way to count the number of SNPs in humans (and distinguish them by the standard NCBI filters of "missense mutations" / "nonsense mutations" / "synonymous mutations" / "intron variants")? I wish to get the data from NCBI's dbSNP database. I can see the numbers through the NCBI variation viewer. Tried doing it through NCBI Entrez Direct but haven't gotten it to work thus far. Any ideas?

snp • 775 views
ADD COMMENTlink modified 21 months ago by Andrew Su4.8k • written 2.4 years ago by Matt En30
1
gravatar for finswimmer
2.4 years ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello,

one way is to download the dbSnp vcf file an annotate it with snpEff (with -canon option). snpEff produces a summary file containing the infos you like.

As the dbSnp file is huge it might take some time to annotate ...

If you are just interested in a given gene, you can first build a subset of variants using tabix and annotate than.

fin swimmer

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by finswimmer13k
1
gravatar for genomax
2.4 years ago by
genomax89k
United States
genomax89k wrote:

Try the following.

$ esearch -db snp -query "brca1" | esummary | xtract -pattern DocumentSummary -element SNP_ID -element FXN_CLASS -element VALIDATED 
1491587915      intron-variant  no-info
1491573730      intron-variant  no-info
1491558507      intron-variant  no-info
1491544899      frameshift-variant,nc-transcript-variant,reference,upstream-variant-2KB,utr-variant-5-prime     no-info
1491538523      intron-variant  no-info
1491356043      intron-variant  by-cluster
1491320693      intron-variant  no-info
1491293961      frameshift-variant,nc-transcript-variant,reference,upstream-variant-2KB,utr-variant-5-prime     no-info
1491277616      intron-variant  no-info

Truncated due to space constraint.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by genomax89k
1
gravatar for genomax
2.4 years ago by
genomax89k
United States
genomax89k wrote:

One more variation. If you only need numbers.

$ esearch -db snp -query "brca1" | esummary | xtract -pattern DocumentSummary -element SNP_ID -element FXN_CLASS -element VALIDATED | awk -F '\t' '{print $2}' | sort | uniq -c
      2 cds-indel,frameshift-variant,intron-variant,nc-transcript-variant,reference
     49 cds-indel,intron-variant,nc-transcript-variant
     22 cds-indel,nc-transcript-variant
      1 cds-indel,nc-transcript-variant,reference
      1 cds-indel,nc-transcript-variant,upstream-variant-2KB
      1 cds-indel,nc-transcript-variant,upstream-variant-2KB,utr-variant-5-prime
      1 cds-indel,nc-transcript-variant,utr-variant-3-prime
      1 cds-indel,splice-acceptor-variant
     93 downstream-variant-500B
    125 downstream-variant-500B,nc-transcript-variant,utr-variant-3-prime
      1 frameshift-variant,intron-variant,nc-transcript-variant
   1144 frameshift-variant,intron-variant,nc-transcript-variant,reference
     10 frameshift-variant,intron-variant,reference
      2 frameshift-variant,nc-transcript-variant
    459 frameshift-variant,nc-transcript-variant,reference
      1 frameshift-variant,nc-transcript-variant,reference,splice-donor-variant
     39 frameshift-variant,nc-transcript-variant,reference,upstream-variant-2KB,utr-variant-5-prime
     36 frameshift-variant,nc-transcript-variant,reference,utr-variant-3-prime
      2 frameshift-variant,nc-transcript-variant,reference,utr-variant-5-prime
  16573 intron-variant
     86 nc-transcript-variant,reference,stop-gained
      5 nc-transcript-variant,reference,stop-gained,synonymous-codon
      2 nc-transcript-variant,reference,stop-gained,upstream-variant-2KB,utr-variant-5-prime
      5 nc-transcript-variant,reference,stop-gained,utr-variant-3-prime

Truncated due to space constraint.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by genomax89k
1
gravatar for JJ
2.4 years ago by
JJ520
JJ520 wrote:

Have a look at the VariantAnnotation package in R - it can do this.

ADD COMMENTlink written 2.4 years ago by JJ520
0
gravatar for Andrew Su
21 months ago by
Andrew Su4.8k
San Diego, CA
Andrew Su4.8k wrote:

You can get this info using the myvariant.info API. For example, suppose you are interested in SYNE1 (ENSG00000131018), you can get all known variants associated with this gene using this API call:

http://myvariant.info/v1/query?q=cadd.gene.gene_id:ENSG00000131018

which currently says that there are 88731 such variants.

To count the number of missense you can execute this call and see that there are currently 56500:

http://myvariant.info/v1/query?q=cadd.gene.gene_id:ENSG00000131018%20AND%20cadd.consdetail:missense

similarly, there are 3893 "stop_gained" or nonsense variants:

http://myvariant.info/v1/query?q=cadd.gene.gene_id:ENSG00000131018%20AND%20cadd.consdetail:stop_gained

You can also consult this page http://docs.myvariant.info/en/latest/doc/packages.html to learn more about the myvariant libraries for python, R, and node.

ADD COMMENTlink written 21 months ago by Andrew Su4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 798 users visited in the last hour