Question: Extract number of SNPs for any given gene
3
gravatar for Matt En
19 months ago by
Matt En30
Matt En30 wrote:

What is the easiest way to count the number of SNPs in humans (and distinguish them by the standard NCBI filters of "missense mutations" / "nonsense mutations" / "synonymous mutations" / "intron variants")? I wish to get the data from NCBI's dbSNP database. I can see the numbers through the NCBI variation viewer. Tried doing it through NCBI Entrez Direct but haven't gotten it to work thus far. Any ideas?

snp • 615 views
ADD COMMENTlink modified 11 months ago by Andrew Su4.8k • written 19 months ago by Matt En30
1
gravatar for finswimmer
19 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello,

one way is to download the dbSnp vcf file an annotate it with snpEff (with -canon option). snpEff produces a summary file containing the infos you like.

As the dbSnp file is huge it might take some time to annotate ...

If you are just interested in a given gene, you can first build a subset of variants using tabix and annotate than.

fin swimmer

ADD COMMENTlink modified 19 months ago • written 19 months ago by finswimmer13k
1
gravatar for genomax
19 months ago by
genomax75k
United States
genomax75k wrote:

Try the following.

$ esearch -db snp -query "brca1" | esummary | xtract -pattern DocumentSummary -element SNP_ID -element FXN_CLASS -element VALIDATED 
1491587915      intron-variant  no-info
1491573730      intron-variant  no-info
1491558507      intron-variant  no-info
1491544899      frameshift-variant,nc-transcript-variant,reference,upstream-variant-2KB,utr-variant-5-prime     no-info
1491538523      intron-variant  no-info
1491356043      intron-variant  by-cluster
1491320693      intron-variant  no-info
1491293961      frameshift-variant,nc-transcript-variant,reference,upstream-variant-2KB,utr-variant-5-prime     no-info
1491277616      intron-variant  no-info

Truncated due to space constraint.

ADD COMMENTlink modified 19 months ago • written 19 months ago by genomax75k
1
gravatar for genomax
19 months ago by
genomax75k
United States
genomax75k wrote:

One more variation. If you only need numbers.

$ esearch -db snp -query "brca1" | esummary | xtract -pattern DocumentSummary -element SNP_ID -element FXN_CLASS -element VALIDATED | awk -F '\t' '{print $2}' | sort | uniq -c
      2 cds-indel,frameshift-variant,intron-variant,nc-transcript-variant,reference
     49 cds-indel,intron-variant,nc-transcript-variant
     22 cds-indel,nc-transcript-variant
      1 cds-indel,nc-transcript-variant,reference
      1 cds-indel,nc-transcript-variant,upstream-variant-2KB
      1 cds-indel,nc-transcript-variant,upstream-variant-2KB,utr-variant-5-prime
      1 cds-indel,nc-transcript-variant,utr-variant-3-prime
      1 cds-indel,splice-acceptor-variant
     93 downstream-variant-500B
    125 downstream-variant-500B,nc-transcript-variant,utr-variant-3-prime
      1 frameshift-variant,intron-variant,nc-transcript-variant
   1144 frameshift-variant,intron-variant,nc-transcript-variant,reference
     10 frameshift-variant,intron-variant,reference
      2 frameshift-variant,nc-transcript-variant
    459 frameshift-variant,nc-transcript-variant,reference
      1 frameshift-variant,nc-transcript-variant,reference,splice-donor-variant
     39 frameshift-variant,nc-transcript-variant,reference,upstream-variant-2KB,utr-variant-5-prime
     36 frameshift-variant,nc-transcript-variant,reference,utr-variant-3-prime
      2 frameshift-variant,nc-transcript-variant,reference,utr-variant-5-prime
  16573 intron-variant
     86 nc-transcript-variant,reference,stop-gained
      5 nc-transcript-variant,reference,stop-gained,synonymous-codon
      2 nc-transcript-variant,reference,stop-gained,upstream-variant-2KB,utr-variant-5-prime
      5 nc-transcript-variant,reference,stop-gained,utr-variant-3-prime

Truncated due to space constraint.

ADD COMMENTlink modified 19 months ago • written 19 months ago by genomax75k
1
gravatar for JJ
19 months ago by
JJ470
JJ470 wrote:

Have a look at the VariantAnnotation package in R - it can do this.

ADD COMMENTlink written 19 months ago by JJ470
0
gravatar for Andrew Su
11 months ago by
Andrew Su4.8k
San Diego, CA
Andrew Su4.8k wrote:

You can get this info using the myvariant.info API. For example, suppose you are interested in SYNE1 (ENSG00000131018), you can get all known variants associated with this gene using this API call:

http://myvariant.info/v1/query?q=cadd.gene.gene_id:ENSG00000131018

which currently says that there are 88731 such variants.

To count the number of missense you can execute this call and see that there are currently 56500:

http://myvariant.info/v1/query?q=cadd.gene.gene_id:ENSG00000131018%20AND%20cadd.consdetail:missense

similarly, there are 3893 "stop_gained" or nonsense variants:

http://myvariant.info/v1/query?q=cadd.gene.gene_id:ENSG00000131018%20AND%20cadd.consdetail:stop_gained

You can also consult this page http://docs.myvariant.info/en/latest/doc/packages.html to learn more about the myvariant libraries for python, R, and node.

ADD COMMENTlink written 11 months ago by Andrew Su4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1131 users visited in the last hour