Question: how to Automate the Retrieval of Variant-related Information from Pubmed, dbSNP, ClinVar, HGMD etc
gravatar for gsr9999
2.4 years ago by
United States
gsr9999120 wrote:

Dear Biostar leaders,

I have started to work on a new Variant Re-Classification project in my lab, and I am wondering how others implement this and it would be great if you could share/advice on best approaches being practiced. After sanger confirmation of WES variants, we classify variants based on ACMG guidelines and it is majorly carried out in 2 steps:

A. Manually search for variant specific information in Pubmed, Google search engine, variant databases(like ClinVar, dbSNP, COSMIC, LOVD, ExAc, 1000Genomes, HGMD etc.) and

B. Manually Review and Re-Classify Variants based on the newly assembled variant information.

I need to automate only the step A i.e. automate the process of collecting and integrating all latest variant associated information from different sources. Manual collection of variant information is laborious and we believe that by streamlining this step, we could improve our efficiency and it would be great to learn what the best practices are and how others implement this ?

I am wondering what the best possible approaches are like :

i) Download and localize all databases like dbSNP, ClinVar, COSMIC etc into a local server and then write python and SQL scripts to pull information specific to a mutation ? (or)

ii) Re-Use or write some web crawlers/data-mining to search new publications in Pubmed and in Google search engine for any variant/mutation specific information ? (or)

iii) Use Entrez APIs (E-utilities) to programatiaclly access online databases like ClinVar ? (or)

iv) etc ?

I am aware of some commercial companies that offer this service as a product like Mastermind(genomenon), VarSeq(goldenhelix), alamut-visual(interactive-biosoftware), etc. I guess companies like this have downloaded, integrated and curated all the data from literature and databases. These would be a good resource, but I am not sure if we could afford for a commercial software.

We are a small laboratory and I am a lone bioinformatician and trying to automate and integrate the variant associated information from dbSNP, Pubmed, ClinVar, etc. without a commercial software. I am aware that this would be a challenging project, but I am trying to achieve this in a sensible and best possible way.

Thanks, gsr

snp next-gen gene genome • 1.1k views
ADD COMMENTlink modified 17 months ago by ND Woods0 • written 2.4 years ago by gsr9999120
gravatar for Kevin Blighe
2.4 years ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

There won't be a single solution because each database is accessed differently. You have also already more or less given your own answer in your question.

I would download local copies of databases for ClinVar, dbSNP, and COSMIC and then access information from these via Python. However, version control then becomes important because these datasets are updated regularly.

HGMD requires a licence for the updated version; older versions of the database don't require a licence.

LOVD can also be downloaded, apparently on a gene-by-gene basis:

For ExAc and 1000Genomes, you can also download these in VCF and then look up variant frequencies:

It's a lot of setup work but it would work when done. You could have the skeleton of the entire script finished in under 1 week.


ADD COMMENTlink modified 22 months ago • written 2.4 years ago by Kevin Blighe63k

Thanks for ur ideas Kevin

ADD REPLYlink written 22 months ago by gsr9999120
gravatar for ND Woods
17 months ago by
ND Woods0
US, Ann Arbor, MI
ND Woods0 wrote:

There is a basic edition of Mastermind that is free and when you sign up you get a free 14 day trial of the pro version. We have 15x more variants than HGMD. You can even do a few searches with anonymous search which we added in the past 6 months or so. Just in case you want to check it out before having to fill out the form and actually sign up. I'll attach a link below, hope this helps, good luck!

Also, we have variant interpretation cards on Amazon you might like. There's also a PDF version, I will attach both.

Amazon: PDF Link:

ADD COMMENTlink modified 17 months ago • written 17 months ago by ND Woods0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 671 users visited in the last hour