Dear Biostar leaders,
I have started to work on a new Variant Re-Classification project in my lab, and I am wondering how others implement this and it would be great if you could share/advice on best approaches being practiced. After sanger confirmation of WES variants, we classify variants based on ACMG guidelines and it is majorly carried out in 2 steps:
A. Manually search for variant specific information in Pubmed, Google search engine, variant databases(like ClinVar, dbSNP, COSMIC, LOVD, ExAc, 1000Genomes, HGMD etc.) and
B. Manually Review and Re-Classify Variants based on the newly assembled variant information.
I need to automate only the step A i.e. automate the process of collecting and integrating all latest variant associated information from different sources. Manual collection of variant information is laborious and we believe that by streamlining this step, we could improve our efficiency and it would be great to learn what the best practices are and how others implement this ?
I am wondering what the best possible approaches are like :
i) Download and localize all databases like dbSNP, ClinVar, COSMIC etc into a local server and then write python and SQL scripts to pull information specific to a mutation ? (or)
ii) Re-Use or write some web crawlers/data-mining to search new publications in Pubmed and in Google search engine for any variant/mutation specific information ? (or)
iii) Use Entrez APIs (E-utilities) to programatiaclly access online databases like ClinVar ? (or)
iv) etc ?
I am aware of some commercial companies that offer this service as a product like Mastermind(genomenon), VarSeq(goldenhelix), alamut-visual(interactive-biosoftware), etc. I guess companies like this have downloaded, integrated and curated all the data from literature and databases. These would be a good resource, but I am not sure if we could afford for a commercial software.
We are a small laboratory and I am a lone bioinformatician and trying to automate and integrate the variant associated information from dbSNP, Pubmed, ClinVar, etc. without a commercial software. I am aware that this would be a challenging project, but I am trying to achieve this in a sensible and best possible way.