Question: literature mining for genes and mutations
gravatar for nkausthu
2.3 years ago by
nkausthu20 wrote:

I have a list of genes and I need to extract all the reported mutations in those genes. It would be really great if someone can suggest me the best way to do this using open source tools with high accuracy.

mining mutations pubmed genes • 739 views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by nkausthu20

I would be more specific with the goal of your task. What kind of mutations you want to extract? All variations or only those related to diseases? Any diseases? Any kind of mutation (SNP, indels, structural variants, etc.)? Also, it's worth to know the scale of the task (how many genes you have), and what kind of skills you have (web, R/Bioconductor, Python). As of now, it seems to be too generic to answer in a helpful way.

ADD REPLYlink written 2.3 years ago by Pawel Szczesny3.2k

I want to extract all the SNVs and Indels which are known to cause different group of skeletal disorders. I have the list of genes for this conditions approx 200 genes. I am good in perl and python but I don't know writing a script will help me to get the accurate results.

ADD REPLYlink written 2.3 years ago by nkausthu20

I doubt any script will allow you do to accurate web scraping in one step. I would suggest you scrape the web for as much as you can find in the first instance, and then devise a second automated step to curate your database as best as possible. You’ll almost certainly have to check at least some of it by hand though.

ADD REPLYlink written 2.3 years ago by Joe16k
gravatar for Denise - Open Targets
2.3 years ago by
UK, Hinxton, EMBL-EBI
Denise - Open Targets5.1k wrote:

You can try BioMart with all its different options of access (web, APIs, R package on bioconductor). The variation dataset of BioMart allows you to enter a list of Ensembl Gene IDs (you will need to convert to those, if your genes are not already in that format) as filters, than as attributes you can choose variant IDs and phenotype description. You will get the mutations from COSMIC and HGMD plus the SNPs and short indels from dbSNP.

Alternative, you can use the Open Targets batch search tool to get the diseases (pathways, drugs) associated with your genes and from further exploration of the results you can find the mutations (or variants) linking those genes to their associated diseases (such as the skeletal disorders you are interested in).

Since you are familiar with Python, perhaps the Open Targets and programmatic access will be the choice for you: get the diseases and the association scores for your genes depending on the genetic variants (or mutations) they carry. We have a Python client for easier communication with our REST API.

If this is useful but you get stuck along the way, just shout.

ADD COMMENTlink written 2.3 years ago by Denise - Open Targets5.1k
gravatar for sacha
2.3 years ago by
sacha1.9k wrote:
  • Get your genes coordinates from refseq as a bed file
  • intersect dbsnp with it
ADD COMMENTlink written 2.3 years ago by sacha1.9k
gravatar for nkausthu
2.3 years ago by
nkausthu20 wrote:

I will explore all these options. Thank you so much..

ADD COMMENTlink written 2.3 years ago by nkausthu20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 715 users visited in the last hour