How to do text mine Pubmed Abstracts to find disease-disease or disease-gene cooccurrences?
6.1 years ago
bingshanli ▴ 10

I want to count the cooccurrences of a disease pair or a disease-gene pair in all pubmed abstract to find evidence for the links. I have many such pairs and want to do an extensive search. Is there an easy way to do that? I am thinking to download all Pubmed abstracts and do the search myself. Can anybody let me know the best way to download all Pubmed abstracts (up to a limit, say last 20 years)? Since I only need abstracts, which are public, I think there should be easy ways but I couldn't find it on google. Thanks!

--Bingshan

Isn't this information already in OMIM? Where the disease entries have a reference section.

I mainly want to have an assessment based on published papers to quantify the confidence of a disease pair or a disease-gene pair. I am not sure whether the OMIM reference has such info.

Not good enough to be an answer, but you can do text searching using https://textrous.irp.nia.nih.gov/ i.e. search for your gene and see whether the disease pops up

6.1 years ago
Denise CS ★ 5.2k

It seems to me you are trying to do something that has already been done. Open Targets for example has got the co-occurrence between disease and gene (or targets) through text mining among other pieces of evidence such as genetic associations. We provide relevant research articles based on the number of times an association between a gene and a disease is found in sentences across the article from Europe PMC. This can be viewed on the web app, accessed programmatically (API documentation) and retrieved from our Data Download page. If you choose the later, you will be looking into downloading the Evidence objects. The co-occurrences of a disease pair will be available shortly. So watch this space.

Thanks Denise and it is very useful. For the associations between genes and diseases, what criteria did you use to report the associations? Is there a paper describing the algorithms? You have drug info as well and it is a surprising finding as I was not looking for it but am interested in it.

For sure I will watch out for disease pairs - many thanks for the nice website!

Glad to know it's very useful. More details on the text mining can be found our Literature Evidence in Open Targets – a target validation platform paper.