Question

Forum:Bioinformatics data integration

1

Entering edit mode

10.3 years ago

balkiprasanna1984 ▴ 10

I want to post an interesting question. Anybody in the bioinformatics background working on the data integration of various sources and building predictive analytics as a tool for diagnosing the clinical conditions of the patients based on history. I want to integrate data from biomedical data, unstructured clinical data, genomic, proteomic data, structured lab reports using ontologies and then build knowledgebase based on it. I want to apply NLP, ML algorithm for large scale datasets using state-of-art big data technologies to build robust system. Looking for people with similar background and interests.

Data-integration • 2.7k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by balkiprasanna1984 ▴ 10

2

Entering edit mode

What's the question ? If you want to discuss this topic, I think this would be best posted in the forum section.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Sorry, I should have posted in forum. I didn't notice that.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by balkiprasanna1984 ▴ 10

0

Entering edit mode

Moved to Forum.

ADD REPLY • link 3.1 years ago by Ram 45k

Ram · Answer 1 · 2014-12-31

2

Entering edit mode

10.3 years ago

Ram 45k

My job profile closely matches part of what you describe. I work on integrating lab reports and clinical data, along with patient genetics to form a disease-specific knowledge base.

I think this approach is a great idea for diseases where the underlying gene is known and the causative mutations are hopefully finite in number. The challenge is in mining clinical reports, where each report may present similar symptoms and diagnoses/conclusions in different ways owing to record creator and patient perception biases. Another factor is to normalize biochemical assays across locations that might use different protocols.

There are many other factors that make the data a bit difficult to work with - especially when it comes to automated analyses. Even with future proofed historical data, I end up spending a considerable amount of time cleaning and getting it into shape.

And of course, the most important issue - that of patient confidentiality - even with anonymization, one might end up presenting data that breaks HIPAA in some obscure way, and that is a risk not many institutions are willing to take.

OP, how do you plan on addressing these concerns? IMO, deciding algorithms to use before the problem statement has been unambiguously defined and the data looked at in-depth is not ideal - I say this from experience. We are better off looking at the data, the live nature of the knowledgebase and working on an optimized process to get to the end product.

ADD COMMENT • link 3.1 years ago by Ram 45k

0

Entering edit mode

Translational research of data integration, identifying the patterns of disease and commorbidities is a very complex and open end problem. One common approach widely used is by ensmble of machine learning and NLP algorithms, find the relations and build knowledgebase using bioontologies.

For unstructured clinical narratives the solution would be to apply N.L.P techniques like Name entity recognition, I.E and find out crucial entities such as disease, signs, symptoms and understand relation between them.

For genomic data and other "omics" data, comparatively identifying the mutations of the gene and track the basis of the disease based on literature mining can be greatly helpful. This is where the knowledgebase can play a major role. IBM watson is now trained to do that and understand the clinical history of patients and derive useful insights from it.

For real time biomedical data, I am still not sure how this can be integrated but can be immensely helpful as it can actively track the current conditions of the patient real time for diseases such as seizure, parkinsons and heart problems.

I see some interesting activities in the area of translational research in Bioinformatics community. I think building knowledgebase from various data type and deriving new insights is going to be exciting topic in few years. I am planning to apply for Phd and pursue research in this topic.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by balkiprasanna1984 ▴ 10

0

Entering edit mode

That sounds exciting, and you seem to have a good grasp on machine learning (I don't, unfortunately). But yes, the area is really promising in terms of how it can lead into personalized healthcare. Do keep me posted on if you find a lab that works in/offers a PhD with (hopefully live or real-time) clinical data involved - I'd love to collaborate with them too!

ADD REPLY • link 3.1 years ago by Ram 45k