I have evaluated a bunch of open access electronic medical record (EMR) / electronic health record (EHR) platforms for a project. OpenMRS have a pre-populated database of patients that could be an ideal fit for your purpose.
You can download:
A sample anonymized data set, including 5,000 patients and 500,000 observations, is available to download for current OpenMRS versions and import into your existing database.
I am not aware of a database containing electronic health records that is also publicly accessible. I am curious to see if anyone else knows of one.
Depending on what kind of information you hope to use and what country you are operating in, I suspect that it may only exist in simulated form. In places like the US, medical records are protected for privacy and liability reasons (all of the following applies to the US, but Canada, EU countries, and others have similar setups):
privacyrights.org has a useful fact sheet on medical records privacy. Health records generally contain protected health information (PHI), including things like names, any kind of date or age at all, various commonly used identifiers, etc. These types of information are highly protected and their use and distribution is subject to many regulations. The electronic versions are even more protected than those on paper. Organizations that have access to electronics health records (hospitals, insurance companies, contractors, sub-contractors, etc.) are generally concerned with compliance to:
- The Health Insurance Portability and Accountability Act (HIPAA)
- The privacy related sections of: The Patient Safety and Quality Improvement Act (PSQIA)
- The Health Information Technology for Economic and Clinical Health Act (HITECH)
Some of the largest breaches reported to HHS have involved business associates. Penalties are increased for noncompliance based on the level of negligence with a maximum penalty of $1.5 million per violation.
Furthermore, an organization found to have violated HIPAA, can in certain circumstances be required to advertise their own breach (at their own expense) in the form of TV spots, radio announcements, etc.
You can imagine that given all of that, any organization that does have electronic medical records might be nervous about making them publicly available. It should be possible though if you anonymize and carefully remove all PHI.
here is some sample data I have been using as a sample set for testing with hadoop: https://www.kaggle.com/c/pf2012/data?trainingSet.zip #interesting and comes with a fairly well-defined data definition