When we are building healthcare applications we often cannot use real patient data, but we still need to robustly test our applications functionality. To do this we can use something called synthetic patient data.
Synthetic patient data is a collection of patient records that looks like real patient records but is completely artificial and has no reference to a real person. Synthetic patient data collections will often times have a lifelong collection of patient records from birth to present day and includes medications, allergies, diagnosis, insurance information, treatments and more. To make this data more realistic, synthetic data generation looks to information about the top reasons patients visit Primary Care Physicians and the major causes for why people have reduced life expectancy.
An example of a publicly available and high quality data set would be Synthea. Synthea data sets can be downloaded in multiple formats including HL7 FHIR versions, CCDA, and CSV. These data sets are downloaded as zip files and as of June 2024 are about 1.34GB in size and contain 1181 JSON files that each represents and artificial but realistic patient journey.
If you are looking for a patient data set that is more specific than a generic list of patients, Synthea can help with that. For example, lets say we are developing a HL7 FHIR R4 application to visualize air quality data in New York City and compare it to incidents of asthma in children under 18. Thats a pretty specific dataset we would need. It would take a lot of time to take an existing HL7 FHIR R4 data set, anonymize it and then cherry pick the patient records to only include children under 18 who live in New York City.
Let’s see how we can do it easily with Synthea.
First you will need to download the Synthea jar and save it locally.
Then run the following command:
java -jar synthea-with-dependencies.jar -a 0-18 -p 100 New York "New York" --exporter.fhir.use_us_core_ig true
You now have your fully safe synthetic FHIR R4 dataset specifically for the application at hand!
With Synthea, you have the tools to easily create specific, realistic patient data for your healthcare applications, enhancing your application development efforts while maintaining data privacy and security.



Leave a Reply