The tension between data privacy and the utility of data for AI applications is growing. This stress is particularly apparent in the healthcare sector, where protecting patient data is paramount. Yet AI in the sector has the potential to find new cures, improve diagnosis, prevent illness, and even improve care for elderly and at-risk populations. Now, however, according to multiple AI in healthcare researchers interviewed by NewtonX, a new kind of machine learning could erase the gap between privacy and training data.
Deep learning algorithms used for drug discovery and diagnosis require an immense amount of data to learn. The algorithms become more and more accurate the more data they receive. Giving the algorithms enough data (with enough diversity) would require numerous health organizations to partner up and share data. However, in both the US and the UK, citizens have been intensely critical of the idea of hospitals and research institutions centralizing incredibly sensitive data — and then handing it off to tech companies that don’t have a particularly good track record with data privacy.
Because of this, tech giants have been looking for ways to fully anonymize training data in foolproof ways. They aim to gain access to sensitive data without the risk of breaches or de-anonymization.
Enter: Federated Learning
Federated machine learning is a type of distributed ML, that consists of training separate on-site algorithms and then combining them. The model first emerged in 2017, when Google used it to train its predictive text model on the text messages typed by all Android users. They did so without ever reading the messages or exporting the data from the users’ phones.
For healthcare, federated learning would consist of training algorithms using the data stored on-site at hospitals and research institutions. After the local models receive training, they go to a central server (probably owned by a tech company). Here, they are combined together into one master algorithm. The master then returns to each hospital or research center. Here, it’s updated with new data acquired over time, and then sent back to the central server. Because the algorithm would receive its training at the hospital, the local patient data would never touch a tech server. Nobody can reverse engineer the algorithms. Consequently, the raw data is inaccessible to the tech company or a malicious third party.
This new approach is promising. IBM Research is using federated learning to advance healthcare applications. Additionally, a Google-backed startup called OWKIN is using it to predict patient survival rates and reactions to drugs and treatments. In the Netherlands, the Personal Health Train (PHT) initiative works to connect distributed health data through federated learning by brining algorithms to hospitals, instead of bringing data to tech companies.
However, the technology is not without its drawbacks.
Why Aren’t All Hospitals and Tech Companies Using Machine Learning All the Time?
The first instance of using federated learning was relatively recent. It didn’t catch on in the industry until companies started looking for solutions to privacy protection for healthcare data.
The reason tech companies were reluctant to implement it is that federated learning requires standardizing data collection across every separate entity (hospitals and research centers). It also requires each of these entities to have the infrastructure and personnel on-site for training the machine learning algorithm. Additionally, there’s the risk that combining separate models could result in a master algorithm worse than each of its parts.
Despite these challenges, however, the expected payoff from AI in healthcare — lower costs, more accurate diagnoses, and predictive medicine — is so great that tech companies and hospitals/research institutions have strong incentives to work on overcoming the challenges. According to the NewtonX healthcare market research survey, while some hospitals and institutions have already begun to implement the necessary infrastructure, there will be a heavy push toward getting onboarded with federated learning over the next two years. By 2025 we will see massive advances and applications for AI in healthcare.