The quality of any AI application is highly dependent on the data feeding its model. The U.S. healthcare system, a prime candidate for AI applications, is still starved for comprehensive and representative real-world health data that is standardized, proactively shared, and easily accessible.
There is no lack of enthusiasm for the potential beneficial impact of AI on the practice of medicine, the health of patients, and the productivity of the healthcare sector. For example, the medical publisher NEJM Group, recently announced NEJM AI, a new journal that will “identify and evaluate state-of-the-art applications of artificial intelligence to clinical medicine.”
The New England Journal of Medicine (NEJM) itself started a series of articles on “AI in Medicine,” stating that “medicine stands out as one [field] in which there is tremendous potential” for AI along with “equally substantial challenges.” Among these challenges is “a mismatch between the data set with which an AI system was developed and the data on which it is being deployed.” In other words, failing to apply an AI model to all patients, not just those who are similar to the patients on which the AI model was trained.
Unfortunately, there has been very little progress in addressing this challenge in the U.S. Data silos still flourish and a national infrastructure for open health data is non-existent. The recent annual report from the Office of the National Coordinator for Health IT (ONC), lists a number of existing barriers to realizing “the full potential of certified health IT.” These include the “insufficient progress on electronic health information sharing,” the “fragmented state/regional health information exchanges (HIEs),” and the “few incentives for health IT and data exchange adoption for certain portions of the care continuum.”
I would argue, however, that the lack of incentives for data interoperability, for making sure that the digitized records of one healthcare system are securly shared with all other healthcare systems in the U.S., applies to the entire U.S. healthcare continuum.
“All healthcare is delivered locally,” says Paul Howard, Senior Director of Public Policy at Amicus Therapeutics. To solve the serious problem that the resulting healthcare data by and large stays locally (i.e., in the healthcare system in which it was created), “we need a forcing function for standardization and we need to establish the right incentives,” says Howard.
In “Data silos are undermining drug development and failing rare disease patients,” Howard and other researchers described the data challenge as follows: “…the tendency of various stakeholders to balkanize databases in proprietary formats, driven by current economic and academic incentives, will inevitably fragment the expanding knowledge base and undermine the current and future research efforts to develop much-needed treatments.”
While this statement is made in the context of rare diseases where the lack of comprehensive and representative data is particularly acute, it captures well the challenge and significance of dismantling data silos in general, as well as the cost to the entire range of healthcare research and practice: “This system also encourages the collection of redundant data in uncoordinated parallel studies and registries to ultimately delay or deny potential treatments for ostensibly tractable diseases; it also promotes the waste of precious time, energy, and resources.”
For Howard, the solution to creating the kind of data infrastructure that successful AI solutions require, boils down to “how do we get the U.S. healthcare system reoriented around building high-quality, interoperable machine-readable data sets that can be used to develop and validate AI algorithms?” He suggests focusing on reimbursements as the best way to re-align incentives. “What gets measured gets done, and what gets done gets paid for,” says Howard. “Let’s identify high priority projects that are critically important for the public and use them as test beds for driving these tools forward and encouraging organizations to build bigger, higher-quality data sets.”
For rare disease research and development, Howard and other researchers recently proposed several initiatives for developing non-proprietary patient registries, improved data standardization, global regulatory harmonization, and new business models that encourage data sharing and research collaboration “as the default mode.”
The lack of data sharing and collaboration is the result of “technology policy and psychiatry,” says Dr. John Halamka, President of Mayo Clinic Platform. There have been many technology barriers to sharing data, buttressed by policies regarding patient privacy and data security. In addition to these built-in hurdles, Halamka points to human foibles, the “ability of organizations to collaborate rather than compete.”
In “Moving towards vertically integrated artificial intelligence development,” Halamka and other researchers explained that the vast quantity of clinical AI research has not resulted in “widespread translation to deployed AI solutions” because of the focus of the research on “optimising architecture and performance of an AI model on best available datasets.” This “model-centric” approach fails when tested in a healthcare setting “due to unpredictability of real-world conditions, out-of-dataset scenarios, characteristics of deployment infrastructure, and lack of added value to clinical workflows relative to cost and potential clinical risks.”
While the paper describes an improved process for developing AI models that actually work in real-world, specific healthcare environments, it may well be that a data-centric approach to developing AI solutions, derived from a national open data infrastructure, could result in successful deployments in all healthcare settings.
“We’ve made progress, but there’s still so much to do,” Halamka sums up the current state of shareable, standardized health data. And he is encouraged by the emergence of new collaborative attitude induced by the recent pandemic: “Covid changed the landscape. Organizations that would never work together found that unless they did real-world evidence gathering and cooperation, we couldn’t get through it.”
We see the growing realization of the necessity of breaking down data silos and upgrading the development of healthcare AI in the formation of industry-wide associations and in new federal government initiatives.
Halamka is a co-founder of The Coalition for Health AI (CHAI) which is developing guidelines and guardrails to drive high-quality healthcare by promoting the adoption of credible, fair and transparent health AI systems. On April 4, CHAI released a blueprint for trustworthy AI which, among other things, calls for an “integrated data infrastructure to support discovery, evaluation, and assurance related to health AI.”
Howard is a member of the Executive Board at the Alliance for Artificial Intelligence in Healthcare (AAIH) which brings together technology developers, pharmaceutical companies, and research organizations to establish responsible, ethical, and reasonable standards for the development and implementation of AI in healthcare. A recent AAIH white paper stated: “There is an urgent need to facilitate access to healthcare data to fully utilize the potential of AI in healthcare. The collection, organization, protection, compliance, and dissemination of data is both an issue, and an opportunity, in all fields.”
The U.S. federal government is slowly moving—but nevertheless moving—towards fulfilling its 2016 promise of health data interoperability, the 21st Century Cures Act. For example, it has established the Trusted Exchange Framework and Common Agreement (TEFCA), a new health information exchange framework which recently onboarded leading EHR vendor Epic, creating “a single on-ramp toward universal interoperability.”
All of these initiatives hopefully will contribute to the creation of widely-shared open data standards developed by participants in the healthcare ecosystem, in addition to or in conjunction with government-mandated data exchange practices and the development of new incentives for data sharing driven by new reimbursement requirements.