Artificial intelligence (AI) is increasingly being applied in medical diagnostics to improve on the accuracy of human observation using machine learning (ML) algorithms. AI-driven software was recently found to perform better than 58 clinical dermatologists in accurately detecting skin cancer.1 Although these findings could possibly save thousands of lives, specific concerns have been raised about the probability that these algorithms may misdiagnose skin lesions in minority populations, as the data used to build the algorithms comes from decades of clinical trials conducted almost exclusively in fair-skinned populations.
Neurology Advisor spoke with Adewole S. Adamson, MD, MPP, a dermatologist at the University of North Carolina at Chapel Hill, who coauthored an editorial in JAMA Dermatology on strategies to compensate for racial disparities that drive AI skin cancer detection protocols. We also spoke with Kun-Hsing Yu, MD, PhD, an instructor in biomedical informatics at Harvard Medical School in Boston, Massachusetts, for a better understanding of the technological hurdles being encountered in AI.
Neurology Advisor: What are the capabilities of ML technology to improve clinical practice in areas such as detection of skin cancer?
Dr Adamson: AI technology has the ability to potentially alert clinicians and patients to moles that are concerning for skin cancer. If designed appropriately and used in the correct setting, AI has the potential to help in efforts at early detection.
Neurology Advisor: Inadequate healthcare data sets are problematic in nearly all ML platforms being developed today. How can the AI programs be designed to identify gaps in data sources?
Dr Adamson: The problem with AI algorithms is not an issue with the technology itself, but an issue with what type of data it is being trained on. Therefore, AI algorithms should be trained on all skin types, not just light skin.
Dr Yu: It’s very hard to develop an effective ML model when we are working with biased data. In my laboratory, we are working on a few ML applications specifically for pathology diagnosis. To train supervised ML models, we have to collect the training data and have it labeled. If the collection of the data or the labeling is biased, the ML model would just relearn the bias we have recorded in the data.
Our solution is the same as that pointed out in Dr Adamson’s editorial, which is to obtain or corroborate with other sources of data and information on other ethnicities, so we can have a more representative collection of data, in which case we would be able to expect that the outcome of the trained ML model would be able to learn the diversity of skin types from different ethnicities.
Neurology Advisor: Inclusion criteria for melanoma clinical trials is the past specifically focused on light-skinned populations. Is there any way for the algorithm to be able to identify that there is an ethnic gap in the data?
Dr Yu: Yes, that is possible. First, we can evaluate the collected data and see whether there is any obvious bias there. For instance, we know here that skin color is an important determinant of the image patterns, and we also know that people of color are underrepresented in many data sets. We can weigh minority cases more in the model training process to minimize imbalances and potential bias, to try to compensate for differences between ethnic populations.
Neurology Advisor: Once identified, are there any ways to improve the algorithms in the absence of necessary data? Because studies of skin cancers in darker skin types are largely lacking, can AI-driven programs be taught to correct for misreading of imaging data that are available?
Dr Adamson: I believe AI algorithms could be fine-tuned with more clinical data. Right now, AI algorithms only take into account lesion appearance without clinical context. Dermatologists use a lot more information to make assessments about whether or not to biopsy a lesion. For example, we consider factors such as age, history of sun exposure, previous skin cancer history, family history, and immune system status in our decision making. If these types of factors could be taken into account in building AI algorithms, the accuracy of AI technology could be improved.
Neurology Advisor: Is there some way to offer guidelines to improve the data from current clinical trials to get the output you need for ML training data?
Dr Yu: Yes; there are some things we can do to work around gaps in the current clinical trial data. We can look at real-world data from clinical records and health insurance data and identify the degree of discrepancy between different populations in terms of disease prevalence or treatment response rates. Once we identify the discrepancy, we can ask more focused clinical research questions targeting skin cancers in different ethnic groups, and carry out additional clinical studies on those focused questions.
Neurology Advisor: How can ongoing studies be modified to better capture data that will be used to drive similar ML-based protocols? And how can clinicians contribute to improving both the data and the approach to the algorithms?
Dr Adamson: Large dermoscopic archives should actively solicit images from people of all skin types. The data should also be supplemented with clinical information so that algorithms can improve and results be more accurate. Curators of these databases should make it as easy as possible for clinicians to contribute dermoscopic images. Clinicians need to understand that AI technology is not magic and that it is only as good as the data it uses to learn on.
Neurology Advisor: AI has changed the way trials need to be designed to capture better data for ML protocols. Going forward, are there suggestions from an ML standpoint that can inform clinical trials to gather better data?
Dr Yu: Moving forward, to have better quality data designed for ML training purposes, we would have to put additional thought into how we design clinical studies. We would have to think about demographic variables to have the right balance, and identify the patient population that would benefit from the AI model. We would have to think about our end goal. If we want to apply the ML models to everyone, then we have to solicit participants from all ethnic groups. Alternatively, if we plan to design separate models for people with different skin colors, we would have to define the models to target these different populations.
Neurology Advisor: What is the AI capability in the future for clinical trials? Can the ML algorithms teach themselves in real time to ask the right questions?
Dr Yu: That’s very possible: we just have to design this kind of system. For example, we can ask the machine to read medical records and their diagnostic codes such that whenever there is a mistake in the diagnosis, or whenever the treatment doesn’t work, the machine would be able to capture these signals and use them to refine the ML model. An ML algorithm can continue to learn once it is given a new piece of information, and the machine may eventually be able to update itself periodically or even in real time to improve the model for greater accuracy. We can leverage the agility of ML to point out unmet clinical needs and improve clinical practice.
1. Haenssle HA, Fink C, Schneiderbauer R, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol. 2018;29:1836-1842.
2. Adamson AS, Smith A. Machine learning and health care disparities in dermatology [published online August 1, 2018]. JAMA Dermatol. doi: 10.1001/jamadermatol.2018.2348