Machine Learning Applied to Medical Diagnosis

Vivek Rastogi | May 05, 2020


The world is currently facing a health and economic crisis due to the appearance of the COVID-19 virus. People’s lifestyles have changed completely. You can no longer visit friends or family, go to the office regularly, go shopping or take part in any social activities. Specialists around the world are researching a way to treat the virus so that things can go back to normal. Thanks to technological development, improvements in processing power, maths, and great quantities of available data, algorithms and models have been developed to predict the probability of contracting diseases such as Covid-19.

Machine Learning and Illness

Using machine learning to predict illness is not something new. Before the appearance of COVID-19, it was already being used to study other diseases. For example, machine learning can be used to predict if a person has breast cancer or not based on ultrasound or X-ray images (such as those shown in Figure 1). This type of problem is approached using supervised learning classification since what you want to know is if there is cancer present or not (a discrete binary label). In supervised learning, the machine learning algorithm looks for a set of rules that allow it to deduce the general characteristics of elements within a group with the objective of applying the same label to similar elements. In this way, when the computer is given a completely new image, it will be able to predict the correct label (for example, if there is cancer present or not) based on ‘previously acquired experience’. As an added value, it’s possible for the algorithm to state why it classified an image in the way it did, generating valuable knowledge for health experts. Machine learning models could also show doctors the specific places in the image where breast cancer may be found in cases where it exists.

Screen Shot 2020-04-30 at 10.27.04 AMFigure 1. Conner, A., Gordon, S. and Gordon, R., 2019. Using AI To Predict Breast Cancer And Personalize Care. MIT News. Taken from:

Another well-documented healthcare example where machine learning has been applied is in the prediction of Alzheimer’s Disease. In this case, using a set of audio recordings, the machine learning model looks for patterns in the speech of patients with this disease. The analysis is based on the pauses between words, pronunciation, and the frequency and amplitude of sounds. In this way, the model helps the expert in geriatric medicine to identify early symptoms of Alzheimer’s in the way the person talks or expresses his/herself.

Current Machine Learning Applications for COVID-19

At the moment, there is no sufficient, reliable, and/or quality data to be able to make predictions related to COVID-19. And if there was, it would be considered private since it relates to personal information about an individual’s health. It is important to note that when machine learning is applied to a problem, it needs vast amounts of data to work with; if this requirement is not met, then not even the best algorithm in the world will be able to come up with reliable predictions. When it comes to sensitive matters such as human health, this is a risk that cannot be taken.

Kaggle, a page that provides different sets of data for machine learning experimentation, has released labeled data to predict the probability of contracting COVID-19. Table 1 shows part of a set of data, with predictive or descriptive columns. You can observe details such as age, gender, symptom onset date, symptom confirmation date, and travel history (places and dates). These details are considered by experts as potential predictors to determine if a person has been infected with COVID-19.

Screen Shot 2020-04-30 at 10.30.04 AMTabla 1. Bhopalwala, B., 2020. COVID-19 & Machine Learning. [jpeg] Medium. Taken from:

In China, citizens are monitored by the government using the application WeChat. During the first weeks after the appearance of COVID-19 in Wuhan, the country started to gather information similar to that presented in the table above for each individual, and an application called Health Code was improved. The latter uses a machine learning model to predict the probability of contracting COVID-19. Therefore, if a person with a high probability of being infected or of being a carrier of COVID-19 leaves their house, then the app will tell them they must remain in isolation. If the person does not heed the warning, then authorities are immediately notified and the person is fined for failure to comply with a health warrant.


There are at least three myths when it comes to applying machine learning to help solve health problems:

  • Machine learning can replace a doctor: although machine learning helps to predict the probability of contracting a disease, this does not replace all the work a specialist does. For example, machine learning can help identify if someone has cancer or not but the care and treatment a person receives are determined by the doctor.
  • The combination of ‘big data’ and an excellent data scientist will always be successful when predicting if a person can contract a disease: these two factors are very important but in many cases, the data is not good quality or no patterns may be found in it. Similarly, when a model is training using a set of data, it will find patterns within that set. However, it is possible that these patterns may not apply to new data.
  • Doctors use recently discovered models: in many cases, updating machine learning models is not part of a doctor’s line of work. Finding time for a medical specialist to help with the verification of data and results can be difficult. Similarly, it is risky to change a verified trained model for one that might not work. In the same manner, continuing to use a trained model that no longer makes reliable predictions can be dangerous, especially when human healthcare is involved.


Applying machine learning to predict the risk of contracting illnesses is not new. Its use to diagnose, prevent, and treat diseases does not mean health specialists will disappear. Machine learning is a powerful tool that helps doctors save lives. Currently, the use of machine learning to help in the prediction of COVID-19 infection is being developed, although there is still not enough available data nor can its quality be verified. In fact, BBC World (2020) reports that the advance of the pandemic is so fast it makes collecting data difficult. But there is a future possibility that accurate predictions on the probability of being infected, the probability of complications once infected, and the probability of recovering from the COVID-19 virus may be achieved.

Contact Us

Insight Content

Share this Post

Featured Insights