Skip to main content
SearchLogin or Signup

Review 1: "COVID-19 Classification of X-ray Images Using Deep Neural Networks"

The study employs deep learning techniques to classify chest X-ray images with or without COVID-19. While the techniques were generally accurate, reviewers expressed concern over missing elements that would strengthen the conclusions suggested by the study.

Published onOct 22, 2020
Review 1: "COVID-19 Classification of X-ray Images Using Deep Neural Networks"
1 of 2
key-enterThis Pub is a Review of
COVID-19 Classification of X-ray Images Using Deep Neural Networks

In the midst of the coronavirus disease 2019 (COVID-19) outbreak, chest X-ray (CXR) imaging is playing an important role in the diagnosis and monitoring of patients with COVID-19. Machine learning solutions have been shown to be useful for X-ray analysis and classification in a range of medical contexts. The purpose of this study is to create and evaluate a machine learning model for diagnosis of COVID-19, and to provide a tool for searching for similar patients according to their X-ray scans. In this retrospective study, a classifier was built using a pre-trained deep learning model (ReNet50) and enhanced by data augmentation and lung segmentation to detect COVID-19 in frontal CXR images collected between January 2018 and July 2020 in four hospitals in Israel. A nearest-neighbors algorithm was implemented based on the network results that identifies the images most similar to a given image. The model was evaluated using accuracy, sensitivity, area under the curve (AUC) of receiver operating characteristic (ROC) curve and of the precision-recall (P-R) curve. The dataset sourced for this study includes 2362 CXRs, balanced for positive and negative COVID-19, from 1384 patients (63 +/- 18 years, 552 men). Our model achieved 89.7% (314/350) accuracy and 87.1% (156/179) sensitivity in classification of COVID-19 on a test dataset comprising 15% (350 of 2326) of the original data, with AUC of ROC 0.95 and AUC of the P-R curve 0.94. For each image we retrieve images with the most similar DNN-based image embeddings; these can be used to compare with previous cases.

RR:C19 Evidence Scale rating by reviewer:

  • Reliable. The main study claims are generally justified by its methods and data. The results and conclusions are likely to be similar to the hypothetical ideal study. There are some minor caveats or limitations, but they would/do not change the major claims of the study. The study provides sufficient strength of evidence on its own that its main claims should be considered actionable, with some room for future revision.


Review Summary: 

The study employs five deep learning techniques to classify CXR images with COVID-19 or non-COVID-19 labels with and without a late fusion technique. The images from the tested dataset were extracted from the same X-ray machine and the results are reliable, but could be more deeply investigated and discussed.


The manuscript by Goldstein et al. presents an experimental study regarding the use of different deep learning classification techniques (ResNet34, ResNet50, ResNet15, Chexpert and VGG16) to identify COVID-19 in X-ray images. They have also tested an ensemble (also known as late fusion) of the deep learning techniques using a majority voting combination schema. Before the training/testing steps, they propose a preprocessing phase that is comprised by Data Augmentation and Lung Segmentation techniques. Furthermore, their approach also provides a nearest neighbor mechanism in the end of the classification process in order to provide four similar images to the user, which can give physicians references to previous patients that had lung findings similar to the analyzed ones.

The proposed approach is valid and the results can be considered reliable. However, the paper does not present a strong novelty, since this kind of study was already published in the literature (Apostolopoulos and Mpesiana, 2020), (Altan and Karasu, 2020), (Brunese et al., 2020), (Civit- Masot et al., 2020), (Makris et al., 2020), and so on. Moreover, the authors did not do a deep literature review and do not provide sufficient related work.

The dataset used in the experiments is the strong point of the paper. The 2427 CXR images were collected from 1384 patients of four hospitals in Israel, which were taken from the same portable X-ray machines. Since the CXR images were extracted from similar machines, the dataset bias in the classification results (which is a main drawback in this type of study, as shown in Maguolo and Nanni (2020)) is minimized. Nevertheless, the dataset used in the experiments could be made freely available for download by the authors, since it would be very interesting and helpful to other machine learning researchers.

The best results were achieved with the ensemble schema, which is acceptable, since the combination of the predictions may be complementary to each other. However, the authors could make more experimental studies concerning the possible combinations of the deep learning techniques into the ensemble. As they have only tested the combination of all the classifiers in once, it is not clear that combining only a few of them do not improve the results (maybe even more then using all of them).

There is a confusion in the results table, since the results in bold are not the best results, as they claim. The authors have named the evaluation metrics section as “statistical analysis,” which is conceptually misleading.

The Data Augmentation and the Lung Segmentation techniques could be more deeply investigated. For instance, the authors can show the classification results for all the deep learning models without using these preprocessing techniques. Thus, they could show how these techniques can impact the model’s learning process (the authors provide only the results for ResNet50 without any preprocessing). The use of Explainable AI (XAI) techniques in the lung segmentation process, such as in Teixeira et al. (2020), can be useful to confirm that the segmentation technique is in fact contributing to the identification of pneumonia spots in the lungs.

In general, the authors did a good job doing a qualitative analysis of the model. In this analysis, they have shown a confidence of the model, since they have computed a classification score histogram of the probabilities and a t-distributed Stochastic Neighbor Embedding (t-SNE). This analysis makes their experimental results more reliable.

Even though the discussion is fair and the insights are valid, the paper is missing a deeper discussion, with non-parametric statistical analysis over the results. In the current version of the manuscript, the discussion section is acting as a conclusion section, since there is no conclusion section in the paper.


Minor Revise


1.         Apostolopoulos, I.D. and Mpesiana, T.A., 2020. Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Physical and Engineering Sciences in Medicine, p.1.

2.         Altan, A. and Karasu, S., 2020. Recognition of COVID-19 disease from X-ray images by hybrid model consisting of 2D curvelet transform, chaotic salp swarm algorithm and deep learning technique. Chaos, Solitons & Fractals, 140, p.110071.

3.         Brunese, L., Mercaldo, F., Reginelli, A. and Santone, A., 2020. Explainable deep learning for pulmonary disease and coronavirus COVID-19 detection from X-rays. Computer Methods and Programs in Biomedicine, 196, p.105608.

4.         Civit-Masot, J., Luna-Perejón, F., Domínguez Morales, M. and Civit, A., 2020. Deep Learning system for COVID-19 diagnosis aid using X-ray pulmonary images. Applied Sciences, 10(13), p.4640.


No comments here