How Artificial Intelligence Can Now Detect Bone Fractures From Osteoporosis

Jerry Wei
Health Data Science
4 min readJun 26, 2019

--

I read a research paper in which the authors used deep neural networks to automatically detect osteoporotic vertebral fractures on CT scans (link). Let’s examine what they did.

Photo by Ken Treloar on Unsplash

Osteoporotic Vertebral Fractures. Abbreviated as OVFs, Osteoporotic Vertebral Fractures are fractures that occur in the spine as a result of osteoporosis. You keep saying “osteoporosis”, but what is that? Osteoporosis is a disease where bones become fragile and filled with holes. This means that the bones are losing their density and quality — the main issue is that osteoporosis happens over a long period of time; there are often no symptoms shown until the first fracture happens. Not only is osteoporosis very difficult to detect, but it is also extremely prevalent in the general population. This disease sees more than 3 million cases in the United States each year, and women older than 50 are most likely to get the consequent spine fractures. In fact, spine fractures that result from osteoporosis will happen to about 40% of women by age 80. So how do we currently tell if someone has OVFs? The current standard to identify spine fractures is through CT scans and X-rays, which will be manually looked at by medical professionals. Knowing that, let’s talk about the innovation…

CT scans identified as positive for OVF by the deep learning model. The heatmaps show regions that were strongly associated with the model’s prediction. Image credits to Tomita et al., the original authors of the paper.

Data Collection. The authors used over one-thousand CT scans from the Dartmouth-Hitchcock Medical Center, applying a filter so that the CT exam had to be of the chest, abdomen, and pelvis. If the CT scan showed “compression deformity or compression fracture”, then that scan was marked as positive for Osteoporotic Vertebrae Fracture. The other scans were marked as negative for OVF. The exams were then randomly split such that 80% was used as a training set, 10% were used as a validation set, and 10% were used for testing.

Deep Learning Model. The authors created a system with two major components: a Convolutional Neural Network (CNN) for feature extraction and a Recurrent Neural Network (RNN) to use the extracted features to make a final prediction. The CNN was a 34-layer Residual Network (ResNet), which is a state-of-the-art machine learning model for image processing. The RNN was made of Long Short-Term Memory (LSTM) networks, which have the capability to identify and ‘remember’ important information through special gates that control information weights.

A representation of the authors’ deep learning model, where the CNNs are feature extractors and the LSTMs are feature aggregators. Image credits to Tomita et al., the original authors of the paper.

Results. The authors tested their model on a test set comprised of images the model has never seen before with the following results. The deep learning model in this paper achieved the best overall accuracy (89.2% accuracy) in comparison to other baseline models (highest accuracy of 88.4%) and even outperformed professional radiologists (88.4% accuracy). The authors also plotted Receiver Operating Characteristic (ROC) curves for the models, and their neural network’s curve was similar to that of radiologists, which is further evidence that the model’s performance was on par with practicing radiologists.

Performance of the paper’s model in comparison to that of radiologists — the model proposed by the paper achieved the highest overall accuracy. Credits to Tomita et al., the original authors of the paper.

Conclusion. The authors showed that their CNN→RNN model was capable of doing Osteoporotic Vertebrae Fracture diagnosis with similar performance to that of practicing radiologists. They note that most commonly, OVFs are missed by radiologists (false negative) because of “staff shortages and/or excessive workload”. Their machine learning model would be able to reduce time and manual burdens that are placed on radiologists and could also improve overall diagnostic accuracy.

While the world may not be ready for artificial intelligence to completely conduct medical diagnoses, perhaps sometime in the future we will be examined by machines with the same (if not better) capabilities as humans.

I’ve put some additional resources below that may be interesting to look at.

--

--

Jerry Wei
Health Data Science

Large language models, AI for health. Research Engineer at Google DeepMind.