Cassava Leaf Disease Classification — Part I

Yuyang Li
5 min readFeb 22, 2021

Work conducted by Haoda Song, Siyuan Li, and Yuyang Li

Github: https://github.com/siyuanli1202/DATA2040_Midterm_Project

Background

The cassava plant is a woody shrub native to South America of the spurge family, Euphorbiaceae. It is the third-largest source of food carbohydrates in the tropics, after rice and maize. And it is a major food in the developing world, providing a basic diet for over half a billion people. However, the major challenge is that it is vulnerable to a broad range of diseases.

The goal of this project is to identify a problem with a cassava plant using a photo from a relatively inexpensive camera. We aim to distinguish between several diseases that cause material harm to the food supply of many African countries. Our task is to classify each cassava image into four disease categories or a fifth category that represents a healthy leaf, which would potentially be beneficial to local farmers.

Exploratory Data Analysis

In this project, our dataset contains 21,367 labeled images that were collected during a regular survey in Uganda. For the predicting labels, there are five classes presented. In the following, we related each number class to the corresponding diseases.

0 — CBB — Cassava Bacterial Blight

1 — CBSD — Cassava Brown Streak Disease

2 — CGM — Cassava Green Mottle

3 — CMD — Cassava Mosaic Disease

4 — Healthy

Figure 1 distribution of predicting labels

The distribution of predicting labels is illustrated in the above graph. We can see that the data is in an unbalanced form. We have 1087 cases for Cassava Bacterial Blight (CBB), 2189 cases for Cassava Brown Streak Disease (CBSD), 2380 cases for Cassava Green Mottle (CGM), 13158 cases for Cassava Mosaic Disease (CMD), and 2577 cases for Healthy condition.

Class 0: CBB — Cassava Bacterial Blight:

Figure 2 CBB

Cassava Bacterial Blight (CBB) is frequently caused by the Xanthomonas axonopodis. The disease, Originally discovered in Brazil in 1912, has demonstrated a common symptom as necrotic spotting of the leave. These spots started as distinguishable moist, and brown lesions normally restricted to the bottom of the plant until they enlarged and coalesced. In the end, they often killed the entire leaf. In fact, Bacterial Blight causes the largest losses in terms of yield(André Antoine Fanou, Valerien Amégnikin Zinsou and Kerstin Wydra 2017).

Class 1: CBSD — Cassava Brown Streak Disease:

Figure 3 CBSD

Cassava brown streak virus disease (CBSD) is a damaging disease of cassava plants and is especially troublesome in East Africa. First identified in 1936 in Tanzania, it was found that two distinct viruses are responsible for the disease: cassava brown streak virus (CBSV) and Ugandan cassava brown streak virus (UCBSV). The common symptom is often described as a yellowish, mottled appearance of the leaves(CABI).

Class 2: CGM — Cassava Green Mottle:

Figure 4 CGM

Cassava green mottle virus (CGMV) is a plant pathogenic virus of the family Secoviridae. The common symptom of the disease is that young leaves are puckered with faint to distinct yellow spots, possessing mosaic and green patterns, and twisted margins. Occasionally, plants are severely stunted(Helen Tsatsia & Grahame Jackson).

Class 3: CMD — Cassava Mosaic Disease:

Figure 5 CMD

Cassava mosaic disease (CMD) was originally recorded from East Africa in 1894, which is caused by the virus under the family Geminiviridae and the genus Begomovirus. The common symptoms include a chlorotic mosaic of the leaves, leaf distortion, and stunted growth. Meanwhile, leaf stalks can have a characteristic S-shape(Patrick Chiza Chikoti).

Class 4: Healthy:

Figure 6 Healthy

This class demonstrated the leaves with no above four kinds of diseases.

Baseline CNN Model

The figure 7 and 8 below show our baseline convolutional neural network model. The initial model uses a 3x3 convolutional layer with 32 filters and ReLu activation function, followed by 2x2 max pooling layers and dropout rate 0.2. Two more convolutional layers with same structure, but different filters (64, 128) are applied. Then a flatten layer is used, followed by two dense layers. The first dense layer is formed by 384 neurons with ReLu activation function. The output dense layer is formed by 5 classes with Softmax activation function.

Figure 7 CNN Model Architecture
Figure 8 CNN Code Block

Tensorboard

Figure 9 shows the trend of accuracy as the number of epochs increases. The yellow line shows the training accuracy while the blue line shows the validation accuracy. Training accuracy is relatively static with a slightly increasing trend while validation accuracy has some fluctuations. Both accuracies reach around 73% eventually.

Figure 9 epoch-accuracy visualization

Figure 10 shows the trend of loss as the number of epochs increases. The yellow line shows the training loss while the blue line shows the validation loss. Training loss is relatively static with a slightly decreasing trend while validation loss has some fluctuations.

Figure 10 epoch-loss visualization

Next Steps

Overall, our baseline model performs quite well with a validation accuracy of around 73%. The next steps include handling imbalanced data, improving the model performance by tuning different hyperparameters, such as activation functions, optimizers and dropout rate, etc. Another approach we plan to use is to try different neural network architectures, such as GoogLeNet, EfficientNet.

Kaggle: https://www.kaggle.com/haodasong01/data2040-deep-divers

References

  1. Cassava Bacterial Blight: A Devastating Disease of Cassava: https://www.intechopen.com/books/cassava/cassava-bacterial-blight-a-devastating-disease-of-cassava
  2. Invasive Species Compendium: https://www.cabi.org/isc/datasheet/17107
  3. Pacific Pests and Pathogens — Fact Sheets: https://www.pestnet.org/fact_sheets/cassava_green_mottle_068.htm
  4. Cassava mosaic disease: a review of a threat to cassava production in Zambia: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6951474/
  5. GoogLeNet: https://www.kaggle.com/luckscylla/googlenet-implementation
  6. A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks: https://www.sciencedirect.com/science/article/pii/S1110866520301110

--

--