Artificial Intelligence has to be trained. AI expert Laura Fink explains in this second part of “Train as you fight” how an AI algorithm learns to distinguish between cultivated plants and weeds using deep learning and computer vision, especially when data quality is poor.
Artificial intelligence is now increasingly conquering the fields we work in. In the agricultural sector research of intelligent technologies, such as deep learning and computer vision, aims to increase revenues and to determine exact harvesting times. Costs can be reduced and processes can be optimized by analysing and calculating whether crops need to be watered and to what degree or which diseases are proliferating on the field. Detecting and removing weeds at an early stage can also help young plants to grow. In our computer vision showcase, we demonstrate that deep learning can help differentiating different types of plant.
Image data – the basis for deep learning & computer vision
For this showcase, we have used publicly available image data of the Computer Vision and Biosystems Signal Processing research group of the University of Aarhus. The data contains images of twelve different wild and cultivated plants which can frequently be found on Danish fields.
It is not just the quantity and the quality of the data that plays a major role in the successful application of AI. When we gather the data, we also have to be careful that no unknown error sources have crept in.
In our first part of Train as you fight, we have already shown that the image data of the research group entails what is known as target leakage: In the training data, for predicting plant types successfully, it was sufficient to rely on the size of the stones in an image’s background. By using tools such as LIME or SHAP to explain machine learning forecasts, we have shown you how such erros can be detected and how data science workflows can be debugged.
Feature extraction – focusing on the essential
How can we prevent the AI algorithm from using the size of the stones to help differentiate various types of plant? It is us humans that have to ensure that the learning algorithm focuses on the essential.To do this, the stones must be removed from the background. Not manually, of course, but digitally. To achieve this, we utilise the fact that the three values which are stored in a pixel depend on the chosen colour space. Switching from the usual RGB space (red-green-blue) to the LAB space (brightness, red-green, blue-yellow), the plants in the red-green channel are clearly distinguished from the stones and we can remove the stones from the image using a simple threshold value procedure!
Transfer learning – retraining the AI algorithm
With deep learning, artificial neural networks with many neural layers are used, which account for the “deep”-part of the term. Due to their considerable complexity and flexibility, they require very large quantities of training data to learn successfully. Those who, like us, only have little image data at their disposal, would be better advised to use an algorithm which has already been trained to solve a similar task.
This popular procedure is called transfer learning: It is based on the observation that the first neural layers in an artificial neural network extract general properties from an image which can be of significance: corners and edges, textures and simple patterns. It is only in subsequent neural layers that task-specific objects or sub-objects are extracted which are required to solve a specific task – in our case, determining different types of plant.
Thus, using the first layers of an already trained network and training only the remaining few layers, one can adapt that network to the task at hand. Thus way fewer parameters have to be determined, requiring less time and less data.
Loose silky-bent grass is not saltmarsh rush
With two exceptions, all plant types can be distinguished from each other with a very high score of approximately 99 %. Only loose silky-bent grass is sometimes wrongly classified as saltmarsh rush in a few growth phases. However, these are examples in which the human eye itself is barely capable of detecting the differences. To solve the problem, improved image quality and further plant features would be required.
In part 1 of “Train as you fight”, we have shown how one can prevent so-called undetected errors when gathering and processing data using tools for explaining machine learning results. In this second part, we have shown how to achieve major successes using feature extraction and transfer learning also when dealing with small data quantities. Hence, deep learning and computer vision are useful approaches even when the data conditions are less than ideal.
Find our complete analysis here: https://www.kaggle.com/allunia/computer-vision-with-seedlings
and the copyright information here: https://vision.eng.au.dk/plant-seedlings-dataset/
© 2014 Mads Dyrmann, Peter Christiansen, University of Southern Denmark, and Aarhus University
The images and annotations are distributed under the Creative Commons BY-SA license.
If you use this dataset in your research or elsewhere, please cite/reference the following paper:
PAPER: A Public Image Database for Benchmark of Plant Seedling Classification Algorithms