Rice Foreign Object Classification Based on Integrated Color and Textural Feature Using Machine Learning

Rice Foreign Object Classification Based on Integrated Color and Textural Feature Using Machine Learning

Aji Setiawan* Kusworo Adi Catur Edi Widodo

Doctoral Program of Information Systems, School of Postgraduate Studies, Diponegoro University, Semarang 50241, Central Java, Indonesia

Department of Information Technology, Faculty of Engineering, Darma Persada University, East Jakarta 13450, Jakarta, Indonesia

School of Postgraduate Studies, Diponegoro University, Semarang 50241, Central Java, Indonesia

Department of Physics, Faculty of Science and Mathematics, Diponegoro University, Semarang 50275, Central Java, Indonesia

Corresponding Author Email: 
aji_setiawan@ft.unsada.ac.id
Page: 
572-580
|
DOI: 
https://doi.org/10.18280/mmep.100226
Received: 
23 December 2022
|
Revised: 
2 March 2023
|
Accepted: 
13 March 2023
|
Available online: 
28 April 2023
| Citation

© 2023 IIETA. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

A blend of natural and artificial foreign objects can be used to determine the rice quality. The agricultural industry, particularly rice plants, has demonstrated great success rates for object detection based on image processing. Most food quality studies can be seen from the image shape color and size, and the rice quality can be seen from the absence of the foreign object. HSV color and GLCM texture are used to classify natural and non-natural foreign object images using the support vector machine (SVM) algorithm and other comparison methods, namely decision tree and naive Bayes. The dataset for foreign objects consists of 80 images, 20 of which are each of the following classes: stone, grain, yellow-broken, and red-black. The dataset will be preprocessed to obtain the feature values for color and texture. SVM method with cross-validation, the highest accuracy value is 96.83%, a decision tree is 87.31%, and naive Bayes is 82.54% in detecting natural and non-natural foreign objects. The use of cross-validation techniques with a value of K=5 gives an average accuracy increase of 10% compared to those without cross-validation. These results show that natural foreign objects in different classes can be appropriately detected using a combination of color and texture features in the SVM classification method and cross-validation.

Keywords: 

foreign object detection, GLCM, HSV, image processing, support vector machine, quality of rice

1. Introduction

Rice (Oryza Sativa) is the most widely consumed staple food of the world human population [1], especially in Asia, Africa, and South-American regions. The high demand for the availability of staple rice is a global issue in the SDGs program (sustainability development goals) [2], the purity, size, color, texture, and other characteristics of the rice affect its quality. Rice purity is essential because unclean rice makes us ill, which increases the risk of cost and financial loss [3]. The frustrating and time-consuming traditional inspection method determines the purity of grains and foreign items. The outcome is greatly influenced by the inspectors physical health, including eyesight, mental health, and work stress, and inspectors are very dependent on certain quality control inspectors. This problem occurs during the sampling process in determining the type of rice quality after the filter process. The sampling was done using the traditional method by selecting the rate in 1 kg of rice taken 100 grams as a random sample. Foreign objects are divided into natural foreign objects and non-natural foreign objects. A collection of natural foreign objects, including grain, yellow-broken, red-black, and rice variants mixed with other variants, while non-natural foreign objects such as stone.

Researchers are interested in examining novel techniques for judging rice quality and safety. For the classification of rice, various methods have been used effectively. Among these methods are spectroscopy [4], electronic nose systems [5, 6], and ultra-high-performance liquid chromatography [5]. These approaches are often time and money-consuming; therefore, they could be more satisfactory.

The evaluation of food quality using image processing techniques has been widely used. Several distinguishing characteristics, including color, shape, and texture, can be used to classify food products qualitatively [7]. The application of image processing is, in principle, an effort to extract or improve an image into a new image based on the value of the Region of Interest (ROI) [6]. The image enhancement process is carried out at the preprocessing stage and then continues with the classification stage. In rice seed quality, machine learning has been widely used in analysis to classify agricultural products and other applications. Several studies have discussed this using machine learning techniques. Mittal et al. [8] determined the quality of rice grains by looking at the contours to determine the shape of the oval grains in classifying excellent or bad-quality rice samples. Eliz et al. classified rice leaf disease with SVM based on the results of LBP and HOG [7]. Aznan et al. [9] identified two types of Malaysian rice varieties in the study, which used color characteristics to identify rice images. Abbaspour-Gilandeh et al. [10] used ANN to classify 13 rice cultivars based on color, morphology, and texture with component optimization using principal component analysis (PCA), resulting in a reasonably good level of accuracy.

Chen et al. [11] utilized a decision tree approach to classifying the six A1–A6 parameters (area, perimeter, maximum feret diameter, elongation factor, compactness, and heywood circularity factor) present in grain impurity rice. The training data set obtained from the recorded photos had a classification accuracy of the decision tree of almost 76%. Yang et al. [12] used a dataset of microscope images, and they predicted rice smut and rice blast illness. The watershed approach, distance transformation, and Gaussian filtering were used to separate the adhering rice blast spores, increasing accuracy by about 10%. Three texture features (entropy, homogeneity, and contrast) and four form features (area, perimeter, ellipticity, and complexity) were chosen for decision-tree model classification. The classification accuracy was calculated using the confusion-matrix method, and the resultant accuracy was 82%.

Effective employment of image processing approaches has been reported in the literature for quality measurement of corn leaf disease [13], mango [14], chestnut [15], tomato leaf disease [16], potato [17], and apple [18]. In addition to agriculture, image processing is applied to medical image objects. Adi et al. conducted research in detecting lung cancer using GLCM and backpropagation on 38 microscopic images of the lung with cancer and non-cancer labels. The developed model has a classification accuracy rate of 95 percent during training and 81.25% during testing [19]. The SVM technique for cerebral infarction cases in brain stroke cases was utilized by Rustam et al. [20] with over and under-sampling accuracy of 94%.

As previously explained, many studies have been conducted on agricultural production yields. It is critical to track manufacturing quality and make quality determinations. However, the discussion about foreign objects in the collection of production results is still not studied in depth, especially on rice objects, even though foreign objects can affect the quality level of rice produced. Therefore, in this study, two types of foreign objects were grouped to classify non-natural foreign objects and natural foreign objects.

Compared to other studies, this study uses a variety of foreign rice objects studied, including stone, grain, yellow-broken rice, and red-black rice. The selection of color features using HSV combined with GLCM texture features is carried out in the preprocessing process to obtain the characteristics of each foreign object. Alternative image processing with color features can use RGB, but in the case of rice, the reason for using HSV is because HSV represents more human perception of color than RGB [21]. In some instances, the RGB color description is less accurate due to non-linear colors and non-uniform illumination [22]. As a color feature, HSV generates color vector values based on hue, saturation, and value.

Separating objects from the background, HSV provides greater color depth on foreign objects than RGB. Because they have separate color channels, the natural gain and natural yellow broken classes are challenging to see solely from the RGB color, making it challenging to estimate the threshold value. Foreground objects and backgrounds can be distinguished using HSV values for hue and saturation; the HSV values acquired will be utilized as color-specific features for each class.

In addition, the color vector value and GLCM texture value are merged at angles of 0°, 45°, 90°, and 135° to calculate the ASM, contrast, energy, homogeneity, and ASM values. The vector outputs of these six values will be categorized by analyzing the Support Vector Machine (SVM), Decision Tree, and Naive Bayes classification methods. Cross-validation was performed to ensure the variety and mix of training and testing data.

2. Material and Method

This study focuses on classifying natural and non-natural foreign objects in a rice collection based on the prototype of the developed rice mining tool. A total of 80 RGB images (red, green, blue) were collected, with 20 images in each class. The methodological approach developed is shown in Figure 1.

Figure 1. Proposed methodology

The execution of the epoch and k-fold values marks the end of the iteration, according to Figure 1. In this instance, 5 K-Fold is used during 100 epochs of training iteration. Higher epoch values can result in greater accuracy, but too many will overfit the data. The methodology primary purpose is to investigate machine learning applications in determining the type of foreign object in a collection of rice based on color depth and texture utilizing HSV and GLCM feature extraction along with a cross-validation-based training method. This study contributes to the investigation of food safety factors. The rice instances preliminary study mentions plant diseases, and certain rice seed classifications discuss RGB as insufficient for color extraction.

2.1 Collection of foreign object sample

The sampling approach still uses the traditional method by determining the quality in 1 kg, taking 100 grams as a random sample. Foreign objects are divided into natural foreign objects and non-natural foreign objects. A collection of natural foreign objects including grain, yellow-broken, red-black, and rice variants mixed with other variants while non-natural foreign objects such as stone.

2.2 Image acquisition

Image dataset obtained from PT Hassana Boga Sejahtera agricultural products to acquire foreign object images. The red background and LED light illuminated the foreign object rice (18 LED for 12 watts with lux value 3035-3040). An image-based prototype platform was built with a square shape of 20 cm x 20 cm in diameter, Logitec C270 HD webcam camera with a height of 10 cm to the rice object. LEDs are used on the platform to create a uniform lighting environment day and night; Figure 2 is a prototype.

Figure 2. Image acquisition prototype

Changes in the angle of image capture and lighting will cause variations in helping the depth of the image in detecting objects. At the image acquisition stage, the camera is faced parallel to the object. There are 80 foreign object images with 2993 x 2993 pixels with a resolution of 72 dpi with 20 images in each class.

Classes of natural and non-natural foreign objects were obtained from samples of rice collections consisting of whole rice and foreign objects, which caused difficulties in assessing the quality of rice. For this reason, it is necessary to categorize them into several classes that will be carried out in this study. From Table 1, we see that the definition of non-natural foreign objects F0 consists of stone, while F1, F2, and F3 are natural foreign objects comprised of grain, yellow-broken, and red-black. The model will detect foreign objects in the four classes to decide whether they are foreign.

Table 1. Type of foreign object rice

Foreign object type

Name

Images

F0

Non-Natural Stone

F1

Natural Gain

F2

Natural Yellow Broken

F3

Natural Red-Black

In terms of size and color, there are some differences in color and size. Differences in color depth and dimension can be identified by the characteristics of each foreign object using the HSV (hue, saturation, and value) approach. Table 2 shows the parts of foreign objects based on color characteristics. The threshold values for the natural gain and natural yellow broken classes are identical due to the prevalence of the same fundamental hue, yellow. In color feature segmentation, the threshold value is utilized to separate foreign objects from the background. We obtain the best results using RGB values of (255, 170, 0), but HSV hue values have lower and upper limits to determine the intensity of yellow.

Table 2. Feature of a foreign object with RGB & HSV

Type

RGB

HSV

R

G

B

H min

H max

S min

V min

F0

0

0

0

0

5

0

0

F1

255

170

0

40

60

128

145

F2

255

170

0

40

60

128

145

F3

255

42

0

1

10

168

110

In this case, the HSV value for the raw foreign object image is grain, yellow-broken, red-black, and non-natural stone foreign object with black color, the range of values is taken from the RGB value to be converted in the HSV image. The maximum filter limit is only used for the hue parameter because the maximum value for the saturation and value parameters is 255 (maximum pixel value). Tests are carried out on the filter parameters for each color in the calibration test with a max value with a maximum increase of 10.

2.3 Feature extraction

The most frequent feature in image segmentation might be color features, particularly regarding agricultural products and the need to separate ripe fruit or diseases of the leaves and fruit from complicated backgrounds in nature. However, given its susceptibility to various lighting situations or occlusion concerns, color segmentation might not be the most efficient technique. In addition to the conventional RGB (Red, Green, and Blue), a distinct color space, such as HSV (Hue, Saturation, Value) is used to overcome this problem [23].

The original image can be extracted based on vector values, and HSV is done to get the color depth by converting RGB to HSV. According to the histogram, the rice object average value ranges from 232 to 256. Figure 3 displays the HSV image.

Figure 3. The difference in the image of the original rice foreign object: (a), HSV (b), hue (c), saturation (d), value (e)

The gray level co-occurrence matrix (GLCM) was then utilized to extract texture features in a variety of domains and applications, including image capture, biometrics, pattern recognition, and remote sensing [24]. In GLCM, the local pattern of an image is examined, while the grayscale pixel value is determined based on the total of the contrasts. Eqns. (1), (2), (3), (4), (5), and (6) of this works main variables include contrast, dissimilarity, homogeneity, energy, correlation, and ASM. Angle orientation values and lengths too close or far apart will make the information on each pixel relatively homogeneous or uneven and will also make the information between pixels meaningless. GLCM will extract the pixel information after the HSV image is greyed out. In this case, the RGB color image will be converted to HSV and then into a grey image. It is necessary to normalize the co-occurrence matrix before generating the GLCM variable. First, the entire co-occurrence accumulation is divided to calculate the normalization. The pieces are added together once they have been normalized. The difference in the amount of gray visible in an image is referred to as the contrast value; there will be no contrast if all pixels have the same value.

Contrast $=\sum_{i=1}^L \sum_{j=1}^L Y(i, j)(i-\mu i)^2$           (1)

Eq. (1) calculates the average intensity of pixels in class i and the intensity difference between pairs of pixels (i,j). Y(i,j) represents the frequency with which the pixel pairings (i,j) appear in the image, and μi represents the class i mean of pixel intensity. Homogeneity shows the image with a degree of grey similarity, and Eq. (2) is the similarity.

Homogeneity $=\sum_{i=1}^L \sum_{j=1}^L \frac{Y(i, j)^2}{1+(i-j)^2}$                         (2)

Energy indicates a measure of the concentration of a pair of pixels in an intensity matrix that is congruent at some coordinates, Eq. (3) is the energy equation.

Energy $=\sum_{i=1}^L \sum_{j=1}^L Y(i, j)^2$                           (3)

Dissimilarity is a measure of the distance between pairs of objects (pixels) in the desired region, and the following shows Eq. (4) is the equation.

Dissimilarity $=\sum_{i=1}^L \sum_{j=1}^L|i-j| p(i, j)$                                    (4)

Correlation $=\frac{\sum_{i=1}^{N g} \sum_{j=1}^{N g}[(i j) p(i, j)]-\mu_x \mu_y}{\sigma_x \sigma_y}$                                       (5)

Angular Second Moment $(A S M)=\sum_{i, j=0}^{N-1} i P_{i, j^2}$                                      (6)

Homogeneity measures how uniformly distributed the pixels are in GLCM. The homogeneity value has a negative link with the contrast value; as the homogeneity value rises, the contrast value falls. The angular second moment (ASM) measures texture uniformity similarly to homogeneity; the correlation demonstrates a linear dependency between the grey level value and the GLCM, demonstrating the dependence of the local grey level on the texture image. Similar grey-level regions can yield higher correlation scores. They serve as training data for naive bayes, decision trees, and support vector machines, and all these equations eventually attempt to determine six texture attributes for each image patch (SVM). In Figure 4, we can understand the various angles 0°, 45°, 90° dan 135° represent the position of the focus pixels.

Figure 4. Angle of GLCM

2.4 Classical machine learning

A machine learning technique called the support vector machine (SVM) forecasts using classification and regression. SVM aims to lessen structural risks and consider generalization issues by selecting the proper hyperplane or classification function to categorize data into the target class [25]. The ideal hyperplane and its importance and accuracy are determined by the distinction between first and second-class hyperplanes [20].

The data points termed support vectors closest to the hyperplane make up the hyperplane class. Consider the variables N, xi, and yi, where = 1,2, …, N, and yi ϵ – 1,1 with yi as the class label of the foreign object dataset, like natural class and non-natural class. Eq. (7) predicts the formulation of the hyperplane.

$y(x)=w^t x+b$                             (7)

where, b is a bias with a scalar value and w is a vector of weighted parameter values. The created hyperplane will divide the data into natural and non-natural classes or SVM method classes with positive and negative values on the foreign object dataset. This dataset is partitioned according to the rules of Eqns. (8) and (9).

$w^t+b \geq 1, y_i=+1$                                         (8)

$w^t+b \leq 1, y_i=-1$                                    (9)

In general, from these equations, it can be concluded in Eq. (10).

$y_i\left(w^T x+b\right) \geq 1, i=1,2, \ldots, n$                                   (10)

The equation of two hyperplanes between the two distances can be defined based on Eq. (11).

$\frac{\left|w^T x_i+b\right|}{\|w\|}=\frac{1}{\|w\|}$             (11)

The total distance between the two hyperplanes is $\frac{2}{\|w\|}$ for maximum margin, ||w|| at the minimum margin with Eq. (12).

$\min \frac{1}{2}\|w\|^2$                              (12)

If the training data are not separated linearly, then the variable εi, it may be added as a misclassification. Adding slack variable changes formula 13.

$\min \frac{1}{2}\|w\|^2+C \sum \varepsilon_i$                                 (13)

with the provision of:

$y_i\left(w^T x_i+b\right) \geq 1-\varepsilon_i$                          (14)

and

$\varepsilon_i \geq 0 \nabla i=1,2, \ldots, n$                            (15)

One of the most popular data mining techniques is the decision tree, which has a low installation cost, simple interpretation with a database system, and high reliability [26], a decision tree is a structure used to divide a huge dataset into more manageable data chains. The Naive Bayes (NB) classifier in machine learning is a statistical classifier that uses probability to predict the class from an unknown sample [27]. NB presupposes that there is no relationship between any of the sample characteristics [28]. The conditional independence NB theorem, however, reduces the classification precision. The Bayesian approach is a strategy for classifying phenomena according to the likelihood of their occurrence or non-occurrence [29]. Ansari et al. [29] use of the multiclass classification was motivated by the success of checking the rice variety purity using SVM on color, morphology, and texture features with seed accuracy of 93.9%. The model utilized in the study included 19 features. In comparison, the model used in our investigation used a total of 27 features, which consisted of 3 HSV color features and 24 GLCM texture features with degrees 0°, 45°, 90°, and 135°.

SVM was chosen to overcome the data imbalance and non-linearity of the dataset, as the value of the extracted color and texture features varies. The number of extracted features for HSV color features is 27, including hue, saturation, and value information. Texture features include dissimilarity 0°, dissimilarity 45°, dissimilarity 90°, and dissimilarity 135°; correlation 0°, correlation 45°, correlation 90°, and correlation 135°; homogeneity 0°, homogeneity 45°, homogeneity 90°, and homogeneity 135°; contrast 00, contrast 45°, contrast 90°, and contrast 135°; ASM 0°, ASM 45°, ASM 90°, and ASM 135°; A total of 27 characteristics will be applied to the four classes of non-natural stone, natural gain, natural yellow-broken, and natural red-black. This research was conducted to compare similar methods, such as decision trees and naive bayes, due primarily to the feature complexity.

2.5 Metrics evaluation

An evaluation matrix is needed to ensure that the model built can study the problem of classifying natural and non-natural foreign object types, evaluation of the model using the F1-score, recall, accuracy, and precision values are shown in Figure 5.

Figure 5. Foreign objects classification using a confusion matrix

Details information:

TP: True Positive, the infraction is predicted, and the infraction is existent.

FP:  False Positive, infraction is predicted but not existent.

FN: False Negative, the absence of the infraction is predicted, yet it exists.

TN: True Negative, the absence of a violation is predicted and none is shown.

3. Result and Discussion

The main aim of classified foreign object rice is to be able to apply counting for real-time object natural and non-natural in a rice collection, including stone, grain, yellow-broken rice, and red-black rice. Support vector machines (SVM) were utilized in this study as opposed to other comparable techniques, such as decision trees and naive bayes. Previously, HSV color and texture feature extraction was done using GLCM, and classification model testing was done by splitting the dataset into training and testing halves of up to 70% each.

Depending on the number of observations used for training and testing or the amount of bias built into the model, accuracy values produced from the testing set may deviate slightly from the actual value of prediction accuracy. The chosen models for each network dataset underwent cross-validation after the model complexity and hyperparameters were adjusted. Moreover, in order to produce an unbiased model and to enable the calculation of prediction errors and accuracies, which more accurately reflect the actual value, cross-validation is essential [30, 31]. K-fold is a popular cross-validation technique that uses new test data for each iteration, the robust testing success rate of the classification model, and as many K values as necessary to repeat the experiment [31]. The K-fold value used is five-fold cross-validation.

The confusion matrix for the support vector machine (SVM), decision tree (DT) and naive bayes (NB) method with HSV and GLCM. For testing data for each class utilizing 20 datasets, including four classes, as demonstrated in the confusion matrix testing methods with and without cross-validation. The results of extracting GLCM texture features from 20 aggregated foreign object datasets are shown in Figure 6. These features are divided into six categories: homogeneity, contrast, energy, and ASM variables.

In Figure 6, the high homogeneity shows a small difference in grey color in the paired elements, and it can be seen that the foreign stone object has the highest value. The highest Contrast value is seen in foreign grain objects, which shows a large local variance value opposite of homogeneity. Dissimilarity has similarities with contrast which is inversely proportional to homogeneity; for foreign objects, grain has the highest value with similarities such as contrast variables. Energy is highly valued when the image is not uniformly or heterogeneous in texture, and the dataset shows the highest red-black rice foreign object in this variable. The correlation indicates that the grain has the highest linear relationship value based on the grey level of the pair of pixels compared to other foreign objects.

In this study, a total of four classes are represented by 20 images dataset. For model, 80 images are allocated as 70% training and 30% validation, and additional unbalanced testing data have been added to Tables 3 through 8 to evaluate the model performances with confusion matrix with cross-validation and non-cross-validation.

Figure 6. GLCM Graphs: (a) dissimilarity, (b) contrast, (c) energy, (d) correlation, (e) homogeneity, (f) ASM degrees foreign object rice

Table 3. Confusion matrix for SVM with cross-validation

 

Predicted Class

Stone

Gain

Yellow-Broken

Red-Black

Actual Class

Stone

14

0

0

0

Gain

0

16

0

0

Yellow-Broken

1

0

16

1

Red-Black

0

0

0

15

Table 4. Confusion matrix for DT with cross-validation

 

Predicted Class

Stone

Gain

Yellow-Broken

Red-Black

Actual Class

Stone

13

2

1

2

Gain

1

14

0

0

Yellow-Broken

0

0

15

1

Red-Black

1

0

0

13

Table 5. Confusion matrix for NB with cross-validation

 

Predicted Class

Stone

Gain

Yellow-Broken

Red-Black

Actual Class

Stone

13

5

0

2

Gain

0

10

0

0

Yellow-Broken

2

0

15

1

Red-Black

0

1

0

13

Table 6. Confusion matrix for SVM with no cross-validation

 

Predicted Class

Stone

Gain

Yellow-Broken

Red-Black

Actual Class

Stone

12

1

0

1

Gain

0

13

4

1

Yellow-Broken

1

0

10

0

Red-Black

6

0

0

12

Table 7. Confusion matrix for NB with no cross-validation

 

Predicted Class

Stone

Gain

Yellow-Broken

Red-Black

Actual Class

Stone

7

3

0

2

Gain

0

11

0

0

Yellow-Broken

0

0

13

1

Red-Black

6

0

1

11

Table 8. Confusion matrix for DT with no cross-validation

 

Predicted Class

Stone

Gain

Yellow-Broken

Red-Black

Actual Class

Stone

5

0

0

2

Gain

3

14

1

2

Yellow-Broken

0

0

13

0

Red-Black

5

0

0

9

Table 9. Methods classification report

 

Accuracy

Precision

Recall

F1-score

5 K-fold cross-validation

SVM with HSV + GLCM

96,83%

97,92%

96,67%

97,29%

DT with HSV + GLCM

87,31%

90,58%

88,33%

89,44%

NB with HSV + GLCM

82,54%

86,11%

81,94%

83,97%

Non-cross-validation

SVM with HSV + GLCM

85,45%

87,21%

85,58%

86,38%

DT with HSV + GLCM

76,36%

78,08%

75,96%

77%

NB with HSV + GLCM

74,55%

75,60%

73,90%

74,74%

1) Accuracy of classification: The average number of samples the classifier on average correctly predicted or categorized. The excellent classification accuracy value improves the method performance.

Accuracy of classification $=\frac{(T N+T P)}{(F N+T P+F P+T N)}$

2) Recall: The model range in terms of class prediction is known as recall. The performance of the approach improves with increasing recall values.

Recall $=\frac{T P}{(F N+T P)}$

3) Precision: Precision refers to the proportion of the positive test sample class that can be reliably anticipated for the entire positive sample class. The Precision value increases the efficiency of the method.

Precision $=\frac{T P}{(F P+T P)}$ 

4) F1-score: The average harmonic of Precision and Recall is the F1-score. The value of the worst classifier is close to 0, whereas the value of the best classifier is near 1.

$f 1=2 x \frac{(\text { precision } x \text { recall })}{(\text { precision }+ \text { recall })}$

The accuracy results are shown in Table 9 with a comparison of the methods used. Based on the list in Table 9, it can be seen that the best accuracy, with an accuracy value of 96.83%, is obtained from the SVM classification method with HSV feature extraction and GLCM with cross-validation. Furthermore, it is pertinent to note that the implemented model achieves a high-level categorization similar to researches [18, 32-34]. Without cross-validation, SVM is still highly accurate, attaining a value of 85.45%. The naive bayes technique enables and disables cross-validation to attain 82.54% and 74.55% accuracy values, respectively, for the lowest accuracy value.

This study aims to strengthen machine learning algorithms' classification performance to forecast four groups of foreign objects: stone, natural grain, yellow-broken and red-black. We compared the support vector machine (SVM) classifier with several other classification methods, including decision tree and naive Bayes, using color segmentation techniques, textures, and cross-validating the training data for all foreign objects. The original image was first segmented using HSV color and GLCM texture to provide more particular image features, and cross-validation was used; this method boosted SVM performance. The hybrid method attained the greatest accuracy at 96.83%.

4. Conclusion

In this study, we showed how image processing methods could be used with a camera to categorize rice that contained foreign objects using color and texture attributes. Multiclassification, SVM, decision tree, and naïve bayes method of semantic segmentation with cross-validation were used to construct foreign object rice class models. The best machine learning convolutional multivariate method to classify foreign objects from stone, gain, yellow-broken, and red-black was the SVM method, which performs better with a higher accuracy of 96.3%. For another machine learning model, the decision tree with an accuracy of 87.31% and naïve bayes 82.54%.

The research has shown that performance improvement can be made by using HSV color feature extraction and GLCM texture. Differences in the use of cross-validation can affect the increase in accuracy by an average of 10%. In our further research, we will attempt to use a convolutional neural network (CNN) with increased images of foreign object rice.

Nomenclature

SVM

Support vector machine

NB

Naïve Bayes

DT

HSV

Decision Tree

Hue, Saturation, Value

GLCM

Gray level co-occurrence matrix

TP

True Positive

TN

True Negative

FP

False Positive

FN

False Negative

Greek symbols

yi

class label on foreign object

b

bias

w

weighted parameter

$\varepsilon_i$

missclassification

i,j

pixel coordinates in the GLCM matrix

pi,j

pixel value in coordinate i,j

µ

mean

σ

standard deviations

  References

[1] Verma, D.K., Srivastav, P.P. (2020). Bioactive compounds of rice (Oryza sativa L.): Review on paradigm and its potential benefit in human health. Trends in Food Science & Technology, 97: 355-365. https://doi.org/10.1016/j.tifs.2020.01.007

[2] Nagoda, N., Ranathunga, L. (2018). Rice sample segmentation and classification using image processing and support vector machine. In 2018 IEEE 13th International Conference on Industrial and Information Systems (ICIIS), Rupnagar, India, pp. 179-184. https://doi.org/10.1109/ICIINFS.2018.8721312

[3] Soon, J.M., Brazier, A.K., Wallace, C.A. (2020). Determining common contributory factors in food safety incidents–A review of global outbreaks and recalls 2008–2018. Trends in Food Science & Technology, 97: 76-87. https://doi.org/10.1016/j.tifs.2019.12.030

[4] Khushbu, S., Yashini, M., Ashish, R., Sunil, C.K. (2021). Recent advances in terahertz time-domain spectroscopy and imaging techniques for automation in agriculture and food sector. Food Analytical Methods, 15: 498-526. https://doi.org/10.1007/s12161-021-02132-y

[5] Li, L., Yin, Y., Zheng, G., Liu, S., Zhao, C., Xie, W., Ma, L., Shan, Q., Dai, X., Wei, L. (2021). Determination of multiclass herbicides in sediments and aquatic products using QuECHERS combined with ultra-high performance liquid chromatography-tandem mass spectrometry (UHPLC-MS/MS) and its application to risk assessment of rice-fish co-culture system in China. Microchemical Journal, 170: 106628. https://doi.org/10.1016/j.microc.2021.106628

[6] Zhu, L., Spachos, P., Pensini, E., Plataniotis, K.N. (2021). Deep learning and machine vision for food processing: A survey. Current Research in Food Science, 4: 233-249. https://doi.org/10.1016/j.crfs.2021.03.009

[7] Pothen, M.E., Pai, M.L. (2020). Detection of rice leaf diseases using image processing. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, pp. 424-430. https://doi.org/10.1109/ICCMC48092.2020.ICCMC-00080

[8] Mittal, S., Dutta, M.K., Issac, A. (2019). Non-destructive image processing based system for assessment of rice quality and defects for classification according to inferred commercial value. Measurement, 148: 106969. https://doi.org/10.1016/j.measurement.2019.106969

[9] Aznan, A.A., Ruslan, R., Rukunudin, I.H., Azizan, F.A., Hashim, A.Y. (2017). Rice seed varieties identification based on extracted colour features using image processing and Artificial Neural Network (ANN). International Journal on Advanced Science, Engineering and Information Technology, 7(6): 2220-2225. https://doi.org/10.18517/ijaseit.7.6.2990

[10] Abbaspour-Gilandeh, Y., Molaee, A., Sabzi, S., Nabipur, N., Band, S.S., Mosavi, A. (2020). A combined method of image processing and artificial neural network for the identification of 13 Iranian rice cultivars. Journal of Agronomy, 10(1): 117. http://dx.doi.org/10.3390/agronomy10010117

[11] Chen, J., Lian, Y., Li, Y. (2020). Real-time grain impurity sensing for rice combine harvesters using image processing and decision-tree algorithm. Computers and Electronics in Agriculture, 175: 105591. https://doi.org/10.1016/j.compag.2020.105591

[12] Yang, N., Qian, Y., EL‐Mesery, H.S., Zhang, R., Wang, A., Tang, J. (2019). Rapid detection of rice disease using microscopy image identification based on the synergistic judgment of texture and shape features and decision tree–confusion matrix method. Journal of the Science of Food and Agriculture, 99(14): 6589-6600. https://doi.org/10.1002/jsfa.9943

[13] Noola, D.A., Basavaraju, D.R. (2021). Corn leaf disease detection with pertinent feature selection model using machine learning technique with efficient spot tagging model. Revue d'Intelligence Artificielle, 35(6): 477-482. https://doi.org/10.18280/ria.350605

[14] Prasetyo, E., Dimas, R., Suciati, N., Fatichah, C. (2020). Partial centroid contour distance (PCCD) in mango leaf classification. International Journal on Advanced Science, Engineering and Information Technology, 10(5): 1920-1926. https://doi.org/10.18517/ijaseit.10.5.8047

[15] Massantini, R., Moscetti, R., Frangipane, M.T. (2021). Evaluating progress of chestnut quality: A review of recent developments. Trends in Food Science & Technology, 113: 245-254. https://doi.org/10.1016/j.tifs.2021.04.036

[16] Wiharto, Nashrullah, F.H., Suryani, E., Salamah, U., Prakisya, N.P.T., Setyawan, S. (2021). Texture-based feature extraction using Gabor filters to detect diseases of tomato leaves. Revue d'Intelligence Artificielle, 35(4): 331-339. https://doi.org/10.18280/ria.350408

[17] Arshaghi, A., Ashourin, M., Ghabeli, L. (2021). Detection and classification of potato diseases potato using a new convolution neural network architecture. Traitement du Signal, 38(6): 1783-1791. https://doi.org/10.18280/ts.380622

[18] Li, J., Luo, W., Wang, Z., Fan, S. (2019). Early detection of decay on apples using hyperspectral reflectance imaging combining both principal component analysis and improved watershed segmentation method. Postharvest Biology and Technology, 149: 235-246. https://doi.org/10.1016/j.postharvbio.2018.12.007

[19] Adi, K., Widodo, C.E., Widodo, A.P., Gernowo, R., Pamungkas, A., Syifa, R.A. (2018). Detection lung cancer using gray level co-occurrence matrix (GLCM) and back propagation neural network classification. Journal of Engineering Science & Technology Review, 11(2): 8-12. 

[20] Rustam, Z., Utami, D.A., Hidayat, R., Pandelaki, J., Nugroho, W.A. (2019). Hybrid preprocessing method for support vector machine for classification of imbalanced cerebral infarction datasets. International Journal on Advanced Science Engineering Information Technology, 9(2): 685-691. https://doi.org/10.18517/ijaseit.9.2.8615

[21] Cheng, H.D., Jiang, X.H., Sun, Y., Wang, J. (2001). Color image segmentation: Advances and prospects. Pattern Recognition, 34(12): 2259-2281. https://doi.org/10.1016/S0031-3203(00)00149-7

[22] Mann, C.J.H. (2008). Color image processing – Methods and applications. Kybernetes, 37(1). https://doi.org/10.1108/k.2008.06737aae.002

[23] Moreira, G., Magalhães, S.A., Pinho, T., dos Santos, F.N., Cunha, M. (2022). Benchmark of deep learning and a proposed hsv colour space models for the detection and classification of greenhouse tomato. Agronomy, 12(2): 356. https://doi.org/10.3390/agronomy12020356

[24] Alazawi, S.A., Shati, N.M., Abbas, A.H. (2019). Texture features extraction based on GLCM for face retrieval system. Periodicals of Engineering and Natural Sciences, 7(3): 1459-1467. https://doi.org/10.21533/pen.v7i3.787

[25] Liu, J., Zio, E. (2019). Integration of feature vector selection and support vector machine for classification of imbalanced data. Applied Soft Computing, 75: 702-711. https://doi.org/10.1016/j.asoc.2018.11.045

[26] Kotsiantis, S.B., Zaharakis, I., Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging artificial Intelligence Applications in Computer Engineering, 160(1): 3-24. https://dl.acm.org/doi/10.5555/1566770.1566773

[27] Caglayan, A., Guclu, O., Can, A.B. (2013). A plant recognition approach using shape and color features in leaf images. In: Petrosino, A. (eds) Image Analysis and Processing – ICIAP 2013. ICIAP 2013. Lecture Notes in Computer Science, vol 8157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41184-7_17

[28] Adi, K., Widodo, C.E., Widodo, A.P., Gernowo, R., Pamungkas, A., Syifa, R.A. (2017). Naïve Bayes algorithm for lung cancer diagnosis using image processing techniques. Advanced Science Letters, 23(3): 2296-2298. https://doi.org/10.1166/asl.2017.8654

[29] Lewis, D.D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds) Machine Learning: ECML-98. ECML 1998. Lecture Notes in Computer Science, vol 1398. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026666 

[30] Ansari, N., Ratri, S.S., Jahan, A., Ashik-E-Rabbani, M., Rahman, A. (2021). Inspection of paddy seed varietal purity using machine vision and multivariate analysis. Journal of Agriculture and Food Research, 3: 100109. https://doi.org/10.1016/j.jafr.2021.100109

[31] Bengio, Y., Grandvalet, Y. (2003). No unbiased estimator of the variance of k-fold cross-validation. Advances in Neural Information Processing Systems, 16.

[32] Marcot, B.G., Hanea, A.M. (2021). What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?. Computational Statistics, 36(3): 2009-2031. https://doi.org/10.1007/s00180-020-00999-9

[33] Koklu, M., Ozkan, I.A. (2020). Multiclass classification of dry beans using computer vision and machine learning techniques. Computers and Electronics in Agriculture, 174: 105507. https://doi.org/10.1016/j.compag.2020.105507

[34] Chen, H.R., He, C.C., Jiang, M.L., Liu, X.X. (2020). Egg crack detection based on support vector machine. In 2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI), Sanya, China, pp. 80-83. https://doi.org/10.1109/ICHCI51889.2020.00025