OPEN ACCESS
Kernel extended dictionary learning model (KED) is a new type of Sparse Representation for Classification (SRC), which represents the input face image as a linear combination of dictionary set and extended dictionary set to determine the input face image class label. Extended dictionary is created based on the differences between the occluded images and nonoccluded training images. There are four defaults to make about KED: (1) Similar weights are assigned to the principle components of occlusion variations in KED model, while the principle components of the occlusion variations have different weights, which are proportional to the principle components Eigenvalues. (2) Reconstruction of an occluded image is not possible by combining only nonoccluded images and the principle components (or the directions) of occlusion variations, but it requires the mean of occlusion variations. (3) The importance and capability of main dictionary and extended dictionary in reconstructing the input face image is not the same, necessarily. (4) KED Runtime is high. To address these problems or challenges, a novel mathematical model is proposed in this paper. In the proposed model, different weights are assigned to the principle components of occlusion variations; different weights are assigned to the main dictionary and extended dictionary; an occluded image is reconstructed by nonoccluded images and the principle components of occlusion variations, and also the mean of occlusion variations; and collaborative representation is used instead of sparse representation to enhance the runtime. Experimental results on CASPEAL subsets showed that the runtime and accuracy of the proposed model is about 1% better than that of KED.
classification, sparse representation, kernel extended dictionary learning, occlusion
Automatic face recognition (AFR) is one of the most important research topics in the field of machine vision, image processing and pattern recognition due to its wide range of applications. AFR is a very complicated and difficult process due to considerable intraclass variation, such as viewing angle, expression, lighting conditions, change of age, and occlusions. Occlusion can occur due to sunglasses, facial hair, wearing hat, etc.
One of the classification models which achieved great success in AFR is sparse representation for classification model (SRC) [1]. The main idea of this minimization model is that an input face image can be expressed as a sparse linear combination of training images, and the larger coefficients of this linear combination belong to images that are in the same class as the input face image. Therefore, input image face class label can be determined. Compared with SRC, which assumes that noise also has a sparse representation, sparse representation on the basis of Maximum Correntropy is much more insensitive to outliers [2, 3]. Kernel sparse representation based classification (KSRC) algorithm [4, 5] maps data into a high dimensional feature space first and then SRC is performed in this new feature space by utilizing kernel trick. If the number of training data is low, KSRC may suffer from lack of training samples when mapping data from input data into feature space. To address this problem, a number of new training samples, termed virtual dictionary, are generated on the basis of the original training set to be used in feature mapping [6]. The performance of SRC highly depends on the data distribution. SRC could not obtain satisfactory results on uncontrolled or imbalanced data sets. Sparse supervised representation classifier (SSRC) was proposed to solve these issues [7].
In extended SRC (ESRC), first, the intraclass variation of each face is determined based on standard and nonstandard training set, and then, the input face image is expressed as a sparse linear combination of standard training images and intraclass variation [8]. The accuracy of ESRC is better than that of the basic SRC but the runtime of ESRC is more than that of SRC. In collaborative representation for classification model (CRC), L2norm is used instead of L1norm of training data coefficients in the minimization model [9]. Although this model cannot obtain a sparse representation of the input face image, but the accuracy of this model is not less than the basic SRC and its runtime is better than that of basic SRC. Akhtar et al. [10] showed that fusion of SRC and CRC can increase the accuracy. In Locally Linear KNearest Neighbers (LLK) [11], higher priority is assigned to nearest neighbors of input face image for contributing in linear representation of that input face image. The accuracy of LLK is better than that of basic SRC. Kernel extended dictionary learning model (KED) [12] is a new type of ESRC. In this model, first, training data features are extracted using local binary patterns (LBP) [13]. The preprocessed training data then are mapped into a Highdimensional feature space. Next, by using Kernel Discriminant Analysis (KDA) [14], standard training images are mapped into a space with the least intraclass variation and interclass similarity. Then, the main directions of occlusion variations are determined according to the differences between the standard training set and occluded training set in the highdimensional feature space using Kernel Principle Component Analysis (KPCA) [15], and are transformed into the KDA space. Finally, the input face image, whose features are extracted by LBP, are mapped into the KDA space, and represented as a sparse linear combination of standard training set in the KDA space (basic dictionary) and the main directions of occlusion variations determined by KPCA and transformed into the KDA space (Extended Dictionary).
There are four defaults to make about KED: (1) Similar weights are assigned to the principle components of occlusion variations in KED model, while the principle components of the occlusion variations have different weights, which are proportional to the principle components Eigenvalues. (2) Reconstruction of an occluded image is not possible by combining only nonoccluded images and the principle components (or the directions) of occlusion variations, but it requires the mean of occlusion variations. (3): The importance and capability of main dictionary and extended dictionary in reconstructing the input face image is not the same, necessarily. (4) KED Runtime is high. To address these problems or challenges, a novel mathematical model is proposed in this paper. In the proposed model, different weights are assigned to the principle components of occlusion variations; different weights are assigned to the main dictionary and extended dictionary; an occluded image is reconstructed by nonoccluded images and the principle components of occlusion variations, and also the mean of occlusion variations; and collaborative representation is used instead of sparse representation to enhance the runtime. Experimental results on CASPEAL subsets showed that the runtime and accuracy of the proposed model is about 1% better than that of KED.
In continue, SRC, CRC, ESRC and KED is described in Section 2, and the proposed model is presented in Section 3. In Section 4, experimental results are reported, and then conclusion is drawn in Section 5.
2.1 SRC
Consider the training set $X=\left[\mathrm{x}_{1}, \mathrm{x}_{2}, \ldots, \mathrm{x}_{\mathrm{n}}\right]$, in which $\mathrm{x}_{\mathrm{i}} \in \mathrm{R}^{\mathrm{d}}$. The class label of $x_i$ is $c_{i} \in[1, C]$. To determine the label of input face image $\mathrm{y} \in \mathrm{R}^{\mathrm{d}}$, based on the training data X, the input face image y are first represented as a sparse linear combination of training set by the following mathematical model [16]:
$\min _{\beta}\\mathrm{y}\mathrm{X} \beta\^{2}+\gamma\\beta\_{1}$ (1)
where, γ≥0 is userdetermined parameter of the model which determines the importance of the second term of the model (1) with respect to its first term. The first term of the model (1) represent the input face image y as a linear combination of training set, where β_{i} is the coefficient of ith training data in this linear combination. Since there are several representations for the input face image y, the second term of the model (1) selects the sparse linear combination among these linear combinations. In fact, $\gamma$ controls the sparseness of this linear combination. It is expected that for a proper $\gamma$, the input face image y is represented as linear combination of a small number of training data of the same class as y. Model (1) can be solved by different algorithms, such as FISTA [17]. Let δ_{l}(β) be a vector whose ith entry is equal to β_{i} if c_{i} is equal to l, otherwise its ith element is equal to zero. c_{i} is the class label of ith face image. The class label of the input data y is l^{*} which minimizes the amount of the following residual:
$r_{l^{*}}(y)=\left\yX \delta_{l^{*}}(\beta)\right\^{2}$ (2)
2.2 CRC
The SRC model is not differentiable. Therefore, fast gradient based algorithm cannot be used to solve SRC. To address this problem, CRC model was proposed. CRC model is as follows:
$\min _{\beta} F=\yX \beta\^{2}+\gamma\\beta\^{2}$ (3)
The objective function of CRC model, unlike the SRC model, is differentiable. Therefore, to solve the model (3), it is enough to solve the following equation:
$\frac{\partial F}{\partial \beta}=0$ (4)
We have
$0=\frac{\partial F}{\partial \beta}=2 X^{T}(yX \beta)+2 \gamma \beta \\ \rightarrow \beta=Z y$ (5)
where,
$Z=\left(X^{T} X+\gamma I\right)^{1} X^{T}$ (6)
Eq. (6) is calculated only once in the training phase, and the coefficient vector β is calculated just by multiplication of the matrix Z and the input data y, while in the SRC model, to calculate β, the SCR model must be solved for each input face image y using FISTA algorithm. Therefore, the CRC method is faster than the SRC.
2.3 ESRC
Both SRC and CRC are not robust to nonstandard face images. To solve this problem, ESRC was proposed. Suppose that training face images include two types of face images: standard face images and nonstandard face images. Standard images are obtained under standard conditions, for example, standard lighting conditions, standard viewing angle, standard expression, and without glasses. Let $\widetilde{\mathrm{X}}=\left[\tilde{\mathrm{x}}_{1}, \tilde{\mathrm{x}}_{2}, \ldots, \tilde{\mathrm{x}}_{\widetilde{n}}\right]$ be the nonstandard face images and $\mathrm{X}=\left[\mathrm{x}_{1}, \mathrm{x}_{2}, \ldots, \mathrm{x}_{\mathrm{n}}\right]$ be standard ones. Also, let $\tilde{\mathrm{X}}_{\mathrm{i}}$ be the nonstandard face image of ith individual and $\mathrm{X}_{\mathrm{idx}(\mathrm{i})}$ be his standard face image. Therefore, intraclass variation bases are as follows:
$\tilde{E}=\left[\tilde{x}_{1}x_{i d x(1)}, \tilde{x}_{1}x_{i d x(2)}, \ldots, \tilde{x}_{\tilde{n}}x_{i d x(\tilde{n})}\right]$
where, $\tilde{\mathrm{n}}$ is the number of nonstandard images. In ESRC, the input face image y is represented as sparse linear combination of standard images and intraclass variation using the following model:
$\min _{\beta, \tilde{\beta}}\left\y[X, \tilde{E}]\left[\begin{array}{c}\beta \\ \tilde{\beta}\end{array}\right]\right\^{2}+\gamma\left\\left[\begin{array}{c}\beta \\ \tilde{\beta}\end{array}\right]\right\_{1}$ (7)
Finally, the label of the input face image y is equal to the class label $l^{*}$ which minimizes the following residual:
$r_{l^{*}}(y)=\left\y[X, \tilde{E}]\left[\begin{array}{c}\delta_{l^{*}}(\beta) \\ \tilde{\beta}\end{array}\right]\right\^{2}$ (8)
2.4 Kernel extended dictionary (KED)
ESRC has better generalization ability than SRC, but its runtime time is highly influenced by the number of the intraclass variation bases $\widetilde{\mathrm{E}}$. One of the goals of KED is to address this problem.
2.4.1 Learning basic dictionary
In KED, all training data except the occluded face images are first transformed into a C1dimensional space with the least intraclass variation and interclass similarity using KDA. Let $\widetilde{\mathrm{V}}=\left[\tilde{\mathrm{v}}_{1}, \tilde{\mathrm{v}}_{2}, \ldots, \tilde{\mathrm{v}}_{\mathrm{C}1}\right]$ be C1 axes in the (C1)dimensional KDA space, and
$\tilde{v}_{i}=\sum_{j=1}^{n} \tilde{a}_{i j} \varphi\left(x_{j}\right)$ (9)
where, φ is a mapping function which maps data from input space into a highdimentional feature space, and $\tilde{a}_{i}=\left[\tilde{a}_{i 1}, \tilde{a}_{i 2}, \ldots, \tilde{a}_{i n}\right]$ is a vector obtained using KDA. Then, the input face image y and the standard training images in the KDA space (the basic dictionary) which are denoted by $\mathrm{y}_{\mathrm{KDA}}$ and $\widetilde{D}$, respectively, are calculated as follows:
$y_{K D A}=\tilde{V}^{T} \varphi(y)=\left[\tilde{a}_{1}^{T} K(y), \tilde{a}_{2}^{T} K(y), \ldots, \tilde{a}_{C1}^{T} K(y)\right]^{T}$ (10)
$\tilde{v}_{i}=\sum_{j=1}^{n} \tilde{a}_{i j} \varphi\left(x_{j}\right)$ (11)
where,
$K(y)=\left[k\left(\tilde{x}_{1}, y\right), k\left(\tilde{x}_{2}, y\right), \ldots, k\left(\tilde{x}_{\tilde{n}}, y\right)\right]^{T}$ (12)
and $\mathrm{k}(., .)$ is Gaussian kernel function.
2.4.2 Learning the model of occluded face images or extended dictionary
The occluded face images were not used to learn basic dictionary model by KDA because an occlusion in a face images causes a broad change in the face image, and an occluded face image is considered to be an outlier in KDA. When there is outlier in training set, KDA cannot find a proper space with the least intraclass variation and interclass similarity.
Let $\dddot{X}=\left[\dddot{x}_{1}, \dddot{x}_{2}, \dots, \dddot{x}_{\dddot{n}}\right]$ be occluded face images. Occlusion variation in highdimensional feature space is as follows:
$\dddot{E}=\left[\begin{array}{c}\varphi\left(\dddot{x}_{1}\right)\varphi\left(x_{i d x(1)}\right), \\ \left.\varphi\left(\dddot{x}_{2}\right)\varphi\left(x_{i d x(2)}\right)\right) \\ \cdots \\ \varphi\left(\dddot{x}_{i n}\right)\varphi\left(x_{i d x(\dddot{n})}\right)\end{array}\right]$ (13)
where, x_{(idx(i))} is the standard image of the occluded image of ith individual $p$ principle components of the occlusion variation which are denoted by $\mathrm{V}^{\cdots}=\left[\mathrm{v}^{\cdots} _1, \mathrm{v} \cdots _2, \ldots, \mathrm{v} \cdots _\mathrm{p}\right]$, can be obtained using KPCA model, namely the following model:
$\max _{\dddot{V}} \dddot{V}^{T} S \dddot{V} \\ subject to \dddot{V}^{T} \dddot{V}=I$ (14)
where, I is identity matrix, and
$S=\sum_{i=1}^{\dddot{n}} \varphi_{i} \varphi_{i}^{T}$ (15)
where $\varphi_{\mathrm{i}}=\varphi\left(\dddot{\mathrm{x}}_{\mathrm{i}}\right)\varphi\left(\mathrm{x}_{\mathrm{idx}(\mathrm{i})}\right)$. The constraints of the model (14) enforce the length of occlusion variation principle components to be equal to 1. To solve the model (14), the following eigenvector problem must be solved:
$S \dddot{V}=\Lambda \dddot{V}$ (16)
where, Λ is a matrix of which main diagonal elements are the eigenvalues corresponding to the eigenvector matrix $\dddot{V} \cdot \dddot{V}$ contains the p more important eigenvectors of the matrix S.
We have:
$\dddot{v}_{i}=\sum_{j=1}^{\dddot{n}} \dddot{a}_{i j} \varphi_{j}$ (17)
where, $\dddot{a}_{i j} \in R$. Therefore, the problem (16) can be written as follows:
$\dddot{K} \dddot{a}=\dddot{\Lambda} \dddot{a}$ (18)
in which,
$\dddot{K}_{i j}=\varphi_{i}^{T} \varphi_{j} \\ =\left(\varphi\left(\dddot{x}_{i}\right)\varphi\left(x_{i d x(i)}\right)\right)^{T}\left(\varphi\left(\dddot{x}_{i}\right)\varphi\left(x_{i d x(i)}\right)\right) \\ =k\left(\dddot{x}_{i}, \dddot{x}_{i}\right)+k\left(x_{i d x(i)}, x_{i d x(i)}\right)2 k\left(\dddot{x}_{i}, x_{i d x(i)}\right)$ (19)
and $\ddot{\Lambda}$ is the matrix of which main diagonal elements are the eigenvalues corresponding to the eigenvector matrix $\dddot{a}$. After solving the problem (18) and obtaining optimal value of $\dddot{a}$, p principle components of the occlusion variation $\dddot{E}$ are determined using Eq. (17). Then, the principle components of the occlusion variation in KDA space (Extended Dictionary) which are denoted by $\dddot{D}$, are obtained by projecting the principle components of occlusion variation $\dddot{V}$ on the KDA space $\widetilde{V}$ as follows:
$\dddot{D}=\tilde{V}^{T} \dddot{V}$ (20)
where,
$\dddot{D}_{i l}=\tilde{v}_{i}^{T} \dddot{v}_{l} \\ =\left(\sum_{j=1}^{n} \tilde{a}_{i j} \varphi\left(x_{j}\right)\right)^{T}\left(\sum_{j=1}^{\dddot{n}} \dddot{a}_{l t} \varphi_{t}\right) \\ =\left(\sum_{j=1}^{n} \tilde{a}_{i j} \varphi\left(x_{j}\right)\right)^{T}\left(\sum_{j=1}^{\dddot{n}} \dddot{a}_{l t}\left(\varphi\left(\dddot{x}_{t}\right)\varphi\left(x_{i d x(t)}\right)\right)\right) \\ =\tilde{a}_{i}^{T} \widetilde{K} \dddot{a}_{l}$ (21)
$\tilde{K}_{i j}=k\left(x_{i}, \dddot{x}_{j}\right)k\left(x_{i}, x_{i d x(j)}\right)$ (22)
Finally, the input face image y in the KDA space denoted by $\mathrm{y}_{\mathrm{KDA}}$ is represented as a linear combination of the basic dictionary $\widetilde{D}$ and the extended dictionary $\dddot{D}$ using the following model:
$\min _{\widetilde{\beta}, \dddot{\beta}}\left\y_{K D A}[\widetilde{D}, \dddot{D}]\left[\begin{array}{c}\tilde{\beta} \\ \dddot{\beta}\end{array}\right]\right\^{2}+\gamma\left\\left[\begin{array}{c}\tilde{\beta} \\ \dddot{\beta}\end{array}\right]\right\_{1}$ (23)
Eventually, the class label of the input face image y is equal to $1^{*}$ which minimizes the following residual:
$r_{l^{*}}(y)=\left\y_{K D A}[\widetilde{D}, \dddot{D}]\left[\begin{array}{c}\delta_{l}(\tilde{\beta}) \\ \dddot{\beta}\end{array}\right]\right\^{2}$ (24)
There are four criticisms to make about KED:
Figure 1. Occlusion cannot be reconstructed using the most important principle component of occlusion variation. Each point represents an occlusion, and each line is a principle component of occlusion variation
To address these four problems or challenges, we propose the following model:
$\min _{\bar{\beta}, \dddot{\beta}} F=\left\y_{K D A}[\overline{\bar{D}}, \theta \dddot{D} \dddot{\Lambda}]\left[\begin{array}{c}\overline{\bar{\beta}} \\ \dddot{\beta}\end{array}\right]\right\_{2}^{2}+\gamma\left\\left[\begin{array}{c}\overline{\bar{\beta}} \\ \dddot{\beta}\end{array}\right]\right\_{2}^{2}$ (25)
where, $\overline{\bar{D}}$ is the basic dictionary $\widetilde{D}$ together with the occluded images mean in the KDA space which are denoted by $\mu_{\mathrm{KDA}}$, i.e.
$\overline{\bar{D}}=\left[\widetilde{D}, \mu_{K D A}\right]$ (26)
$\mu_{\mathrm{KDA}}=\widetilde{\mathrm{V}}^{\mathrm{T}}\left(\frac{1}{\dddot{\mathrm{n}}} \sum_{\mathrm{j}=1}^{\dddot{\mathrm{n}}} \varphi_{\mathrm{j}}\right) \\ =\frac{1}{\dddot{\mathrm{n}}} \widetilde{\mathrm{V}}^{\mathrm{T}} \sum_{\mathrm{j}=1}^{\dddot{\mathrm{n}}} \varphi\left(\dddot{\mathrm{x}}_{\mathrm{j}}\right)\varphi\left(\mathrm{x}_{\mathrm{idx}(\mathrm{j})}\right) \\=\frac{1}{\dddot{\mathrm{n}}}\left[\tilde{a}_{1}^{T} \sum_{j=1}^{\dddot{\mathrm{n}}}\left(K\left(\dddot{x}_{j}\right)K\left(x_{i d x(j)}\right)\right), \ldots\right. \\ \left.\tilde{a}_{C1}^{T} \sum_{j=1}^{\dddot{\mathrm{n}}}\left(K\left(\dddot{x}_{j}\right)K\left(x_{i d x(j)}\right)\right)\right]$ (27)
The eigen values of occlusion variation $\dddot{\Lambda}$ is multiplied by the principle components of occlusion variations $\ddot{D}$ in our proposed model to give greater weight or more importance to the principle components of occlusion variation with larger eigenvalues to response to the first criticism. Principle components with larger eigenvalues represent the most important directions of occlusion variation. In our proposed model, we consider the occluded images mean in the KDA space $\mu_{\mathrm{KDA}}$ to response to response the second criticism. Hyperparameter θ controls the importance of the extended dictionary with respect to the basic dictionary to response to the third criticism. Finally, we used l2norm of coefficient in our proposed model instead of l1norm in KED model to response to the fourth criticism. Therefore, our proposed model is differentiable and we can design a fast gradient based algorithm to solve our proposed model. To solve our proposed model, i.e. model (25), we simply solve the following equation:
$\frac{\partial F}{\partial\left[\begin{array}{c}\overline{\bar{\beta}} \\ \dddot{\beta}\end{array}\right]}=0$ (28)
We have:
$0=\frac{\partial F}{\partial\left[\begin{array}{c}\bar{\beta} \\ \dddot{\beta}\end{array}\right]} \\ =2[\overline{\bar{D}}, \theta \dddot{D} \dddot{\Lambda}]^{T}\left(y_{K D A}[\overline{\bar{D}}, \theta \dddot{D} \dddot{\Lambda}]\left[\begin{array}{c}\bar{\beta} \\ \dddot{\beta}\end{array}\right]\right)+2 \gamma\left[\begin{array}{c}\overline{\bar{\beta}} \\ \dddot{\beta}\end{array}\right] \\ \rightarrow\left[\begin{array}{c}\bar{\beta} \\ \dddot{\beta}\end{array}\right]= \tilde{Z} \times y_{K D A}$ (29)
where,
$\tilde{Z}=\left([\overline{\bar{D}}, \theta \dddot{D} \dddot{\Lambda}]^{T}[\overline{\bar{D}}, \theta \dddot{D} \dddot{\Lambda}]+\gamma I\right)^{1}[\overline{\bar{D}}, \theta \dddot{D} \dddot{\Lambda}]^{T}$ (30)
Training phase of our proposed model is summarized as algorithm 1. The class label of test face image is determined using algorithm 2.
Algorithm 1. Training phase of our proposed model.
Input:
Occluded face images,
Standard face images.
Begin
End.
Algorithm 2. Test phase of our proposed model.
Input:
Test face image.
Begin
$\mathrm{r}_{l^{*}}(\mathrm{y})=\ \mathrm{y}_{\mathrm{KDA}}\left[\overline{\mathrm{D}}, \theta \dddot{\mathrm{D}} \dddot{\Lambda} \left[\begin{array}{c}\left(\begin{array}{c}1 \\ \delta_{l^{*}}(\tilde{\beta}(2: \mathrm{end})) \\ \dddot{\beta}\end{array}\right]\end{array}\right] \^{2} .\right.$ (31)
where, $\delta_{1}(\beta)$ is a vector of which ith entry is equal to $\beta_{i}$ if $c_{i}$ is equal to l, otherwise its ith element is equal to zero. $c_{i}$ is the class label of ith face image.
End.
In this section, some experiments are performed on a Chinese benchmark dataset, i.e. CASPEAL, to study the performance of our proposed model. CASPEAL contains 99594 face images of size 120×100. 9031 available frontal face images of this dataset are used in our experiments. These face images belong to 1040 individuals and include 1040 standard face images. Some example images of this dataset were shown in Figure 2. The experiments were implemented on a computer with COREi7 2636 CPU, and 6GB main memory. The best value of the hyperparameters $\gamma$ and $\theta$, were selected from the set $\{0.001,0.01,0.1,1,2,3,4,5,10,100\}$ and $\left\{0.001,0.01,0.1,1,10,10^{2}, 10^{3}, 10^{4}, 10^{5}\right\}$, respectively.
Figure 2. Examples of CASPEAL dataset face images.
CASPEAL was divided into two sets, i.e. training set and test set. The training set was randomly selected from 800 face images of 200 different individuals under nonstandard lighting conditions (lighting subset), 400 face images of 100 individuals with nonstandard face expressions (expression subset) and 80 occluded face images of 20 different individuals with accessory such as glasses (accessory subset). Then, standard face images of randomly selected nonstandard face images were also added to the training set. The test face images were selected from expression subset (expression), lighting subset (lighting), accessory subset (accessory), nonstandard background face images subset (background), individual face images taken at different ages (Time), individual face images with nonstandard distance from the camera (distance), and individual face images with hats (hat).
Figure 3 shows the sensitivity of overall accuracy of our proposed method to the parameter γ. As it can be seen, the accuracy of our proposed model for γ = 100 is the best. According to Table 1, the overall accuracy of KED for the best value of its hyperparameter, i.e. for γ=0.01, is 93.47%, while the overall accuracy of our proposed method for the best value of its hyperparameters, i.e. for γ = 100 and θ=10^{5}, is 94.41% which is better than that of KED method. In addition, the speed of our proposed model is better than that of KED.
Figure 3. Sensitivity of the overall accuracy of our proposed model to the parameter Gamma (γ)
Table 1. Overall accuracy and speed of our proposed model and KED for the whole test set
method 
Running time (sec.) 
Overall accuracy (%) 
KED 
967 
93.47 
proposed 
888 
94.41 
Figure 4. Sensitivity of the accuracy of our proposed model to the parameter Gamma (γ) for “Accessory” subset
Figure 5. Sensitivity of the accuracy of our proposed model to the parameter Gamma (γ) for “Lighting” subset
Figure 6. Sensitivity of the accuracy of our proposed model to the parameter Gamma (γ) for “Expression” subset
Figure 7. Sensitivity of the accuracy of our proposed model to the parameter Gamma (γ) for “Sunglass” subset
Figure 8. Sensitivity of the accuracy of our proposed model to the parameter Gamma (γ) for “Hat” subset
Table 2. Accuracy of our proposed model, SRC, ESRC, KDA+SRC, KDA+ESRC, LED, and KED for different test data subsets
Method 
Test data subset 

Expression 
Lighting 
Hat 
Accessory 

SRC 
98.21 
17.31 
51.17 
72.87 
ESRC 
99.70 
82.08 
75.66 
87.05 
KDA +SRC 
99.69 
82.73 
65.85 
80.81 
KDA+ESRAC 
99.69 
83.03 
65.93 
80.85 
LED 
99.12 
66.83 
72.26 
84.78 
KED 
99.80 
84.55 
86.62 
92.94 
Proposed 
99.80 
86.06 
89.66 
94.44 
Figures 48 show the sensitivity of accuracy of our proposed method to the parameter γ for different subsets of dataset. As it can be seen, the accuracy of our proposed model is not too sensitive to the parameter γ, specially for some test subsets such as “Expression”. Table 2 compares the accuracy of our proposed model with SRC, ESRC, KDA+SRC, KDA+ESRC, LED, and KED for different test data subsets. Table 3 compares the best obtained accuracy of the proposed method with 1NN, SSEC, RRC, SLFRKR, MOST, KED methods. As can be seen, the accuracy of the proposed method for any type of test image, except for the distance, is better or equal to the accuracy of the other methods.
Table 3. Accuracy of our proposed model, 1NN, SSEC, RRC, SLFRKR, MOST, KED for different test data subsets
Method 
Test data subset 

Distance 
Background 
Time 
Expression 
Lighting 
Accessory 

1NN 
76.91 
39.69 
38.79 
80.31 
2.874 
38.75 
SSEC 
84.23 
66.83 
51.94 
74.51 
17.39 
66.64 
RRC 
97.90 
95.64 
96.67 
93.98 
29.32 
84.19 
SLFRKR 
99.69 
99.88 
98.48 
99.64 
28.81 
90.88 
MOST 
99.75 
99.01 
97.88 
98.15 
82.39 
80.35 
KED 
100 
100 
98.48 
99.80 
84.55 
92.94 
Proposed 
99.89 
100 
100 
99.80 
86.06 
94.44 
The kernel extended dictionary learning model (KED) is a new type of SRC, which represents the test datum as a linear combination of dictionary set and extended dictionary set. There were four criticisms to make about KED. To address these four problems, a novel model was proposed in this paper. Experimental results on real dataset showed that the accuracy of the proposed model is better than that of KED, SRC, ESRC, KDA+SRC, KDA+ESRC, LED, 1NN, SSEC, RRC, SLFRKR, MOST, and runtime of the proposed model is atleast better than that of KED.
X 
Standard face image processed using LBP 
$\dddot{X}$ 
Occluded face image processed using LBP 
y 
Test face image processed using LBP 
y_{KDA} 
Processed test face image in KDA space 
$\mu_{\mathrm{KDA}}$ 
Occluded images mean in the KDA space 
$\dddot{D}$ 
Principle components of occlusion variation (Extended dictionary) 
$\ddot{\Lambda}$ 
Eigen values of occlusion variation 
$\widetilde{D}$ 
Basic dictionary 
$\gamma, \theta$ 
Hyperparameters of our proposed model 
$\mathrm{k}(., .)$ 
Kernel Gaussian function 
[1] Wright, J., Yang, A.Y., Ganesh, A., Sastry, S., Ma, Y. (2009). Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2): 210227. https://doi.org/10.1109/TPAMI.2008.79
[2] He, R., Zheng, W.S., Hu, B.G. (2011). Maximum correntropy criterion for robust face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8): 15611576. https://doi.org/10.1109/TPAMI.2010.220
[3] He, R., Zheng, W.S., Tan, T.N., Sun, Z.N. (2014). Halfquadraticbased iterative minimization for robust sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2): 261275. https://doi.org/10.1109/TPAMI.2013.102
[4] Yin, J., Liu, Z.H., Jin, Z., Yang, W.K. (2012). Kernel sparse representation based classification. Neurocomputing. Neurocomputing, 77(1): 120128. https://doi.org/10.1016/j.neucom.2011.08.018
[5] Fan, Z., Wei, C. (2020). Fast kernel sparse representation based classification for Undersampling problem in face recognition. Multimedia Tools and Applications, 79(11): 73197337. https://doi.org/10.1007/s1104201908211x
[6] Fan, Z.Z., Zhang, D., Wang, X., Zhu, Q., Wang, Y.F. (2018). Virtual dictionary based kernel sparse representation for face recognition. Pattern Recognition, 76: 113. https://doi.org/10.1016/j.patcog.2017.10.001
[7] Shu, T., Zhang, B., Tang, Y.Y. (2020). Sparse supervised representationbased classifier for uncontrolled and imbalanced classification. IEEE Transactions on Neural Networks and Learning Systems, 31(8): 28472856. https://doi.org/10.1109/TNNLS.2018.2884444
[8] Deng, W., Hu, J., Guo, J. (2012). Extended SRC: undersampled face recognition via intraclass variant dictionary. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9): 18641870. https://doi.org/10.1109/TPAMI.2012.30
[9] Dong, X., Zhang, H.X., Zhu, L., Wan, W.B., Wang, Z.H., Wang, Q., Guo, P.L., Ji, H., Sun, J. (2019). Collaborative representation for face recognition based on bilateral filtering. International Journal of Computer Science, 58: 187194.
[10] Akhtar, N., Shafait, F., Mian, A. (2017). Efficient classification with sparsity augmented collaborative representation. Pattern Recognition, 65: 136145. https://doi.org/10.1016/j.patcog.2016.12.017
[11] Liu, Q., Liu, C. (2017). A novel locally linear KNN method with applications to visual recognition. IEEE Transactions on Neural Networks and Learning Systems, 28(9) 20102021. https://doi.org/10.1109/TNNLS.2016.2572204
[12] Huang, K.K., Dai, D.Q., Ren, C.X., Zhao, R.L. (2017). Learning kernel extended dictionary for face recognition. IEEE Transactions on Neural Networks and Learning Systems, 28(5): 10821094. https://doi.org/10.1109/TNNLS.2016.2522431
[13] Liu, W., Wang, Y., Li, S. (2011). LBP feature extraction for facial expression recognition. Journal of Information & Computational Science, 8(3): 10.
[14] Mika, S. (1999). Fisher discriminant analysis with Kernels. IEEE Conference on Neural Networks for Signal Processing IX, Madison, WI, USA, pp. 4148. https://doi.org/10.1109/NNSP.1999.788121
[15] Wold, S., Esbensen, K., Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(13): 3752. https://doi.org/10.1016/01697439(87)800849
[16] Zhang, L., Yang, M, Feng, X. (2011). Sparse representation or collaborative representation: Which helps face recognition? IEEE. International Conference on Computer Vision, Barcelona, Spain, pp. 471478. https://doi.org/10.1109/ICCV.2011.6126277
[17] Yang, A., Ganesh, A., Sastry, S., Ma, Y. (2010). Fast ℓ1minimization algorithms and an application in robust face recognition. EECS Department, University of California, Berkeley.