JOURNAL METRICS

CiteScore 2024: 1.9 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2024: 0.231 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2024: 0.566 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

Content-Based Image Retrieval System Based on Fusion of Wavelet Transform, Texture and Shape Features

Dannina Kishore^* | Chanamallu Srinivasa Rao

Dept. of ECE, Aditya College of Engineering and Technology, Surampalem 533437, A.P, India

Dept. of ECE, University College of Engineering Vizianagaram, Jawaharlal Nehru Technological University Kakinada, Vizianagaram 535003, India

Corresponding Author Email:

kishore_dannina@acet.ac.in

Received:

4 February 2020

Revised:

11 December 2020

Accepted:

23 December 2020

Available online:

28 February 2021

| Citation

08.01_14.pdf

OPEN ACCESS

Abstract:

In the last few years, Content-Based Image Retrieval (CBIR) has received wide attention. Compared to text-based image retrieval contents of the image are more in information for efficient retrieval by Content-Based Image Retrieval. The single feature cannot be applied to all the images and provides lower performance. In this paper, we have put forward a proposal on an image retrieval using multi-feature fusion. The concept of multi-resolution has been exploited with the help of a wavelet transform. This method combines Local Binary Pattern (LBP) with Fast and Accurate Exponent Fourier Moments (FAEFM’s) with the wavelet decomposition of an image using multiple resolutions. In order to extract the feature of texture from image, LBP codes of Discrete Wavelet Transform (DWT), the image coefficients are estimated followed by the computation of Fast and Accurate Exponent Fourier Moments to these LBP codes so as to extract features of shape to construct the required feature vector. These constructed vectors aid us in exactly finding out and retrieving visually similar images from existing databases. The benchmark databases Corel-1k and Olivia 2688 are used to test the proposed method. The proposed method achieves 99.99% of precision and 93.15% of recall on Corel-1k database and 99.99% of precision and recall of 93.63% on Olivia-2688 database, which are higher than the existing methods.

Keywords:

CBIR, fast and accurate exponent fourier moments, local binary pattern (LBP), discrete wavelet transform (DWT)

1. Introduction

Rise in advances of multimedia and internet, a large quantity of multimedia information in image form, video form, and audio form has been made use of in different fields like satellite data, medical treatment, video still images and surveillance and security systems. This has initiated a massive demand for a system that can preserve and extract data in a better way. In order to satisfy these demands, many multimedia information preserving and extracting systems have been developed. The most common retrieval systems are Text-Based Image Retrieval (TBIR) systems and Content-Based Image Retrieval (CBIR) systems [1]. A conventional TBIR scans the database for identical text data in and around the image as in query string. But extraction based on text suffers from certain disadvantages like expressing the whole visual content of images not being correct every time and limitless time consumption. This drawback can be overcome using Content-Based Image Retrieval Systems. A CBIR system makes use of the image’s visual content presented as features at a lower level like color, texture, shape and spatial locations as representatives to images and the databases. The device extracts identical images after a sample image is given as input to the device. This way of investigation avoids the need for a description of images and words as visual data and is nearby the visual data’s human perception. CBIR image features at a low level like color, texture, shape and spatial locations are presented as an n-dimensional feature vector. This vector of images in the database completes the formation of a database with features. The two categories of CBIR used for image features are local and global features. Color and texture come under local features. Color similarity of Color is obtained by computing a color histogram (CH) [2] for every image that describes the proportional part of pixels within the framework of the image holding specific values. Texture measures [3] look for visual patterns in images and the way of spatially defined. ‘Shape’, which is a globally accepted feature has been used as well for extraction in different ways.

An image is of complex structure and consisting of varying levels of details and it is not possible to extract the features by using a single resolution. Multi-resolution overcomes the problem of extracting the features from the complex image. The leftover un noticed features at one stage can be detected at another. This research work explores the mixture of shape and texture features [4] at more than one resolution of the image in order to get an effective and more than useful feature vector for image extraction by the advantages of multiple features. This variety of mixing processes is employed more than once at one scale of the image. Using wavelet multiresolution analysis is also explored by this research job. Local Binary Pattern (LBP) and Fast and Accurate Exponent Fourier Moments (FAEFM’s) extract the texture information and shape information respectively. With the help of Fast and Accurate Exponent Fourier components of LBP codes (this acts as a feature vector for extraction of images that are identical visually) we go for the computation of LBP codes of wavelet coefficients of grayscale images. This technique results in a feature vector that is used to extract features that were otherwise not possible with a single resolution. At multiple resolutions, the above technique could extract shape features as well from the texture feature of the image.

2. Related Work

Early in order to retrieve a task, Content-Based Image Retrieval systems made use of features like color, texture or shape as a single entity. Colour histograms are mainly used for color-based retrieval techniques. Along with color, another feature that is used a lot in texture [5]. Texture in the way of local patterns has been used at times [6]. Local Binary Pattern (LBP) is one of the most popular local patterns of much use [7]. Based on LBP many local features have been introduced in the past. Local Tetra Pattern (LTrP) [8] and Directional Local Extrema Pattern (DLEP) [9] are two different local features put forward as a proposal by Murala et al. Relying on the LBP concept. Liu et al. [10] proposed the concept of MTH (Multi-Texton Histogram) which was an improvement of the TCH (Texton Co-occurrence Matrix) [11]. Based on the similar edge orientations, micro structure descriptor (MSD) [12] identifies colors in order to compute local features. Vipparthi et al. [13] in the process of improving the present Motif pattern proposed a pattern called Directional Local Motif XoR pattern. Recently, the idea of a hybrid information descriptor that explores human visual perception mechanisms by mixing features of low levels like color, shape, and texture as well as higher-level understanding was put forward by Zhang et al. [14]. Liu et al. [15] came forward with a model for CBIR to simulate human visual attention mechanisms that is Computation Visual Attention model and also proposed Colour Difference Histogram (CDH) descriptor [16] that counts difference of different colors under different conditions of color and edge orientations between points under consideration. To perform the process of retrieval, all the above features as a single feature are exploited. To overcome the limitations of a single feature we opted to go for the combination. Wu and Wu [17] was the first to propose a combination of Fourier descriptor with K-moments for retrieval. Verma et al. [18] suggested a pattern that utilizes texture and color features by Co-occurrence patterns. The methods that are discussed above are used for a single resolution of the image. A single resolution of the image is not adequate enough for an image with varying levels of details (high and low). The break through for multi-resolution analysis of the image to extract details of varying levels from an image has come from the single resolution technique. Wavelets are one such example put into effective use and most importantly can be used with the alone feature as well as a mixture of many features. Wavelet correlogram is one such technique suggested by Tarzan et al. [19] that utilizes the concept of a genetic algorithm. A combination of a trous wavelet and MSD (Microstructure Descriptor) to form the feature vector was put forward by Agarwal et al. [20] called a Trous gradient structure descriptor for image retrieval.

3. Wavelet and Moments

Wavelets are small waves with very less time period and of varying frequency. These wavelets are used in the multi-resolution analysis process in which analysis and representation of images at multiple resolutions are dealt with using resolutions at multi-level. Low contrast and small size objects are viewed at high resolution and high contrast and large size objects are seen at the coarse view. An image consisting of small objects, large objects, low contrast objects and high contrast objects can be determined with the help of multiresolution analysis. The features in images that are left unscanned at one resolution get noticed at another with the help of these multiple resolutions.

3.1 Discrete wavelet transform (DWT)

The wavelet series expansion of function $f(s) \in L^{2}(\mathrm{R})$ in relation to the wavelet w(s) and scaling function θ(s) is mathematically represented as:

$f(s)=\underset{k}{\mathop \sum }\,{{a}_{i0}}\left( \text{k} \right){{\theta }_{i0k}}\left( k \right)\left( s \right)+\underset{i=0}{\mathop \sum }\,{{l}_{j}}\left( k \right){{w}_{i,k}}\left( s \right)$ (1)

where, i₀ is the starting scale with an arbitrary nature, a_i0(k) are coefficients of approximation, i, l_j(k) are detail coefficients. The Eq. (1) of wavelet series maps a continuous variable function into a sequence of coefficients. If the result is a sequence of numbers in the expansion of a function, then the coefficients obtained are called DWT coefficients of f(s). Expanding Eq. (1) leads to DWT transform pair:

${{W}_{\theta }}\left( {{i}_{0,k}} \right)=\left( \frac{1}{\sqrt{N}} \right)\underset{S}{\mathop \sum }\,f\left( s \right){{\theta }_{{{i}_{0}},k}}\left( s \right)$ (2)

${{W}_{w}}\left( i,k \right)=\left( \frac{1}{\sqrt{N}} \right)\underset{S}{\mathop \sum }\,f\left( s \right){{W}_{i,k}}\left( s \right)$ (3)

for $\geq i_{0}$ and

$f\left( s \right)=\left( \frac{1}{\sqrt{N}} \right)\underset{s}{\mathop \sum }\,f\left( s \right){{\theta }_{i0,k}}\left( s \right)+\left( \frac{1}{\sqrt{N}} \right)\underset{S}{\mathop \sum }\,f\left( s \right){{W}_{i,k}}\left( s \right)$ (4)

This one-dimensional transform can be further used as an extension for images with two-dimensions. In two dimensions, along with the two-dimensional scaling function, $\theta(s, t)$ there are three two-dimensional wavelets $\theta^{H}(s, t), \theta^{V}(s, t),$ $\theta^{D}(s, t)$. These wavelets are capable of measuring gray-level variations and sensitive directionally for images along with different directions.

The DWT of function f(s,t) of size M×N is expressed as:

${{W}_{\theta }}\left( {{i}_{0}},m,n \right)=\frac{1}{\sqrt{MN}}\sum\limits_{s=0}^{M-1}{\sum\limits_{t=0}^{N-1}{f(s,t)\,{{\theta }_{{{i}_{0}},m,n}}(s,t)}}$ (5)

$W_{w}^{j}\left( i,m,n \right)=\frac{1}{\sqrt{MN}}\sum\limits_{s=0}^{M-1}{\sum\limits_{t=0}^{N-1}{f(s,t)\,W_{i,m,n}^{j}(s,t)}}$ (6)

where, the $W_{\theta}\left(i_{0}, m, n\right)$ coefficeints define an approximation of $f(s, t)$ at scales $i_{0}$ and the $W_{w}^{j}(i, m, n)$ coefficients define horizontal, vertical, and diagonal details for scales $i \geq i_{0}$

3.2 Property of DWT

DWT provides easy and sufficient information for analysis and synthesis. Approximation coefficients and detail coefficients are decomposed forms of a signal by DWT. There are three types of coefficients $-\theta^{H}(\mathrm{~s}, \mathrm{t}), \theta^{V}(s, t), \theta^{D}(\mathrm{~s}, \mathrm{t})$ whicl compute coefficients horizontally, vertically, and diagonally respectively. These provide directional information and in order to retrieve visually similar images, they separately form a feature vector. At the next resolution, the analysis of the coefficients is done which in turn provides another three detail coefficients and these will help in extracting the features that are left in the previous resolution. These sets are finally made to combine so as to obtain the final set of retrieved images.

3.3 Local binary pattern

The Local Binary Pattern (LBP) operator is utilized to describe texture with the help of sign differences between neighbor pixels and center pixels [21]. Most times it is applied to gray-scale images and intensity derivatives. The LBP operator takes 3x3 surroundings of pixels, if the pixel values are equal to or exceed the threshold value the neighbor pixel happens to become 1 and if at all the value is smaller than the threshold, it becomes 0.

The LBP code of image pixel (x,y) with center pixel ‘g_c’ and neighboring pixel ‘g_p’ is computed as: $\text{LB}{{\text{P}}_{\text{P},\text{R}}}=\underset{\text{p}=0}{\overset{\text{P}-1}{\mathop \sum }}\,\text{s}\left( {{\text{g}}_{\text{p}}}-{{\text{g}}_{\text{c}}} \right){{2}^{\text{p}}}$. where, $s(x)=\left\{ \begin{matrix} 1,\,\,x\ge 0 \\ 0,\,\,x<0 \\\end{matrix} \right.$.

The number of neighborhood pixels is denoted by P, radius of neighborhood is denoted by R and index of neighboring pixel is denoted by p.

3.4 Properties of local binary pattern

The following are important properties of LBP

(1) LBP is a less complex and effective local descriptor used to describe textures and once it is mixed with the global features, it plays the role of a powerful feature vector [22] to perfection.

(2) LBP does the job of an encoder by encoding the relation pertaining to the center pixel gray value and its neighboring pixels into 0 or 1. LBP is also useful to find the local information of an image.

The of LBP computation with the aid of an example is shown in Figure 1.

11.png

Figure 1. Computation of local binary pattern

3.5 Fast and accurate exponent fourier moments

Fast and Accurate Exponent Fourier Transforms calculated by using partitioning of angular [23] and radial parts n to equally spaced sectors. By mapping image function n to polar coordinates so that separation of transform occurs and by employing Fast Fourier transform circular part can be evaluated. The grid density should be high enough to keep the information about the finest patten.

Let $N_{r}$ be radial and $N_{\theta}$ be angular size of image according to the sampling theorem $N_{\theta} \geq 2 \mathrm{I}$ max, the angular and radial components partitioned into M equal parts to maintain the grid density as follows:

${{N}_{r}}=\frac{1}{M}\sum\limits_{K=0}^{M-1}{K}\,\,\,and\,\,\,{{N}_{\theta }}=\frac{1}{M}\sum\limits_{L=0}^{M-1}{L}$ (7)

In the same way, the image is partitioned into $M^{2}$ subregions. Now the polar coordinates $\left(N_{r}, N_{\theta}\right)$ translate into rectangular coordinates as $\mathrm{x}=\left(N_{r} \times \mathrm{N} / 2 \times \cos \mathrm{N}_{\theta}\right)$ and $\mathrm{y}=\left(N_{r} \times \mathrm{N} / 2 \times \sin \mathrm{N}_{\theta}\right)$. The conversion of Cartesian coordinates to discrete domain gives K=-floor(y)+N/2+1, L=floor(x)+N/2. Finally, using the portioned data the image in the polar form can be represented as:

${{f}_{p}}({{N}_{r}},{{N}_{\theta }})=f(K,L)$ (8)

The final expression to calculate Fast and Accurate Exponent Fourier moments:

${{F}_{nm}}=\,\frac{1}{{{M}^{2}}}\sum\limits_{K=0}^{M-1}{\sum\limits_{L=0}^{M-1}{f({{N}_{r}},{{N}_{\theta }})\,\,*\,\sqrt{{{N}_{r}}}}}\times \,\exp \left( -jn\frac{2\pi }{N}K \right)\,$

$\exp \left( -jm\frac{2\pi }{N}L \right)$ (9)

where, n is moment order, m is repetition and the term $\sqrt{N_{r}} \times \exp \left(-j n \frac{2 \pi}{N} K\right)=T_{n}\left(N_{r}\right)$ is the radial kernel

3.6 Integration of the wavelet transform, LBP and fast and accurate exponent fourier moments for CBIR

Nowadays, the task of image acquisition is easy due increase of numerous images capturing devices. Natural images with a lot of complexity and irregular texture are being made available these days [12]. Such complex features cannot be extracted with the inadequate single resolution technique. Information of pixels pertaining to the structure is given by the texture feature and pixels belonging to images with complexity are placed with varied orientations. The concept of multi-resolution processing provides the best way of gathering all the intensity values. Multiple resolutions help in processing the objects with varying resolutions as well as varying sizes. To extract the texture feature LBP is used which gathers the texture information efficiently. LBP operator is a tool to measure the gray-scale invariant and it originated from the conventional texture of definition pertaining to the local neighborhood. LBP gathers information based on the neighboring pixels and center pixel and it fails to images of complex sizes and also at different orientations. Hence, it is not possible to gather information based on a single resolution. This problem can be resolved by the usage of techniques with multi-resolution like wavelets that will operate in three orientations. The combination of LBP with wavelet proves to be efficient and gathers the texture information at multiple orientations. Based on the intensity levels given by the texture operation, it is useful to extract shape features. The best Fourier moment and shape descriptor suitable for rotation invariant pattern recognition, image analysis and extract shape features is the Accurate Exponential Fourier moment [24]. Hence Fast and Accurate Exponent Fourier moments presents itself to be a relatively constructive shape descriptor for bringing out information of shape texture descriptor. Figure 2 is a representation showing the sequential diagram of the framework of the system proposed. By using wavelet transform an image is disintegrated into a large number of resolutions and then texture information at multiple orientations is gathered by the LBP of wavelet coefficients. The structural intensity values are given to Fast and Accurate Exponent Fourier moments computation which extracts the shape features [25] from texture feature.

3.7 Advantages of combining DWT, LBP and FAEFM's

The advantages of combining Fast and Accurate Exponent Fourier moments with LBP in a multi-resolution analysis framework are the following

(1) The combinations of LBP and moment provide efficient details of an image as LBP extracts the local relationship embedded in the values of intensity exclusively and the shape information is gathered by the moment.

(2) Single-resolution is found inadequate to extract accurate information with regard to images of different sizes and resolutions. By the concept of multi-resolution, the features that are left undetected can be detected at another resolution.

(3) In the framework of multi-resolution analysis with the help of DWT, the combination of LBP and FAEFM makes a genuine attempt to extract the feature of shape [26] from texture at many scales of an image.

The drawback of LBP is unable to extract directional information. It is overcome by combining LBP with DWT where in to determine the feature of shape from texture the obtained distribution of intensity values in various directions spatially is utilized.

4. The Proposed Method

The sequential and methodical steps are:

(1) DWT coefficients’ computation belonging to grayscale images.

(2) LBP of wavelet coefficients helps in extracting directional information.

(3) From the texture, feature extracts the shape feature through the computation of Fast and Accurate Exponent Fourier moment of LBP codes of coefficients pertaining to DWT.

(4) Similarity detection magnitude wise.

The method put forward is shown schematically in below Figure 2.

2.png

Figure 2. Architecture for the proposed method

4.1 DWT coefficients' computation

At first DWT coefficients belonging to 2-D grayscale images are estimated. For 2-D images, we have a single 2-D scaling function and three 2 -D wavelet functions $-\theta^{H}(s, t)$, $\theta^{V}(s, t), \quad \theta^{D}(s, t) .$ These wavelets functions measure variations in the gray-level of the images in a very sensitive way along with different directions. $\theta^{H}, \theta^{V}$ and $\theta^{D}$ measure variations along horizontally, vertically, and diagonally respectively. Approximation coefficients, horizontal detail coefficients, vertical detail coefficients, diagonal detail coefficients are four coefficient matrices produced by the DWT of the images belonging to the grayscale. Applying DWT to a gray-scale image of size 256×256 disintegrates it into many sub-bands as tabulated in Table 1.

Table 1. Decomposition of the grayscale image of size 256X256 using Discrete Wavelet Transform

Image Size / approximation Coefficients	Resolution Levels (L)	Each wavelet Subband Size
256 x256	L 1	128x128
128 x 128	L 2	64x64
64 x64	L 3	32x32
32x32	L 4	16x16
16 x16	L 5	8x8
8 x8	L 6	4x4
4 x4	L 7	2x2

4.2 Computation of LBP

LBP is fast, easy to implement and invariant to illumination changes. The local information is extracted with the help of LBP but it lacks directional information because its computation depends on intensity values of neighborhood. This can be overcome by combining LBP with wavelet transform. Every single detail coefficient matrix belonging to LBP codes is estimated and saved in three matrices and FAEFM is operated on them so as to extract the feature of shape.

They are fast, easy to implement and invariant to monotonic intensity and illumination changes. These methods also have the ability to extract local information with high precision compared to other local texture descriptors.

4.3 Computation of moments

Fast and Accurate Exponent Fourier moments and moments belonging to LBP codes are utilized to extract shape features. For each and every single matrix, the computation of LBP codes belonging to the wavelet transform is carried out. Four sub-bands (one approximate coefficient, three detail coefficients) are produced by the decomposition of DWT for grayscale image and to construct a feature vector each detail coefficient matrix is used. The computation is performed to LBP codes of detail coefficients followed by computation of Fast and Accurate Exponent Fourier Moments. By further decomposing approximate coefficients, the procedure enters into the second level and the same is carried out again for the forthcoming resolution levels. Every single level gives one (1) approximate coefficient matrix and three (3) detail coefficient matrices as the output. The preceding level features are integrated with the features of the current level. For example, level 2 consist of features of level 1 that constitutes level 2 and level 3 consist of features of level 1 and level2. Ultimately we have three feature vectors pertaining to a single resolution that is utilized for retrieval.

4.4 Similarity measurement

This measurement is a fruitful job employed to retrieve images from the vast dataset that is almost as same as that of the query image. Measuring of Euclidean distance which is a part of the process of the measurement of similarity between the feature vector of database image and that of the query image also determines the performance efficiency of the retrieval system. Let’s take $Q_{I}=\left(Q_{I 1} Q_{I 2} \ldots \ldots \ldots Q_{I n}\right)$ as the normalized coefficients’ moments belonging to the query image and $\left.D_{I=\left(D_{I 1} D_{I 2} \ldots \ldots \ldots\right.} D_{I n}\right)$ as those belonging to the database image. Then the Euclidean distance between the query image and database image can be represented as

$\text{D (}{{\text{Q}}_{\text{I}}}\text{, }{{\text{D}}_{\text{I}}}\text{) = }\sqrt{\sum{{{\left( {{\text{Q}}_{\text{I}}}\text{- }{{\text{D}}_{\text{I}}} \right)}^{2}}}}$ (10)

5. Results and Discussions

Techniques used to evaluate retrieval of images made use of the below mentioned universally accepted reference databases,

(1) Dataset 1 (Corel-1k). Corel-1ks the first dataset [27] comprises of 10 categories, named as Mountains, Buildings, Flowers, African People, Food, Dinosaurs, Elephants, Horses, Flowers, and Buses. The size of each image is 256 by 384 or 384 by 256. Every category mentioned above, in turn, has 100 more images.

(2) Dataset 2 (Olivia-2688). Olivia-2688is another dataset used [28] comprising of eight exclusive categories, named Tall building, Coast, Street, Forest, Open country, Highway, inside city and Mountain. The size of each image is of 256 X 256.

5.1 Evaluation of performance

The performance efficiency of the method put forward can be evaluated in terms of recall and precision. The ratio of the retrieved number of relevant images to the total retrieved images is termed as precision (P). In mathematical form, it can be represented as:

$P=\frac{{{R}_{I}}}{{{T}_{I}}}$ (11)

where, T_I is the number of total retrieved images and R_I is the retrieved relevant image's number. The ratio of the retrieved relevant image's number to the number of total relevant images as part of the database is termed as recall (R). In mathematical form, it can be represented as:

$R=\frac{{{R}_{I}}}{{{T}_{R}}}$ (12)

where, T_R is a number of total relevant images in the database and R_I is the retrieved relevant images' number. In our proposed method, T_I = 10 and the value of T_R are variables for different datasets. For the dataset-1, T_R=100 and for dataset-2 the value of T_R is a variable based on image categories.

5.2 Retrieval results

For the sake of the experiment, all the images from dataset-1 and dataset-2 are rescaled to the size of 256x256. In the first step, from the rescaled gray-scale images, the DWT coefficients and from these coefficients LBP codes are computed. Finally, FAEFM’s of these LBP codes are composed to build feature vector. The three wavelet matrices are produced by applying DWT to the grayscale image namely, diagonal detail (DD), horizontal detail (HD), vertical detail (VD) followed by the computation of LBP and FAEFM’s. The similarity measurements for every 3 matrices are conducted exclusively to produce similar images of 3 sets. Their union is considered to obtain the final set. to this final set, recall is applied in order to compute (by counting) the total number of relevant images. Now precision is applied such that counting of top ‘m’ matches for every set is done and then union operator is applied to the results of the three sets so as to get the final set. In mathematical form, let $F_{h}, F_{v}, F_{d}$ be a set of similarimages obtained from HD feature vector, VD feature vector, and DD feature vector respectively. So, weget the final set of images represented as $F_{s}$ and given by:

${{\text{F}}_{\text{s}}}\text{=}{{\text{F}}_{\text{h}}}\cup {{\text{F}}_{\text{v}}}\cup {{\text{F}}_{\text{d}}}$ (13)

Similarly, let $F_{h}^{m}, F_{v}^{m}, F_{d}^{m}$ be the set of top ‘m’ images obtained from HD, VD and DD feature vector respectively. So, we get the final set of top ‘m’ images represented as $F_{s}^{m}$ and given by:

$F_{s}^{m}=F_{h}^{m} \cup F_{v}^{m} \cup F_{d}^{m}$ (14)

Repetition of this procedure is done for all resolution levels and every time the set of relevant images of the preceding level is taken into consideration and integrated with the current level so as to obtain the relevant set of images for the level under consideration. Seven levels DWT with db1 wavelet is applied to the image in the proposed method and in line with the values of P and R, retrieval is measured. The speed of retrieval is based on the size of the feature vector. In this method, the DWT coefficients of gray-scale images belonging to the LBP codes are computed and then computation of Fast and Accurate Exponent Fourier Moments of resulting LBP codes. When DWT is applied to an image of K×K at different levels, at each level factor of 2 down samplings of the image occurs.

Table 2. Mean of P (%) and R (%) for different levels of resolution for dataset 1

Levels (L)	P (%)	R (%)
L 1	57.66	38.78
L 2	79.64	55.43
L 3	92.46	69.76
L 4	96.89	76.88
L 5	99.28	83.97
L 6	99.97	88.18
L 7	99.99	93.15

Table 3. Mean of P (%) and R (%) for different levels of resolution for dataset 2

Levels (L)	P (%)	R (%)
L 1	54.38	36.24
L 2	79.45	55.62
L 3	95.18	71.36
L 4	99.23	78.49
L 5	99.75	85.12
L 6	99.87	90.23
L 7	99.99	93.63

This results in a reduction in the size of subbands at every scale. In this method, DWT is conducted on the gray-scale image with size KxK leading to the disintegration at every level and producing four subbands, equal in size $\left(\frac{K}{2} \times \frac{K}{2}\right)$. LBP codes of all 3 detail coefficients matrices are computed followed by the computation of FAEFM’s to each matrix. Average Recall (R) and precision (P) values for all seven resolution levels and for all images of dataset 1 and dataset 2 are tabulated in Table 2 and Table 3.

5.3 Comparison of performance with LBP+Wavelet+Legendre moments

LBP, wavelet, and Legendre moments [29] are combined and there by compared for the method put forward in this work of research. The method [29] is evaluated by combining DWT, LBP with Legendre moments and then this method is compared with the proposed method. One resolution is not suitable for images of the complex structure as it contains small, large, high and low objects. Hence this resolution technique fails to gather more information. This method [29] is an integration of LBP, wavelet and Legendre moments. With the help of LBP the local features are gathered from DWT coefficients followed by the computation of shape features through Legendre moments and finally this method estimates LBP feature directionally so as to retrieve image through the construction of feature vector.

Table 4. Comparison of Performance with LBP + Wavelet + Legendre moments

(a) Corel-1k

	Method [30]		Proposed method
Levels	P (%)	R (%)	P (%)	R (%)
L 1	56.08	37.28	57.66	38.78
L 2	78.57	55	79.64	55.43
L 3	90.53	67.55	92.46	69.76
L 4	96.53	76.62	96.89	76.88
L 5	98.96	83.17	99.28	83.97
L 6	99.71	87.95	99.97	88.18
L 7	99.95	92.04	99.99	93.15

(b) livia-2688

	Method [30]		Proposed Method
Levels	P (%)	R (%)	P (%)	R (%)
L 1	53.08	34.05	54.38	36.24
L 2	79.39	53.75	79.45	55.62
L 3	93.66	68.29	95.18	71.36
L 4	98.51	77.52	99.23	78.49
L 5	99.68	84.48	99.75	85.12
L 6	99.91	89.9	99.87	90.23
L 7	99.99	93.26	99.99	93.63

In Table 4 the level-wise performance has been compared with LBP + Wavelet + Legendre moments and proposed method for Corel-1k dataset and Olivia-2688 dataset. The overall retrieval shows the proposed method gives better accuracy of image retrieval than the LBP + Wavelet + Legendre moments. Hence the method we put forward outperforms in comparison with the methods [29-31].

6. Conclusion

In the current research work, we have proposed a technique that utilizes shape and texture features in multi-resolution analysis frame-work for retrieval of images. By using LBP texture features were extracted followed by shape features are extracted by using Fast and Accurate Exponent Fourier Moment of LBP codes. The concept of multi-resolution was optimally used with the help of wavelets. As a part of this work, computation of LBP codes of DWT coefficients followed by FAEFM’s to get feature vectors and these are in turn used to retrieve images that are visually similar. This process is repeated for different levels of images, the same process is employed so as to retrieve the details from different directions. The proposed method performance is evaluated by the recall (R) and precision (P).

References

[1] Rao, C.S. (2012). Content based image rettrieval fundamentals and AI. LAP LAMBART Publishing.

[2] Suhasini, P.S., Krishna, K., Krishna, I.M. (2009). CBIR using color histogram processing. Journal of Theoretical & Applied Information Technology, 6(1): 116-122.

[3] Manjunath, B.S., Ma, W.Y. (1996). Texture features for browsing and retrieval of image data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8): 837-842. https://doi.org/10.1109/34.531803

[4] Wang, X.Y., Yu, Y.J., Yang, H.Y. (2011). An effective image retrieval scheme using color, texture and shape features. Computer Standards & Interfaces, 33(1): 59-68. https://doi.org/10.1016/j.csi.2010.03.004

[5] Khare, M., Srivastava, R.K., Khare, A. (2015). Moving object segmentation in Daubechies complex wavelet domain. Signal, Image and Video Processing, 9(3): 635-650. https://doi.org/10.1007/s11760-013-0496-4

[6] Ojala, T., Pietikainen, M., Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7): 971-987. https://doi.org/10.1109/TPAMI.2002.1017623

[7] Tan, X., Triggs, B. (2010). Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Transactions on Image Processing, 19(6): 1635-1650. https://doi.org/10.1109/TIP.2010.2042645

[8] Murala, S., Maheshwari, R.P., Balasubramanian, R. (2012). Local tetra patterns: a new feature descriptor for content-based image retrieval. IEEE Transactions on Image Processing, 21(5): 2874-2886. https://doi.org/10.1109/TIP.2012.2188809

[9] Murala, S., Maheshwari, R.P., Balasubramanian, R. (2012). Directional local extrema patterns: A new descriptor for content based image retrieval. International Journal of Multimedia Information Retrieval, 1(3): 191-203. https://doi.org/10.1007/s13735-012-0008-2

[10] Liu, G.H., Zhang, L., Hou, Y.K., Li, Z.Y., Yang, J.Y. (2010). Image retrieval based on multi-texton histogram. Pattern Recognition, 43(7): 2380-2389. https://doi.org/10.1016/j.patcog.2010.02.012

[11] Liu, G.H., Yang, J.Y. (2008). Image retrieval based on the texton co-occurrence matrix. Pattern Recognition, 41(12): 3521-3527. https://doi.org/10.1016/j.patcog.2008.06.010

[12] Liu, G.H., Li, Z.Y., Zhang, L., Xu, Y. (2011). Image retrieval based on micro-structure descriptor. Pattern Recognition, 44(9): 2123-2133. https://doi.org/10.1016/j.patcog.2011.02.003

[13] Vipparthi, S.K., Nagar, S.K. (2014). Expert image retrieval system using directional local motif XoR patterns. Expert Systems with Applications, 41(17): 8016-8026. https://doi.org/10.1016/j.eswa.2014.07.001

[14] Zhang, M., Zhang, K., Feng, Q., Wang, J., Kong, J., Lu, Y. (2014). A novel image retrieval method based on hybrid information descriptors. Journal of Visual Communication and Image Representation, 25(7): 1574-1587. https://doi.org/10.1016/j.jvcir.2014.06.016

[15] Liu, G.H., Yang, J.Y., Li, Z. (2015). Content-based image retrieval using computational visual attention model. Pattern Recognition, 48(8): 2554-2566. https://doi.org/10.1016/j.patcog.2015.02.005

[16] Liu, G.H., Yang, J.Y. (2013). Content-based image retrieval using color difference histogram. Pattern Recognition, 46(1): 188-198. https://doi.org/10.1016/j.patcog.2012.06.001

[17] Wu, Y., Wu, Y. (2009). Shape-based image retrieval using combining global and local shape features. In 2009 2nd International Congress on Image and Signal Processing, pp. 1-5. https://doi.org/10.1109/CISP.2009.5304693

[18] Verma, M., Raman, B., Murala, S. (2015). Local extrema co-occurrence pattern for color and texture image retrieval. Neurocomputing, 165: 255-269. http://dx.doi.org/10.1016/j.neucom.2015.03.015i

[19] Saadatmand-Tarzjan, M., Moghaddam, H.A. (2007). A novel evolutionary approach for optimizing content-based image indexing algorithms. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 37(1): 139-153. https://doi.org/10.1109/TSMCB.2006.880137

[20] Agarwal, M., Maheshwari, R.P. (2012). Á trous gradient structure descriptor for content based image retrieval. International Journal of Multimedia Information Retrieval, 1(2): 129-138. https://doi.org/10.1007/s13735-012-0005-5

[21] Kishore, D., Srinivas Kumar, S., Srinivasa Rao, C. (2016). Content based image retrieval using gray level co-occurrence matrix with SVD and local binary pattern. International Journal on Cybernetics & Informatics (IJCI), 5(4): 213-222. https://doi.org/10.5121/ijci.2016.5424

[22] Nigam, S., Khare, A. (2015). Multiresolution approach for multiple human detection using moments and local binary patterns. Multimedia Tools and Applications, 74(17): 7037-7062. https://doi.org/10.1007/s11042-014-1951-0

[23] An, S.Y., Lee, L.K., Oh, S.Y. (2012). Modified angular radial partitioning for edge image description in image retrieval. Electronics Letters, 48(10): 563-565. https://doi.org/10.1049/el.2012.0705

[24] Singh, S.P., Urooj, S. (2015). Combined rotation-and scale-invariant texture analysis using radon-based polar complex exponential transform. Arabian Journal for Science and Engineering, 40(8): 2309-2322. https://doi.org/10.1007/s13369-015-1645-6

[25] Belkasim, S.O., Shridhar, M., Ahmadi, M. (1991). Pattern recognition with moment invariants: A comparative study and new results. Pattern Recognition, 24(12): 1117-1138. https://doi.org/10.1016/0031-3203(91)90140-Z

[26] Srivastava, P., Binh, N.T., Khare, A. (2013). Content-based image retrieval using moments. In International Conference on Context-Aware Systems and Applications, pp. 228-237. https://doi.org/10.1007/978-3-319-05939-6_37

[27] http://wang.ist.psu.edu/docs/related/, accessed on April 2014.

[28] Oliva, A., Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3): 145-175. https://doi.org/10.1023/A:1011139631724

[29] Srivastava, P., Khare, A. (2017). Integration of wavelet transform, local binary patterns and moments for content-based image retrieval. Journal of Visual Communication and Image Representation, 42: 78-103. https://doi.org/10.1016/j.jvcir.2016.11.008

[30] Alsmadi, M.K. (2018). Query-sensitive similarity measure for content-based image retrieval using meta-heuristic algorithm. Journal of King Saud University-Computer and Information Sciences, 30(3): 373-381. https://doi.org/10.1016/j.jksuci.2017.05.002

[31] Madhavi, K.V., Tamilkodi, R., Sudha, K.J. (2016). An innovative method for retrieving relevant images by getting the top-ranked images first using interactive genetic algorithm. Procedia Computer Science, 79: 254-261. https://doi.org/10.1016/j.procs.2016.03.033

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Content-Based Image Retrieval System Based on Fusion of Wavelet Transform, Texture and Shape Features

11.png

2.png