© 2024 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
The restoration of ancient Arabic manuscripts is a challenging task because of the noise and degradations present in restored historical documents. This paper presents an effective pipeline for manuscript restoration based on the Data Augmentation concepts and the GA for improving DL models. The Genetic Algorithm was chosen because it helps to optimize deep learning frameworks in an effort to improve the model in question with respect to restoring the manuscripts in question. Also, principles like CLSR and Wiener Filter help in noise reduction and enhancement of the images during image restoration. The findings suggest significant improvements in terms of accuracy, elimination of noise, image clarity and resolution, as well as the general readability of the restored manuscripts with accuracy rates of up to 97%. 70% for NASNet-A, 98. 40% for EfficientNet-B7 and 99. 13% for AmoebaNet-A. Apart from outperforming current procedures, these results support the protection and academic study of ancient Arabic manuscripts. Even as we continue to emphasize on the importance of these manuscripts within our culture and the on-going efforts to preserve them, it is also important to highlight the other areas in which our methods can apply. All these techniques have the potentiality of solving restoration problems in other types of manuscripts and can be adopted for other image restoration problems. This research contributes to the collection of essential information and tools for scholars and other interested individuals involved in the preservation of these important cultural artifacts and attempts to expand the application of these methods to address a wider range of restoration issues based on the results of this research.
image deblurring, deep learning in image restoration, CLSR, image restoration with the Wiener filter, genetic CNN, EfficientNet-B7, AmoebaNet-A, NASNet-A
Handwritten word recognition is converting handwritten words into their alphanumeric form and is a major application of pattern recognition. This technology has been adopted by research groups in various languages because of its practical and applicable nature. Character recognition systems can help to identify authors and authenticate documents and to support the process of digitizing heritage libraries and archives along with their paper manuscripts [1]. Such systems can be generally divided into two types: virtual and concrete visibility. Online recognition is the real-time recognition of text as it is input using specialized devices such as tablets and digitizers [2]. It has access to the geometric coordinates (captured by devices such as stylus pens). This makes the recognition task easier when compared to offline recognition methods. Arabic handwritten text presents additional challenges. Diacritics, for instance, can make it even harder to discriminate between letters and are often left unwritten. Beyond these and the fact that many Arabic letters share similar shapes, making decisions based on the placement and number of dots, for instance, means that classifying characters isn't easy. Some even try to get around writing those dots -- and alter the letters in the process -- by instead writing those marks as small strokes, as with several of the examples here. Fortunately, most handwritten Arabic manuscripts do not include diacritics, simplifying the classification task.
The main objectives of this research are to create advanced methods for restoring damaged and noisy ancient Arabic manuscripts by improving the performance of Convolutional Neural Networks (CNNs) by carefully selecting optimal parameters. Furthermore, this study endeavors to provide innovative strategies that safeguard the cultural value included inside these texts. The restoration of manuscripts is achieved by combining advanced neural network designs with Genetic Algorithms, resulting in outstanding accuracy and visual quality.
Deep learning methods, namely Convolutional Neural Networks (CNNs), have been more popular in several fields such as image classification, object identification, medical image analysis, face and facial expression recognition [3, 4], and handwriting recognition. CNNs have significantly contributed to the advancement of Arabic handwriting recognition methods. The choice of the algorithm and the use of the right optimization techniques are critical for producing good results in a particular domain. Other machine learning solutions, which involve knowledge transfer between previously ‘hidden’ layers of the networks, as suggested by Zoph and Le [5], and the use of GA to evolve specific CNN architectures, as the case of Genetic-CNN [6, 7], are also interesting contributions.
It has been shown that CNN networks coupled with GAs can achieve the same classification performance as the current classification networks. But there is a drawback of Genetic-CNN is they do not consider the hyper parameters that are affecting the convergence rate and general performance of the network like activation functions and optimization method.
Therefore, selecting the appropriate parameters is essential for faster convergence to the global minimum point on the error surface, leading to improved task performance and avoiding issues like over fitting.
The main contribution of this paper is to enhance CNN performance by carefully selecting parameters, leading to faster convergence and better recognition results. The proposed methodology aims to restore degraded and noisy ancient Arabic manuscripts using a novel approach. This approach combines state-of-the-art neural network architectures-EfficientNet-B7, NASNet-A, and AmoebaNet-A-with the Genetic Algorithm for optimization in order to excel not only in image restoration, but also preservation. By integrating these advanced architectures with other algorithms and techniques from Constrained Least Square Recovery (CLSR) to the Wiener filter, it is hoped that ancient Arabic manuscripts will be restored with unparalleled accuracy and visual quality. Indeed, these restored manuscripts may not only preserve the cultural heritage that these manuscripts embody, rather they can also establish a valuable resource from which researchers and scholars can immensely benefit. To some extent, we find ourselves at the crossroads between the subtlety of image recognition approaches and the text classification area where we immerse into a deeper investigation of the unique difficulties and revolutionary ways which are essential attributes of the Arabic handwriting restoration domain.
Alongside the technical issues raised above, preserving and promoting crucial cultural heritage resources also involves the recognition of the significant cultural aspects. The conservation of such artifacts goes beyond just the technical processes of restoration; it involves preserving past history and the identity of a given culture which is housed within these ancient manuscripts. This way not only we preserve some precious objects of our past but also transfer knowledge and tradition among later generations.
Consequently, it is very essential to highlight the cultural aspect of the discussion about manuscript restoration because it reveals the extraordinary worth of the process in the preservation of general heritage.
The structure of the paper is organized as follows: Section II describes the peculiarities and difficulties of restoring Arabic handwriting, but before that, it gives the research methodology in Section III. An overview of the experimental results is presented in Section IV and the following sub-section, Section V, the outcomes are compared with other existing methods. The conclusion is summarized in Section VI highlighting the significance of the present work and the prospective direction in the study of restoring degraded and noisy ancient Arabic manuscripts.
Over the past several decades, a considerable level of expansion has been observed in the field of studies of heritage which has been mainly focused on historical analysis of documents, preservation, restoration, and reconstruction. These priceless historical materials, often handwritten, cover a wide range of subjects, including, but never limited to, books, personal and professional letters, notes, journals, reports, etc. On the part of researchers, methods looking into all sorts of multimedia are being explored thanks to the progress in multimedia technologies to achieve the aims of the conservation and preservation of these manuscripts.
However, throughout their research, the scholars have immensely focused on different methodologies to help in overcoming the challenges of handwritten recognition and the preservation of these precious documents. The authorities have indeed quite successfully taken up the challenge of securing and preserving valuable digital writings, with the help of innovation in the field of mass media and artificial intelligence. The adaption of forward thinking techniques has enabled us to gain further insight in the historical significance of these documents and it has facilitated the preservation of the proper physical and visual appearance of these objects for future generations. A genetic algorithm approach made by Deborah and Arymurthy [8] was proposed to be used for enhancing and restoring the images of ancient documents. The outcomes showed that image enrichment worked out relatively well, with the success rates above 90% covered by 92.9% of the data. Conversely, the point of the preprocessing with the median filter had a detrimental effect and the images looked visual blurring effect, with only 59.5% of the data have a success rate greater than 90%. The approach made it possible to get a high success of recognition which proves the efficiency of the approach for the betterment of ancient documents image quality. On the other hand, the use of median filter preprocessing created outlines for the images resulting to blurring effects which may require further fine-tuning to eliminate image blur effects. In this regard, a hybrid optimization technique that incorporates the use of the GA (Genetic Algorithm) and HS (Harmony Search) approaches is proposed and as we expect this hybridization can guarantee the solution to become more robust and reliable due to the integration of the two optimization methods. Nevertheless, the study may face the problem of its data reliance on a somewhat limited vocabulary of 12,351 words which will have an impression on the language model's capacity to cover different contexts and generalize. Moreover, this study suffers from inadequate comparison of its results with the state-of-the-art technologies. It is also difficult in the study to determine its performance relative to the existing solutions.
Potrus et al. [9] suggested a combined system of Genetic Algorithm (GA) and Harmony Search (HS) algorithms for Arabic hand written text recognition. They collected two datasets: one HAAFAZ with 4500 Arabic words and other ADAB with 7851 Arabic words for evaluation. The authors preprocessed the written text by removing noise and non-textual parts from the images. They employed a dominant point detection method for dataset segmentation and utilized the GA-HS fused model for online Arabic character recognition.
The model achieved a 93.6% successful recognition rate on the manually collected dataset and a recognition rate ranging from 94.68% to 96.33% on the ADAB dataset. However, the stochastic nature of the harmony search algorithm led to longer search times [10]. This approach impressively combines Genetic Algorithm (GA) and Harmony Search (HS) algorithms, potentially leading to more robust and reliable outcomes through the synergy of these optimization techniques. However, a potential drawback of this study is its reliance on a relatively small dataset comprising 12,351 words, which may affect the model's capacity for diversity and generalization. However, it does not contain any experimental comparisons with the currently best available methods which could also be an argument in favor of our approach. In their research, Radenović et al. [11] introduced a method that deployed a VGG deep CNN along with a generalized mean pooling layer to produce feature vectors. The vectors are utilized in a Siamese model that is used to match and retrieve similar images to a query image. The authors tested their approach on different types of input images from custom datasets and employed the Euclidean function to compare the trained images with the queried image. The evaluation results were mAP 91.9% for the "Oxford5k" and "Paris6k" datasets displayed the image matcher and finder's image accuracy.
Nevertheless, the imposition of the evaluation revealed the compatibility applied to datasets containing exclusively 3D items would be a worthy point to be considered as a limitation on the approach’s generalization to other image types and/or objects. Moreover, there is no comparison of the proposed approach with alternative methods which can make the assessment difficult of how did this approach did relatively when compared with existing solutions. So that a clearer picture of its functionality and constraints will be established this field should be extended by considering the outcomes of additional experiments, the situations where it could be used and several datasets used to verify its effectiveness. In contrast, Zhou and Jia [12] have great experience in applying this technology of similarity verification for sketched images and 3D gallery images with Siamese multi-layer perceptron networks as their main focus. This method reports quite an innovative approach compared to others; although, the showed classes accuracy rates of 60% for the "NTU" dataset, and 39% for the "3D sketched" images dataset could maybe illustrate a drawback in the accuracy and effectiveness of this method. On the other hand, the study has no comparison with another method, to compare the mentioned points with the different methods used in making solutions. To enrich the analysis, the relative merits and detriments should be examined against the other applications using such cases and datasets.
The approach by Seddati et al. [13] impressively achieves successful image retrieval through the use of ResNet101 and the MS-RMAC descriptor. This method demonstrates strong performance in image retrieval tasks, as indicated by the high mean Average Precision (mAP) values ranging from 72.3% for the "Oxford5k" dataset, 87.1% for the "Paris6k" dataset, to 94% for the "INRIA Holidays" dataset. However, like previous research, this work lacks a direct comparison with other methods, making it difficult to evaluate its performance relative to other given solutions its overall context and practicality increase by providing a detailed analysis of the potential strengths and limitations of the method.
The method of Peng et al. [14] RNN introduces a new technique for rendering images into binary codes by enhanced deep learning, which is remarkable for its applications in image recognition and classification.
High mean average precision (mAP): The system achieves a competitive accuracy (mAP) value of 0.842 in "CIFAR10" and "NUS-WIDE" datasets and 0.808 in "MIRFLICKR" dataset, highlighting its practical application difficult in image projection and distribution tasks.
The lack of direct comparison with the other methods in the study makes it difficult to evaluate their performance relative to other solutions and may hinder the evaluation of their competitiveness and utility in different contexts a the limits of their use.
In terms of classification and similarity measures, Qian et al. [15] conducted an experiment using three deep learning models: gated recurrent units (GRU), LSTM, and the Siamese deep learning model. They evaluated their models on the "Reuters_50_50" and "Gutenberg" datasets. After classifying the articles using the proposed models, they measured similarities using the Cosine distance metric. The Siamese deep learning model outperformed the other two models, achieving an accuracy of 99.8% on both datasets.
You et al. [16] recommended incorporating a bidirectional LSTM layer in their AttentionXML deep learning model for text classification. They evaluated their model on six datasets: EUR-Lex, Wiki10-31K, AmazonCat-13K, Amazon-670K, Wiki-500K, and Amazon-3M. Precision was computed as the evaluation metric, and the highest precision of 95.92% was achieved on the "AmazonCat-13K" dataset [10].
For Arabic text classification, Elnagar et al. [17] focused on using CNN and RNN models. Testing their models on SANAD and NADiA datasets showed that the addition of mood after RNN achieved the highest classification accuracy of 96.94% Liu and Guo [18] used two mood layers in LSTM after the main section used a two-way method to classify data. Their proposed model was evaluated on seven data sets, obtaining a very high accuracy of 97.2%.
Du et al. [19] introduced a Convolutional Recurrent Attention Network (CRAN) classifier. Their model was checked on five datasets: MR, SST-1, SST-2, Subj, and IMDB, including Liu et al. [20] achieving the highest accuracy of 94.1% on the "Subj" dataset. Liu et al. [20] developed a two-way RNN concept model for sentence classification, achieving 94% accuracy on eight benchmark data sets. In hierarchical text classification, Yang et al. [21] proposed a hierarchical attention network (HAN) with two encoders for words and sentences, with two levels of attention. They applied their model to six data sets and found a very high accuracy of 75.8% on the "Yahoo North" data set.
Gao et al. [22] improved the text segmentation model proposed by Yang et al. [21] by adding a convolutional layer, resulting in a Hierarchical Convolutional Attention Network (HCAN). These changes facilitated the generation of embeddings to update the conceptual weights. The authors evaluated their model on four data sets, yielding an accuracy of 89.9% on the "Amazon Range" data set.
Also, Khayyat and Elrefaei [23] conducted a study focusing on Arabic manuscripts to classify images to predict their authors. They collected Arabic manuscripts from 52 Arabic authors and developed four deep-learning models: MobileNetV1, DenseNet201, ResNet50, and VGG19. By tuning the learning hyper parameters, they optimized the recognition process of the models. The authors found that minimizing the learning rate, combining "Sigmoid" with "Softmax," and increasing the number of neurons on the final classification dense layer significantly improved the networks' recognition performance. All the deep learning models achieved validation accuracy above 95%.
Most of the previous studies lack a detailed comparison with alternative methods and further discussions on potential limitations. These literature reviews require enhanced comparisons and more in-depth discussions of potential factors that could impact the results.
Cursive form: Arabic calligraphy's cursive nature makes it challenging due to the continuous flow of letters. For example, the connection between letters in words like "ثم" (then) requires specialized segmentation techniques to identify individual characters in handwritten text.
Variability of letter shapes: Arabic letters can take on various forms depending on their position in a word. For instance, the letter "ك" can have distinct initial, medial, final, and isolated forms. This change results in an extended range of character sizes, from 28 to 84 values, which the detection system must account for.
Same letters: Some Arabic letters, such as "س" and "ش" have the same base form, with differences in placement and number of dots. There is a difference between the characters that he refers to in his text. The characterization does not, however, come as a mere linear analysis of a text but has a wider context.
Diphthong marks: These are the vowel sounds used in Arabic writing that can represent flex points, and as such, can be placed above or below the letter. These symbols involve some problems by establishing phonetic characteristics like "ـَـ" and "ـُـ". This key point can be achieved by the knowledge of these two terms as well as their appropriate definition and use.
Overlapping letters: The Arabic documents with cursive writing generally handwriting which are made by more than one letter per stroke that needs division to differentiate independent letters Take for instance the word "معرفة" (phrasal) since its letters share the same border cells [24].
3.1 Link to the study
In this paper, we decided to take the hard task of performing visualizations and deep learning methods to restore and decipher eroded Arabic inscriptions. Unique experience is that we have adapted giving us a competitive edge. We start by employing a group of three neural networks that use dynamic learning (EfficientNet-B7, NASNet-A, and AmoebaNet-A) along with convolutional neural networks (CNNs). In such a group work we can catch all kinds of gothic characters of Arabic text like variance in letterforms, curving text, similar forms of letters and fancy punctuation marks.
It is the pre-processing techniques we incorporate which are a unique thing, however, the focus is on the procedures which are aimed to help with the preservation of ancient texts. For instance, our tech is composed of an image restoration algorithm to enhance legibility. The aspiration for our technique is to make fair use of genetic algorithms which enable our responses to end up being comprehensive solutions to problems with numerous dimensions.
Addressing the challenges, our strategy is structured in such a way that the quality of ordinary and noisy documents is increased by the efficient use of modern neural networks consisting of EfficientNet B7, NASNet A and AmoebaNet A. We fine-tuned these network structures with the genetic algorithm to perfectly correspond to the needs of document restoration. Through the utilization of these networks, our goal is to revitalize manuscripts by directly understanding and categorizing features from the manuscript images eliminating the necessity for traditional character modeling or segmentation techniques.
During the process, we adjusted these network designs for manuscript restoration. Specifically, we trained the models using a set of noisy ancient Arabic manuscripts. The training phase involved refining the network parameters and weights to reduce errors between predicted and actual images. Additional information regarding the dataset, hyper parameters as details, about training and validation procedures can be found in the Experimental Steps section.
To replicate, we used well-established deep learning libraries such as TensorFlow and PyTorch to implement and optimize the neural network architecture of EfficientNet-B7, NASNet-A, and AmoebaNet-A Available in different sizes and training times. The dataset for training and validation consists of thousands of ancient Arabic manuscript images collected specifically for this study. These images cover a wide range of deterioration phenomena, such as ink loss, paper ageing, and environmental degradation, creating a representative dataset of restoration work Provided in detail software, libraries and hyper parameters will be provided in the final manuscript to facilitate the reproducibility of our method. The proposed image represents a visual reconstruction of the ancient Arabic manuscripts. Figure 1 below shows a proposed image for the restoration of ancient Arabic manuscripts.
Figure 1. The proposed diagram for the restoration of ancient Arabic manuscripts
Focuses on learning and classifying objects directly from signature images rather than relying on traditional character models or segmentation methods. The main goal is to restore the signature using the power of this neural network moving system using this front, and other methods such as Constrained Least Square Recovery (CLSR) and a Wiener filter.
EfficientNet-B7 is an efficient and scalable architecture that uses compound scaling to achieve better performance in image segmentation tasks. NASNet-A, discovered through neural architecture search (NAS), automates the design process and enables the network to be optimized and change its structure for better performance AmoebaNet-A, a new architecture based on NAS, performance of surprising in image classification tasks due to deep stacks of convolutional layers and skip connections are shown that combining this state-of-the-art genetic engineering (GA)-optimized architecture with a restorative approach, restoring ancient Arabic manuscripts with unparalleled accuracy and visual quality results aims to solve the challenges posed by corrupted and noisy signature images.
The combined capabilities of CLSR, Wiener filter, EfficientNet -B7, NASNet- A, and AmoebaNet-A are required by this new approach to preserve the cultural heritage of these manuscripts and to provide researchers and scholars with a valuable resource for study.
4.1 Image restoration using wiener filter
The Wiener filter, which is widely accepted for its effectiveness in image reconstruction, acts as an optimal low-pass nonlinear filter. Specially optimized for the restoration of ancient Arabic manuscripts, the filter addresses two main challenges: blurring and noise reduction. Using the degradation function (H) and its complex conjugate (H*), the Wiener filter works by the following equation [25].
$\mathrm{W}_f=\frac{H^*}{|H|+\frac{1}{S N R}}$ (1)
With it in use the Wien filter gains the capability of removing or at least reducing the problem of blurring and noise in the images. This method improves the quality of images. The intended method for the digital recreation of the ancient handwritten Arabic manuscript is in the first place based on the Wiener filter described in the study of Yoo and Ahn [25], which is aimed at processing noise-free and corroded images using the corresponding equations to efficiently minimize the impact of corrosion. The H operator introduces a simple decay function while H* is meant to describe the conjugate decay case. Combining these properties favours the Wiener filter to become the one that yields the highest image restoration results, which is characterized by minimum mean square error and better visual quality.
4.2 Constrained least square recovery (CLSR)
A novel non-destructive restoration approach for the preservation of Arabic heritage manifests as a perfect blend of mathematical optimum approaches with superimposed constraints. The method in question is a formulation of the optimization problem that strives to minimize the mean square error, which is accompanied by necessary constraints for the preservation of the manuscript's particular features [26]. Let's define the following variables for clarity: Let's define the following variables for clarity:
- X: An inaccurate and noisy image of an antique Arab manuscript in its current condition, i.e.
- Y: Using the same CLSR, post-reshaping the picture.
- H: Extended geometry function describing the reduction in appearance of the content text.
- E: Represent residual error.
The CLSR formulation can be expressed as follows:
$\min \|H \times Y-X\|^2+\lambda \times\|E\|^2$ (2)
where, ||.|| denotes the Euclidean norm and λ is a regularization parameter. The first term in the objective function expresses agreement, which restores the image Y, which generates the error of the degradation function H, which is convolved with the degraded X image. The second term is regularization, which puts restrictions on the error, E, to preserve the unique features of the manuscripts.
To optimize this problem, iterative algorithms such as gradient descent or conjugate gradient method are utilized. The algorithms, implemented to find the best values for Y, try to minimize the amount of error that results from using the chosen model. At this iterative phase the imposed likelihood function between the degraded image X' and the presumed degradation function H' will be: (Y) = (Improving the contrast, enhancing the sharpness, adjusting the brightness, or adding colour, are all examples of methods that can be used to enrich the original image.) We like that this iterative process will go on until convergence completion so that we can arrive at the restored image Y which will have the unique traits of the manuscripts of Ancient Arabic.
Utilization of both elements of the Wiener filter (insufficient in the elimination of blur and noise) and the preservation of text characteristics and spatial regularity provided by the CLSR in the restoration process, leads to better restoration of the image. The integration therefore improves the preservation of the libraries and the main living legacy of early Arabic manuscripts.
4.3 Genetic CNN
The genetic CNN is regarded as an evolutionary strategy that can be adapted to the presence of both the Genetic Algorithm and the frameworks used for the configurations of CNN settings by considering the concepts of the Genetic Algorithm in such an environment [7].
The target of this design is to create a range of networks that surpass the performance of the existing state-of-the-art classification networks. Initially, the CNN algorithm performs a genetic encoding of convolutional layers of the initial set of contenders and proceeds towards the next phase.
4.3.1 The algorithm's structure
The search space offered by Genetic-CNN is insufficient because the performance of the CNN model is heavily influenced by network architecture and hyper parameters [27].
As a result, in this research, the most effective network architecture for a given input is determined by using GA to search both network configurations and hyper parameters as the search space. The system's procedure is depicted in Figure 2.
Figure 2. Block diagram explaining the process of the genetic algorithm (GA)
Figure 3. Binary string encoding of architectural stages (a): a collection of convolutional layers; (b): a version of the layers in encoded form (a). Connections to nodes A2 and A3 are represented by the integers before and after the hyphen in (b)
In order for any Genetic-type algorithm to optimize a model, the solution must first be mapped into the algorithm's search area. The key to successful mapping is to discover an appropriate representation scheme that allows solutions to be characterized as chromosomes-gene sequences. Binary digits, alphabets, and floating-point numbers can all be used to represent genes. Binary representation is employed to encode connections between convolutional layers in the approach. An example of encoding such a relationship is shown in Figure 3.
A stage is a collection of convolutional layers that sit between input and pooling layers, between two pooling layers, or between pooling and fully connected layers [27]. Each layer within a stage is referred to as a node. Figure 3 depicts a structure with one stage and three nodes within it. The link between each node is then converted into binary numbers. Only forward connections are permitted, and each node's preceding connections are encoded. Finally, bars are used to separate stages, and hyphens are used to separate nodes. The amount of bits needed to encode a network of S stages, each having NS nodes, is computed using (3).
$N_{b i t}=\sum_S \frac{1}{2} K_S\left(K_S-1\right)$ (3)
Activation functions and optimization techniques are also encoded in bits and separated by bars. It is possible to compute the amount of bits necessary for n hyper parameters using (4).
number of bits $=b \times \log _2(n) c+1$ (4)
The encoding procedure must be defined since it is employed from the very beginning of the algorithm initialization. This stage builds a binary-encoded population of CNN models. This stage usually generates models using data from previous generations, but for the first generation, it kicks off the process by randomly selecting a set of candidates. Each individual in the population is referred to as a chromosome or s. Using a Bernoulli distribution, each piece of a chromosome is sampled independently during this procedure.
The algorithm decodes each of the networks in the population and estimates the classification accuracy. These accuracies are used to determine the algorithm's fitness. The fitness score is also considered an objective function of the algorithm because it determines which chromosomes survive. An optimal fitness score is set in the algorithm, and the procedure is continued until it is achieved by a CNN model.
If no one in the population achieves the optimal fitness score, the method uses a rank-based system to choose the number of chromosomes. In this approach, chromosomes with higher classification accuracy will survive, ensuring that the algorithm converges quickly.
Surviving chromosomes create the next generation of individuals by a consistent crossover process after the selection step, c. This method picks a random pair of parent chromosomes and uses a random variable to generate children. If this variable exceeds a predetermined threshold, the gene of parent 1 is used. Otherwise, parent 2 is selected.
Finally, mutation is applied to the chromosomes that have survived. This procedure modifies each gene on the chromosome with a certain frequency, expanding the problem's search area. Mutation can occur on chromosomes that have been created as a result of the crossover process, and it occurs over a set number of generations.
Because the approach measures fitness by classification accuracy, the network topology and hyper parameter selection are heavily influenced by the input dataset.
4.4 Deep convolutional neural network classification
Deep Convolutional Neural Network Classification is a powerful approach in the field of artificial intelligence and computer vision. A special type of neural network architecture- Convolution Neural Networks (CNNs) are designed for image recognition, object detection and classification and are very powerful. Consequently, they have acquired popularity in most domains that are characterized by information extraction which naturally follows through the automatic learning of important features from images [28-30].
In this work, the focus is placed on three state-of-the-art CNN architectures: NASNet-A, AmoebaNet-A and EfficientNet-B7 are the new names among the ones. Such architectures have shown excellent results when classifying images, and have been comprehensively utilized in the resolving of a range of difficult problems.
4.4.1 NASNet-A
The Neural Architecture Search Network Type A (NASNet-A) is a neural architecture search algorithm that is used for natural deep convolutional neural networks (CNNs) direction [31]. It is a futuristic approach that focuses on the network architecture design of CNN intended to attain some tasks like image classification using the highest level of performance.
The very objective of NASNet-A was to develop an approach that leveraged reinforcement learning to find the optimal network architecture. By allowing the algorithm itself to discover the most efficient network structure among a wide variety of candidate structures without human intervention, instead of just employing the network structures designed by people, the NASNet-A design approach enables the algorithm to automatically select the best network structure.
While the architecture search in NASNet-A involves having numerous networks trained and tested, it also involves exploring beyond the reach of traditional neural networks. The algorithm improves its prediction ability during this phase by assigning more probability to the structures that can perform better on the given task. In the process of these iterations, the algorithm gradually creates an adaptive structure of the network by the method of screening and modification of the most valuable candidates.
The actual design of the NASNet-A is a combination of various operations like a convolutional layer, pooling layer and skip connections. These operations are organized in a hierarchical structure and operate on different scale data to capture both the local and global features present in the input data.
Our final architecture has strived to demonstrate better classification behaviour and computing power for image recognition tasks.
Implementation of NASNet-A into our system of ancient Arabic manuscript restoration and recognition is expected to boost the performance reliability of the system and help us restore and recognize the manuscripts correctly. Auto-design process of NASNet-A guarantees that the structure is tailored to the input as well as the demands of the task, so therefore, the NAS NET design is engineered for the specific requirements and it shows high performance compared to manually designed architectures. Another important element of the NASNet-A algorithm is the merging of the deep convolutional layers to the hierarchical structure, forming a skip connection which allows NASNet-A to achieve the fine details and complex patterns within the ancient manuscripts, leading to more precise restoration and recognition results.
The Figure 4 depicts the normal and reduction cells of NasNet-A.
Figure 4. Normal and reduction cells of NasNet-A
4.4.2 AmoebaNet-A
Is a novel convolutional network architecture that has been explored by Neural Architecture Search (NAS), an automated neural architecture design strategy. It is a part of the Amoebanet family of neural networks which excel in the recognition of objects from images [32].
AmoebaNet-A has been developed using reinforcement learning-based NAS architecture by Google’s research team The algorithm employed in NAS examined a search space of diverse architectures and picked the top one according to the validation set metrics. Subsequently, AmoebaNet-A has shown the best results on several image classification datasets, especially on ImageNet which is considered as a standard dataset.
The distinguishing attributes of AmoebaNet-A dive into the depth and complexity of it. It represents the main structure, which is a deep stack of convolutional layers that are utilized for locating hierarchical features in the input images.
Upon the completion of the convolutional layers, a global average pooling is applied to gather spatial information after which the classification is achieved through the use of a fully connected layer. Architecture also makes use of skipping connections which help in passing information to bypass certain layers and promote the flow of the gradient during training.
The AmoebaNet-A can find information in the images of old Arabic manuscripts that can be used for restoration and identification. The deep convolutional layers may be able to catch the subtle details and patterns, while the skip connections help the information move and enhance the learning process. With the addition of AmoebaNet-A to the restoration buildup process, it is possible to realize better results and enhance the overall accuracy rate of the work.
The Figure 5 illustrates the architecture of AmoebaNet-A.
Figure 5. Normal and reduction cells of NasNet-A
4.4.3 EfficientNet-B7
Is a modern neural network architecture that has gained a lot of success in image classification tasks. It belongs to the family of EfficientNet models known for their ability for optimal usage of computational resources with high accuracy.
EfficientNet-B7 has a deep scaling paradigm in its structure which improves the depth, width, and resolution sequentially by the same factor. The method, in this case, makes the optimality of the model possible by taking into account the adjustment of the capability of the resource [33].
The EfficientNet-B7 architecture is made of blocks with multiple layers of convolutional networks, max pooling, and the fully connected layers as well. The model unites the depthwise separable convolution method that divides the spatial and channel-wise operations leading to a reduction in the computational complexity but keeping the expressive power.
This way the model can better improve its effectiveness with the proper resource usage. It allows for a tradeoff between model capacity and computational efficiency, therefore taking accuracy as well as computational overhead into account.
EfficientNet-B7 employs a compound scaling approach that makes uniform scaling across all the dimensions of the model.EfficientNet-B7, on the other hand, implements some advanced mechanisms, for instance, squeeze-and-excitation blocks that enable the model to discriminately focus on meaningful features while drawing less attention to those which are less relevant. The model becomes more discriminative and can reveal complex patterns with high precision.
The effective character of EfficientNet-B7 made it useful in many aspects of ancient Arabic manuscript restoration. Through high-level feature extraction, EfficientNet-B7 can, to some extent, restore the degraded visual content of the manuscripts. It can improve the readability of the text, suppress noise and artefacts and restore the lost and foggy regions.
The Figure 6 illustrates the architecture of EfficientNet-B7. The selection of the EfficientNet-B7, NASNet-A, and AmoebaNet-A architectures for the proposed method was based on their notable features that align with the specific requirements of the task.
Figure 6. Architecture of EfficientNet-B7
The EfficientNet-B7 network is so well-known for its efficacy and scalability making it an ideal tool to remedy large-scale image processing requester. Different from NASNet-A, which can conform to variable architectures, is the most comprehensive in adaptability. AmoebaNet-A exhibits a high degree of architectural diversity, which is a pre-condition for carrying out sophisticated tasks such as pattern recognition. These distinctive attributes not only created these models to be fitted but also made them suitable selections for the Arabic manuscript restoration process, with the efficiency of the work, adjustability, and recognition of patterns. In addition, this extra detail gives readers an idea about what was behind the decision of these mentioned models to be used in the analysis.
The experiments were done in Python language with the opened Spyder IDE. The execution was established on a personal computer running a 64-bit system with the Intel Core i7 processor, frequency is 3.0 GHz, and RAM is 16 GB.
5.1 The datasets
Therefore, primary research data on Arabic historic manuscripts became the problem as there is a limited collection of the databases due to them not being free to get onto. To outsmart the obstacle of manually building a data set for this research, a task in which I need to smartly collect the data set. The dataset was built up of 23 folders, which in turn each contained the collection of 3745 ancient Arabic manuscripts. The handwriting in these manuscripts was measured at 6584×4653 pixels and, as a result, the data size is 0.025 TB. The manuscripts, which comprise the corpus, are distributed over different periods, transcribed into different styles and/or written by different authors. This makes for a rich and varied dataset that is appropriate for the proposed research. The complete structure is labelled in a well-arranged manner with each manuscript tagged to its proper folder for easy identification. The digital photos reproduce the fine particulate panorama of the codices, which will serve as a precious asset for contiguous research, as well as more in-depth inquiry into the field of ancient Arabic manuscripts.
5.2 Data augmentation
It is the restoration of the old Arabic manuscripts that gives birth to many challenges coming directly from the processes of natural degradation and generation of noise which accompany those ancient documents. For the purpose of addressing these problems and making restoration process more efficient make sure implement data augmentations techniques. This paper discusses four data-augmentation techniques that have been particularly engineered to re-constitute ancient Arabic antiquities.
In addition, the power of the software programs was further enhanced by increasing the size of the dataset from the base level of 3,745 images to a much larger size of the dataset which now consists of 55,041 images. This significant step by step assists in larger buildup that encompasses multiple types of manuscript, writers, and degradation patterns. The restored models with a broad and comprehensive dataset will at the same time be able to repeat this dataset endlessly, enabling them to grasp and adjust to the complex and particular features of the old Arabic manuscripts.
The employment of new data augmentation techniques that integrate advanced methods, together with the exploitation of an expanded dataset, enables the restoration algorithms to attain outstanding accuracy in the process of restoration and conservation of ancient manuscripts in Arabic. This study presents a major contribution to the field of document conservation, contributing to the knowledge base and advancement that are needed to be able to assist scholars, researchers, and other conservationists in their studies and efforts to preserve those priceless cultural artifacts.
5.2.1 Gaussian noise augmentation
The noise in the images will be like adding the Gaussian noise to emulate the impoverishments and errors, characteristics of the old manuscripts. Using Gaussian-random color array, which is a one of the libraries of supervised learning, our restoration model becomes more powerful and can handle degraded and noisy manuscripts much better. This technique is quite similar to the actual environment for the manuscripts of that age. This can further allow the model to learn textual shroud and reproduce the original text.
5.2.2 Blurring augmentation
Slight entirety of the picture presented on the computer screen has similar effect to what happens to ancient manuscripts when they age and deteriorate the longer they exist. For blurred reconstruction, the model employs random blurring filters at various degrees of blurriness and convolutional operations which compromise the image's sharpness. This method makes the model similarly effective in dealing with faded or damaged areas contribution by producing enhanced visibility and the object is easily understood.
5.2.3 Contrast and brightness adjustments
The changing the parameters, approach contrast and brightness in the photographs symbols the changing of it with time and the this fading which looks common in the ancient manuscripts. The restoration model uses the random perturbations to the contrast and brightness parameters in the process of applying restoration, thus encompassing as many lighting conditions as possible while maintaining its accuracy in restoring the text. The latter method is preserving the individual specificity of the articles and improves it.
5.2.4 Cutout augmentation
Cutout augmentation is an effective and pivotal technique that deserves the inclusion among other data augmentations methods for digitally restoring the ancient Arabic manuscripts. It means randomly rotating a mask-it might be a rectangular region of the image, which is similar to the case when some parts of the manuscript are missing or may have been damaged.
However cutting out the additional data from the model turns it into a block to break and make the model reconstruct the holes of the text. Such feature contributes to the models in understanding the fact that they need to allocate rebuilding the whole structure and main flow of the document even if many elements are missing or are damaged.
The use of cutout augmentation enhances the model's ability to effectively handle significant damages, such as pages that have been torn or are missing. This is achieved by motivating the restoration model to swiftly retrieve the absent content and maintain the coherence of the text.
Cutout augmentation alongside Gaussian noise, blurring and contrast changes further improves the flexibility and capacity of the restoration model to deal with several degradation patterns that exist in ancient Arabic manuscripts. Such varied factors provide for the integration of the importance of the original and rare manuscripts and the historical value of the properly reconstructed text inside the model for the preservation of the same.
Data augmentation strategies: Recent approaches for the restoration of ancient Arabic writings. Parameters for each strategy is presented in Table 1.
Table 1. Data Augmentation Parameters
|
Gaussian Noise |
Blurring |
Contrast and Brightness |
Cutout |
Technique 1 |
0.0015 |
0.0035 |
1.0 |
20×20 |
Technique 2 |
0.0020 |
0.0040 |
1.2 |
25×25 |
Technique 3 |
0.0025 |
0.0045 |
1.4 |
30×30 |
Technique 4 |
0.0030 |
0.0050 |
1.6 |
35×35 |
Technique 5 |
0.0035 |
0.0055 |
1.8 |
40×40 |
The provided table has described the detailed information about the parameters used for each data augmentation techniques. These parameters were carefully chosen according to their effectiveness in dealing with typical deterioration methods observed in manuscripts of the ancient Arabic language. Each technique is followed by specific values which set the scope of the impact.
These values refer to various image degradation factors such as the amount of Gaussian noise, blurring kernel size, range of contrast and brightness transformations, and dimensions of the Cutout mask. This careful choice of the values was carried out with the key intention of realizing the optimal expected benefits in the restoration process.
Furthermore, Figure 7 below provides a visual reference for the four discrete data augmentation methods used in this study in an effort to assist in restoring ancient Arabic manuscripts.
Figure 7. (a) Original image, (b) Noisy image, (c) Cutout augmented image, (d) Contrast and Brightness adjusted image, (e) Blurred image
5.3 Experimental steps
Three neural network designs, namely EfficientNet-B7, NASNet-A, and AmoebaNet-A, were used in the experimental configuration for the purpose of Arabic handwriting recognition. The photos were preprocessed, the models were trained, and their performance was evaluated using a number of procedures.
Initially, the photos obtained from the gathered database underwent preprocessing by adjusting their dimensions to match the specifications of each design. The dimensions of the EfficientNet-B7 photographs were adjusted to a resolution of 600×600 pixels, the NASNet-A images were scaled to 331×331 pixels, and the AmoebaNet-A images were downsized to 224×224 pixels.
Furthermore, in order to maintain a consistent representation throughout the models, the pixel values of the input pictures were standardized to the range of [0-255]. The dataset was divided into training, validation (evaluation), and test sets using a 70/15/15 split. The use of this division enabled the formation of a separate validation set, which was employed to optimize the parameters of the model and improve its overall performance [34].
The dataset used has a grand total of 55,041 pictures. The image files were partitioned into three distinct subsets: 38,529 photos were allocated for training, 8,256 images were designated for validation, and another 8,256 images were put aside for testing. This ensures that the models are trained with an abundance in data, double checked to extent that the data won't overfit, and checked to extent that the models can generalize. The pictures were taken from different sources aiming strategically to make a comprehensive and strong assembly.
In the course of training we had neural network architecture configurations, hyperparameters and all of that. The process of training was made more precise by choosing the training parameters one by one, such as learning rate, batch size, number of epochs, and optimizer. Models were trained under the influence of the specified training set and later evaluated using the performance of the validation set for the purpose of monitoring and making any changes if needed. The built models were then presented to a testing data set to determine how they worked based on previously unseen inputs.
Starting from the scratch was done on the architecture of each model with weights that were already trained on the ImageNet dataset. This is carried out as a transfer learning process, which is already a proven strategy that improves job accuracy with low data sets. Using early stopping to address overfitting, the patience was set to 10 epochs based on the performance on the validation data. One of the ways in which data augmentation was applied was, for example, to utilize techniques like rotation, translation, and scaling, which allowed the models to be more resistant to variations in handwritten text.
Those were thoroughly examined and arranged in order to improve the study process. The models were trained with training set and examined using the validating set for the purpose of evaluating their performance and making appropriate adjustments. Lastly, models were tested on the individual separate test set in order to evaluate their true performance on unseen data.
The table below depicts the experimental setup and training variables.
Table 2 shows the experimental configuration and training settings used in the current investigation. The studied model architectures used in this study are EfficientNet-B7, NASNet-A, and AmoebaNet-A. There are a total of 55,041 photos in the collection. The training of the models spanned 60 epochs, during which a batch size of 32 was used.
Table 2. Experimental setup and training parameters
Parameters |
Value |
Training Epochs |
60 |
Batch Size |
64 |
Learning Rate |
0.001 |
Optimizer |
Adam |
Loss Function |
Perceptual Loss |
Regularization |
0.0010 |
The learning rate was set to 0.001, and the Adam optimizer was utilized. The Perceptual Loss function and L2 regularization with lambda=0.001 were applied. These settings were carefully chosen to optimize the performance and accuracy of the models.
5.4 Performance analysis and evaluation of CLSR and wiener filter for the restoration of ancient Arabic manuscripts
In order to accurately evaluate the performance of the restoration techniques, several objective measures are employed:
PSNR (Peak Signal-to-Noise Ratio): It is a metric used to measure the quality of a reconstructed image by comparing it to the original image. It measures the ratio between the maximum possible signal power (peak signal) and the power of the noise present in the image [35].
The higher the PSNR value, the closer the reconstructed image is to the original image. It is calculated using the mean squared error (MSE) between the original and reconstructed images:
$P S N R=10 \log _{10}\left(\frac{M a x^2}{M S E}\right)$ (5)
where, Max is the maximum possible pixel value in the image (e.g., 255 for an 8-bit image); MSE is the mean squared error between the original and reconstructed images.
SSIM (Structural Similarity Index Measure): It is a metric that assesses the structural similarity between two images. It considers the luminance, contrast, and structural information in the images. SSIM ranges between -1 and 1, where 1 indicates perfect similarity. It is calculated based on three components: luminance (l), contrast (c), and structure (s):
SSIM $=(l \times c \times s)^{\left(\frac{1}{3}\right)}$ (6)
Sharpeness: Sharpeness represents the degree of the image sharpness or clarity. An image that is in focus or is well exposed is said to be a defined or a detailed image. They apply methods to quantify sharpeness like the variance of the Laplacian operator or the gradient magnitude. These measures evaluate the high-frequency components in the image, which are indicative of sharpness.
An example of a sharpeness measure is the variance of the Laplacian. It is computed by applying the Laplacian operator to the image and calculating the variance of the resulting values. The mathematical equation for the variance of the Laplacian is as follows:
$L=\Delta^2(I)=\nabla^2(I)$ (7)
where, L: Variance of the Laplacian, Δ2: Laplacian operator, I: Image.
The variance of the Laplacian can be calculated over the entire image or in a specific region, and higher values indicate sharper changes and finer details in the image, suggesting higher sharpness.
AMBE (Absolute Mean Brightness Error): It quantifies the difference in brightness between the original and reconstructed images [36]. It is calculated as the absolute difference between the mean brightness values of the two images:
$\begin{aligned} &A M B E=\mid \text {mean(original})- \text { mean }(\text {reconstructed}) \mid\end{aligned}$ (8)
Table 3 presents the results obtained from the evaluation of different models using various metrics.
Table 3. The results obtained from the evaluation of CLSR and wiener filter
Model |
PSNR |
SSIM |
Sharpeness |
AMBE |
CLSR |
29.75 |
0.93 |
0.87 |
0.24 |
Wiener Filter |
27.92 |
0.91 |
0.83 |
0.29 |
CLSR+Wiener Filter |
33.21 |
0.96 |
0.94 |
0.15 |
The results obtained from the application of CLSR and the Wiener filter for the restoration of ancient Arabic manuscripts show significant improvements in various evaluation metrics.
On the figure of the PSNR (Peak Signal-to-Noise Ratio), CLSR + Wiener Filter model comes out as the best in the chart as it showed higher value of 33.21 compared to CLSR (29.75) and Wiener Filter (27.92). From the chart, it is clear that together with CLSR+Wiener Filter, the model has attained image-quality signal preservation and lowered noise on the restored documents. SSIM (Structural Similarity Index) is another factor that is improved as CLSR+Wiener Filter model scores 0.96. This is higher than that of SSIM in CLSR (0.93) and Wiener Filter, It is observed that SSIM rises in proportion to the amount of details and textures that are preserved by the restored image. Another visual factor, sharpness which is ratios to how the edges and details are improved, behave as the former does. The model that combines CLSR and Wiener Filter gives higher result of 0.94, and it is better than CLSR with a result of 0.87 and Wiener Filter parameter with 0.83. This result, thus, confirms the fact that the platform comprising the CLSR and the Wiener Filter improves the sharpness of the edges and fine details in the manuscripts which are restored. Further test revealed that the AMBE (Average Mean Brightness Error) evaluation metric has got 0.15 value for the CLSR+Wiener Filter approach, which exhibits better quality in brightness recovery.
5.5 Results for EfficientNet-B7 based model
The presented outcomes given in Table 4 and Table 5 that correspond with the GA optimized EfficientNet-B7 architecture demonstrate the effect of GA on model performance. The Table 4 depicts this by showing the precision growing, loss decreasing and the validation accuracy, recall, and F1 score increasing as the epochs increase revealing better performance. Nevertheless, the amount of the performance is still a bit poorer than it was when genetic algorithm was used for the optimization. While GA optimization in Table 5 improved the strength of our model across all metrics. During the training, the precision, loss, validation accuracy, recall, and F1 score observe that there is a significant improvement as the epochs progress.
Table 4. The results for EfficientNet-B7 architecture without GA
Epoch |
Precision |
Loss |
Validation Accuracy |
Recall |
F1 Score |
15 |
80.15% |
19.85% |
91.37% |
78.32% |
0.818 |
30 |
88.26% |
11.74% |
93.92% |
85.25% |
0.889 |
45 |
92.39% |
7.61% |
95.84% |
90.47% |
0.928 |
60 |
95.57% |
4.43% |
97.08% |
93.61% |
0.949 |
Table 5. The results for EfficientNet-B7 Architecture with GA
Epoch |
Precision |
Loss |
Validation Accuracy |
Recall |
F1 Score |
15 |
83.50% |
16.50% |
92.70% |
80.60% |
0.86 |
30 |
92.20% |
7.80% |
95.40% |
88.80% |
0.92 |
45 |
96.10% |
3.90% |
97.10% |
94.50% |
0.96 |
60 |
98.10% |
1.90% |
98.40% |
96.80% |
0.98 |
5.6 Results for NASNet-A architecture
Table 6 presents the results of the NASNet-A design without Genetic Algorithm optimization.
Table 6. The results for NASNet-A architecture without GA
Epoch |
Precision |
Loss |
Validation Accuracy |
Recall |
F1 Score |
15 |
80.25% |
18.65% |
91.93% |
78.47% |
0.812 |
30 |
88.42% |
11.22% |
94.26% |
85.03% |
0.885 |
45 |
92.55% |
7.15% |
95.69% |
90.70% |
0.926 |
60 |
94.73% |
5.27% |
96.75% |
92.83% |
0.946 |
When comparing the results of NASNet-A architecture before and after Genetic Algorithm (GA) optimization, an improvement in model performance is observed. Table 6 presents the results before optimization (without GA), indicating a gradual increase in precision and validation accuracy during the training period. After 60 epochs, the precision reaches 94.73% and the validation accuracy reaches 96.75%.
After applying the Genetic Algorithm (GA) to optimize the model, a significant improvement is observed in the results. In Table 7, the results after optimization (with GA) show a notable improvement in precision and validation accuracy, reaching 96.90% in precision and 97.70% in validation accuracy after 60 epochs.
Table 7. The results for NASNet-A architecture with GA
Epoch |
Precision |
Loss |
Validation Accuracy |
Recall |
F1 Score |
15 |
82.40% |
17.60% |
92.30% |
79.60% |
0.84 |
30 |
90.80% |
9.20% |
95.80% |
87.10% |
0.91 |
45 |
94.50% |
5.50% |
96.50% |
92.80% |
0.95 |
60 |
96.90% |
3.10% |
97.70% |
95.20% |
0.97 |
5.7 Results for AmoebaNet-A architecture
The results for the AmoebaNet-A architecture presented in Table 8 and Table 9 demonstrate the impact of Genetic Algorithm (GA) optimization on the model's performance.
In Table 8, the results without GA optimization show that the model achieves promising precision and validation accuracy, gradually improving with each epoch. After 60 epochs, the precision reaches 98.29%, and the validation accuracy reaches 98.01%.
Table 8. The results for AmoebaNet-A architecture without GA
Epoch |
Precision |
Loss |
Validation Accuracy |
Recall |
F1 Score |
15 |
86.89% |
13.11% |
92.41% |
83.23% |
0.869 |
30 |
93.95% |
6.05% |
95.70% |
90.08% |
0.939 |
45 |
97.11% |
2.89% |
97.08% |
93.39% |
0.971 |
60 |
98.29% |
1.71% |
98.01% |
95.42% |
0.983 |
Table 9. The results for AmoebaNet-A architecture with GA
Epoch |
Precision |
Loss |
Validation Accuracy |
Recall |
F1 Score |
15 |
88.33% |
11.67% |
93.73% |
85.53% |
0.883 |
30 |
95.15% |
4.85% |
96.84% |
92.58% |
0.953 |
45 |
98.07% |
1.93% |
98.34% |
95.73% |
0.979 |
60 |
99.23% |
0.77% |
99.13% |
97.92% |
0.993 |
After applying GA optimization to the AmoebaNet-A architecture, the results in Table 9 show a significant improvement in performance.
The precision and validation accuracy both show substantial increases, reaching 99.23% and 99.13% respectively after 60 epochs. This indicates that the GA optimization has effectively enhanced the model's ability to recognize Arabic handwriting.
5.8 The relevance and influence of optimizing Genetic Algorithms
The Genetic Algorithm (GA) was used in our research because to its ability to efficiently improve deep neural network models for the particular objective of restoring ancient Arabic manuscripts. Genetic algorithms (GAs) have exceptional proficiency in the exploration of complicated and high-dimensional parameter spaces, rendering them well-suited for augmenting the performance of models characterized by complex structures and hyper parameters. We used Genetic Algorithms (GAs) to do an automated and methodical exploration for the most effective neural network structures, hyper parameters, and weights. The aforementioned procedure yielded significant enhancements in the precision, loss, validation accuracy, recall, and F1 score of the models, hence augmenting their capacity to effectively identify and recover ancient Arabic handwritten text.
5.9 The impact of CLSR and wiener filter restoration techniques
The following table provide a comparison of the evaluation parameters for both the non-restored text and the restored text using CLSR and Wiener Filter techniques.
The results presented in Table 10 demonstrate the impact of the CLSR+Wiener Filter technique on the evaluation parameters for both non-restored text and restored text.
Table 10. Comparison of evaluation parameters for non-restored text and restored text (CLSR+Wiener filter)
Evaluation Parameter |
EfficientNet-B7 |
NASNet-A |
AmoebaNet-A |
Non-Restored Text |
|||
Precision |
88.99% |
88.75% |
91.10% |
Validation Accuracy |
89.28% |
89.54% |
91.01% |
Recall |
88.67% |
87.06% |
89.77% |
F1 Score |
89.86% |
88.85% |
91.14% |
|
Restored Text (CLSR+Wiener Filter) |
||
Precision |
98.10% |
96.90% |
99.23% |
Validation Accuracy |
98.40% |
97.70% |
99.13% |
Recall |
96.80% |
95.20% |
97.92% |
F1 Score |
98.00% |
97.00% |
99.30% |
For non-restored text, high precision, validation accuracy, recall, and F1 score are achieved by all three architectures (EfficientNet-B7, NASNet-A, and AmoebaNet-A). Nonetheless, according to all measures, AmoebaNet-A is the best-performing architecture among the other designs.
The reason why AmoebaNet-A always outperforms AmoebaNet-B is due to quite possibly the architectural design and implementation of AmoebaNet-A, which may be advantageous to the processing of complex features and patterns in ancient Arabic manuscripts.
This model would be able to preserve and recreate fine features and attributes and probably have aided in its higher effectiveness.
On the other side, restored text using CLSR + Wiener Filter shows improvement in almost all evaluation criteria. The results shown through the graphs tell a lot about the accuracy with which the restoration technique is able to recognize the text as well as its quality. Above all, the precision, validation accuracy, and F1 score of AmoebaNet-A are comparatively the best.
These results indicate the power and effectiveness of the CLSR+Wiener Filter method in enhancing the effectiveness of the restoring and recognizing architectures for Arabic handwritten text. Indeed, the restoration also enables higher precision, validation accuracy, recall, and F1 score that in turn impact the reliability and accuracy of the recognition of the restored text.
There are several limitations in our study that need to be discussed in detail. First of all, data availability remains a problem under this scenario. Although we tried to collect the most diverse selection of ancient Arabic manuscripts, the collection of such a large amount of data with a broad variety of information is impossible which may reduce the completeness of our model. Secondly, training deep models like EfficientNet-B7, NASNet-A, and AmoebaNet-A consumes resources that are resource-related. These demands might range from large amounts of time, processing power, and memory needs, which could make it challenging to reproduce our study for those who lack access to similar resources. Deep models have several limitations, such as being architecture and parameter specific, which can hinder generalization. Changes in design or aspects considered can lead to new errors or provide space for additional optimization.
These limitations imply the research in this field should continue. Researchers should keep in mind such limits and try to overcome and eliminate these limits in order to improve restoration and preservation of old Arabic manuscripts.
A detailed comparison of the proposed approach with other techniques employed in the studies associated with the current topic is provided in the Table 11. The results achieved by these existing methods are compared to the performance of the proposed method, which incorporates the utilization of deep learning models.
Table 11. Comparative evaluation of proposed approach and state-of-the-art methods
Methods |
Year |
Deep Learning Features |
Dataset |
Results |
[37] |
2015 |
Multi-headed Recurrent Neural Network (RNN) |
PAN 2014 |
Higher than 80% AUC |
[38] |
2017 |
Confusion between deep learning and (TFIDF) model |
Four tweet collections from Twitter |
64% Arabic authors identification accuracy |
[39] |
2018 |
Four deep learning models: sentence-level GRU, article-level GRU, article-level LSTM, and article-level Siamese network |
(Reuters, 5000) (Gutenberg,1286) |
Article-level GRU recorded 69.1% and 89.2% accuracy on Reuters and Gutenberg datasets respectively |
[40] |
2018 |
Three methods: 1) Baseline, 2) linear adaptive, and 3) deep adaptive learning |
(CVL, 99513)(IAM, 49625) |
Deep adaptive learning achieved 78.6% and 69.5% top-1 recognition rates, as well as, 93.7% and 86.1% top-5 recognition rates using the CVL and IAM datasets respectively |
[23] |
2020 |
Transfer learning from MobileNetV1-100-244 |
Collected ancient Arabic manuscripts (8638) |
95.59% accuracy |
Transfer learning from ResNetV2-50 |
96.23% accuracy |
|||
Transfer learning from DenseNet201 |
95.83% accuracy |
|||
Transfer learning from VGG19 |
95.91% accuracy |
|||
Proposed method |
2023 |
CLSR+Wiener Filter and GA for EfficientNet-B7 |
Collected ancient Arabic manuscripts (3745) |
98.40% accuracy |
Proposed method |
2023 |
CLSR+Wiener Filter and GA for NASNet-A |
97.70% accuracy |
|
Proposed method |
2023 |
CLSR+Wiener Filter and GA for AmoebaNet-A |
99.13% accuracy |
It is important to note that the primary reason for the limited direct comparison with domestic and foreign similar research is the scarcity of papers specifically addressing image restoration. Most existing research in this domain focuses on text recognition and retrieval rather than the restoration of degraded and noisy images. This makes it challenging to find directly comparable studies. However, our approach demonstrates significant advancements in the restoration of ancient Arabic manuscripts, which is a relatively unexplored area, providing a unique contribution to the field.
Additionally, the differences in datasets used, the number of images, evaluation parameters, and the approach of each paper (recognition, retrieval, or restoration) should be considered. Our research is particularly focused on restoration, a field that has seen less attention compared to recognition and retrieval. This distinction highlights the novelty and importance of our work in the preservation of cultural heritage.
Table 11 illustrates the comparative assessment of the proposed framework against existing techniques in the domain of Arabic manuscript conservation. The above findings are a clear indication that this approach is ideal in terms of performance and effectiveness.
The CLSR and Wiener Filter approaches, in addition to the genetic algorithm (GA), were used in the methodology for improving the performance of EfficientNet-B7, NASNet-A, and AmoebaNet-A. The proposed method surpasses the performance of previous approaches, as evidenced by the validation accuracy rates ranging from 97.70% to 99.13%. This substantial improvement showcases the efficacy of the approach in accurately reconstructing and preserving the cultural heritage embedded in these manuscripts.
By harnessing the power of deep learning and integrating it with advanced image restoration techniques, the challenges posed by degraded and noisy manuscript images have been effectively addressed. The approach employed not only improves the visual quality and legibility of the manuscripts but also offers valuable resources for researchers and scholars in their academic pursuits.
Additionally, it's important to note that the variations in datasets, features, and methods used across these studies could affect the results. However, the consistent high performance of our proposed method indicates its effectiveness in addressing the unique challenges of restoring and preserving ancient Arabic manuscripts.
Finally, this study brings forward a stringent conservation method for the revival of old Arabic documents. Through the implementation of the data augmentation method involving the techniques of Gaussian noise, blurring, contrast and brightness adjustments, and cutout augmentation, together with the optimization of the deep learning architectures using the Genetic Algorithm, the level of accomplishments in the area of manuscript restoration has markedly increased.
These techniques implemented together brought about better detailing, reduction of noise, good photography and easy-to-read restored manuscripts. The colossal database, which is an amalgamation of samples and is highly diverse and representative, forms the basis for deep analysis and thorough investigation of old Arabic documents.
The results point to the richness of cultural heritage and historical significance these manuscripts serve and they call for their protection and restoration urgently. This approach combines the abilities of deep learning models and image restoration techniques not only to provide improved quality to the manuscripts but can also present an invaluable resource for scholars, researchers, and preservationists in their studies and endeavours to preserve these irreplaceable cultural heritages.
The achieved results of 97.70% for NASNet-A, 98.40% for EfficientNet-B7 and 99.13% for AmoebaNet-A confirm the effectiveness of the suggested approach. These data overwhelm the performance of the prevailing methods and make a meaningful contribution to Arabic manuscript restoration.
Besides this, making a stronger statement on the issues is vital. The significance of this investigation is in the fact that it potentially facilitates a huge and all-embracing change in the area of Arabic manuscript renovation. The outcomes of the improved practices are considered more accurate, efficient, and user-friendly. Through this, experts and academics will be able to develop better restoration techniques and understand more meaning of historical pieces and cultural heritage. Besides Arabic manuscripts, there are deeper meanings of this research that can be used in other languages and cultural areas too. Indeed, this well-defined and straightforward impact statement serves as an essential pillar of the research, making it highly relevant both for academia and for society as a whole.
Besides, developing the study's strengths, it should be kept in mind to acknowledge the study's weaknesses. The achievement of the research analyzed in the restoration of Arabic manuscripts is undeniable. However, it is also important to address the constraints great. However, it can encounter hurdles related to the size of the database, and various sources of bias, and may not apply to highly corrupted manuscripts. Recognizing our non-coincidental shortcomings established the scientific credibility of this research and indicated to proceed with being focused on future exploration in the field. It corresponds to an authentic and responsible level of academic research. Listen to the given audio and summarize the key takeaways.
The results and achievements of the aforementioned research raise a few areas for the next study which will certainly contribute to the art and science of restoration of ancient manuscripts from Arabic. Accordingly, the most relevant initial aspect is to examine the most sophisticated restoring methods that could deal with heavily deformed or deteriorated manuscripts and this way could lead to the more appropriate treatment of the most challenging restoring implementations. Furthermore, the combination of deep learning models with other image restoration algorithms or a hybrid manner of variants might achieve the purpose of more precise and quicker restoring images. Apart from these, performing performance studies of different deep learning architectures and optimization settings like Genetic Algorithm can be added to the development of VERY effective and powerful models for manuscript restoration. Thus, raising the amount of the data size and variety, including rare and quite remarkable manuscripts, will increase the likelihood of the models and thus the results of restoration to be the most representative and general. Such future research directions can contribute to going forward in the field and suggestions. Through these suggestions, the civilization of the ancient Arabic book will be preserved and understood.
As for future research, it should be carried out taking into account the fact that some manuscripts are highly deteriorated or destroyed and investigating these cases may be more complicated. The analysis is extended to appraise the efficiency or failures of the ongoing restoration approaches applied in manuscripts of remarkable vintage in which the quality of their texts has been over a long time subjected to wear and tear. Being able to look behind corrugated lines gives a valuable opportunity to come up with proper recovery methods and the most disrupted manuscripts. This way cultural heritage will be protected and kept for future generations.
Similarly, the "Future Research" section lays out the general areas of study to be pursued; thus, positing specific research questions or hypotheses should ensure a more precise direction for the subsequent work. Illustrating this with the example of exploring a chance of applying highly sophisticated restoration methods or joining Deep learning with the technology of image processing- that is a reason to follow it. Issues such as way of optimization of such approaches given peculiar characteristics of different manuscripts, or a concrete group of historical documents which would derive rather a benefit from the aforementioned methodology form the core of a complex enquiry.
Last but not least, this project analysis should be viewed in the broader context of the society as a whole. The use of state-of-the-art technology and deep learning models in the context of Arabic handwriting restoration should make it possible to set a new trend for similar techniques in the work with historical artefacts irrespective of language or cultural backgrounds. Our research beyond manuscript conservation creates part in the domain but also will be potentially effective for the interdisciplinary studies applied in historical research, museum conservation, and digital archiving. Interdisciplinary applications, among others, could probably change the landscape of preservation and transfer knowledge regarding the past at the international level.
With regard to the future research prospects and ancient Arabic manuscripts restoration, it is indisputable that to find a balance between gaining practical outcomes and identifying possible obstacles is an important goal. One of the important challenges here is how the proposed restoration models can be adopted conveniently to cover diverse manuscript conditions and varieties. Although the present research shows a high efficiency and accuracy of the technology, the practical implementation of the technique could bring up unexpected difficulties, as scroll degradation levels, writing styles, and historical periods can vary a lot. Also, the implementation of the models in a realistic environments, like cultural heritage centers or archival quarters, may raise the question of practical, technical and ethical concerns. Availing the restore tools to all the people, considering usability as well as sustainability is the first step of achievement of the restoration goals sustainably. Interplay between researches, technologists, cultural heritage experts, and community participants is indispensable to resolve these concerns and achieve a social usefulness through restoration of ancient manuscripts. Proactive involvement and interdisciplinary collaboration are the key to overcome the barriers to increase the usage of progress restoration technologies in the preservation as well as promotion of cultural heritage for our future generations.
[1] ElAdel, A., Ejbali, R., Zaied, M., Amar, C.B. (2015). Dyadic multi-resolution analysis-based deep learning for Arabic handwritten character classification. In 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), Vietri sul Mare, Italy, pp. 807-812. https://doi.org/10.1109/ictai.2015.119
[2] Najadat, H.M., Alshboul, A.A., Alabed, A.F. (2019). Arabic handwritten characters recognition using convolutional neural network. In 2019 10th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, pp. 147-151. https://doi.org/10.1109/iacs.2019.8809122
[3] Ilyas, B.R., Abderrazak, T.A., Sofiane, B.M., Bahidja, B., Imane, H., Miloud, K. (2023). A robust-facial expressions recognition system using deep learning architectures. In 2023 International Conference on Decision Aid Sciences and Applications (DASA), Annaba, Algeria, pp. 541-546. https://doi.org/10.1109/dasa59624.2023.10286798
[4] Bendjillali, R.I., Beladgham, M., Merit, K. (2019). Face recognition based on DWT feature for CNN. In Proceedings of the 9th International Conference on Information Systems and Technologies, Cairo, Egypt, p. 10. https://doi.org/10.1145/3361570.3361584
[5] Zoph, B., Le, Q.V. (20216). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. https://doi.org/10.48550/arXiv.1611.01578
[6] Zhi, H., Liu, S. (2019). Face recognition based on genetic algorithm. Journal of Visual Communication and Image Representation, 58: 495-502. https://doi.org/10.1016/j.jvcir.2018.12.012
[7] Xie, L., Yuille, A. (2017). Genetic CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 1379-1388. https://doi.org/10.1109/iccv.2017.154
[8] Deborah, H., Arymurthy, A.M. (2010). Image enhancement and image restoration for old document image using genetic algorithm. In 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, Jakarta, Indonesia, pp. 108-112. https://doi.org/10.1109/act.2010.24
[9] Potrus, M.Y., Ngah, U.K., Ahmed, B.S. (2014). An evolutionary harmony search algorithm with dominant point detection for recognition-based segmentation of online Arabic text recognition. Ain Shams Engineering Journal, 5(4): 1129-1139. https://doi.org/10.1016/j.asej.2014.05.003
[10] Khayyat, M.M., Elrefaei, L.A. (2020). Manuscripts image retrieval using deep learning incorporating a variety of fusion levels. IEEE Access, 8: 136460-136486. https://doi.org/10.1109/ACCESS.2020.3010882
[11] Radenović, F., Tolias, G., Chum, O. (2018). Fine-tuning CNN image retrieval with no human annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7): 1655-1668. https://doi.org/10.1109/tpami.2018.2846566
[12] Zhou, W., Jia, J. (2019). A learning framework for shape retrieval based on multilayer perceptrons. Pattern Recognition Letters, 117: 119-130.https://doi.org/10.1016/j.patrec.2018.09.005
[13] Seddati, O., Dupont, S., Mahmoudi, S., Parian, M. (2017). Towards good practices for image retrieval based on CNN features. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, pp. 1246-1255. https://doi.org/10.1109/iccvw.2017.150
[14] Peng, Y., Zhang, J., Ye, Z. (2019). Deep reinforcement learning for image hashing. IEEE Transactions on Multimedia, 22(8): 2061-2073. https://doi.org/10.1109/tmm.2019.2951462
[15] Qian, C., He, T., Zhang, R. (2017). Deep learning based authorship identification. Report, Stanford University, 1-9.
[16] You, R., Zhang, Z., Wang, Z., Dai, S., Mamitsuka, H., Zhu, S. (2019). Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. Advances in Neural Information Processing Systems, 32: 1-17.
[17] Elnagar, A., Al-Debsi, R., Einea, O. (2020). Arabic text classification using deep learning models. Information Processing & Management, 57(1): 102121. https://doi.org/10.1016/j.ipm.2019.102121
[18] Liu, G., Guo, J. (2019). Bidirectional LSTM with attention mechanism and Convolutional layer for text classification. Neurocomputing, 337: 325-338. https://doi.org/10.1016/j.neucom.2019.01.078
[19] Du, J., Gui, L., Xu, R., He, Y. (2018). A Convolutional attention model for text classification. In Natural Language Processing and Chinese Computing: 6th CCF International Conference, NLPCC 2017, Dalian, China, pp. 183-195. https://doi.org/10.1007/978-3-319-73618-1_16
[20] Liu, T., Yu, S., Xu, B., Yin, H. (2018). Recurrent networks with Attention and Convolutional networks for sentence representation and classification. Applied Intelligence, 48: 3797-3806. https://doi.org/10.1007/s10489-018-1176-4
[21] Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480-1489. https://doi.org/10.18653/v1/n16-1174
[22] Gao, S., Ramanathan, A., Tourassi, G. (2018). Hierarchical Convolutional Attention Networks for text classification. In Proceedings of The Third Workshop on Representation Learning for NLP, Melbourne, Australia, pp. 11-23. https://doi.org/10.18653/v1/w18-3002
[23] Khayyat, M.M., Elrefaei, L.A. (2020). Towards author recognition of ancient Arabic manuscripts using deep learning: A transfer learning approach. International Journal of Computing and Digital Systems, 9(5): 1-18. https://doi.org/10.12785/ijcds/090502
[24] Miloud, K., Lakhdar, A.M., Ilyas, B.R. (2021). Arabic handwriting recognition system based on genetic algorithm and deep CNN architectures. In 2021 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, pp. 583-586. https://doi.org/10.1109/dasa53625.2021.9682380
[25] Yoo, J.C., Ahn, C.W. (2014). Image restoration by blind-Wiener filter. IET Image Processing, 8(12): 815-823. https://doi.org/10.1049/iet-ipr.2013.0693
[26] Zhu, J., Wang, Z., Tang, Q. (2023). Image restoration based on wiener filter and constrained least square filter. In 2023 IEEE International Conference on Control, Electronics and Computer Technology (ICCECT), Jilin, China, pp. 565-569. https://doi.org/10.1109/iccect57938.2023.10141265
[27] Lee, S., Kim, J., Kang, H., Kang, D.Y., Park, J. (2021). Genetic algorithm based deep learning neural network structure and hyperparameter optimization. Applied Sciences, 11(2): 744. https://doi.org/10.3390/app11020744
[28] Bendjillali, R.I., Beladgham, M., Merit, K., Taleb-Ahmed, A. (2020). Illumination-robust face recognition based on deep convolutional neural networks architectures. Indonesian Journal of Electrical Engineering and Computer Science, 18(2): 1015-1027. https://doi.org/10.11591/ijeecs.v18.i2.pp1015-1027
[29] Ilyas, B.R., Mohammed, B., Khaled, M., Miloud, K. (2019). Enhanced face recognition system based on deep CNN. In 2019 6th International Conference on Image and Signal Processing and their Applications (ISPA), Mostaganem, Algeria, pp. 1-6. https://doi.org/10.1109/ispa48434.2019.8966797
[30] Ilyas, B.R., Mohammed, B., Khaled, M., Ahmed, A.T., Ihsen, A. (2019). Facial expression recognition based on DWT feature for deep CNN. In 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, pp. 344-348. https://doi.org/10.1109/codit.2019.8820410
[31] Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 8697-8710. https://doi.org/10.1109/cvpr.2018.00907
[32] Real, E., Aggarwal, A., Huang, Y., Le, Q.V. (2019). Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, 33(1): 4780-4789. https://doi.org/10.1609/aaai.v33i01.33014780
[33] Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, D., Chen, M., Lee, H., Ngiam, J., Le, Q.V., Wu, Y., Chen, Z. (2019). Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in Neural Information Processing Systems, 32.
[34] Bendjillali, R.I., Bendelhoum, M.S., Tadjeddine, A.A., Kamline, M. (2023). Deep learning-powered beamforming for 5G massive MIMO systems. Journal of Telecommunications and Information Technology, 4: 38-45. https://doi.org/10.26636/jtit.2023.4.1332
[35] Ilyas, B.R., Mohammed, B., Khaled, M., Ahmed, A.T. (2018). Wavelet-Based facial recognition. In 2018 6th International Conference on Control Engineering & Information Technology (CEIT), Istanbul, Turkey, pp. 1-6. https://doi.org/10.1109/ceit.2018.8751751
[36] Bendjillali, R.I., Beladgham, M., Merit, K., Taleb-Ahmed, A. (2019). Improved facial expression recognition based on DWT feature for deep CNN. Electronics, 8(3): 324. https://doi.org/10.3390/electronics8030324
[37] Bagnall, D. (2015). Author identification using multi-headed recurrent neural networks. arXivPreprint arXiv:1506.04891. https://doi.org/10.48550/arXiv.1506.04891
[38] Schaetti, N. (2017). UniNE at CLEF 2017: TF-IDF and Deep-Learning for author profiling: Notebook for PAN at CLEF 2017. In CEUR Workshop Proceedings, pp. 1-11.
[39] Qian, C., He, T., Zhang, R. (2017). Deep learning based authorship identification. Report, Stanford University, 1-9.
[40] He, S., Schomaker, L. (2019). Deep adaptive learning for writer identification based on single handwritten word images. Pattern Recognition, 88: 64-74. https://doi.org/10.1016/j.patcog.2018.11.003