JOURNAL METRICS

CiteScore 2022: 2.8 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2022: 0.299 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2022: 0.665 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

qqtu_pian_20240428144739.png

Assessing Semantic Similarity Measures and Proposing a WuP-Resnik Hybrid Metric for Enhanced Arabic Language Processing

Tahar Dilekh^* | Mohamed Abderrahmen Boulahia | Saber Benharzallah

Computer Science Department, University of Batna 2, Fesdis, Batna 05078, Algeria

LAMIE Laboratory, University of Batna 2, Fesdis, Batna 05078, Algeria

Corresponding Author Email:

tahar.dilekh@univ-batna2.dz

Received:

7 May 2023

Revised:

20 September 2023

Accepted:

26 September 2023

Available online:

31 October 2023

| Citation

ria_37.05_24.pdf

OPEN ACCESS

Abstract:

The accurate quantification of semantic similarity among Arabic words presents a significant challenge in Natural Language Processing (NLP). This is a critical aspect for a wide array of text-centric applications, including recommendation systems, plagiarism detection, and information retrieval. Enhanced performance in searches and classification is achieved by simplifying concepts within machine processing and unifying words with close meanings. This research investigates the complexities of measuring semantic similarity in Arabic, a language with distinct features such as the absence of short vowels in written text that renders distinguishing words without vowel diacritics challenging for computing systems. The effectiveness of various semantic similarity metrics is meticulously evaluated in this study, with a specific focus on their applicability to Arabic WordNet and English WordNet. The challenges associated with using Arabic WordNet for measuring word similarity are illuminated, and an innovative metric, integrating the Wu-Palmer and Resnik metrics, is proposed to enhance result accuracy. The primary accomplishment of this research resides in the identification of an optimal semantic similarity metric with a reduced error rate, thereby boosting the precision of results in NLP. This pivotal advancement paves the way for more accurate semantic assessments and improved performance across a broad spectrum of applications.

Keywords:

semantic similarity measures, ontologies, WordNet (WN), Arabic WordNet (AWN), hybrid measures, WuP measures, Resnik measures

1. Introduction

Measuring semantic similarity between words lies at the heart of natural language understanding, holding profound implications for various applications in the field of Natural Language Processing (NLP). The pursuit of a precise semantic similarity measure with minimal error rates constitutes a fundamental quest in this domain. This endeavor is not merely an academic exercise; it carries substantial significance for practical applications and the advancement of research. In this introduction, we delve into the motivations behind this research and the unique challenges posed by the Arabic language in this context.

The importance of determining the most accurate semantic similarity measure cannot be overstated for several compelling reasons. Firstly, a measure that exhibits a low error rate provides results that align closely with human assessments, thereby enhancing the precision of NLP tasks. This heightened precision is particularly pivotal in applications where accurate semantic similarity measurements are imperative. For instance, a text classification application that is predicated on the semantic expansion approach. This approach is rooted in knowledge and involves a thorough investigation of various linguistic attributes, including the morphological, semantic, and syntactic relationships of query terms. The approach adopted here involves the replacement of query terms with words that possess similar contextual meanings, which leads to organize textual data into categories based on their similarities [1, 2].

Furthermore, lower error rates signify a reduction in the ambiguity surrounding semantic similarity measurements. A precise measure can effectively distinguish between words or concepts that are similar and those that are dissimilar. This reduction in ambiguity translates to fewer false positives and negatives, significantly impacting the reliability of NLP systems.

The significance extends to the broader realm of research. A semantic similarity measure with a low error rate can serve as a benchmark, facilitating consistent and comparable results across different NLP studies. Researchers can confidently build upon and refine their work, fostering advancements in the field.

In real-world applications such as plagiarism detection and question-answering systems, precision is paramount. Deploying a measure with a low error rate instills confidence in these applications, where accuracy directly translates into tangible benefits.

However, despite the universal need for accurate semantic similarity measures, developing reliable metrics remains a formidable challenge, particularly for languages like Arabic. The Arabic language introduces unique complexities into the realm of NLP. One such challenge is the absence of short vowels, known as "chakla," in written texts. These short vowels, which are not part of the alphabet, play a vital role in distinguishing between words with different meanings that are otherwise spelled identically. For example, the Arabic words 'عَلَم,' 'عِلْم,' 'عُلِمَ,' and 'عَلَم' all share the same written form "علم" but convey distinct meanings.

In light of these challenges and the overarching need for precise semantic similarity measures, this research embarks on a comprehensive investigation. Our primary objective is to identify the most accurate semantic similarity measure while minimizing error rates. This investigation encompasses a thorough evaluation of various semantic similarity metrics applied to both Arabic WordNet (AWN) and English WordNet (WN). Importantly, we acknowledge and address the inherent disparities between these linguistic resources.

Additionally, this research introduces an innovative hybrid measure, drawing inspiration from the WuP and Resnik metrics. The aim is to potentially surpass the performance of existing metrics, especially within the specific dataset employed in this study. Our research endeavors to provide a deep analysis of diverse semantic similarity measures, elucidating their strengths and limitations. We underscore the critical importance of using the correct synset, emphasizing its role in generating relevant and realistic results.

The remainder of this paper is organized as follows: We commence by delving into the intricacies of calculating semantic similarity for Arabic words and contrasting the structures of WN and AWN. Subsequently, we provide an overview of prior research in the realm of semantic similarity measures based on WN/AWN. Our experimental case study follows, wherein we conduct a rigorous comparative evaluation of various semantic similarity measures, assessing their suitability for use in both WN and AWN. Finally, we conclude by discussing the implications of our research and charting avenues for future exploration.

In summation, this research paper contributes valuable insights into the domain of semantic similarity measures and their applicability in the realm of NLP. Our innovative hybrid measure holds the potential to significantly enhance the accuracy of various applications, including information retrieval and text classification, and presents exciting opportunities for further investigation in diverse languages and domains.

2. Related Works of Semantic Similarity Measures Based on WN/AWN

The ontology is a formal representation of knowledge within a domain and the relationships between its concepts. Ontology is used in various fields such as the Semantic Web, Artificial Intelligence, and Biomedical Informatics, and is a useful tool for measuring semantic similarity between words. It provides a shared language for modeling a domain and offers important information that cannot be obtained from simple dictionaries. Ontology refers to the collection of concepts that are utilized to describe and represent a specific domain [3]. Gruber defines ontology as an explicit specification of a conceptualization [4], while computer science defines it as a formal representation of knowledge in a hierarchical way [5].

There are several ontologies available, and WN is one of them, a widely used lexical database for knowledge-based semantic similarity methods in computational linguistics and natural language processing. WN is primarily based on synonyms, with different synsets attributed to words with different meanings, and it organizes nouns, verbs, adverbs, and adjectives into semantic relations, represented as a hierarchical structure. WN provides four commonly used semantic relations for nouns: hyponym/hypernym, part meronym/part holonym, member meronym/member, and holonym. The most common relation is the hyponym/hypernym (is-a) relation, which accounts for close to 80% of the relations [6-9].

WN organizes concepts into hierarchy way Which shows the relations and the type of these relations between the different concepts, Figure 1.

AWN is a resource for Modern Standard Arabic that is designed based on Princeton WordNet (PWN) and Euro WordNet (EWN) [10]. AWN is mapped onto the Suggested Upper Merged Ontology (SUMO) [10], which is a formal ontology that contains about 1000 Ontology Concepts, 4000 Ontology Axioms and 750 Ontology Rules [11]. AWN contains four entity types: item, word, form, and link. There is a significant addition in the number of words between version 2.0 and version 2.0.1 of AWN, which may affect the results of similarity measurement [10].

image003.png

Figure 1. Sample hierarchical structure of fragment of WordNet

The challenge associated with the utilization of AWN primarily revolves around its limitations. AWN is relatively constrained, which is particularly notable given the vast lexical diversity within the Arabic language, surpassing that of many other languages. Additionally, the intricacies inherent in accurately pinpointing the appropriate synset for certain words compound the challenge. While methodologies exist for the identification of suitable synsets, the inherent complexity of distinguishing between closely related synsets remains a formidable obstacle for automated processes, often necessitating human intervention for resolution.

image004.png

Figure 2. The different path between the synsets of the noun "Means"

Table 1. Summary of related works on semantic similarity measures based on WN/AWN

Category	Measures	Principle	Advantages	Disadvantages
Path based	The Shortest Path [12, 13], PATH measure [14]	Based only on the distance between two concepts	Simple to implement	The similarity of two pairs with equal distance between two concepts will be the same.
	Wu and Palmer [15], Almarsoomi et al. [16], Leakcock and Chodorow [17]	Based on the depths of concept and the length	Simple to implement	The similarity of two pairs with equal depths and length will be the same.
	Li et al. [18]	Based on the assumption that information sources are infinite to some extend	Simple to implement
IC based [9, 19-25]	Resnik [21]	Based on the information content of lso	Simple to implement	The similarity of two pairs with the same lso will be the same similarity
	Jiang and Conrath [26], Lin [27]	Based on the information content of concepts and their lso	Consider the IC of the concepts that are being compared	The similarity of two pairs with the same sum of IC(c1) and IC(c2) will be the same.
	Lord et al. [28]	Based on the IC value given by Resnik	simple to implement	The similarity of two pairs with the same lso will be the same similarity
	Seco et al. [29]	Apply a linear transformation to the jiang &Conrath formula	Consider the IC of the concepts that are being compared	two pairs with the same information content that uses hyponymy will have the same similarity
	Saruladha et al. [22]	Consider hyponymy and meronomy of concepts	Provides an independent solution to the sparse data problem that is prevalent in corpus.	two pairs with the same information content that uses hyponymy and meronomy will have the same similarity
	Seddiqui and Aono [24]	considers the relation of properties, property function, and restrictions	Consider the relation of properties	The similarity of two pairs with the same sum of IC(c1) and IC(c2) and the same lso will be the same.
	Meng and Gu [30]	based on Lin’s method	consider the IC of the concepts that are being compared	The similarity of two pairs with the same sum of IC(c1) and IC(c2) will be the same.
Feature-based [9, 31]	Tversky [32], Ezzikouri et al. [33]	Features shared by a subclass and its superclass contribute more to the similarity evaluation than features shared in the opposite direction.	Consider the features of the concept.	Issue of missing glosses in most of ontologies and problem of computational complexity.
	Lesk [34]	The overlap between the corresponding definitions of two concepts, be used to measure the relatedness between two concepts	Can be used in conjunction with any dictionary	Issue of missing glosses in most of ontologies and problem of computational complexity.
	Patwardhan and Pedersen [35], Patwardhan [36]	Uses context vectors to combine the glosses content of taxonomic concepts	Combine the glosses content of taxonomic concepts with statistical data extracted from the corpus
	Merhbene et al. [37]	Modified the Lesk algorithm by using different semantic similarity measures.	Resolve the issue of missing glosses
	Jiang et al. [38]	The similarity value is increased by common features, while the similarity value is decreased by non-common features	Determining semantic similarity based on the glosses of Wikipedia concepts.
	Ezzikouri et al. [33]	The similarity value is increased by common features	Consider the features of the concept
Hybrid method	Zhou et al. [39]	Based on lengths between and IC of concept	Based on various information from different categories	Depends on the categories that are combined.
Hybrid method	Aldiery [19]	Combine multiple information sources	Based on various information from different categories	Depends on the categories that are combined.

The problem of using Arabic WordNet is that we can’t tell the difference between the synset ‘wasiylap_n2AR’ and the synset ‘wasiylap_n1AR’ because they are in the same path and the synset ‘wasiylap_n2AR’ is straight down the synset ‘wasiylap_n1AR’ which means if we choose the synset ‘wasiylap_n2AR’ we will get the depth increased by 1 and the distance between other synset also increased by 1 leading us to different result by the measure of similarity, as shown in Figure 2. Which made it necessary to use WN instead of AWN in many cases.

WN is regarded as a valuable resource for identifying semantic similarity between two words due to its organization of words according to lexical relationships. Numerous similarity measures utilizing WN have been suggested.

Generally, the conventional similarity measures fall into four categories, which include path-based measures, information content-based measures, feature-based measures, and hybrid measures.

In Table 1, we aimed to summarize the most relevant works on semantic similarity measures based on WN/AWN, highlighting the advantages and limitations of each method.

3. Experimental Case Study

The purpose of this section is to conduct a comparative evaluation of different measures of semantic similarity and assess their suitability for use in both WN and AWN. In this comparison, we will consider the differences between WN and AWN. The chosen measures will then be applied to an Arabic dataset and their results will be analyzed. The aim is to identify the measure that exhibits the best performance for automated tasks.

3.1 Arabic dataset benchmark

The dataset has been used in this study is the Arabic dataset benchmark created by Faza et al. [40]. The dataset was created following the same methods as the English dataset benchmarks for semantic similarity. The dataset was created in two stages. In the first stage, the Arabic word pairs set was selected and translated into Arabic using an English-Arabic dictionary. Two sets of Arabic noun pairs ranging from high similarity of meaning (HSM) to medium similarity of meaning (MSM) and low similarity were generated. In the second stage, the human similarity rate for word pairs was specified. 60 participants from various Arabic countries were asked to rank the set of 70 Arabic word pairs gathered in the first stage in order of importance. They ranked each word pair on a five-point scale ranging from 0.0 (unrelated) to 4.0 (the same). The dataset is important for evaluating semantic similarity between Arabic words.

Table 2 presents a sample of the data benchmark used in our study.

Table 2. A snippet of the benchmark dataset

N	English Word Pairs		Human Ratings	Arabic Word Pairs
01	Coast	Endorsement	0.03	تصديق	ساحل
02	Noon	String	0.03	خيط	ظهر
03	Cushion	Diamond	0.06	الماس	مسند
04	Gem	Pillow	0.07	مخدة	جوهرة
05	Stove	Walk	0.07	مشي	موقد
06	Cord	Midday	0.08	ظهيرة	حبل
07	Signature	String	0.08	خيط	توقيع
08	Boy	Endorsement	0.12	تصديق	صبي
09	Boy	Midday	0.16	ظهيرة	صبي
10	Slave	Vegetable	0.16	خضار	عبد
11	Smile	Village	0.18	قرية	ابتسامة
12	Smile	Pigeon	0.20	حمامة	ابتسامة
13	Wizard	Infirmary	0.22	مشفى	ساحر
14	Noon	Fasting	0.29	صيام	ظهر

The column labeled "N" is represented by the following abbreviation: The numbering of the word.

3.2 Applicable measures on AWN

As mentioned earlier, most semantic similarity measures, including feature-based measures and corpus-dependent information content measures cannot be used on AWN due to its limitations. Table 3 illustrates the reasons why certain common semantic similarity measures are not applicable to AWN.

Table 3. Assessing the suitability of traditional semantic measures on AWN

Category	Measures	Applicable on AWN	Justifications
Path based	All Path Based Measures	Yes	AWN provides a path information source.
Path based	Li	Yes	Non-linear function of the shortest path and depth of lso which are available in AWN
IC based	All current IC based Measure except Meng	No	Finding Arabic word dependent frequency with diacritics is difficult since data is few.
IC based	M	Yes	Depth of LCS and max depth which are available in AWN
Feature-based	MF	Limited	Issue of missing glosses in most of ontologies and problem of computational complexity.
Feature-based	Zo	Yes	It uses different semantic similarity measures to replace Lesk's original measure to find the gloss that corresponds to the correct sense of the ambiguous word.
Hybrid method	Zh, GA	Yes	Consists of a combination of applicable measures on AWN

"Measure" is represented by the following abbreviation: SP: The Shortest Path, PM: PATH measure, WP: Wu & Palmer’s, F: Faaza, LC: Leakcock &Chodorow, Li: Li, R: Resnik, J: Jiang, L: Lin, M: Meng, MF: Most of feature-based measures, Zo: Zouaghi, Zh: Zhou and GA: Ghandi Aldiery.

3.3 Choosing synsets in AWN

We discussed the difficulty of selecting synsets in AWN, which led us to use two methods for selecting synsets: automatic selection by the program and manual selection. To ensure accurate results, we primarily chose synsets automatically, but some were manually corrected. In all comparisons, the same synset was chosen for each term to eliminate any variations that may arise from using different synsets. We specifically selected the synset of a concept instead of its synonyms. It is possible that our results may differ from those of other studies because we prioritized the selection of specific synsets over the general ones.

In AWN's file, each synset has a unique name, but some synsets may have multiple values, making it difficult to locate the correct synset for a given concept. For example, in the Figure 3, if we search for the word "سفر", we may find multiple synsets, and while one may be correct, the other may provide a better similarity score. In our study, we present two results: one represents the best possible outcome, while the other represents the most realistic outcome for automated work, given the lack of a reliable method for determining the best synset.

Table 4. The selected synset for the first experiments (using the best possible synsets)

N	Word 1	Word 2	Synset (w1)	Synset (w2)	Human Rating	Depth (lso)	Depth (w1)	Depth (w2)	Len (w1, w2)
01	تَصْدِيق	سَاحِل	taSodiyq_n2AR	$aATi}_AlbaHor_n1AR	0.01	0	4	1	0
02	خَيْط	ظُهْر	xaATa_v1AR	<irotafaEa_v3AR	0.01	0	4	3	0
03	مَشْي	موقد	ma$oy_n1AR	Non	0.01
04	خُضَار	عَبْد	xuDar_n1AR	xaAdim_n1AR	0.04	1	7	5	10
05	قَرْيَة	بَسْمَة	qaroyap_n1AR	basomap_n1AR	0.05	0	5	8	0
06	مشفى	ساحر	Non	Non	0.06
07	حَمَامَة	تَلّ	HamaAm_n1AR	rukaAm_n1AR	0.08	2	11	6	13
08	الْماس	كَأْس	AlomAs_n1AR	kuwb_n1AR	0.09	1	7	7	12
09	جَبَل	حَبْل	jabal_n1AR	laq~aHa_v1AR	0.13	0	1	3	0
10	شَاطِئ	غابَة	$aATi}_AlbaHor_n1AR	gaAbap_n1AR	0.21	0	1	4	0
11	شَيْخ	ضَرِيح	qabor_n1AR	ra}iyos_n1AR	0.22	1	5	5	8
12	مِخَدَّة	أَدَاة	wisaAdap_n1AR	>aadaAp_n1AR	0.25	4	6	7	5
13	جَبَل	سَاحِل	jabal_n1AR	$aATi}_AlbaHor_n1AR	0.27	0	1	1	0
14	قَدَح	أَدَاة	kuwb_n1AR	>daAp_n1AR	0.33	5	7	6	3
15	شَاطِئ	رِحْلَة	$aATi}_AlbaHor_n1AR	Harakap_n1AR	0.37	0	1	4	0
16	سَفَر	حَافِلَة	taHar~uk_n1AR	HaAfilap_n1AR	0.4	0	5	8	0
17	فرن	طعام	Non	TaEaAm_n3AR	0.44
18	صِيَام	عِيد	Sawom_n1AR	<iHotifaAl_n1AR	0.49	2	6	4	6
19	وَسِيلَة	حَافِلَة	wasiylap_n1AR	HaAfilap_n1AR	0.52	5	6	8	4
20	أخْت	فَتَاة	>ax_n1AR	fataAp_n1AR	0.6	3	5	6	5
21	جبل	تَلّ	jabal_n1AR	rukaAm_n1AR	0.65	0	1	6	0
22	شَيْخ	سَيِّد	ra}iyos_n1AR	say~id_n1AR	0.67	3	5	6	5
23	خُضَار	طَعام	xuDaAr_n1AR	TaEaAm_n3AR	0.69	4	6	4	2
24	جَاْرِيَة	عَبْد	xaAdim_n1AR	Eabod_n1AR	0.71	3	5	4	3
25	مَشْي	جَرْي	ma$oy_n1AR	jaroy_n1AR	0.75	5	6	6	2
26	خَيْط	حَبْل	gazol_n1AR	Habol_n1AR	0.77	6	7	6	1
27	أَحْرَاش	غابَة	dagol_n1AR	dagol_n1AR	0.79	5	5	5	0
28	مِخَدَّة	مِسْنَد	wisaAdap_n1AR	wisaAdap_n1AR	0.85	6	6	6	0
29	قَرْيَة	رِيف	riyf_n1AR	riyf_n1AR	0.85	5	5	5	0
30	شَاطِئ	سَاحِل	$aATi}_AlbaHor_n1AR	$aATi}_AlbaHor_n1AR	0.89	0	1	1	0
31	وَسِيلَة	أَدَاة	wasiyolap_n1AR	>adaAp_n1AR	0.92	6	6	7	1
32	فَتَى	صَبِيّ	muraAhiq_n1AR	muraAhiq_n1AR	0.93	5	5	5	0
33	قَبْر	ضَرِيح	qabor_n1AR	qabor_n1AR	0.94	5	5	5	0
34	مشعوذ	ساحر	Non	Non	0.94
35	قَدَح	كَأْس	kuwb_n1AR	kuwb_n1AR	0.95	7	7	7	0

image005.png

Figure 3. A snippet of AWN's xml file

3.4 Using conventional measures with AWN and WN

The aim of these experiments is to compare several widely used measures that can be applied to both AWN and WN. The purpose is to identify the WN and measure combination that produces superior outcomes.

3.4.1 Experiment 1: Comparison of measures on AWN with synset selection for optimal results

Selecting synset. In this comparison, we only considered synsets that yield superior outcomes, regardless of whether they are the correct synsets or not.

Table 4 reveals that numerous synsets are missing in AWN, such as "مشعوذ", "مشفى", and "موقد". Moreover, several pairs yield a similarity score of 1, indicating that both terms belong to the same synset, and some terms have no synset at all. However, as mentioned earlier, selecting synsets in this manner may lead to many errors. For instance, in pair number 4, the synset "xaAdim_n1AR" was selected for the concept "عبد", but in pair 24, the synset "Eabod_n1AR" was selected for the same concept. Although this approach may improve results, it lacks a synset selection methodology. In this experiment, we chose the synset that results in a smaller error based on the provided data.

Result. We evaluated six measures, as shown Table 5.

Table 5. Result of the first experiment (applying similarity measures on AWN using the best possible synsets)

N	HR	SP	W	Li	F	A	LC
01	0.01	0.00	0.00	0.00	0.00	0.00	0.00
02	0.01	0.00	0.00	0.00	0.00	0.00	0.00
03	0.01	--	--	--	--	--	--
04	0.04	0.67	0.17	0.07	0.05	0.20	0.78
05	0.05	0.00	0.00	0.00	0.00	0.00	0.00
06	0.06	--	--	--	--	--	--
07	0.08	0.57	0.24	0.06	0.05	0.34	0.66
08	0.09	0.60	0.14	0.05	0.03	0.17	0.70
09	0.13	0.00	0.00	0.00	0.00	0.00	0.00
10	0.21	0.00	0.00	0.00	0.00	0.00	0.00
11	0.22	0.73	0.20	0.11	0.06	0.24	0.88
12	0.25	0.83	0.62	0.36	0.33	0.71	1.08
13	0.27	0.00	0.00	0.00	0.00	0.00	0.00
14	0.33	0.90	0.77	0.55	0.51	0.81	1.30
15	0.37	0.00	0.00	0.00	0.00	0.00	0.00
16	0.4	0.00	0.00	0.00	0.00	0.00	0.00
17	0.44	--	--	--	--	--	--
18	0.49	0.80	0.40	0.25	0.17	0.50	1.00
19	0.52	0.87	0.71	0.45	0.43	0.78	1.18
20	0.6	0.83	0.55	0.35	0.27	0.65	1.08
21	0.65	0.00	0.00	0.00	0.00	0.00	0.00
22	0.67	0.83	0.55	0.35	0.27	0.65	1.08
23	0.69	0.93	0.80	0.66	0.53	0.83	1.48
24	0.71	0.90	0.67	0.52	0.37	0.73	1.30
25	0.75	0.93	0.83	0.67	0.60	0.85	1.48
26	0.77	0.97	0.92	0.82	0.75	0.92	1.78
27	0.79	1.00	1.00	1.00	1.00	1.00	1.49
28	0.85	1.00	1.00	1.00	1.00	1.00	1.49
29	0.85	1.00	1.00	1.00	1.00	1.00	1.49
30	0.89	1.00	1.00	1.00	1.00	1.00	1.49
31	0.92	0.97	0.92	0.82	0.75	0.92	1.78
32	0.93	1.00	1.00	1.00	1.00	1.00	1.49
33	0.94	1.00	1.00	1.00	1.00	1.00	1.49
34	0.94	--	--	--	--	--	--
35	0.95	1.00	1.00	1.00	1.00	1.00	1.49

The first row is represented by the following abbreviations: N: the numbering of the word, HN: Human rating, SP: Shortest path, W: WuP measure, Li: Li measure, F: faza, A: Aldieryand LC: Leakcock & Chodorow.

The Table 6 shows the correlation and mean squared error for each measure in this comparison.

qqtu_pian_20231108160156.png

Figure 4. The correlation between human ratings and similarity measures scores in AWN experiment using the best possible synsets

Among the six measures utilized, the li measure had the highest correlation value, as shown in Figure 4, and the lowest mean squared error, indicating that it generated better similarity scores. It is unsurprising that the shortest path had the lowest correlation because it only takes into account the distance between concepts.

Table 6. The correlation and MSE of the measures in the first experiment

Measures	Correlation	MSE
Shortest path	0.69	0.104
Wup measure	0.84	0.046
Li measure	0.87	0.042
Faza measure	0.86	0.051
Aldiery	0.83	0.052
Leakcock & Chodorow	0.76	0.086

3.4.2 Experiment 2: Comparison of measures on AWN with synset selection of correct synsets

Selecting synset. For this experiment, we only used the right synsets for each concept, as shown in Table 7, even if they do not produce the best results, so the correlation value decreased in this experiment.

Table 7. The selected synset for the second experiments (Right synsets)

N	HR	Synset (w1)	Synset (w2)
01	0.01	taSodiyq_n2AR	$aATi}_AlbaHor_n1AR
02	0.01	xaATa_v1AR	<irotafaEa_v3AR
03	0.01	ma$oy_n1AR	Non
04	0.04	xuDar_n1AR	Eabod_n1AR
05	0.05	qaroyap_n1AR	basomap_n1AR
06	0.06	Non	Non
07	0.08	HamaAm_n1AR	rukaAm_n1AR
08	0.09	AlomAs_n1AR	kuwb_n1AR
09	0.13	jabal_n1AR	laq~aHa_v1AR
10	0.21	$aATi}_AlbaHor_n1AR	gaAbap_n1AR
11	0.22	qabor_n1AR	ra}iyos_n1AR
12	0.25	wisaAdap_n1AR	>aadaAp_n1AR
13	0.27	jabal_n1AR	$aATi}_AlbaHor_n1AR
14	0.33	kuwb_n1AR	>daAp_n1AR
15	0.37	$aATi}_AlbaHor_n1AR	Harakap_n1AR
16	0.4	safar_n1AR	HaAfilap_n1AR
17	0.44	Non	TaEaAm_n3AR
18	0.49	Sawom_n1AR	<iHotifaAl_n1AR
19	0.52	wasiylap_n1AR	HaAfilap_n1AR
20	0.6	>ax_n1AR	fataAp_n1AR
21	0.65	jabal_n1AR	rukaAm_n1AR
22	0.67	ra}iyos_n1AR	say~id_n1AR
23	0.69	xuDaAr_n1AR	TaEaAm_n3AR
24	0.71	xaAdim_n1AR	Eabod_n1AR
25	0.75	ma$oy_n1AR	jaroy_n1AR
26	0.77	xayoT_n1AR	Habol_n1AR
27	0.79	dagol_n1AR	gaAbap_n1AR
28	0.85	wisaAdap_n1AR	wisaAdap_n1AR
29	0.85	qaroyap_n1AR	riyf_n1AR
30	0.89	$aATi}_AlbaHor_n1AR	$aATi}_AlbaHor_n1AR
31	0.92	wasiyolap_n1AR	>adaAp_n1AR
32	0.93	muraAhiq_n1AR	Sabiy~_n1AR
33	0.94	qabor_n1AR	qabor_n1AR
34	0.94	Non	Non
35	0.95	kuwb_n1AR	kuwb_n1AR

The column labeled "N" is the abbreviation of the numbering of the word, and "HR" is the abbreviation of Human Rating

Result. Table 8 indicates that numerous synsets, such as the term "قدح" which shares the same synset as "كأس", are still missing in AWN. Additionally, a noteworthy observation is the significant contrast in the degree of similarity between "قرية" and "ريف"; in the previous experiment, their similarity score was 1, but in this experiment, it dropped to zero since their correct synsets do not belong to the same hierarchy in AWN.

Table 8. Result of the second experiment (applying similarity measures on AWN using the correct synsets)

N	HR	SP	W	Li	F	A	LC
01	0.01	0.00	0.00	0.00	0.00	0.00	0.00
02	0.01	0.00	0.00	0.00	0.00	0.00	0.00
03	0.01	--	--	--	--	--	--
04	0.04	0.70	0.18	0.09	0.05	0.22	0.82
05	0.05	0.00	0.00	0.00	0.00	0.00	0.00
06	0.06	--	--	--	--	--	--
07	0.08	0.57	0.24	0.06	0.05	0.34	0.66
08	0.09	0.60	0.14	0.05	0.03	0.17	0.70
09	0.13	0.00	0.00	0.00	0.00	0.00	0.00
10	0.21	0.00	0.00	0.00	0.00	0.00	0.00
11	0.22	0.73	0.20	0.11	0.06	0.24	0.88
12	0.25	0.83	0.62	0.36	0.33	0.71	1.08
13	0.27	0.00	0.00	0.00	0.00	0.00	0.00
14	0.33	0.90	0.77	0.55	0.51	0.81	1.30
15	0.37	0.00	0.00	0.00	0.00	0.00	0.00
16	0.4	0.00	0.00	0.00	0.00	0.00	0.00
17	0.44	--	--	--	--	--	--
18	0.49	0.80	0.40	0.25	0.17	0.50	1.00
19	0.52	0.87	0.71	0.45	0.43	0.78	1.18
20	0.6	0.83	0.55	0.35	0.27	0.65	1.08
21	0.65	0.00	0.00	0.00	0.00	0.00	0.00
22	0.67	0.83	0.55	0.35	0.27	0.65	1.08
23	0.69	0.93	0.80	0.66	0.53	0.83	1.48
24	0.71	0.90	0.67	0.52	0.37	0.73	1.30
25	0.75	0.93	0.83	0.67	0.60	0.85	1.48
26	0.77	0.87	0.67	0.44	0.38	0.74	1.18
27	0.79	0.77	0.22	0.13	0.07	0.27	0.93
28	0.85	1.00	1.00	1.00	1.00	1.00	1.49
29	0.85	0.00	0.00	0.00	0.00	0.00	0.00
30	0.89	1.00	1.00	1.00	1.00	1.00	1.49
31	0.92	0.97	0.92	0.82	0.75	0.92	1.78
32	0.93	0.97	0.89	0.81	0.62	0.89	1.78
33	0.94	1.00	1.00	1.00	1.00	1.00	1.49
34	0.94	--	--	--	--	--	--
35	0.95	1.00	1.00	1.00	1.00	1.00	1.49

We observe a more substantial discrepancy between the human ratings and the measurement outcomes in this experiment compared to the previous one, indicating an elevated MSE and a reduction in correlation. Table 9 demonstrates that the ranking of similarity measures is not significantly different from the previous experiment, as the li measure continues to yield the most favorable results in terms of both correlation and MSE.

We can observe that all similarity measures exhibit a decrease in correlation and an increase in MSE, highlighting the impact of synset selection on achieving accurate results. The Table 8 presents a more realistic outcome for automated similarity measures, providing a clearer picture of the extent of variation between each measure (Figure 5).

Table 9. The correlation and MSE of the measures in the second experiment

Measures	Correlation	MSE
Shortest path	0.58	0.126
Wup measure	0.72	0.077
Li measure	0.75	0.081
Faza measure	0.73	0.097
Aldiery	0.70	0.082
Leakcock & Chodorow	0.66	0.105

qqtu_pian_20231108160204.png

Figure 5. The correlation between human ratings and similarity measures scores in AWN experiment using the correct synsets

3.4.3 Experiment 3: Comparison of traditional measures on WN

Selecting synset. We utilized the NLP library that incorporates the English WN to pick the synset in the WN. It is noteworthy that in the English WN, finding the synset of a concept was more straightforward since it has a greater number of synsets than the AWN, and it provides a more precise definition of synsets because the term is accessible in both Arabic and English, thereby mitigating the issue of homonyms (Table 10).

Result. We compared five different similarity measures from various categories on the English WN, including Path, Wu & Palmer's, Leacock & Chodorow, Resnik, and Lin measures. Table 11 presents a comparison of the most widely used similarity measures in WN, with WuP, Path, and Leacock & Chodorow belonging to the path category and Resnik and Lin to the information content - corpus dependent category. This category was not applicable in AWN due to the unavailability of the required corpus in Arabic. Upon initial observation, we noticed that the English WN includes all the necessary concepts for this experiment, and each word has its synset, unlike AWN, where the issue of synset deficiency arose.

Table 10. The selected synset for the third comparison (synsets in English WN)

N	Arabic Word 1	Arabic Word 2	English Word 1	English Word 2	Synset Word 1	Synset Word 2	Human Rating
01	تَصْدِيق	سَاحِل	Coast	Endorsement	seashore.n.01	endorsement.n.05	0.01
02	خَيْط	ظُهْر	Noon	String	noon.n.01	string.n.01	0.01
03	مَشْي	موقد	Stove	Walk	stove.n.01	walk.n.01	0.01
04	خُضَار	عَبْد	Slave	Vegetable	slave.n.01	vegetable.n.01	0.04
05	قَرْيَة	بَسْمَة	Smile	Village	smile.n.01	village.n.02	0.05
06	مشفى	ساحر	Wizard	Infirmary	sorcerer.n.01	hospital.n.01	0.06
07	حَمَامَة	تَلّ	Hill	Pigeon	hill.n.01	pigeon.n.01	0.08
08	الْماس	كَأْس	Glass	Diamond	glass.n.02	diamond.n.01	0.09
09	جَبَل	حَبْل	Cord	Mountain	cord.n.03	mountain.n.01	0.13
10	شَاطِئ	غابَة	Forest	Shore	forest.n.01	shore.n.01	0.21
11	شَيْخ	ضَرِيح	sepulcher	Sheikh	burial_chamber.n.01	sheik.n.01	0.22
12	مِخَدَّة	أَدَاة	Tool	Pillow	tool.n.01	pillow.n.01	0.25
13	جَبَل	سَاحِل	Coast	Mountain	seashore.n.01	mountain.n.01	0.27
14	قَدَح	أَدَاة	Tool	Tumbler	tool.n.01	tumbler.n.02	0.33
15	شَاطِئ	رِحْلَة	Journey	Shore	journey.n.01	shore.n.01	0.37
16	سَفَر	حَافِلَة	Coach	Travel	coach.n.01	travel.n.01	0.4
17	فرن	طعام	Food	Oven	food.n.02	oven.n.01	0.44
18	صِيَام	عِيد	Feast	Fasting	feast.n.02	fast.n.01	0.49
19	وَسِيلَة	حَافِلَة	Coach	Means	coach.n.01	means.n.02	0.52
20	أخْت	فَتَاة	Girl	Sister	female_child.n.01	sister.n.01	0.6
21	جبل	تَلّ	Hill	Mountain	hill.n.01	mountain.n.01	0.65
22	شَيْخ	سَيِّد	Master	Sheikh	master.n.08	sheik.n.01	0.67
23	خُضَار	طَعام	Food	Vegetable	food.n.02	vegetable.n.01	0.69
24	جَاْرِيَة	عَبْد	Slave	Odalisque	slave.n.01	odalisque.n.01	0.71
25	مَشْي	جَرْي	Run	Walk	run.n.01	walk.n.01	0.75
26	خَيْط	حَبْل	Cord	String	cord.n.03	string.n.01	0.77
27	أَحْرَاش	غابَة	Forest	Woodland	forest.n.02	forest.n.02	0.79
28	مِخَدَّة	مِسْنَد	Cushion	Pillow	cushion.n.03	pillow.n.01	0.85
29	قَرْيَة	رِيف	Countryside	Village	countryside.n.01	village.n.02	0.85
30	شَاطِئ	سَاحِل	Coast	Shore	seashore.n.01	shore.n.01	0.89
31	وَسِيلَة	أَدَاة	Tool	Means	instrument.n.02	means.n.01	0.92
32	فَتَى	صَبِيّ	Boy	Lad	male_child.n.01	cub.n.02	0.93
33	قَبْر	ضَرِيح	Sepulcher	Grave	burial_chamber.n.01	grave.n.02	0.94
34	مشعوذ	ساحر	Wizard	Magician	sorcerer.n.01	sorcerer.n.01	0.94
35	قَدَح	كَأْس	Glass	Tumbler	glass.n.02	tumbler.n.02	0.95

Table 11. Result of the third comparison (applying similarity measures on English WN)

N	HR	W	P	LC	R	L
01	0.01	0.13	0.07	1.00	0.00	0.00
02	0.01	0.11	0.06	0.80	0.00	0.00
03	0.01	0.09	0.05	0.59	0.00	0.00
04	0.04	0.33	0.11	1.44	0.80	0.09
05	0.05	0.13	0.07	1.00	0.00	0.00
06	0.06	0.44	0.09	1.24	1.53	0.15
07	0.08	0.32	0.07	1.00	1.29	0.13
08	0.09	0.56	0.11	1.44	2.31	0.23
09	0.13	0.40	0.10	1.34	1.29	0.12
10	0.21	0.18	0.10	1.34	0.00	0.00
11	0.22	0.42	0.09	1.24	1.53	0.13
12	0.25	0.63	0.14	1.69	2.31	0.25
13	0.27	0.67	0.20	2.03	5.88	0.60
14	0.33	0.71	0.17	1.85	3.26	0.32
15	0.37	0.13	0.07	1.00	0.00	0.00
16	0.4	0.13	0.07	0.93	0.00	0.00
17	0.44	0.24	0.07	1.00	0.80	0.09
18	0.49	0.47	0.10	1.34	2.04	0.18
19	0.52	0.47	0.10	1.34	1.53	0.17
20	0.6	0.60	0.14	1.69	2.33	0.25
21	0.65	0.83	0.33	2.54	6.95	0.73
22	0.67	0.63	0.16	1.84	2.33	0.20
23	0.69	0.83	0.33	2.54	6.11	0.84
24	0.71	0.60	0.14	1.69	2.33	0.00
25	0.75	0.57	0.10	1.34	3.73	0.42
26	0.77	0.59	0.13	1.56	2.31	0.20
27	0.79	1.00	1.00	3.64	9.61	1.00
28	0.85	0.93	0.50	2.94	11.29	0.98
29	0.85	0.75	0.20	2.03	4.76	0.42
30	0.89	0.91	0.50	2.94	9.42	0.96
31	0.92	0.93	0.50	2.94	6.79	0.83
32	0.93	0.95	0.50	2.94	8.40	0.83
33	0.94	0.93	0.50	2.94	9.85	0.96
34	0.94	1.00	1.00	3.64	11.98	1.00
35	0.95	0.94	0.50	2.94	9.44	0.81

The first row is represented by the following abbreviations: N: the numbering of the word, HN: Human rating, W: WuP measure, Li: P: Path measure, LC: Leakcock & Chodorow, R: Resnik measure and L: Lin.

image008.png

Figure 6. The correlation between human ratings and similarity measures scores in in AWN experiment using the correct synsets

According to Table 12, the WuP measure in English WN yielded the best similarity scores among all measures in experiments 2 and 3, as indicated by its highest correlation value (Figure 6) and lowest mean squared error.

Table 12. The correlation and MSE of the measures in the third comparison

Measures	Correlation	MSE
WuP measure	0.82	0.042
Path measure	0.69	0.117
LCH measure	0.80	0.041
Resnik measure	0.79	0.070
Lin measure	0.77	0.069

3.5 Experimental observations and the introduction of a novel hybrid Similarity measure combining WuP and Resnik measures

3.5.1 A comparison of AWN and WN

This section explores the contrasts between AWN and WN identified in previous experiments, along with the benefits and drawbacks of utilizing each of them.

Advantages of leveraging AWN in the domain of natural language processing for Arabic include:

Every word in AWN is marked with Arabic vowels, making it better equipped for avoiding homonyms when seeking a synset for an Arabic word.

It performs exceptionally well when processing Arabic books that contain Arabic vowels (diacritics).
Outperforms other ontologies in tasks related to natural language processing, such as determining word roots.

Regarding the constraints associated with the utilization of AWN in the context of natural language processing for Arabic, these encompass:

AWN's limited synset coverage requires WN to locate missing synsets, which are not available in AWN.
Decreased efficiency in tasks involving words without Arabic vowels.
AWN has not received an update since 2010.
When used with modern books and online publications lacking Arabic vowels, it has poor efficiency.
The lack of resources for the Arabic language limits the use of many similarity measures.

While the merits of employing WN in the realm of natural language processing for Arabic encompass:

WN contains a larger number of synsets than other language specific WNs.
WN has been updated more frequently than AWN.
Provides better accuracy in calculating the degree of similarity between concepts.
Allows for the use of all similarity measures.
Performs better with modern documents because they lack Arabic vowels, only requiring word translation.
English WN has a wealth of resources in terms of corpus and tools, making it easier to implement than other WNs.

Drawbacks associated with the utilization of WN in the context of natural language processing for Arabic comprise:

Requires correct translation of words to obtain the desired result.
Cannot be used in many Arabic language processing tasks.

3.5.2 Comparing results of similarity measurements and introduction of a hybrid measure

We analyzed the results of similarity calculation using various measures and compared the measures that yielded the best outcomes in experiments where we used the correct synset.

As previously mentioned, the WuP measure exhibited a high correlation coefficient (0.82) with human ratings, indicating a good linear relationship with human ratings. Figure 7 illustrates the squared error between human ratings and scores obtained from three different measures.

Through this experiment, we note that measure WuP has a good ability to measure the similarity between words, where we note that the results of measure WuP are excellent for words that have a large percentage of similarity, while we note relatively irregular values for words that have little similarity, unlike measure Resnik , which was able to get very close to the exact result of these words from the conclusion of the observation that we made in many experiments, it can be concluded that measure WuP can be used as a good separator between words that are close in meaning and words that are far apart in meaning.

From the Figure 7, it is evident that WuP outperformed the other measures on pairs with similarity scores above 0.55. On the other hand, for pairs with similarity scores below 0.55, the squared error was high, whereas the Resnik measure exhibited better performance on pairs with similarity scores below 0.45. Finally, the LCH measure performed well on pairs with moderate similarity scores (between 0.45 and 0.55).

According to the results of the similarity measures, it was found that the WuP measure is more effective in determining the similarity of word pairs with a similarity score greater than a certain threshold, while providing a slightly higher score for pairs with a similarity value between them below this threshold. Conversely, the Resnik measure is more suitable for calculating the similarity score between words with similarity values below this threshold (1).

image009.png

Figure 7. Comparison of similarity measures squared error values in experiment 3 (the result of applying measure on WN)

Our earlier experiments indicate that employing a sole measure might not adequately capture the accurate degree of similarity between two words. We advocate for a more intricate method in gauging the similarity between word pairs. Rather than depending on a single metric, we propose categorizing word pairs into 'semantically close' and 'not semantically close'. Each category requires a distinct measure, as our results demonstrate that a universal measure lacks efficacy. This approach is expected to produce.

Based on these observations, a new similarity measure is proposed that incorporates the WuP and Resnik measures. The proposed measure takes into account several factors, including the depth of concept, the length of the shortest path between the synsets of two words, and the information content of their Least Common Subsumer (LCS).

$\begin{aligned} & \operatorname{sim}_{\text {Resnik } \& \text { WuP }}\left(c_1, c_2\right)=\left\{\begin{array}{l}\operatorname{sim}_{\text {resnik }}\left(c_1, c_2\right), \operatorname{sim}_{\text {wup }}\left(c_1, c_2\right)<\text { threshold } \\ \operatorname{sim}_{\text {resnik }}\left(c_1, c_2\right), \operatorname{sim}_{\text {wup }}\left(c_1, c_2\right) \geq \text { threshold }_{\text Sim_{{Resnik } \& \text { WuP }}}\left(c_1, c_2\right)\end{array}\right. \\ & =\left\{\begin{array}{c}\operatorname{IC}\left(\operatorname{lso}\left(c_1, c_2\right)\right), \frac{2 \times \operatorname{depth}\left(\operatorname{lso}\left(c_1, c_2\right)\right)}{\operatorname{len}\left(c_1, c_2\right)+2 \times \operatorname{depth}\left(\operatorname{lso}\left(c_1, c_2\right)\right)}<\text { threshold } \\ \frac{2 \times \operatorname{depth}\left(\operatorname{lso}\left(c_1, c_2\right)\right)}{\operatorname{len}\left(c_1, c_2\right)+2 \times \operatorname{depth}\left(\operatorname{lso}\left(c_1, c_2\right)\right)}, \frac{2 \times \operatorname{depth}\left(\operatorname{lso}\left(c_1, c_2\right)\right)}{\operatorname{len}\left(c_1, c_2\right)+2 \times \operatorname{depth}\left(\operatorname{lso}\left(c_1, c_2\right)\right)} \geq \text { threshold }\end{array} \text { and threshold }=0.45\right.\end{aligned}$ (1)

To compute the similarity value between two concepts, we first apply the WuP measure. If the resulting value exceeds a specific threshold, we consider it as the final result. Otherwise, we use the Resnik measure to obtain the final result. It is important to mention that the threshold value has been determined using the provided dataset, and we suggest it to be within the range of 0.4 to 0.46, with WuP producing excellent similarity values above this threshold. Nonetheless, it would be preferable to identify the optimal value based on a larger dataset.

3.5.3 Result of the hybrid measure

The results of applying the hybrid measure that combines WuP and Resnik measures are presented in Table 13.

Table 13. The results of the hybrid measure combining WuP and Resnik measures

N	Human Rating	Hybrid Measure	Error	Squared Error
01	0.01	0.00	0.01	0.0001
02	0.01	0.00	0.01	0.0001
03	0.01	0.00	0.01	0.0001
04	0.04	0.07	0.03	0.0007
05	0.05	0.00	0.05	0.0025
06	0.06	0.13	0.07	0.0046
07	0.08	0.11	0.03	0.0008
08	0.09	0.56	0.47	0.2167
09	0.13	0.11	0.02	0.0005
10	0.21	0.00	0.21	0.0441
11	0.22	0.13	0.09	0.0085
12	0.25	0.63	0.38	0.1406
13	0.27	0.67	0.40	0.1573
14	0.33	0.71	0.38	0.1413
15	0.37	0.00	0.37	0.1369
16	0.4	0.00	0.40	0.1600
17	0.44	0.07	0.37	0.1392
18	0.49	0.47	0.02	0.0004
19	0.52	0.47	0.05	0.0024
20	0.6	0.60	0.00	0.0000
21	0.65	0.83	0.18	0.0336
22	0.67	0.63	0.04	0.0016
23	0.69	0.83	0.14	0.0205
24	0.71	0.60	0.11	0.0121
25	0.75	0.57	0.18	0.0319
26	0.77	0.59	0.18	0.0330
27	0.79	1.00	0.21	0.0441
28	0.85	0.93	0.08	0.0069
29	0.85	0.75	0.10	0.0100
30	0.89	0.91	0.02	0.0004
31	0.92	0.93	0.01	0.0002
32	0.93	0.95	0.02	0.0003
33	0.94	0.93	0.01	0.0000
34	0.94	1.00	0.06	0.0036
35	0.95	0.94	0.01	0.0001
Correlation =		0.85	MSE =	0.038

The effectiveness of the Hybrid measure can be observed in Figure 8, particularly for word pairs with a high level of similarity. However, as with many other similarity measures, the efficiency of the measure decreases for word pairs with a moderate degree of similarity. Despite this, there is an improvement in the correlation between human ratings and the measure's output, with a correlation coefficient of 0.85 and a decrease in the Mean Squared Error to 0.038.

image011.png

Figure 8. Squared error results of the hybrid similarity measure

4. Conclusions and Future Works

In conclusion, this research paper comprehensively explored various facets of semantic similarity measures in the context of Arabic natural language processing (NLP).

Our study conducted an in-depth comparison between Arabic WordNet (AWN) and WordNet (WN) in the domain of Arabic NLP. Our investigation revealed distinct advantages and limitations associated with both resources. AWN's notable strengths lie in its Arabic vowel markings, exceptional performance with diacritics-laden Arabic books, and proficiency in tasks involving word root determination. However, AWN's drawbacks include limited synset coverage, reduced efficiency with non-voweled words, lack of updates, and inefficacy with modern publications lacking Arabic vowels. The scarcity of resources further constrains the application of various similarity measures.

Our research delved into the comparison of similarity measurement results using different measures. Notably, the WuP measure exhibited a strong correlation with human ratings (0.82), particularly for word pairs with high similarity. Conversely, the Resnik measure showed better performance for word pairs with low similarity. These findings were instrumental in guiding the development of a novel hybrid similarity measure.

The significance of our study lies in its contribution to advancing the understanding of semantic similarity measures. We introduce a novel hybrid measure, combining the strengths of WuP and Resnik measures, and optimize its performance using empirical thresholding. This hybrid measure exhibits substantial improvements in the correlation between human ratings and model output, boasting a remarkable correlation coefficient of 0.85 and a significantly reduced Mean Squared Error of 0.038.

In summary, our research not only comprehensively compared AWN and WN for Arabic NLP but also proposed an innovative hybrid similarity measure that significantly enhances semantic similarity calculations. This study thus makes a notable contribution to the field of NLP, underscoring the importance of resource selection and hybrid measures in achieving more accurate and context-aware semantic similarity assessments in the Arabic language. Nevertheless, it is essential to recognize that word similarity is not static, constrained solely by dictionary definitions. It mutates with evolving cultures and the emergence of new terminologies. This underscores the variances in similarity perceptions across distinct human groups, owing to prevailing cultural nuances. Consequently, we advocate the exploration of dynamic ontologies, subject to automatic evolution through artificial intelligence techniques. Such an approach holds the promise of enhancing the precision of word similarity assessments and, consequently, improving the efficacy of NLP applications.

References

[1] Muzakir, A., Adi, K., Kusumaningrum, R. (2023). Advancements in semantic expansion techniques for short text classification and hate speech detection. Ingénierie des Systèmes d’Information, 28(3): 545-556. https://doi.org/10.18280/isi.280302

[2] Bhyrapuneni, S., Rajendran, A. (2022). Word recognition method using convolution deep learning approach used in smart cities for vehicle identification. Revue d'Intelligence Artificielle, 36(3): 489-495. https://doi.org/10.18280/ria.360318

[3] Lazarre, W., Guidedi, K., Amaria, S., Kolyang. (2022). Modular ontology design: A state-of-art of diseases ontology modeling and possible issue. Revue d'Intelligence Artificielle, 36(3): 497-501. https://doi.org/10.18280/ria.360319

[4] Gruber, T.R. (1995). Toward principles for the design of ontologies used for knowledge sharing? International Journal of Human-Computer Studies, 43(5-6): 907-928. https://doi.org/10.1006/IJHC.1995.1081.

[5] Kulmanov, M., Smaili, F.Z., Gao, X., Hoehndorf, R. (2021). Semantic similarity and machine learning with ontologies. Briefings in Bioinformatics, 1-18. https://doi.org/10.1093/BIB/BBAA199

[6] Sánchez, D., Batet, M., Isern, D., Valls, A. (2012). Ontology-based semantic similarity: A new feature-based approach. Expert Systems with Applications, 39(9): 7718-7728. https://doi.org/10.1016/J.ESWA.2012.01.082

[7] Pawar, A., Mago, V. (2019). Challenging the boundaries of unsupervised learning for semantic similarity. IEEE Access, 7: 16291-16308. https://doi.org/10.1109/ACCESS.2019.2891692

[8] Zhu, G., Iglesias, C.A. (2017). Computing semantic similarity of concepts in knowledge graphs. IEEE Transactions on Knowledge and Data Engineering, 29(1): 72-85. https://doi.org/10.1109/TKDE.2016.2610428

[9] Meng, L., Huang, R., Gu, J. (2013). A review of semantic similarity measures in WordNet. International Journal of Hybrid Information Technology, 6(1): 1-12.

[10] Elkateb, S., Black, W. J., Vossen, P., Farwell, D., Rodríguez, H., Pease, H., Alkhalifa, M., Fellbaum, C. (2006). Arabic WordNet and the challenges of Arabic. In Proceedings of the International Conference on the Challenge of Arabic for NLP/MT, 15-24. https://aclanthology.org/2006.bcs-1.2.

[11] Suggested Upper Merged Ontology (SUMO) - GM-RKB. (n.d.). http://www.gabormelli.com/RKB/Suggested_Upper_Merged_Ontology_(SUMO).

[12] Rada, R., Mili, H., Bicknell, E., Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1): 17-30. https://doi.org/10.1109/21.24528

[13] Bulskov, H., Knappe, R., Andreasen, T. (2002). On measuring similarity for conceptual querying. In Flexible Query Answering Systems: 5th International Conference, Copenhagen, Denmark, pp. 100-111. https://doi.org/10.1007/3-540-36109-X_8

[14] Michelizzi, J. (2005). Semantic relatedness applied to all words sense disambiguation (Doctoral dissertation, University of Minnesota, Duluth).

[15] Wu, Z., Palmer, M. (1994). Verb semantics and lexical selection. arXiv preprint cmp-lg/9406033, 133-138. https://doi.org/10.48550/arxiv.cmp-lg/9406033

[16] Almarsoomi, F.A., OShea, J.D., Bandar, Z., Crockett, K. (2013). AWSS: An algorithm for measuring Arabic word semantic similarity. In 2013 IEEE International Conference on Systems, Man, and Cybernetics, Manchester, UK, pp. 504-509. https://doi.org/10.1109/SMC.2013.92

[17] Leacock, C., Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. WordNet: An electronic lexical database, 49(2): 265-283.

[18] Li, Y., Bandar, Z.A., Mclean, D. (2003). An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4): 871-882. https://doi.org/10.1109/TKDE.2003.1209005.

[19] Aldiery, M.G. (2017). The semantic similarity measures using Arabic ontology سيياقم هباشتلا يللادلا. Doctoral dissertation, Middle East University.

[20] Banu, A., Fatima, S.S., Khan, K.U.R. (2015). Information content based semantic similarity measure for concepts subsumed by multiple concepts. International Journal of Web Applications, 7(3): 85-94.

[21] Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. arXiv Preprint CMP-lg/9511007.

[22] Saruladha, K., Aghila, G., Raj, S. (2010). A new semantic similarity metric for solving sparse data problem in ontology based information retrieval system. International Journal of Computer Science Issues, 7(3): 40-48.

[23] Sánchez, D., Batet, M., Isern, D. (2011). Ontology-based information content computation. Knowledge-Based Systems, 24(2): 297-303. https://doi.org/10.1016/j.knosys.2010.10.001

[24] Seddiqui, M.H., Aono, M. (2010). Metric of intrinsic information content for measuring semantic similarity in an ontology. In Proceedings of the Seventh Asia-Pacific Conference on Conceptual Modelling, 110: 89-96.

[25] Meng, L., Gu, J., Zhou, Z. (2012). A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. International Journal of Grid and Distributed Computing, 5(3): 81-94.

[26] Jiang, J.J., Conrath, D.W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. arXiv Preprint CMP-lg/9709008.

[27] Lin, D. (1998). An information-theoretic definition of similarity. In ICML, 98: 296-304.

[28] Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A. (2003). Investigating semantic similarity measures across the Gene Ontology: The relationship between sequence and annotation. Bioinformatics, 19(10): 1275-1283. https://doi.org/10.1093/bioinformatics/btg153

[29] Seco, N., Veale, T., Hayes, J. (2004). An intrinsic information content metric for semantic similarity in WordNet. In ECAI, 16: 1089-1090.

[30] Meng, L., Gu, J. (2012). A new method for calculating word sense similarity in WordNet1. International Journal of Signal Processing, Image Processing and Pattern Recognition, 5(3): 197-206. https://www.earticle.net/Article/A208828

[31] Chandrasekaran, D., Mago, V. (2021). Evolution of semantic similarity—A survey. ACM Computing Surveys (CSUR), 54(2): 1-37. https://doi.org/10.1145/3440755

[32] Tversky, A. (1977). Features of similarity. Psychological Review, 84(4): 327-352. https://doi.org/10.1037/0033-295X.84.4.327

[33] Ezzikouri, H., Madani, Y., Erritali, M., Oukessou, M. (2019). A new approach for calculating semantic similarity between words using WordNet and set theory. Procedia Computer Science, 151: 1261-1265. https://doi.org/10.1016/j.procs.2019.04.182

[34] Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24-26. https://dl.acm.org/doi/pdf/10.1145/318723.318728

[35] Patwardhan, S., Pedersen, T. (2006). Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In Proceedings of the Workshop on Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together.

[36] Patwardhan, S. (2003). Incorporating dictionary and corpus information into a context vector measure of semantic relatedness. Doctoral dissertation, University of Minnesota, Duluth.

[37] Merhbene, L., Zouaghi, A., Zrigui, M. (2013). A semi-supervised method for Arabic word sense disambiguation using a weighted directed graph. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan, pp. 1027-1031.

[38] Jiang, Y., Zhang, X., Tang, Y., Nie, R. (2015). Feature-based approaches to semantic similarity assessment of concepts using Wikipedia. Information Processing & Management, 51(3): 215-234. https://doi.org/10.1016/j.ipm.2015.01.001

[39] Zhou, Z., Wang, Y., Gu, J. (2008). New model of semantic similarity measuring in wordnet. In 2008 3rd International Conference on Intelligent System and Knowledge Engineering, Xiamen, China, 1: 256-261. https://doi.org/10.1109/ISKE.2008.4730937

[40] Faaza, A., James, D., Zuhair, A., Keeley, A. (2012). Arabic word semantic similarity. International Journal of Cognitive and Language Sciences, 6(10): 2497-2505. https://doi.org/10.5281/zenodo.1080052

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Assessing Semantic Similarity Measures and Proposing a WuP-Resnik Hybrid Metric for Enhanced Arabic Language Processing