The Impact of Oversampling and Undersampling on Aspect-Based Sentiment Analysis of Indramayu Tourism Using Logistic Regression

ABSTRACT


INTRODUCTION
Tourism represents a potential source of income for a region.One such region that has made strides in developing its tourism sector is Indramayu Regency.Indramayu Regency is located on the north coast of Java Island, which has 11 subdistricts with 36 villages directly adjacent to the Java Sea, with a coastline length of 147 km.Indramayu is situated in the northern region of Java Island [1].This geographical location has facilitated the development of numerous beach tours, which have become a significant tourist attraction in the local area and the majority of attractions in Indramayu.This study focuses on the natural attractions of nine beaches in Indramayu, namely Tiris Indramayu Beach, Balongan Indah Beach, Legenda Tirtamaya Beach, Plentong Ujunggebang Beach, Tambak Indah Beach Tourism, Karang Song Beach, Junti Beach, Glayem Beach, and Panjiwa Sumber Mas Beach.
These tourist destinations can be easily accessed through applications such as Google Maps.As the number of tourist attractions continues to grow, the competition is expected to intensify.Consequently, it will be necessary to implement various strategies to attract visitors and ensure the long-term viability of this industry.One crucial factor influencing visitors' decision to visit a tourist attraction is the reviews provided by other visitors who have already been there.These reviews serve as a valuable reference.The reviews available on Google Reviews can also be taken into consideration when determining the most suitable tourist attraction.In addition to serving as a source of information for visitors, these reviews can also be evaluated by the managers of the tourism objects to identify potential areas for improvement that could enhance visitor satisfaction.
Reviewing a large number of visitor reviews can be a timeconsuming task.Therefore, a technique is needed to collect and analyze these reviews on a large scale.Sentiment Analysis is one such technique that can be employed to address this challenge.Sentiment analysis involves the computational study of opinions, sentiments, and emotions expressed in text to determine whether the sentiment is positive or negative [2].
The purpose of sentiment analysis is to assist users in understanding the sentiment expressed in order to facilitate informed decision-making about an object.This study aims to determine the sentiments expressed in visitor reviews of tourist destinations in Indramayu through Google Reviews.The sentiment classification of these reviews will assess the accuracy of the multi-label classification method based on aspects and related sentiments.
Machine learning techniques can be used to classify sentiment, such as Naï ve Bayes [3], and Support Vector Machine (SVM) [4] to classify sentiment opinion.Logistic regression has also been used to classify customer reviews [5][6][7].Previous studies have also explored the application of deep learning algorithms to perform sentiment analysis of reviews, including application usage reviews and reviews of specific locations [8][9][10].However, such methods require large amounts of data.With a limited amount of data, conventional machine learning remains an attractive option.
The issue of data imbalance is a significant challenge in machine learning.An imbalanced dataset may result in a classifier being unable to accurately classify classes with a limited number of instances.To address the problem of imbalance in sentiment analysis, Ogul et al. employed various balancing techniques, including oversampling and undersampling, on English and Turkish texts [11].Furthermore, the application of balancing methods has been shown to yield positive outcomes in the sentiment classification of application reviews using machine learning algorithms, with an improvement in accuracy of up to 13% observed in Indonesian text [12].
This study differs from previous research [11,12], which focuses on a single-label case sentiment analysis and aims to predict positive, negative, and neutral polarity.Instead, this study performs multi-label classification sentiment analysis based on its aspect in Indonesian text using logistic regression as a classifier.This approach has been demonstrated to yield effective results in sentiment analysis, as evidenced by previous studies [13,14].Furthermore, the impact of oversampling and undersampling methods on aspect-based sentiment analysis datasets with imbalanced data has been evaluated in the context of Indramayu Tourism.
The work is organized as follows: Section 2 presents the related works on balancing methods, section 3 describes the methodology used to classify aspect-based sentiment analysis, section 4 presents the results and discussion, and finally, section 5 concludes the study.

RELATED WORK
The practice of sharing experiences or reviews about products or services through social media has become increasingly prevalent.Other individuals subsequently utilized this review as a factor in their purchasing decisions.This phenomenon also extends to the selection of travel destinations.This phenomenon also occurs when tourists select a destination for their vacation or a place to stay.The tourism sector is significantly influenced by the decisionmaking process based on visitor reviews from social media.Young people today tend to consult online reviews, such as those found on travel blogs, forums, and review sites, before determining a tourist destination [15].In addition to being useful for consumers, this product review can also be the key to developing products or services.From the review, potential aspects can be identified that can be developed or need to be improved to satisfy consumers in the future better [16].
The information derived from review texts is expected to provide an understanding of the customer experience, not merely from ratings, which can be inconsistent and less able to show which aspects are advantages and which need to be improved concerning goods and services.Given the vast quantity of information available online, the time required to read all the reviews is considerable.Consequently, a sentiment analysis system capable of analyzing the reviews is necessary.Sentiment analysis is currently developed mainly using dictionary-based and machine learning methods [17].As dictionary-based methods require knowledge-based resources, languages with low dictionary resources rely on machine learning-based methods in sentiment analysis.Logistic regression has been employed to classify sentiment in Arabic [5][6][7] and Indonesian [13], with encouraging results.SVM and Naive Bayes have also been employed as classifiers to predict sentiment polarity [3,4,12].However, one of the challenges in machine learning arises when dealing with class imbalance.Imbalanced data often produces unsatisfactory results in classification [18], particularly when one class occupies the majority of the sample size.The imbalanced nature of the dataset poses a challenge for machine learning models.These models are more likely to learn from the class with more significant sample representation in the dataset, mainly when the degree of imbalance is high.The result of such models is often high accuracy in classifying data to its majority class.However, if the model is analyzed further, the samples of the minority class are mostly assigned to the class with the majority sample.
One solution to address this issue is data balancing.Oversampling and undersampling are frequently employed as a means of addressing the issue of imbalanced data in machine learning.In the case of undersampling, a random selection of samples from the majority class is used to balance the number of samples in the minority class [19].In contrast, oversampling involves replicating and adding samples from the minority class to the dataset, thereby ensuring a balanced distribution of samples across the different classes [20].
The development of oversampling techniques that do not only duplicate samples was carried out in study of Chawla et al. by introducing SMOTE [21].The methodology involves a mixture of undersampling the majority class as well as oversampling the minority class by using synthetic examples of the latter.The paper presents evidence that this methodology is capable of producing superior classifier performance compared to methods that only under-sample the majority class or vary the loss rates in the class priors in Naive Bayes.The methodology is evaluated using Area Under Curve (AUC) and Receiver Operating Characteristic (ROC), and is shown to produce better classifier performance than alternative resampling approaches.
SMOTE is often used in imbalanced data cases.In the health field, works by Ijaz et al. [22] propose a Hybrid Prediction Model (HPM) designed to forecast the probability of type 2 diabetes and hypertension at an incipient stage using input risk factors from individuals.The model incorporates DBSCANbased outlier detection, SMOTE, and Random Forest algorithms to achieve heightened precision in predictions.The model surpasses other models in predicting diabetes and hypertension.In their study, Feng et al. [23] [25].
Another balancing technique, random undersampling, is also used in detecting web attacks in the CSE-CIC-IDS2018 dataset [26].The investigation encompasses a total of eight random undersampling ratios and seven distinct classifiers employed for the detection of web attacks.Before applying any random undersampling, the classifiers utilized in the experiment encountered difficulties in achieving satisfactory classification performance, thereby highlighting the issue of class imbalance in web attacks.With the gradual increase in random undersampling, there was a general improvement in the classification performance aimed at addressing the problem of class imbalance.Expert system for the automated identification of obstructive sleep apnea through the utilization of a single-lead ECG and the technique of random undersampling boosting yields superior results in terms of sensitivity, specificity, and accuracy when compared to the currently available detection algorithms [27].
Karthikeyan and Kathirvalavakumar [28] propose a technique for addressing imbalanced datasets in classification through the utilization of undersampling approach.Specifically, the method under consideration employs an incremental clustering technique, wherein the representatives of the clusters are ascertained as the average of instances of the respective cluster.This proposed undersampling technique outperforms existing approaches that employ K-Means and Leader algorithm.The outcomes yielded through the proposed algorithm are characterized by higher accuracy and reduced misclassification rate in both major and minor classes, thereby establishing the efficacy of the proposed method.
Random oversampling also widely used, such as identifying mobile malware, which effectively mitigates the issue of imbalanced network traffic information [29].The proposed method entails creating new instances of malicious activity by synthesizing signatures obtained from clustered malicious network traffic datasets.These newly generated instances are then combined with those obtained through random oversampling, resulting in a more enriched minority class dataset.Moreo et al. [30] proposed technique serves as a viable means to enhance the precision of classification algorithms in the realm of imbalanced text classification issues.It can be seamlessly employed in various domains of binary text classification where the negative instances vastly outnumber the positive instances pertaining to the class of interest.The approach facilitates the generation of synthetic training examples for the minority class, thereby optimizing the classifier's performance.
This research not only examines the sentiment expressed in reviews but also attempts to predict aspects that contain sentiment.Furthermore, this research compares data balancing techniques employed in aspect-based sentiment analysis, including SMOTE, random oversampling, and random oversampling compared to no balancing techniques.

METHODOLOGY
Figure 1 shows the research methodology used in this study.There are seven main steps: data acquisition, annotation (labeling), preprocessing, data splitting, vectorization, classification, and evaluation.

Data acquisition
The dataset was taken from Google Maps on nine beaches in Indramayu, namely Tiris Indramayu Beach, Balongan Indah Beach, Legenda Tirtamaya Beach, Plentong Ujunggebang Beach, Tambak Indah Beach Tourism, Karang Song Beach, Junti Beach, Glayem Beach, Panjiwa Beach Sumber Mas and obtained 1677 rows of data.The criteria for data collection are reviews that have text (review ratings without text are not taken).Table 1 shows an example of a review taken from Google Maps review.
Access to the location can only be motorbikes, lots of trees are cool, there are some good photo spots too.Cheap admission.jalannya perlu diperbaiki dan kebersihan pantai perlu dijaga the road needs to be repaired and the cleanliness of the beach needs to be maintained pantai tiris merupakan salah satu tempat wisata baru di indramayu, pantainya lumayan bagus, tapi sayang akhir-akhir ini banyak sampah berserakan dan pengelola sepertinya tidak peduli dan akses menuju kesana juga cukup sulit, semoga pengelola akan memperbaikinya dengan cepat Tiris beach is one of the new tourist attractions in Indramayu, the beach is pretty good, but it's a shame that lately there's been a lot of trash scattered around and the manager doesn't seem to care and access to it is also quite difficult, hopefully the manager will fix it quickly

Labeling
The data was collected and then annotated by three individuals in accordance with specific guidelines, with the objective of ensuring uniformity in annotation across different annotators.The final label for each review was determined through majority voting.The guidelines typically involve two distinct stages of annotation: one for identifying aspects and another for determining sentiments.Aspect labels encompass a range of categories, including cleanliness, facilities, accessibility, attractiveness, image, price, and human resources.Sentiment labels are further categorized as either positive or negative sentiment polarity.Each sentence review can be labeled to more than one aspect such as text sample 1 in Table 1.The sentence review mentioned accessibility, attractiveness, and price, which can be labeled as 1 and other aspects as 0. Samples of aspect labeling from Table 1 can be seen in Table 2.The sentiment of each aspect is annotated as 1, then analyzed further, whether it has positive or negative sentiment.If there is no discussion about a specific aspect, then its sentiments remain null.Examples of sentiment labeling can be seen in Table 3.The first sample shows an associated sentiment for accessibility, which is negative because the beach can only be accessed by motorcycle.The second example shows a negative in cleanliness because the user mentioned that the cleanliness of the beach needs to be maintained.After completing the annotation process, the acquisition of labeled data is quite limited.In addition, the quantity of labeled data is relatively small compared to the overall dataset.Furthermore, there exists an imbalance in the labels assigned to sentiments on various aspects.This disparity can be observed in Table 4, where the distribution of positive and negative sentiment labels for each aspect predominantly exhibits an uneven distribution.For instance, in the aspects of accessibility, attractiveness, image, price, and human resources, one class represents only one-third or less of the total dataset pertaining to that specific aspect.

Preprocessing
Data preprocessing is a process for preparing raw data before being processed by the system by removing inappropriate data or changing the data into a form that is easier for the system to process [31].In sentiment analysis, this preprocessing stage is crucial, especially for social media, which mostly contains informal and unstructured words or sentences and has a lot of noise.Data preprocessing is a very important part of this research, as it prepares raw data before being processed by the system by removing inappropriate data to reduce noise.It is hoped that it will produce a good classification model later.
The data used at this stage is raw data in the form of reviews from previously labeled social media, but most of the review data contains non-standard and unstructured words or sentences and has large noise.The preprocessing carried out in this study was case folding, stopword removal, stemming, and word normalization.The result of this stage is data that is ready for the vectorization and classification process.

Case folding
Case folding is the initial stage in data preprocessing.At this stage, all letters in a document or sentence become lowercase.In this case folding process, all letters in each review data will be converted to lowercase letters to ensure that all letters are in lowercase is due to the fact that capital and lowercase words are stored as separate entities in computer language [32].This process guarantees that the term "term" is recognized as being equivalent to "Term".

Punctuation removal
This stage aims to remove some data that does not have useful features for sentiment analysis.In this process, each data review will be cleaned by removing punctuation.In this study, terms will be extracted and split by space.Therefore, punctuation needs to be removed.

Normalization
In this language normalization process, any review data containing non-standard words, such as misspellings and abbreviations will be changed or converted into standard words or words that are in accordance with the Big Indonesian Dictionary (KBBI).This stage is carried out by matching each word in the review data with the dictionary; this dictionary is built by evaluating each review manually by taking nonstandard words and then adding standard words to the dictionary.The dictionary that has been built is then used to replace non-standard words, such as abbreviations, typos, etc., with standard words.Normalization reduces vocabulary which can contribute to reducing noise and standardization of texts [33,34].

Stemming
Stemming is the utilization of heuristic algorithms in eliminating morphological affixes from words, thereby retaining solely the word stem [35] which leads to a reduction in vocabulary size.This technique aims to get the base words or stem of words by removing their affixes, either prefixes, suffixes, or prefixes and suffixes.This stemming is done to eliminate word variations due to affixes that form passive forms, active forms, endings and so on and only take the base form.

Stop words removal
Stop words are common words that are considered to provide little or no information or value in document analysis [36,37].The process of eliminating stop words is employed to remove words that lack semantic significance and frequently appear.In this study, the Sastrawi library's compilation of stop words is employed to filter out stop words.

Data splitting
The data utilized in this study has been divided into two distinct subsets, the train data 60%, and the test data 40%.The division was undertaken to ensure that the data employed for training and testing purposes was distinct, thereby resulting in more objective testing outcomes.Table 5 presents the results of data splitting based on aspect labels.The total number of training samples is 1,006, while the number of testing samples is 671.
For each aspect, two classes define whether an aspect is present or not present for each text review.Such as cleanliness, 194 of 1,006 samples in the training set that used as training for aspect models, and sentiment models are trained using aspect present labels that consists of positive and negative sentiments.

Text vectorization
Text representation in this study is done by converting text to vector using Term Frequency -Inverse Document Frequency (TF.IDF).In the field of information retrieval, term weights are primarily utilized to denote the term usefulness in the retrieval process.TF.IDF is a quantitative measure that combines the two variables TF and IDF.In this context, TF offers a direct assessment of the likelihood of occurrence of a term, which is standardized by the overall frequency in the document or the collection of documents, contingent upon the extent of the calculation.Conversely, IDF can be interpreted as the quantification of information level in traditional information theory [38].The TF.IDF values for each term are calculated using the training data.The testing data vector is obtained by transforming the text with the TF.IDF term vector from the previous training data.

Data balancing
In the training phase, data balancing is performed to ensure equitable learning from data by the model.However, data balancing is not employed in testing data.We utilize various techniques such as Random Over Sampling (ROS), SMOTE, and Random Under Sampling (RUS) to achieve data balance.In this experimental study, a comparative analysis of these techniques is conducted.
Undersampling involves selecting a subset of points from the majority class while discarding the rest.On the other hand, oversampling entails replicating some of the points from the minority class to increase its cardinality.Alternatively, synthetic data generation involves crafting novel data points from the minority class, an approach exemplified by the SMOTE method [21], to augment its cardinality.

Logistic regression using stochastic gradient descent
TF-IDF vector as feature vector (x) are combined linearly using weights (coefficient values) to predict an output value (y).The output of logistic regression is binary values, 0 or 1 that are generated from the sigmoid function using threshold 0.5.Sigmoid function can be seen in Eq. (1).
Error during training process is measured using the log loss function, as seen in Eq. ( 2).
The weight update is achieved through stochastic gradient descent, which aims to minimize the loss function.This entails updating the weight by either decreasing or increasing it.The decision to update the weight is based on the prediction error.When the difference between the prediction and class is positive, the weight is updated to be smaller.Conversely, if the difference is negative, the weight is increased.Eq. ( 3) and Eq. ( 4) are utilized to compute the gradient for weight update in stochastic gradient descent, where dw is used to update weights, and db to update biases.In this update weights, we use  0.0001. () =   (  −  (( () ) Τ   +   )) −    ()   (3) So, new weights are defined by the sum of old weights and the multiplication of the learning rate and its updated weights.In this study, we use a learning rate of 0.1.Equation for new weights and biases can be seen in Eq. ( 5) and Eq.(6). (5)

Aspect-based sentiment analysis model
A binary relevance model is used to perform multi-label classification for aspect and sentiment.In binary relevance, a group of single-label binary classifiers undergoes training, with each classifier being assigned to a specific class.The classifiers in question predict the membership or nonmembership of their respective classes, with the final multilabel output being determined by the union of all predicted classes.This approach used the assumption that there is no correlation between aspect labels.Figure 2 shows the correlation between aspect labels.The highest correlation between different aspect labels are images and attractiveness with -0.35, which can be interpreted as a low negative correlation, and the other correlation is considered very low.Thus, in this study, we assume there is no correlation between labels and the use of binary relevance.

Figure 2. Correlation between aspect labels
Figure 3 shows the classification process in aspect-based sentiment analysis.The training data in the form of TF.IDF vectors is used to build a Logistic Regression model with Stochastic Gradient Descent (SGD) to update the weights.In the aspect-based sentiment analysis model, the model consists of two steps.The first step is to build the aspect model, then the second step builds a model for sentiment classification which is done separately with the same data.Each category and sentiment has its own logistic regression model, so there are eight aspect models and eight sentiment models.For the sentiment model, the training data used is training data with not-null labels.
The aspect category and sentiment models that have been built using training data are then tested with testing data.Testing is done hierarchically, where the testing data is tested with the aspect model, if the aspect is detected (has a label of 1), then the data will be tested using related aspect's sentiment models into positive (1) or negative (0) sentiment.If the aspect category is 0, then the sentiment on that aspect is given a dummy value of -1.The output of these sentiment labels is then used for evaluation.

Model evaluation
In order to conduct a comparative analysis and assessment of the models, we employed precision, recall, and F1 score as the evaluation metrics.Precision provides insight into the veracity of the model, while recall signifies the comprehensiveness of the model.The calculation of precision and recall involves TP, TN, FP, and FN, which correspond to true positive, true negative, false positive, and false negative, respectively.TP denotes the correct classification of the positive sentiments, TN signifies the correct classification of the negative sentiments, FP represents the incorrectly classified values of the positive sentiments or has negative real class sentiments, and FN denotes the incorrectly classified samples of the negative sentiment or has a positive real class.The F1 score, the harmonic mean of precision and recall, is deemed a more significant evaluation parameter, particularly in imbalanced datasets.These evaluation parameters fall within the range of 0 to 1, where 1 denotes the highest value and 0 denotes the lowest value.Precision, Recall, and F1-Score are defined in Eq. (7)(8)(9).

RESULT AND DISCUSSION
In this section, we discuss the result of our four distinct sets of experiment scenarios.Our objective is to assess and contrast the performances of aspect-based sentiment analysis on an imbalanced dataset and a balanced dataset, utilizing the techniques of Synthetic Minority Oversampling Technique (SMOTE), Random Over Sampling (ROS), and Random Under Sampling (RUS).The machine learning models employed in this empirical study encompass the utilization of Logistic Regression, with Stochastic Gradient Descent as weights optimizer using learning rate of 0.1 and epoch of 500.
In order to mitigate any potential overfitting issues, the training and testing data have been partitioned into a 60-40% ratio, respectively.Table 6 presents a comparison of the various balancing methods employed in aspect-based sentiment analysis.The highest F1-score of 83 for cleanliness is achieved using SMOTE and ROS, exceeding without balancing and RUS.As for facility and image aspects, the best F1-score was achieved using the SMOTE balancing method with scores of 40 and 59, respectively.In contrast, undersampling has negative impact on model performance.The application of the balancing technique has been observed to result in a decline in accessibility and price aspects performance.In the attractiveness aspect, the SMOTE method did not result in an improvement in the F1-score, whereas other methods tended to result in a decrease in performance.The human resources aspect exhibited the most favorable performance when employing random oversampling, with a slight discrepancy observed between this approach and SMOTE, which differed by 1%.whereas undersampling led to a significant increase in recall from 29 to 43, although a decline in precision accompanied this.In addition, the undersampling method has been observed to improve recall in the human resources aspect, although this has been accompanied by a reduction in precision.This phenomenon is suspected to be the result of a limited number of positive sentiments.Since two models, the aspect model and the sentiment model, are combined sequentially in the sentiment aspect analysis scenario, we analyze if each model is separated under the assumption that the aspect model and the sentiment model each stand alone to find out the performance of multi-label classification when the two models are not combined.The performance of the aspect model using the same testing data without combining aspects and sentiment can be seen in Table 7. Oversampling method in Aspect detection tends to improve overall F1-Score except for price that downgrades performance.Conversely, undersampling decreased the f1score except for the attractiveness aspect that surpass without balancing the dataset and also oversampling SMOTE and Random Over Sampling (ROS).For precision and recall, balancing techniques generally increase recall and decrease precision.This shows that oversampling increases aspect detection or true positives with an increase in recall but also has the side effect of increasing false positives as indicated by a decrease in precision.
The performance of the standalone sentiment model can be observed in Table 8.It is noteworthy that the overall performance of the balancing method is nearly identical to that of the original data set without balancing.However, it is noteworthy that in certain aspects, the balancing technique has a detrimental effect on performance.This is particularly evident in the attractiveness and image aspects, where the balancing technique has a significant negative impact on recall without a commensurate increase in precision, resulting in a decline in the F1-score.Conversely, in the price aspect, undersampling has a pronounced negative effect on performance, with a notable reduction in recall.In contrast, undersampling has a beneficial effect on recall and a detrimental effect on precision in human resources, cleanliness, facilities, and accessibility aspects.
The aspect model and sentiment model demonstrate superior performance compared to the combination of the two models.This indicates that the decline in performance is observed in samples that are correctly identified by the aspect model but not necessarily successfully classified by the sentiment model.Additionally, there are instances where models that are correctly classified by the sentiment model fail to be recognized as actual aspects when classified by the aspect model.Given that aspect models precede sentiment models, if an aspect is not detected, it will never reach the sentiment model.This results in the sequential combination of the two models exhibiting a lower performance than the aspect model.
In this investigation, logistic regression is employed as a classifier, given the limited quantity of samples.Sophisticated models such as deep learning or its advanced model, transformers-based models, require a large number of samples.However, an imbalanced dataset may result in logistic regression leading to overfitting, wherein the model excels in predicting the majority class but performs inadequately in predicting the minority class.This can lead to high efficacy during training but diminished effectiveness during testing.
The results of investigating balancing techniques in sentiment analysis indicate that oversampling has the tendency to enhance the detection accuracy of minority classes while slightly diminishing the accuracy of majority classes.Undersampling, on the other hand, is inclined to lower the predictive performance.Within the scope of this particular investigation, the diminished sample size resulting from undersampling leads to a deterioration in predictive performance.
In aspect-based sentiment analysis, particularly in the context of binary relevance, where predictions are made at two distinct levels-first at the aspect level and then at the sentiment level-the performance of sentiment prediction is not only reliant on the sentiment model but also impacted by the accuracy of aspect prediction.This is due to the fact that the aspect prediction serves as a precursor to sentiment analysis.
By using aspect-based sentiment analysis, reviews can be completed at a lower cost and in less time.The tourism industry can benefit from aspect-based sentiment analysis by gaining insights into user satisfaction and dissatisfaction with specific aspects.This information can be used to enhance services and gain a competitive advantage.The results of this study can support the development of aspect-based sentiment analysis models on limited and imbalanced datasets.

CONCLUSIONS
The comparison of balancing methods in this study indicates that the oversampling method can enhance the overall performance of aspect-based sentiment analysis in the tourism sector with limited and imbalanced data.
Consequently, the balancing method in aspect-based sentiment analysis on Indramayu tourism has a positive impact on the F1-Score, increasing it by 1-6% on the aspects of cleanliness, facilities, images, and human resources when oversampling SMOTE is employed in comparison to the scenario without balancing.The results of the random oversampling method indicate a positive impact on the cleanliness and human resources aspects, with an improvement of 3-7% in the F1-Score.Conversely, the random undersampling method has been observed to result in a decline in the F1-Score for the aspect-based sentiment analysis.
The logistic regression method with SGD is an appropriate approach for this aspect-based sentiment analysis study in tourism, where the amount of data is limited in each aspect and the data exhibits an imbalanced class problem.Logistic regression models that are sensitive to imbalance problems can be mitigated by employing oversampling balancing techniques, which can enhance sentiment detection performance on each aspect.
Evaluation results show that Logistic Regression with SGD is affected by the aspect model and sentiment model that is used sequentially.Where stand alone models, aspect models only or sentiments model only performs better than the combination of these two models.The oversampling method can improve aspect detection in aspect models, but in sentiment models balancing method tends to not affecting performance.
This research has limitations that can be addressed in future studies.Suggestions for future research on the dataset include the following: as the review progresses, the number of dataset samples for research can be expanded, and methods can be developed in terms of both the automatic extraction of aspect entities and the use of deep learning methods as classification algorithms and language models such as BERT for text vectorization.

Table 2 .
Multi aspect labeling

Table 4 .
Positive and negative sentiment distribution

Table 5 .
Data splitting

Table 6 .
Comparison of balancing method on aspect-based sentiment analysis

Table 7 .
Comparison of balancing method on aspect classification

Table 8 .
Comparison of balancing method on sentiment classification