Explainable Machine Learning for Credit Card Customer Segmentation Using Clustering Algorithms

Explainable Machine Learning for Credit Card Customer Segmentation Using Clustering Algorithms

Sanket Kedar Kapish Rathod Darshana Bodhak Priyanka V. Deshmukh* Anjali S. More

Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University), Pune 412115, India

Corresponding Author Email: 
priyanka.deshmukh@sitpune.edu.in
Page: 
261-268
|
DOI: 
https://doi.org/10.18280/isi.310125
Received: 
15 April 2025
|
Revised: 
18 July 2025
|
Accepted: 
17 January 2026
|
Available online: 
31 January 2026
| Citation

© 2026 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

Customer segmentation plays a crucial role in financial analytics, enabling institutions to develop targeted marketing strategies, improve risk management, and enhance customer engagement. However, traditional segmentation methods often struggle to capture complex behavioral patterns in large-scale credit card transaction datasets. This study proposes an explainable machine learning framework for credit card customer segmentation based on unsupervised clustering techniques. The framework integrates multiple clustering algorithms, including K-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), hierarchical clustering, and spectral clustering, to identify distinct customer groups according to spending behavior and credit usage patterns. To enhance model interpretability and transparency, explainable artificial intelligence techniques—specifically SHapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME)—are employed to analyze feature contributions and cluster characteristics. Experimental results show that K-means achieved the most effective clustering performance, supported by evaluation metrics such as the Davies–Bouldin index, silhouette score, and Calinski–Harabasz index. The analysis reveals four meaningful customer segments representing distinct credit usage behaviors, including full payers, installment users, low-activity customers, and cash-advance users. The proposed framework demonstrates that combining unsupervised learning with explainable AI can provide interpretable and actionable insights for financial institutions, supporting more informed credit risk assessment, personalized marketing strategies, and data-driven decision-making in credit card analytics.

Keywords: 

customer segmentation, credit card analytics, unsupervised learning, clustering algorithms, explainable artificial intelligence, Shapley additive explanations, local interpretable model-agnostic explanations, cluster validation metrics

1. Introduction

Customer segmentation is a critical skill, particularly in the financial services sector. Effective marketing, risk management, and improved customer retention are made possible by understanding the type of customer [1-4]. Banks and other financial firms have historically used relatively basic segmentation techniques that simply employ a small number of transactions or demographic data. These approaches are fundamentally inadequate, however, considering that they failed to take into consideration the complex nature of customer preferences, behaviors, and market dynamics [5-13]. The increasing number of electronic transactions has ended up in a storm of data, making it tricky for ordinary segmentation techniques to produce insightful evaluations. While manual segmentation can be time-consuming and biased at times, rule-based approaches fail to detect subtle trends in large volumes of data [14]. Hence, it leads to financial institutions losing out on other potential means of improving their services in general to better customer experience. These are obstacle courses that could be surmounted by AI and unsupervised ML. However, most existing segmentation systems lack real-time integration of explainable AI (XAI), which is essential for stakeholder trust and regulatory transparency in financial decision-making. Moreover, algorithms like Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Hierarchical Clustering still pose scalability challenges when applied to real-time or high-frequency credit card transaction data. Addressing these gaps is vital for building deployable, trustworthy segmentation frameworks. With more sophisticated algorithms, such as clustering techniques, AI mines customer data to identify earlier unknown patterns and, therefore, come up with specific usable customer segments. That ability allows the banks to tailor services, better target marketing, and even raise the level of service [12, 15-19]. Also, explainable AI or XAI should be applied to the customer segmentation process. Especially in an industry where confidence and transparency mean everything, the reasons behind the segmentation decisions need to be understandable by the stakeholders. As a result, financial institutions lose out on additional opportunities to enhance their offerings generally and enhance the client experience [15].

AI and unsupervised machine learning could overcome these challenges. AI analyzes customer data using more complex algorithms, like clustering techniques, to find formerly hidden patterns and, as a result, generate exact, usable customer segments [11, 20, 21]. This ability empowers the banks to improve target marketing, customize services, and even improve the quality of services. Additionally, the customer segmentation process needs to incorporate explainable AI, often known as XAI. The stakeholders must be able to understand the explanations behind the segmentation decisions, particularly in a sector where trust and openness are crucial. XAI provides the rationale for the model's conclusions, enabling industries to justify their strategies and adhere to legal requirements [2, 17, 20, 22]. SmartSeg's responsibility is to segment credit card customers using these AI techniques. This will give a well-defined structure the capacity to identify and fully understand various client categories without compromising decision-making's interpretability and transparency.

1.1 Motivation

Considering the financial sector is continually evolving, the providers of credit cards continually strive to learn more about their customers to improve their products and services, keep them as customers, and boost profitability. Traditional consumer segmentation, which is usually based on basic demographic information, is rapidly losing its effectiveness in classifying the complicated spending habits of modern credit card users. SmartSeg intends to change the way credit card companies divide up their customer base by using unsupervised machine learning techniques. The main objective of SmartSeg is to use explainable AI and present clustering algorithms to fully understand customers' purchasing patterns, credit behavior, and risk. This data-driven segmentation will enable less credit risk, stronger strategies for marketing, and increased customer satisfaction through targeted financial products and services.

2. Literature Review

Qiu and Wang [1] found how machine learning can enhance the segmentation of credit card customers. There are many new procedures to do this Recency, Frequency, and Monetary (RFM) analysis and neural networks, among others that can increase the speed and accuracy of the segmentation. Using raw algorithms like Gaussian mixtures, DBSCAN, and K-means, a bank or any such financial institution can adapt the methods peculiarly to the spending patterns of its customers. Precise segmentation leads to economic stability and risk management through data-driven decision-making in the credit card business.

Talaat et al. [2] have provided ongoing literature research for describing the integration of Artificial Intelligence and Deep Learning for customer segmentation. Conventional approaches (such as RFM analysis) are incapable of capturing patterns that may not be easily recognized. Therefore, the DeepLimeSeg model integrates the neural network approach with the interpretability through the Lime-based technique. This will use the customers' behavioral data and demographics to segment the customer. DeepLimeSeg was tested on real-world datasets and demonstrated its efficiency in data-driven marketing decisions by outperforming traditional models on metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-square.

The use of k-means clustering for customer segmentation in RFM analysis was investigated by Gustriansyah et al. [3]. It is shown that k-Means is a useful tool for determining customer segmentation and optimizing marketing strategies. However, k-Means has drawbacks, such as sensitivity to the initial center of mass. Future avenues of research include investigating alternative clustering methods and improving k-Means to improve customer segmentation. This research contributes to the development of data-driven marketing strategies.

A comparison was made by Akshaya et al. [4] of the K-means and K-medoids clustering algorithms and reviews data mining techniques used to predict frequent item sets in credit card transactions. The study evaluates the performance of both algorithms using a dataset of 100,000 credit card transactions. The results show that K-medoids outperform K-means in terms of handling outliers and clustering accuracy. Future modifications to the clustering algorithms are proposed in this paper to enhance the prediction of frequent item sets in digital transactions.

Karo et al. [5] employed the K-means algorithm in the identification of different groups of credit card patrons concerning their usage habits. The following six clusters were identified: occasional customers, cash-using, luxury-good customers, timely payers, installment-payers, and recreational shoppers. Six was established as the most optimal number of clusters by the elbow method and the standard of the formed clusters was ascertained by the silhouette index. The work points out that the algorithm will only get better in the next time and it will be able to sort more variables.

Dawood et al. [6] examined the application of machine learning techniques (including K-means, improved K-means, fuzzy C-means, and neural networks) to the analysis of bank customer behavior. The dataset contains 30,000 customer records from a Taiwanese bank. After preprocessing, new labels are generated using an unsupervised clustering algorithm. Using these labels, the neural network achieved a maximum accuracy of 98.08%. With this approach, banks can reduce risk and increase profitability through improved customer analytics. Future research will explore deep learning techniques to further improve analytical accuracy.

The study by Umuhoza et al. [7] focuses on Egypt and performs behavior-based credit card segmentation of African users. It examines how credit card transaction analysis using unsupervised machine learning, more specifically k-means clustering, can classify users into four categories: ordinary people, fashion enthusiasts, executives, and limited consumers. The results could help financial institutions in Africa to better target their marketing efforts, satisfy more customers, and encourage the use of credit cards. The study also highlights the effectiveness of such segmentation in reducing marketing costs and increasing consumer loyalty.

The study by Jha [8] uses unsupervised machine learning to study credit card consumer behavior analysis. It uses clustering techniques such as K-means to group customers according to their spending habits based on a dataset of 9000 credit card holders. The analysis yields three groups: high-balance users, large consumers, and low-balance consumers. The study addresses data quality and ethical issues in the use of the model while highlighting how businesses can use data science to improve marketing strategies, detect fraud, and improve customer segmentation.

Zhou et al. [9] performed research and analysis of different machine learning techniques-based approaches for customer segmentation and personal credit rating estimation based on the dataset of 10,000 credit records. The study calls attention to the now swift shift towards digital payments and omnichannel economies, supported by data-driven approaches. LOF testing methods, gradient boosting, and BP neural networks that would help improve the accuracy of credit scoring will be discussed in this work. More precisely, this paper improved credit ratings, enhanced customer segmentation, and resolved issues of personal credit ratings in e-commerce by incorporating machine learning into conventional bank rating systems.

Rachman et al. [10] investigated Mini Batch K-means for credit card customers' clustering using a data set of 46,079 credit card transactions over three months. This study combined Recency, Frequency, and Monetary analysis with demographic, geographic, and behavioral data in an attempt to develop an improved customer segmentation. It proved that Mini Batch K-means outperformed other algorithms like K-means and BIRCH in runtime. Using the Elbow method, the optimal number of clusters proved to be four. This approach facilitates the study of target marketing by identifying customer loyalty and potential.

Abdulhafedh [11] applied K-means, Hierarchical Clustering, and Principal Component Analysis (PCA) to segment customers by using a dataset from 8,950 credit card holders. This shows that, according to the study with the various evaluation metrics, Silhouette and Dunn indexes, K-means outperforms Hierarchical Clustering. PCA reduces 17 features into 5 principal components, minimizing collinearity and enhancing the efficiency of clustering. Most importantly, PCA unveiled one more group of customers that increased segmentation accuracy. The study concludes that the combination of PCA with K-means improves the identification of distinct customer clusters and amplifies targeted marketing strategies.

Hung et al. [12] used HAC on 9,000 credit card holders featuring credit limits, purchases, and payments. The study preprocesses data by handling outliers and missing values, followed by the determination of three optimal clusters using techniques like the Elbow and Gap statistic. Results show three clear customer segments: low-credit, high-spending, and mixed patterns. It concludes that HAC is suitable only for a small dataset; it advocates cloud-based environments if one wants scalability. This enhances cluster analysis in the formation of marketing strategies in customer segmentation.

Yanık and Elmorsy [13] examined the segmentation of customers using SOMs and k-means clustering for data derived from credit card transactions, considering both consumption and demographic variables. By PCA, a dataset of 38,887 customers had their number of dimensions reduced for ease of clustering. The SOMs outperformed k-means, especially on the demographic feature representations, due to lower Davies-Bouldin indices. Among interesting features clustered: single, high-income people spend a lot on entertainment and insurance. It is recommended that future research apply these clusters to better forecasting and business strategies.

3. Methodology

The proposed model shown in Figure 1 starts with the credit card dataset, which was obtained from Kaggle [23]. The data goes through a data preprocessing step to clean and standardize it. Following this, feature reduction is executed using PCA to decrease dimensionality while preserving crucial information. The processed dataset is then subjected to various Clustering Methods, including K-Means, Spectral Clustering, DBSCAN, and Hierarchical Clustering, to categorize users into relevant groups based on their behavior. To improve interpretability, the outcomes are examined using Explainable AI as SHAP and LIME, offering insights into the clustering process and the attributes of each group. The final result is a visual representation of customer segments, which supports enhanced decision-making and a deeper understanding of customers.

Figure 1. Architecture of the proposed system

3.1 Data preprocessing

The dataset undergoes comprehensive data pre-processing to handle missing values, categorical values, outliers, scaling, and feature selection using PCA. This step ensures that the data quality is optimal for clustering, which is essential for identifying meaningful groupings.

The following steps were done for preprocessing to ensure that the dataset was ready for clustering:

Handling missing values: In this dataset, columns such as "Minimum Payments" were missing entries; these were imputed with the use of the median for imputation to preserve the distribution of data without bias.

Outlier detection and removal: Numerical features such as extremely large balances or credit limits were identified using IQR for outlier detection, then treated by capping extreme values at the 95th percentile.

Feature scaling: Standard Scaler is then used for the normalization of numerical attributes, such as Balance, Purchases, and Credit Limits. All the features needed to have a mean of 0 and a standard deviation of 1, which is necessary for distance-based algorithms such as K-Means.

Dimensionality reduction: PCA reduces the dimensionality such that over 95% of the variance is retained with the intent to cluster data points without strong noise and at a higher speed.

3.2 Model selection

The model selection phase involves unsupervised clustering techniques, which are useful in scenarios with large data. A few clustering algorithms, including K-Means, DBSCAN, Agglomerative Clustering, and Spectral Clustering, are applied to the dataset.

K-Means Clustering: The simplest and widely used clustering algorithm for segmentation in large data distributions, it partitions the data into predefined numbers of clusters by minimizing the variance within a cluster. This algorithm has an iterative technique of assigning points to a cluster whose center is nearest to it and then replacing these centers with the mean of points in each cluster.

Algorithm 1: K-means clustering

Input: Dataset X = {x1, x2, …, xN} containing N data points in a D-dimensional feature space.  

           Predefined number of clusters: K

           Similarity measure: Euclidean distance

Output: Grouping of dataset into X into K distinct clusters

C1, C2, …, CK

1: Select the optimum number of K based on elbow and Calinski-Harabasz method

2: Partition objects into K-subsets

3: Identify the centroids of current partition

4: repeat 

5: for each data point xi in X do 

6: Measure the distance between xi and all current centroids

7: Allocate xi to the cluster associated with the nearest centroid

8: end for 

9: for each cluster Cj where j = 1 to K do 

10: Update the cluster's centroid as the average of its current members

11: end for 

12: while cluster assignments remain unaltered (convergence)

DBSCAN: It is one of the popular algorithms for partitioning data into clusters based on distance to other points. It was good at finding clusters of varying shapes but needed help with high-dimensional data in this case.

Agglomerative clustering: It is a bottom-up hierarchical approach of clustering where each point will initially form its cluster and at each step, the two nearest clusters join to form a new cluster. Small data sets are easily handled, but large data sets are highly problematic due to the computations involved when applying them.

Spectral clustering: It takes complex datasets in rarer dimensions and converts them into clusters of similar data points. The outline is to group the all spectrum of unorganized data points into groups based on the uniqueness of the points. It is considered for its performance in non-linear relationships but is less interpretable for business purposes.

3.3 Model training and evaluation

The Model Training phase includes training of selected models on preprocessed data to create meaningful clusters.

Below are three important measures that would go a long way toward developing comprehensive clustering models in this study concerning effectiveness: Davies-Bouldin Index, Silhouette Score, and Calinski-Harabasz Index.

Davies-Bouldin Index: This is one of the popular methods in machine learning when the need arises to make a very excellent cluster decision by defining the mean similarity from that cluster to the most similar one by its size.

Silhouette score: This measures how corelated an object is to its assigned cluster compared with other clusters (separation). The range is from -1 to +1; a score is higher when the object has a strong connection with its cluster and is far away from neighboring clusters.

Calinski-Harabasz Index: It is also called the Variance Ratio Criterion. It gives the statistical significance of the ratio of all variance between the clusters relative to the variance of each cluster. Higher values of the Calinski-Harabasz Index reflect better clustering performance.

3.4 Customer segmentation

The trained model identified four distinct customer segments:

Cluster 1 (full payers users): Customers with High credit limits and balances. Regular and full repayments.High engagement and loyalty. These are premium customers and major revenue contributors.

Cluster 2 (starter/student users): Customers can be characterized by Low balances and transaction frequencies and Minimal utilization of available credit limits. Likely newer or less active customers needing encouragement for increased usage.

Cluster 3 (installment users): Customers showing High installment-based purchases. Moderate balances but consistent payments. Preferences for structured repayment plans.

Cluster 4 (cash advance/withdraw users): Customers with High dependency on cash advances. High balances but irregular payment behavior. Likely to experience financial stress, presenting both risks and opportunities.

3.5 Explainability

Explainability techniques, such as Shapley values and feature importance analysis, were used to validate cluster assignments and ensure the segmentation logic aligned with business goals.

4. Results

Figure 2 shows a full comparison of the clustering performance of different algorithms using the same 2D PCA projection, to provide a consistent view. Figure 2(a) shows the Distortion Score (Elbow Method). Here the “elbow” is at k = 4, which suggests four clusters may give the right amount of compactness without being overfit. Figure 2(b) shows the Calinski-Harabasz Score and it also has a peak at k = 4, suggesting the selection from the distortion score is reinforced. Based on this, the following commonly used clustering algorithms were applied to the dataset using k = 4 wherever appropriate.

Figure 2(c) illustrates the K-Means clustering outcomes where the clusters are notably separate in PCA space. It displays a successful clustering of four clear-cut clusters.

Figure 2(d) states the outputs from the DBSCAN clustering algorithm were similar to Figure 2(c). Including DBSCAN was important because it clustered multiple dense clusters and identified several points as noise (shown here with outliers), particularly at sparse locations in PCA space.

Figure 2(e) reflects the outputs from Hierarchical clustering with Ward linkage. It yielded similar clustering distributions to the K-Means clustering outputs, but the boundary definitions of the clusters as well as the internal compactness of the clusters differed slightly.

Figure 2(f) displays the outputs from Spectral Clustering. It captured the non-linear relationships between data better than the K-Means clustering outputs. The clusters are distinguishably discrete, but not entirely. There are slight overlaps which suggest that Spectral Clustering may be more parameter sensitive.

Figure 2. Comparison of clustering algorithms and evaluation metrics, visualized in 2D using Principal Component Analysis (PCA): (a) distortion score (Elbow method) for determining optimal K, (b) Calinski-Harabasz Score (Elbow method) for assessing cluster separation, (c) K-Means clustering results, (d) DBSCAN clustering results, (e) Hierarchical clustering results, (f) Spectral clustering results
Note: DBSCAN = Density-Based Spatial Clustering of Applications with Noise.

Table 1. Comparison of algorithms based on evaluation metrics

Algorithm

Davies-Bouldin Index

Silhouette Score

Calinski-Harabasz Index

K-Means

0.800

0.408

5823.343

DBSCAN

1.287

0.803

685.302

Hierarchical

0.855

0.388

4936.960

Spectral

11.334

0.004

190.856

Note: DBSCAN = Density-Based Spatial Clustering of Applications with Noise.

The data presented in the Table 1 indicates that the K-Means algorithm records the lowest Davies-Bouldin index among the three algorithms, suggesting that K-Means offers better clustering quality in comparison to the others. However, in terms of silhouette score, K-Means ranks second highest, indicating the presence of some overlapping clusters generated by this method. Additionally, the performance of the hierarchical clustering algorithm displays clustering quality results that are comparable to those of K-Means, although its Davies-Bouldin index is slightly elevated and its silhouette score is marginally lower. In contrast, DBSCAN demonstrates the poorest performance in terms of the Davies-Bouldin index, yet achieves the most favorable silhouette score among the algorithms assessed. The results from the Calinski-Harabasz index further illustrate that K-Means offers the highest value, signifying its superior performance and density relative to the other algorithms. In summary, it can be inferred that K-Means exhibits the highest clustering quality of the three algorithms, as evidenced by its lowest Davies-Bouldin index and slightly improved overlap of clusters when compared to hierarchical clustering.

Figure 3. Clustering results for K-means based on mean matrix

Figure 4. SHAP summary plot-feature impact
Note: SHAP = SHapley Additive exPlanations.

Clustering results for K-Means based on the mean matrix are shown in Figure 3 and SHAP summary plot provided in Figure 4 gives insight into which features are most impactful within the model's predictions—specifically, "BALANCE_FREQUENCY" and "BALANCE." For "BALANCE_FREQUENCY," higher values colored red mostly increased the model's prediction due to the positive SHAP value output, while the lower ones have a smaller impact, colored blue. For the feature "BALANCE," there is a more important interaction: high and low values may have either positive or negative contributions depending on the context of the prediction. The color bar further maps the color in between that shows the relationship in correspondence to the contribution coming from the value of that specific feature, ranging from the value in red, referring to higher values, through gray, to blue with its lower values.

Table 2. SHAP-based top feature contributions

Cluster

Top Features

SHAP Values

Cluster 0

[BALANCE, BALANCE_FREQUENCY]

[0.779, 0.182]

Cluster 1

[BALANCE_FREQUENCY, BALANCE]

[0.395, 0.387]

Cluster 2

[BALANCE, BALANCE_FREQUENCY]

[0.825, 0.228]

Cluster 3

[BALANCE, BALANCE_FREQUENCY]

[1.967, 0.073]

Note: SHAP = SHapley Additive exPlanations.

Table 2 shows the top two features that influence each customer’s segment based on SHAP values. SHAP helps explain which features most impact a model’s prediction. Customer clusters are primarily influenced by BALANCE and BALANCE_FREQUENCY with BALANCE emerging as the most dominant feature across all groups. Figure 5 shows the predicted probabilities and feature contributions for Cluster 1 and Cluster 2 according to a clustering model. The prediction probabilities are identical between the two clusters, with Cluster 2 being the most likely assigned cluster at 0.59 and Cluster 1 being the second most likely at 0.22. Feature thresholds drive the decision-making in assigning Cluster 2: "BALANCE ≤ -1.49" contributes highly to the prediction, with a value of 0.21, whereas "-1.26 < BALANCE_FREQUENCY" adds contributions of smaller values, ranging between 0.07 and 0.08. These values are constant across the clusters for "BALANCE = -1.68" and "BALANCE_FREQUENCY = -1.08." Therefore, it seems from this that both features are relevant for cluster assignments, but "BALANCE" is driving the model more.

Figure 5. Cluster prediction explanation using LIME
Note: LIME = Local Interpretable Model-Agnostic Explanations.
5. Conclusion

SmartSeg effectively highlights the strengths of unsupervised machine learning techniques and explainable AI in realizing effective customer segmentation of credit card users. Equipped with K-Means, Hierarchical Clustering, and DBSCAN algorithms, this system segments the customers with much efficiency through their behavioral patterns to devise well-articulated marketing and business strategies. Integrating it with explainable AI has improved result interpretability for sure transparency and actionable insights to the stakeholders. However, the system is not without limitations such as sensitivity to data quality, potential demographic bias in the Kaggle dataset, and scalability challenges with DBSCAN and Hierarchical Clustering in real-time contexts. Despite these constraints, SmartSeg offers a robust, adaptable foundation for advanced customer analytics. Future enhancements, including SHAP-driven dashboards for analysts, testing federated learning for secure cross-institutional collaboration, and incorporating scalable clustering for real-time updates, promise to extend its utility. This covers not only a very complex issue related to customer behavior analysis but also gives a way to continually improve and apply it in real-time for various industries.

  References

[1] Qiu, Y., Wang, J. (2023). A machine learning approach to credit card customer segmentation for economic stability. In Proceedings of the 4th International Conference on Economic Management and Big Data Applications, Tianjin, China, pp. 27-29. https://doi.org/10.4108/eai.27-10-2023.2342007

[2] Talaat, F.M., Aljadani, A., Alharthi, B., Farsi, M.A., Badawy, M., Elhosseini, M. (2023). A mathematical model for customer segmentation leveraging deep learning, explainable AI, and RFM analysis in targeted marketing. Mathematics, 11(18): 3930. https://doi.org/10.3390/math11183930

[3] Gustriansyah, R., Suhandi, N., Antony, F. (2020). Clustering optimization in RFM analysis based on k-means. Indonesian Journal of Electrical Engineering and Computer Science, 18(1): 470-477. https://doi.org/10.11591/ijeecs.v18.i1.pp470-477

[4] Akshaya, N., Santhoshkumar, S., Ramaraj, E. (2019). Credit card user frequent buying prediction analysis using cluster methods. International Journal of Innovative Technology and Exploring Engineering, 8(9): 2223-2225. https://doi.org/10.35940/ijitee.I8145.078919 

[5] Karo, I.M.K., Yusmanto, A., Setiawan, R. (2021). Segmentation of credit card customers based on their credit card usage behavior using the K-means algorithm. Journal of Software Engineering, Information and Communication Technology, 2(2): 55-64. https://doi.org/10.17509/seict.v2i2.40220

[6] Dawood, E.A.E., Elfakhrany, E., Maghraby, F.A. (2019). Improve profiling bank customer’s behavior using machine learning. IEEE Access, 7: 109320-109327. https://doi.org/10.1109/ACCESS.2019.2934644

[7] Umuhoza, E., Ntirushwamaboko, D., Awuah, J., Birir, B. (2020). Using unsupervised machine learning techniques for behavioral-based credit card users segmentation in Africa. SAIEE Africa Research Journal, 111(3): 95-101. https://doi.org/10.23919/SAIEE.2020.9142602

[8] Jha, R. (2024). Analyzing credit card consumer behavior using unsupervised machine learning techniques. International Journal of Science and Research, 13(1): 460-463. https://doi.org/10.21275/SR24106025150

[9] Zhou, Y., Jílková, P., Chen, G., Weisl, D. (2020). New methods of customer segmentation and individual credit evaluation based on machine learning. In “New Silk Road: Business Cooperation and Prospective of Economic Development” (NSRBCPED 2019), pp. 925-931. https://doi.org/10.2991/aebmr.k.200324.170

[10] Rachman, F.P., Santoso, H., Djajadi, A. (2021). Machine learning mini batch K-means and business intelligence utilization for credit card customer segmentation. International Journal of Advanced Computer Science and Applications, 12(10): 218-227. https://doi.org/10.14569/IJACSA.2021.0121024

[11] Abdulhafedh, A. (2021). Incorporating k-means, hierarchical clustering and PCA in customer segmentation. Journal of City and Development, 3(1): 12-30. https://doi.org/10.12691/jcd-3-1-3

[12] Hung, P.D., Lien, N.T.T., Ngoc, N.D. (2019). Customer segmentation using hierarchical agglomerative clustering. In Proceedings of the 2nd International Conference on Information Science and Systems, Tokyo, Japan, pp. 33-37. https://doi.org/10.1145/3322645.3322677

[13] Yanık, S., Elmorsy, A. (2019). SOM approach for clustering customers using credit card transactions. International Journal of Intelligent Computing and Cybernetics, 12(3): 372-388. https://doi.org/10.1108/IJICC-11-2018-0157

[14] Oliveira, G.P., Gertrudes, J.C., Oliveira, R.B. (2023). Spending pattern visualization using unsupervised machine learning. In Proceedings of the 38th Brazilian Symposium on Databases, pp. 167-178. https://doi.org/10.5753/sbbd.2023.231577

[15] Yum, K., Yoo, B., Lee, J. (2022). Application of AI-based customer segmentation in the insurance industry. Asia Pacific Journal of Information Systems, 32(3): 496-513. https://doi.org/10.14329/apjis.2022.32.3.496

[16] Afzal, A., Khan, L., Hussain, M.Z., Hasan, M.Z., et al. (2024). Customer segmentation using hierarchical clustering. In 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), Pune, India, pp. 1-6. https://doi.org/10.1109/I2CT61223.2024.10543349

[17] Doshi-Velez, F., Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. https://doi.org/10.48550/arXiv.1702.08608

[18] Khan, R.H., Dofadar, D.F., Alam, M.G.R. (2021). Explainable customer segmentation using K-means clustering. In 021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, pp. 639-643. https://doi.org/10.1109/UEMCON53757.2021.9666609

[19] Wandre, S., Desai, S., Patel, A., Lopes, H. (2022). Credit card fraud detection using KNN and naive Bayes algorithm. Journal of Emerging Technologies and Innovative Research, 9(4): 327-332.

[20] Potluri, C.S., Rao, G.S., Kumar, L.M., Allo, K.G., Awoke, Y., Seman, A.A. (2024). Machine learning-based customer segmentation and personalised marketing in financial services. In 2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE), Gautam Buddha Nagar, India, pp. 1570-1574. https://doi.org/10.1109/IC3SE62002.2024.10593143

[21] Kansal, T., Bahuguna, S., Singh, V., Choudhury, T. (2018). Customer segmentation using K-means clustering. In 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), Belgaum, India, pp. 135-139. https://doi.org/10.1109/CTEMS.2018.8769171

[22] John, J.M., Shobayo, O., Ogunleye, B. (2023). An exploration of clustering algorithms for customer segmentation in the UK retail market. Analytics, 2(4): 809-823. https://doi.org/10.3390/analytics2040042

[23] Bhasin, A. (2018). Credit card dataset for clustering. https://www.kaggle.com/datasets/arjunbhasin2013/ccdata.