Optimal Clustering Techniques for The Segmentation of Tourist Spending. Analysis of Tourist Surveys in The Valencian Community (Spain): A Case Study

Optimal Clustering Techniques for The Segmentation of Tourist Spending. Analysis of Tourist Surveys in The Valencian Community (Spain): A Case Study

A. Rabasa A. Pérez-martín D. Giner 

University Miguel Hernández of Elche, Spain

Instituto Valenciano Tecnologías Turísticas, Agència Valenciana de Turismo, Spain

1 February 2018
| Citation



The Valencian Community (South-East Spain) is one of the most important tourist destinations in Europe. The Valencian Government has been carrying out surveys about the types of travel, the type of transport, the type of accommodation, the duration of the trip and the number of travellers, as well as other issues. The aim is to discover the different spending typologies incurred by foreign visitors.

In their task of drawing up more attractive tourist strategies, the following questions may become particularly relevant to the Valencian Public Services: what type of traveller spends more on transportation in their own country, or pays for it in the Valencian Community; visitors’ nationalities and their higher or lower propensity to spend money on leisure; or the number of overnight stays in low-end destinations.

But the surveys gathering all this information consist of multiple and nested responses, distributed in thematic blocks that overlap, and whose translation to flat file systems (susceptible to being analysed with acceptable counting times) is a complex problem.

This paper presents a treatment process of the surveys, especially oriented towards having a suitable dataset to generate models of optimal segmentation of the different types of expenditure. Likewise, some results of such segmentation are shown, which are proving to be of great value to public managers in their challenge to offer suitable tourist alternatives to each type of traveller.

The paper includes an example of how open data sources can be incorporated into the original dataset in order to obtain better segmentation. A variation to the classical segmentation methods (algorithms of the K means family) is also provided, which leads to the establishment of the optimal number of groups for each computational experiment.


Big data, clustering, optimization, surveys analysis, tourism


[1] Solsona, J., Tourism development in rural space, situation analysis and prospective. Study applied to the case of the Region of Valencia (Doctoral thesis). Ed. Universitat Jaume I de Castellón, 2010.

[2] Tour Spain, Egatur Statistics, available at: http://estadisticas.tourspain.es, Accessed on: 20 March, 2017.

[3] Instituto Nacional de Estadística. http://www.ine.es/, Accessed on: 20 March, 2017.

[4] Vapnik, V. The Nature of Statistical Learning Theory. Springer-Verlag, New York, pp. 95–104, 1995.

[5] Hand, D., Mannila, H. & Smyth, P., Principles of data mining, Cambridge, MA: The MIT Press, pp. 84–102, 2001.

[6] Wasilewska, A & Menasalvas, E., Data preprocessing and data mining as generalization. Data mining: Foundations and practice. Studies in Computational Intelligence, 118, pp. 469–484, 2008. https://doi.org/10.1007/978-3-540-78488-3_27

[7] MacQueen, J.B., Some Methods for classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. 1. University of California Press. pp. 281–297, 1967.

[8] K Means, Based on a handout. Andrew Ng & Jordan, M. Standford University. available at: http://stanford.edu/~cpiech/cs221/handouts/kmeans.html. (Accessed on 15 March 2017.)

[9] Buhalis, D. & Law, R., Progress in information technology and tourism management: 20 years on and 10 years after the Internet the state of eTourism research. Tourism Management, 29(4), pp. 609–623, 2008. https://doi.org/10.1016/j.tourman.2008.01.005