Home Journals IJDNE A Computational Experience for Automatic Feature Selection on Big Data Frameworks

JOURNAL METRICS

CiteScore 2022: 2.0 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2022: 0.254 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2022: 0.699 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

123.png

A Computational Experience for Automatic Feature Selection on Big Data Frameworks

Y. Orenes| A. Rabasa | A. PÉrez-Martín | J.J. Rodríguez-Sala | J. Sánchez-Soriano

Miguel Hernández University of Elche, Spain

Received:

N/A

| |

Accepted:

N/A

| | Citation

dne110302f.pdf

OPEN ACCESS

https://www.witpress.com/elibrary/dne-volumes/11/3/1188

Abstract:

The classification rule system is one of the predictive analytical techniques used in Big Data problems, where finding datasets with millions of rows but also with dozens of variables (attributes) is common. Classification rule systems consist of rule sets which have a so-called antecedent (variable or set of variables that can be numeric or nominal) and a consequent (target variable, provided nominal). If the antecedent variables are numerical, many generator algorithms of classification rules employ traditional methods of automatic feature selection, based on techniques already established in the scientific field, such as discriminant analysis or cluster analysis. In this paper, the authors propose the comparison of their own method of feature selection and classification, RBS (originally designed to manage only nominal variables) and classical methods of feature selection. After the formal definition of our own method, this paper presents the design of a computing experience that allows a qualitative and quantitative comparison of the adapted RBS and other methods for feature selection. Finally, optimal conditions of application of each method are discussed and future research areas in the field of automatic feature selection are identified.

Keywords:

big data, classification rule systems, feature selection

References

[1] Quinlan, J.R., Induction of decision trees. Machine learning, 1, pp. 81–106, 1986.

http://dx.doi.org/10.1007/BF00116251

[2] Lê Cao, K.A., Boitard, S. & Besse, P., Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics, 12(253), pp. 1–16, 2011. http://dx.doi.org/10.1186/1471-2105-12-253

[3] Fisher, R.A., Use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), pp. 179–184, 1936. http://dx.doi.org/10.1111/j.1469-1809.1936.tb02137.x

[4] Almiñana, M., Escudero, L.F., Pérez-Martín, A., Rabasa, A. & Santamaría, L., A classification rule reduction algorithm based on significance domains, TOP, 22, pp. 367–416, 2012.

[5] Rabasa, A., Compañ, A., Agulló, J.J., Rodríguez-Sala, J.J., Santamaría, L. & Noguera, L., Data management for an anaesthesiology department optimization. WIT Transactions on Information and Communication Technologies, eds. A. Rabasa, C.A. Brebbia & A. Bia, WIT Press, 45, pp. 175–183, 2013.

[6] WEKA, Waikato Environment for Knowledge Analysis. Machine Learning Group at the University of Waikato: New Zealand, available at http://www.cs.waikato.ac.nz/ml/weka/

[7] Team, R., A language and environment for statistical computing. R Foundation for Statistical Computing, R Core Team, Vienna, Austria, available at http://www.r-project.org/

[8] Venables, W.N. & Ripley, B.D., Modern Applied Statistics with S, 4th edn, Springer: New York, 2002, ISBN 0-387-95457-0 http://dx.doi.org/10.1007/978-0-387-21706-2

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

A Computational Experience for Automatic Feature Selection on Big Data Frameworks