Home Journals AMA_B A Novel Algorithm for Clustering and Feature Selection of High Dimensional Datasets

JOURNAL METRICS

CiteScore 2019: 0.50 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2019: 0.117 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2019: 0.415 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

123.png

A Novel Algorithm for Clustering and Feature Selection of High Dimensional Datasets

Thulasi Bikku | Alapati Padma Priya

Member, IAENG, Department of CSE

Vignan's Nirula Institute of Technology and Science for Women, Palakaluru Road, Guntur, Andhra Pradesh 522005, India

Corresponding Author Email:

thulasi.jntua@gmail.com, padmapriyaalapati01@gmail.com

Received:

26 December 2017

| |

Accepted:

15 January 2018

| | Citation

60.03_01.pdf

OPEN ACCESS

Abstract:

The exponential development of perplexing, heterogeneous, dynamic, and unbounded information, produced by an group of fields including health, genomics, material science, climatology, and interpersonal organizations posture noteworthy difficulties in information preparing and desired speed-execution. The responsibility of collection and arrangement of objects such that items in a similar group are more comparable to each other than to those in different groups (clusters). Exploratory information technique is clustering, which arranges the information of dataset into a few groups. There are many grouping methods are accessible. Various types of calculations are best utilized for various types of information. K-means is mostly utilized for clustering analysis algorithm. Big data analytics includes numerous imperative data mining undertakings including clustering, which arranges the information into important clusters in view of the likeness or uniqueness among objects. Experiments are performed on a benchmark dataset to assess the attainability and effectiveness of our calculation. Immense measure i.e. Gigabytes, Terabytes) of information processing and analysis is done using the big data environment. For Cluster analysis technique, mainly the K-mean clustering algorithm is executed through the Hadoop and MapReduce to analyse high dimensional datasets. In big data analytics, the clustering is done when the unlabelled information is handled and used to group clusters of the information. Also when it is examined by the conventional k-means algorithm does not works well with the Hadoop framework and MapReduce programming in this manner it is mandatory to change the algorithm so as to improve the performance on the data analysing techniques. In this manner another clustering algorithm with improvement on conventional k-means clustering algorithm is proposed and executed. This approach initially upgrades the quality of the data by evacuating the anomaly focuses in datasets and afterward the bi-part technique is utilized to play out the grouping. The proposed algorithm for clustering method executed utilizing the Hadoop framework and MapReduce programming at long last the execution of the proposed algorithm of grouping approach is assessed and contrasted and the conventional k-means clustering technique. The acquired execution demonstrates the compelling outcomes and improved accuracy of group construction with the evacuation of the de-effectiveness. In this way the proposed work can be applied for big data environment with enhancing the execution of grouping

Keywords:

K-means, Classification, Clustering, MapReduce, Big data, Accuracy, Hadoop

1. Introduction

2. Algorithm for Traditional K-Means

3. Related Work

4. Proposed Algorithm (KHDD)

5. Experimental Results

6. Conclusion and Future Work

References

[1] Bikku, Thulasi, N. Sambasiva Rao, Ananda Rao Akepogu. Hadoop based feature selection and decision making models on Big Data. 2016, Indian Journal of Science and Technology, vol. 9, no. 10.

[2] Bhatt, Chintan M., S. K. Peddoju, eds. Cloud Computing Systems and Applications in Healthcare. 2016, IGI Global.

[3] Queiroz, Rodrigo, et al. Does feature scattering follow power-law distributions? An investigation of five pre-processor-based systems. 2014, Proceedings of the 6th International Workshop on Feature-Oriented Software Development. ACM.

[4] Ye, Qiang, Ziqiong Zhang, Rob Law. Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. 2009, Expert systems with applications, vol. 36, no. 3, pp. 6527-6535.

[5] Endert, A., et al. 2017, The state of the art in integrating machine learning into visual analytics. Computer Graphics Forum.

[6] Hartigan, John A., Manchek A. Wong. Algorithm AS 136: A k-means clustering algorithm. 1979, Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, no. 1, pp. 100-108.

[7] Jain, Anil K. Data clustering: 50 years beyond K-means. 2010, Pattern recognition letters, vol. 31.8, pp. 651-666.

[8] Fahim, A. M., et al. An efficient enhanced k-means clustering algorithm. 2006, Journal of Zhejiang University-Science A, vol. 7, no. 10, pp. 1626-1633.

[9] Celebi, M. Emre, Hassan A. Kingravi, Patricio A. Vela. A comparative study of efficient initialization methods for the k-means clustering algorithm. 2013, Expert systems with applications, vol. 40, no. 1, pp. 200-210.

[10] Bikku, Thulasi. A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets. 2017, IOP Conference Series: Materials Science and Engineering, vol. 225, no. 1. IOP Publishing.

[11] Arthur, David, Sergei Vassilvitskii. How slow is the k-means method?. 2006, Proceedings of the twenty-second annual symposium on Computational geometry. ACM.

[12] Faber, Vance. Clustering and the continuous k-means algorithm. 1994, Los Alamos Science 22.138144.21.

[13] Pakhira, Malay K. Clustering large databases in distributed environment. 2009, Advance Computing Conference, 2009. IACC 2009. IEEE International. IEEE.

[14] Chen, Dar-Ren, et al. Classification of breast ultrasound images using fractal feature. 2005, Clinical imaging, vol. 29, no. 4, pp. 235-245.

[15] Khan, Shehroz S., Amir Ahmad. Cluster center initialization algorithm for K-modes clustering. 2013, Expert Systems with Applications, vol. 40, no. 18, pp. 7444-7456.

[16] Nazeer, KA Abdul, SD Madhu Kumar, M. P. Sebastian. Enhancing the k-means clustering algorithm by using a O (n logn) heuristic method for finding better initial centroids. Emerging Applications of Information Technology (EAIT), 2011, 2011 Second International Conference on. IEEE.

[17] Wu, Jieming, Wenhu Yu. Optimization and improvement based on K-Means Cluster algorithm. Knowledge Acquisition and Modeling, 2009. KAM'09. Second International Symposium on, vol. 3. IEEE.

[18] Mingoti, Sueli A., Joab O. Lima. Comparing SOM neural network with Fuzzy c-means, K-means and traditional hierarchical clustering algorithms. 2006, European Journal of Operational Research, vol. 174, no. 3, pp. 1742-1759.

[19] Topchy, Alexander, Anil K. Jain, William Punch. Combining multiple weak clusterings. Data Mining, 2003. ICDM 2003. Third IEEE International Conference on. IEEE.

[20] Krzanowski, Wojtek J. Statistical principles and techniques in scientific and social research. 2007, Oxford University Press on Demand.

[21] Bikku, Thulasi. A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets. 2017, IOP Conference Series: Materials Science and Engineering, vol. 225, no. 1. IOP Publishing.

[22] Chang, Xiangyu, et al. Sparse Regularization in Fuzzy $ c $-Means for High-Dimensional Data Clustering. 2017, IEEE transactions on cybernetics.

[23] Strehl, Alexander, Joydeep Ghosh, Raymond Mooney. Impact of similarity measures on web-page clustering. 2000, Workshop on artificial intelligence for web search (AAAI 2000). Vol. 58.

[24] Jung, Se-Hoon, et al. A Novel on Automatic K Value for Efficiency Improvement of K-means Clustering. 2017, Advanced Multimedia and Ubiquitous Engineering. Springer, Singapore, pp. 181-186.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

A Novel Algorithm for Clustering and Feature Selection of High Dimensional Datasets