Home Journals RCES An Improved Parallel Bayesian Text Classification Algorithm

JOURNAL METRICS

CiteScore 2019: N/A ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2019: N/A ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2019: N/A ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

An Improved Parallel Bayesian Text Classification Algorithm

School of Computer Science and Information Technology, Zhejiang Wanli University, Ningbo, China

Corresponding Author Email:

1172340155@qq. com

Received:

| |

Accepted:

| | Citation

03.1_02.pdf

OPEN ACCESS

Abstract:

Used the idea of cloud computing, according to MapReduce model to solve the traditional Bayesian classification algorithm suited to large-scale data deficiencies, greatly improved the speed of classification. The combination of the characteristics of the parallel algorithm was improved accordingly. Adding synonyms and word frequency filtering combined approach allows vector dimensionality reduction, reducing false positives. Wherein the particular keyword was then weighted to enhance the accuracy of the classification. Finally, the Hadoop cloud computing platform was experimentally proved that the traditional text classification algorithm after parallelization on Hadoop cloud computing platforms, has better speedup, and the improved algorithm can improve the classification accuracy.

Keywords:

Cloud computing, Text classification, Parallel, Hadoop

1. Introduction

2. Naive Bias Classification Algorithm and Its Paralleization

3. Classification Algorithms in Cloud Computing Environment

4. Experimental Results and Analysis on Cloud Platform

5. Conclusions

Acknowledgement

References

[1] Jing Y. S., Pavlovic V., Rehg J. M., “Boosted Bayesian network classifiers,” Machine Learning, 2008, vol. 73, no. 2, pp. 155-184.

[2] Webb G. I., Boughton J. R., Zheng F., et al. “Learning by extrapolation from marginal to full-multivariate probability distributions: Decreasingly naive Bayesian classification,” Machine Learning, 2012, vol. 86, no. 2, pp. 233-272

[3] Tillman R. E., “Structure learning with independent non-identically distributed data,” Proceedings of the 26th Annual International Conference on Machine Learning, New York, 2009, pp. 1041-1048.

[4] Su J., Zhang H., Ling C. X., et al., “Discriminative parameter learning for Bayesian networks,” Proceeding of the 25th International Conference on Machine Learning Helsinki, Finland, 2008, pp. 1014-1023.

[5] Ekanayake J., Li H., Zhang B., et al. “Twister: A runtime for interactive MapReduce,” Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, USA, 2010, pp. 810-818.

[6] Dean J, Ghemawat S. Mapreduce, “Smiplified data processing on large clusters,” Proceedings of the 6th Symposium on Operating System Design and Implementation, SanFrancisco, California, USA: USENIX Association, 2004, pp. 137-150.

[7] Thusoo A., Sarms J. S., Jain N., et al., “Hive: A warehousing solution over a map-reduce framework,” Proceedings of the Conference on Very Large Databases, Ly-on, France, 2009, pp. 1626-1629.

[8] Dean J., Ghemawat S., “Map/Reduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs,” Communications of the ACM, vol. 53, no. 1, pp. 72-77.

[9] Dittrich J., Quiane-Ruiz J. A., Jindal A., et al., “Hadoop++: Making a yellow elephant run like a cheetah(without it evennoticing),” Proceedings of the VLDB Endowment, vol. 3, no. 1, pp. 518-529, 2010.

[10] Bu Y., Howe B., Balazinska M., et al., “HaLoop: Efficient iterative data processing on large clusters,” Proceedings of the VLDB Endowment, vol. 3, no. 1, pp. 285-296, 2010.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

An Improved Parallel Bayesian Text Classification Algorithm