Social media systems have been proven to be valuable platforms for information and communication, particularly during events; in case of natural disaster like earthquakes tsunami and states of nuclear emergencies in Japan in 2011. The behavior leads to an accumulation of an enormous amount of information. However, finding relevant posts can be a challenging task, since the relevance of a post is dependent both on its content, author and tweet’s characteristics. Besides identifying tweets that describe a specific type of event is also challenging due to the high complexity and variety of event descriptions. These challenges present a big opportunity for Natural Language Processing (NLP) and Information Extraction (IE) technology to enable new large-scale data-analysis applications. Taking to account all the difficulties, this paper proposes a new metric to improve the results of the searches in microblogs. It combines content relevance, tweet relevance and author relevance, and develops a Natural Language Processing method for extracting temporal information of events from posts more specifically tweets. Our approach is based on a methodology of temporal markers classes and on a contextual exploration method. To evaluate our model, we built a knowledge management system. Actually, we used a collection of 10 thousand of tweets talking about the current events in 2014 and 2015.
microblogs, relevant information, NLP, tweets search, information retrieval
Akermi I., Faiz R. (2012). Hybrid method for computing word-pair similarity based on web content. In Proceedings of the International Conference on Web Intelligence, Mining and Semantics, WIMS’12, New York, NY, USA, ACM.
Barbosa L., Feng J. (2010). Robust sentiment detection on Twitter from biased and noisy data. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, p. 36-44. Association for Computational Linguistics.
Becker H., Naaman M. and Gravano L. (2010. Learning similarity metrics for event identification in social media. In WSDM’10.
Ben Kraiem M., Feki J., Khrouf K., Ravat F., Teste O. (2014). OLAP of the tweets: From modeling to exploitation. IEEE International Conference on Research Challenges in Information Science (IEEE RCIS’14)
Cha M., Haddadi H., Benevenuto Krishna F., Gummadi P. (2010). Measuring User Influence in Twitter: The Million Follower Fallacy. Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media 2010, ICWSM’10.
Chakrabarti D., Punera K. (2011). Event Summarization using Tweets, in ICWSM 2011.
Chambers N., Jurafsky D. (2011). Template-based information extraction without the templates. In Proceedings of ACL, Portland, OR.
Cherichi S., Faiz R. (2014). Analyzing the behavior and text posted by users to extract knowledge. In proceedings of the International Conference on Computational Collective Intelligence Technologies and Applications, ICCCI’14, Seoul, Korea, ACM 2014 Lecture Notes in Artificial Intelligence of Springer-Verlag.
Cherichi S., Faiz R. (2013a). New metric measure for the improvement of search results in microblogs. Proc. of the International Conference on Web Intelligence, Mining and Semantics (WIMS’13), New York, NY, USA, ACM.
Cherichi S., Faiz R. (2013b). Relevant information discovery in microblogs: new metric measure for the improvement of search results in microblogs. Proc. of INSTICC International Conference on Knowledge Discovery and Information Retrieval (KDIR’13), Vilamoura, Portugal, 19-22 September. ©SciTePress
Cherichi S., Faiz R. (2013c). Relevant information management in microblogs. In International Conference on Knowledge Management, Information and Knowledge Systems (KMIKS 2013), Hammamet, Tunisia, Avril.
Doan S., Vo B.K.H., Collier N. (2011). An analysis of Twitter messages in the 2011 Toho earthquake. Arxiv preprint arXiv:1109.1618,
Duan Y., Jiang L., Qin T., (2010). An empirical study on learning to rank of tweets. COLING Proceedings of the 23rd International Conference on Computational Linguistics Proceedings of the Conference, 23-27 August, Beijing, China, p. 295-303, Tsinghua
Faiz R. (2006). Identifying relevant sentences in news articles for event information extraction. International Journal of Computer Processing of Oriental Languages (IJCPOL), World Scientific, vol. 19, n° 1, p. 1-19.
Huang C. (2011). Facebook and Twitter key to Arab Spring uprisings: report. http://bit.ly/ 1bh6jV6. [Online; accessed 28-August-2013].
James A., Papka R., Lavrenko V. (1998). On-line new event detection and tracking. In SIGIR.
Jiang L., Yu M., Zhou M., Liu X., Zhao T. (2011). Target-dependent Twitter sentiment classification. Proc. 49th ACL: HLT, 1, p.151-160.
Lin J., Snow R., Morgan W. (2011). Smoothing techniques for adaptive online language models: Topic tracking in tweet streams. In KDD.
Li J., Ritter A., Hovy E. (2014). Weakly supervised user profile extraction from twitter. In Proc. ACL. Jones S., Walker K., Robertson S. (2000). A probabilistic model of information retrieval: Development and comparative experiments. Information Processing & Management, vol. 36, n° 6, p. 779-808.
Jure L., Lars B., Kleinberg J. (2009). Meme-tracking and the dynamics of the news cycle. In KDD.
Kwak H., Lee C., Park H., Moon S. (2010). What is Twitter, a social network or a news media? In WWW’10. Lampos V, Cristianini N. (2010). Tracking the u pandemic by monitoring the social web. In Cognitive Information Processing (CIP), 2nd International Workshop on, IEEE, p. 411-416.
Mendoza M., Poblete B., Castillo C. (2010). Twitter Under Crisis: Can we Trust What We RT? In Proceedings of the First Workshop on Social Media Analytics. Metzler D., Cai C., Hovy E. (2012). Structured event retrieval over microblog archives. In Proc. of HLT-NAACL
Osborne M., Moran S., McCreadie R., der Von Lunen A., Sykora M., Cano E., Ireson N., Macdonald C., Ounis I., He Y., Jackson T., Ciravegna F., O’Brien A. (2014). Real-time detection, tracking, and monitoring of automatically discovered events in social media. In
O’Connor B., Balasubramanyan R., Routledge B.R., Smith N.A. (2010). From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the International AAAI Conference on Weblogs and Social Media, p. 122-129.
Page L. (1997). “PageRank: Bringing Order to the Web” at the Wayback Machine (archived May 6, 2002), Stanford Digital Library Project, talk. August 18, (archived 2002)
Panem S., Gupta M., Varma V. (2014). Structured information extraction from natural disaster events on twitter. August.
Parikh R., Karlapalem K. (2013). Et: Events from tweets. In Proc. companion WWW.
Petrovi´c S., Osborne M., Lavrenko V. (2010). Streaming first story detection with application to Twitter. In NAACL’10.
Qu Y., Zhang C.P., Zhang J. (2011). Microblogging after a Major Disaster in China: A Case Study of the 2010 Yushu Earthquake. In Proceedings of the ACM 2011 conference on Computer supported cooperative work, p. 25-34.
Ritter A., S. Clark, Mausam, and O. Etzioni (2011). Named entity recognition in tweets: An experimental study. In Proc. EMNLP.
Ritter A., Mausam, O. Etzioni, S. Clark (2012). Open domain event extraction from twitter. In Proc. KDD.
Robertson S. E., Spärck J.K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science vol. 27, n° 3, p. 129. doi:10.1002/asi.4630270302
Sakaki T., Okazaki M., Matsuo Y. (2010). Earthquake shakes Twitter users: Real-time event detection by social sensors. In WWW’10.
Sankaranarayanan J., Samet H., Teitler B.E., Lieberman M.D., Sperling J. (2009). Twitterstand: News in tweets. In GIS’09.
Volkova S., Wilson T., Yarowsky D. (2013). Exploring sentiment in social media: Bootstrapping subjectivity clues from multilingual twitter streams. In Proc. ACL short paper.
Xin Zhao W., Jiang J., He J., Song Y., Achananuparp P., Lim E., Li X. (2011). Topical keyphrase extraction from twitter. In Proc. ACL’11.
Yard S., Boyd D. (2010). Tweeting from the town square: Measuring geographic local networks. In ICWSM’10.
Zheng X., Zeng Z., Chen Z., Yu Y., Rong C. (2015). Detecting spammers on social networks Neurocomputing, vol. 159, 2 July, p. 27-34