Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.11851/9103
Title: | Distributed Sentiment Analysis for Geo-Tagged Twitter Data | Other Titles: | Co?rafi Etiketli Twitter Verileri için Da?itik Duygu Analizi | Authors: | Zengin M.S. Arslan R. Akgun M.B. |
Keywords: | BERT Big data distributed data processing sentiment analysis Big data Data Analytics Data handling Forecasting Social networking (online) Analysis models BERT Computational social science Data set Distributed data processing Prediction time Primary sources Sentiment analysis Social media Social media datum Sentiment analysis |
Publisher: | Institute of Electrical and Electronics Engineers Inc. | Abstract: | The ever-increasing frequency of sharing on social media makes these platforms one of the primary sources of data for computational social science studies. Similarly, examining and analyzing large scale social media data-sets is crucial for governments as well as companies. However, as the amount of data increases, insights that need to be derived from the data using artificial intelligence based models becomes more and more demanding in terms of processing power. In fact, hardware requirements might dramatically increase if the insights are needed under real-time or near-real time constraints. In this study, we developed a distributed sentiment analysis model that utilizes a large social media data-set. 16 million tweets have been collected and grouped by the originating city. The sentiment analysis model was produced by fine-tuning the pre-trained BERT model. Distributed big data analytics engine, Apache Spark, is used to execute the trained model in a distributed fashion. For evaluation purposes, the prediction time on a single compute unit is compared with the distributed prediction time. Sentiment analysis model has been executed separately for each of the data-groups corresponding to 81 provinces. The data-set containing 16 million tweets used in this study, the Turkish sentiment analysis model produced, the distributed prediction code developed for Apache Spark and all the results of the study can be accessed from the address https://distributed-sentiment-analysis.github.io/. © 2022 IEEE. | Description: | 30th Signal Processing and Communications Applications Conference, SIU 2022 -- 15 May 2022 through 18 May 2022 -- -- 182415 | URI: | https://doi.org/10.1109/SIU55565.2022.9864702 https://hdl.handle.net/20.500.11851/9103 |
ISBN: | 9.78167E+12 |
Appears in Collections: | Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection Öğrenci Yayınları / Students' Publications |
Show full item record
CORE Recommender
Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.