Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/9103
Title: Distributed Sentiment Analysis for Geo-Tagged Twitter Data
Other Titles: Co?rafi Etiketli Twitter Verileri için Da?itik Duygu Analizi
Authors: Zengin M.S.
Arslan R.
Akgun M.B.
Keywords: BERT
Big data
distributed data processing
sentiment analysis
Big data
Data Analytics
Data handling
Forecasting
Social networking (online)
Analysis models
BERT
Computational social science
Data set
Distributed data processing
Prediction time
Primary sources
Sentiment analysis
Social media
Social media datum
Sentiment analysis
Issue Date: 2022
Publisher: Institute of Electrical and Electronics Engineers Inc.
Abstract: The ever-increasing frequency of sharing on social media makes these platforms one of the primary sources of data for computational social science studies. Similarly, examining and analyzing large scale social media data-sets is crucial for governments as well as companies. However, as the amount of data increases, insights that need to be derived from the data using artificial intelligence based models becomes more and more demanding in terms of processing power. In fact, hardware requirements might dramatically increase if the insights are needed under real-time or near-real time constraints. In this study, we developed a distributed sentiment analysis model that utilizes a large social media data-set. 16 million tweets have been collected and grouped by the originating city. The sentiment analysis model was produced by fine-tuning the pre-trained BERT model. Distributed big data analytics engine, Apache Spark, is used to execute the trained model in a distributed fashion. For evaluation purposes, the prediction time on a single compute unit is compared with the distributed prediction time. Sentiment analysis model has been executed separately for each of the data-groups corresponding to 81 provinces. The data-set containing 16 million tweets used in this study, the Turkish sentiment analysis model produced, the distributed prediction code developed for Apache Spark and all the results of the study can be accessed from the address https://distributed-sentiment-analysis.github.io/. © 2022 IEEE.
Description: 30th Signal Processing and Communications Applications Conference, SIU 2022 -- 15 May 2022 through 18 May 2022 -- -- 182415
URI: https://doi.org/10.1109/SIU55565.2022.9864702
https://hdl.handle.net/20.500.11851/9103
ISBN: 9.78167E+12
Appears in Collections:Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Show full item record

CORE Recommender

Google ScholarTM

Check

Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.