Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.11851/3846
Title: | Analysis and prediction in sparse and high dimensional text data: The case of Dow Jones stock market | Authors: | Sert, Onur Can Şahin, Salih Doruk Özyer, Tansel Alhajj, Reda |
Keywords: | Named entity recognition topic modelling sentiment analysis social network analysis stock market movement prediction msaenet |
Publisher: | Elsevier B.V. | Source: | Sert, O. C., Şahin, S. D., Özyer, T. and Alhajj, R. (2020). Analysis and prediction in sparse and high dimensional text data: The case of Dow Jones stock market. Physica A: Statistical Mechanics and its Applications, 545, 123752. | Abstract: | In this research, we proposed a text analysis system to predict stock market movements using news and social media data. It is a scalable prediction system for sparse and high dimensional feature sets. Using the developed system, we collected 12,560 articles from New York Times covering one year time period, and 2,854,333 tweets from Twitter covering 4 months time period. We analysed the collected data using entity extraction, sentiment analysis and topic modelling techniques. We applied our feature set creation and elastic net regression based training method. The analyses have been used to train different prediction models. Using these trained prediction models, we predicted stock market movements for Dow Jones Index and showed that the proposed method can make promising predictions. In different sets of experiments, highly accurate (up to 70.90% accuracy) predictions are made by the proposed approach. These predicted values also correlated (up to 0.2315 correlation coefficient value) with real Dow Jones Index values. Further, we report performance comparison results for various prediction models that we trained with different set of features to analyse the importance of time interval and feature space size. Our test results show that it is possible to make reasonable stock movement prediction by integrating news and related social media data, analysing them using named entity extraction, sentiment analysis and topic modelling techniques together with prediction models which use features that are created from these analysis results. (C) 2019 Elsevier B.V. All rights reserved. | URI: | https://www.sciencedirect.com/science/article/pii/S0378437119320904?via%3Dihub https://hdl.handle.net/20.500.11851/3846 |
ISSN: | 0378-4371 |
Appears in Collections: | Bilgisayar Mühendisliği Bölümü / Department of Computer Engineering Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection Yapay Zeka Mühendisliği Bölümü / Department of Artificial Intelligence Engineering |
Show full item record
CORE Recommender
SCOPUSTM
Citations
4
checked on Nov 2, 2024
WEB OF SCIENCETM
Citations
11
checked on Nov 2, 2024
Page view(s)
250
checked on Nov 4, 2024
Google ScholarTM
Check
Altmetric
Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.