Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/3846
Title: Analysis and prediction in sparse and high dimensional text data: The case of Dow Jones stock market
Authors: Sert, Onur Can
Şahin, Salih Doruk
Özyer, Tansel
Alhajj, Reda
Keywords: Named entity recognition
topic modelling
sentiment analysis
social network analysis
stock market movement prediction
msaenet
Publisher: Elsevier B.V.
Source: Sert, O. C., Şahin, S. D., Özyer, T. and Alhajj, R. (2020). Analysis and prediction in sparse and high dimensional text data: The case of Dow Jones stock market. Physica A: Statistical Mechanics and its Applications, 545, 123752.
Abstract: In this research, we proposed a text analysis system to predict stock market movements using news and social media data. It is a scalable prediction system for sparse and high dimensional feature sets. Using the developed system, we collected 12,560 articles from New York Times covering one year time period, and 2,854,333 tweets from Twitter covering 4 months time period. We analysed the collected data using entity extraction, sentiment analysis and topic modelling techniques. We applied our feature set creation and elastic net regression based training method. The analyses have been used to train different prediction models. Using these trained prediction models, we predicted stock market movements for Dow Jones Index and showed that the proposed method can make promising predictions. In different sets of experiments, highly accurate (up to 70.90% accuracy) predictions are made by the proposed approach. These predicted values also correlated (up to 0.2315 correlation coefficient value) with real Dow Jones Index values. Further, we report performance comparison results for various prediction models that we trained with different set of features to analyse the importance of time interval and feature space size. Our test results show that it is possible to make reasonable stock movement prediction by integrating news and related social media data, analysing them using named entity extraction, sentiment analysis and topic modelling techniques together with prediction models which use features that are created from these analysis results. (C) 2019 Elsevier B.V. All rights reserved.
URI: https://www.sciencedirect.com/science/article/pii/S0378437119320904?via%3Dihub
https://hdl.handle.net/20.500.11851/3846
ISSN: 0378-4371
Appears in Collections:Bilgisayar Mühendisliği Bölümü / Department of Computer Engineering
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Yapay Zeka Mühendisliği Bölümü / Department of Artificial Intelligence Engineering

Show full item record



CORE Recommender

SCOPUSTM   
Citations

4
checked on Nov 2, 2024

WEB OF SCIENCETM
Citations

11
checked on Nov 2, 2024

Page view(s)

250
checked on Nov 4, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.