Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/2669
Title: Classification of Turkish Documents Using Paragraph Vector
Authors: Sarı, Mustafa
Özbayoğlu, Ahmet Murat
Keywords: PV-DBOW
PV-DM
DL4J
paragraph Vectors
word2Vec
doc2Vec
text mining
author profile identification
Publisher:  Institute of Electrical and Electronics Engineers Inc.
Source: Sarı, M., and Özbayoğlu, A. M. (2018, September). Classification of Turkish Documents Using Paragraph Vector. In 2018 International Conference on Artificial Intelligence and Data Processing (IDAP) (pp. 1-5). IEEE.
Abstract: Text processing and mining gained a lot of traction recently due to rising interest in integration of Natural Language Processing with data analytics algorithms, in particular Deep Learning Models. In this study, newspaper columnists are classified according to vector models created by their posts. Hence, we may not only be able to determine an unclassified post's author, but also author profiles can be formed by grouping similar styles together. DeepLearning4J Java library and Doc2Vec class are mainly the preferred deep learning solutions for text mining. The vector models of 5, 10, 15, and 20 authors were created from 20k corner posts. Two particular implementations, Distributed Memory (PV-DM) and Distributed Bag of Words (PV-DBOW) models were adapted and their performances are compared. According to the results, it is seen that some authors are clearly distinguished from other authors. Such a model can be used for author profile extraction, plagiarism detection and identifying which author styles are similar. © 2018 IEEE.
Description: 2018 International Conference on Artificial Intelligence and Data Processing ( 2018: Malatya; Turkey )
URI: https://ieeexplore.ieee.org/document/8620813
https://hdl.handle.net/20.500.11851/2669
ISBN: 9.78154E+12
Appears in Collections:Bilgisayar Mühendisliği Bölümü / Department of Computer Engineering
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Show full item record



CORE Recommender

Page view(s)

126
checked on Nov 4, 2024

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.