Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.11851/12585
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yazar, Togay | - |
dc.contributor.author | Kutlu, Mucahid | - |
dc.contributor.author | Bayirli, Isa Kerem | - |
dc.date.accessioned | 2025-08-10T17:34:59Z | - |
dc.date.available | 2025-08-10T17:34:59Z | - |
dc.date.issued | 2025 | - |
dc.identifier.issn | 1574-020X | - |
dc.identifier.issn | 1574-0218 | - |
dc.identifier.uri | https://doi.org/10.1007/s10579-025-09857-w | - |
dc.identifier.uri | https://hdl.handle.net/20.500.11851/12585 | - |
dc.description.abstract | Over the past century, the Turkish language has undergone substantial changes, mainly driven by governmental interventions. The relatively rapid linguistic evolution of the Turkish language complicates the processing of historical Turkish documents. In this work, we introduce Turkronicles, which is a diachronic corpus for Turkish derived from the Official Gazette of T & uuml;rkiye and the records of the Grand National Assembly of T & uuml;rkiye, spanning the period from 1920 to 2024. Turkronicles contains 46,328 documents and 1.1B tokens, making it an important resource for analyzing the linguistic evolution of Turkish and developing models to process historical Turkish documents. In addition, we develop a library to conduct linguistic analysis on diachronic corpora easily. Furthermore, we train a model to fix OCR errors within the documents. Moreover, we explore how the Turkish vocabulary and the writing conventions have changed since 1920 using our corpus. Our analysis reveals that the vocabulary has changed significantly and multiple spellings exist for several words. Specifically, we show that vocabulary divergence increases over time, as expected. Due to such significant vocabulary change in Turkish over time, similarity between the periods 1920-1929 and 2010-2019 is 57%. Despite the substantial vocabulary changes, we demonstrate that it is possible to identify old Turkish words that have the same meanings with newly coined ones using word embeddings. Regarding writing conventions, we found a noticeable decrease in the use of circumflex. In addition, words ending with the letters '-b' and '-d' have been largely replaced by their counterparts ending with '-p' and '-t', respectively, although the former are still in use. Lastly, we observe an increase in the usage of words that comply with vowel harmony rules as a result of the "purification" process of Turkish Language Reform. Overall, our study quantitatively highlights the dramatic changes in Turkish from various linguistic aspects. | en_US |
dc.description.sponsorship | Qatar National Library | en_US |
dc.description.sponsorship | Open Access funding provided by the Qatar National Library | en_US |
dc.language.iso | en | en_US |
dc.publisher | Springer | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Diachronic Corpora | en_US |
dc.subject | Diachronic Analysis | en_US |
dc.subject | Turkish Corpus | en_US |
dc.subject | Frequency Analysis | en_US |
dc.title | Turkronicles: Diachronic Resources for the Fast Evolving Turkish Language | en_US |
dc.type | Article | en_US |
dc.department | TOBB University of Economics and Technology | en_US |
dc.identifier.wos | WOS:001538468500001 | - |
dc.identifier.doi | 10.1007/s10579-025-09857-w | - |
dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
dc.identifier.scopusquality | Q1 | - |
dc.identifier.wosquality | Q3 | - |
dc.description.woscitationindex | Science Citation Index Expanded | - |
item.fulltext | No Fulltext | - |
item.languageiso639-1 | en | - |
item.openairecristype | http://purl.org/coar/resource_type/c_18cf | - |
item.cerifentitytype | Publications | - |
item.grantfulltext | none | - |
item.openairetype | Article | - |
crisitem.author.dept | 02.3. Department of Computer Engineering | - |
Appears in Collections: | WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection |
CORE Recommender
Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.