Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/12585
Full metadata record
DC FieldValueLanguage
dc.contributor.authorYazar, Togay-
dc.contributor.authorKutlu, Mucahid-
dc.contributor.authorBayirli, Isa Kerem-
dc.date.accessioned2025-08-10T17:34:59Z-
dc.date.available2025-08-10T17:34:59Z-
dc.date.issued2025-
dc.identifier.issn1574-020X-
dc.identifier.issn1574-0218-
dc.identifier.urihttps://doi.org/10.1007/s10579-025-09857-w-
dc.identifier.urihttps://hdl.handle.net/20.500.11851/12585-
dc.description.abstractOver the past century, the Turkish language has undergone substantial changes, mainly driven by governmental interventions. The relatively rapid linguistic evolution of the Turkish language complicates the processing of historical Turkish documents. In this work, we introduce Turkronicles, which is a diachronic corpus for Turkish derived from the Official Gazette of T & uuml;rkiye and the records of the Grand National Assembly of T & uuml;rkiye, spanning the period from 1920 to 2024. Turkronicles contains 46,328 documents and 1.1B tokens, making it an important resource for analyzing the linguistic evolution of Turkish and developing models to process historical Turkish documents. In addition, we develop a library to conduct linguistic analysis on diachronic corpora easily. Furthermore, we train a model to fix OCR errors within the documents. Moreover, we explore how the Turkish vocabulary and the writing conventions have changed since 1920 using our corpus. Our analysis reveals that the vocabulary has changed significantly and multiple spellings exist for several words. Specifically, we show that vocabulary divergence increases over time, as expected. Due to such significant vocabulary change in Turkish over time, similarity between the periods 1920-1929 and 2010-2019 is 57%. Despite the substantial vocabulary changes, we demonstrate that it is possible to identify old Turkish words that have the same meanings with newly coined ones using word embeddings. Regarding writing conventions, we found a noticeable decrease in the use of circumflex. In addition, words ending with the letters '-b' and '-d' have been largely replaced by their counterparts ending with '-p' and '-t', respectively, although the former are still in use. Lastly, we observe an increase in the usage of words that comply with vowel harmony rules as a result of the "purification" process of Turkish Language Reform. Overall, our study quantitatively highlights the dramatic changes in Turkish from various linguistic aspects.en_US
dc.description.sponsorshipQatar National Libraryen_US
dc.description.sponsorshipOpen Access funding provided by the Qatar National Libraryen_US
dc.language.isoenen_US
dc.publisherSpringeren_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectDiachronic Corporaen_US
dc.subjectDiachronic Analysisen_US
dc.subjectTurkish Corpusen_US
dc.subjectFrequency Analysisen_US
dc.titleTurkronicles: Diachronic Resources for the Fast Evolving Turkish Languageen_US
dc.typeArticleen_US
dc.departmentTOBB University of Economics and Technologyen_US
dc.identifier.wosWOS:001538468500001-
dc.identifier.doi10.1007/s10579-025-09857-w-
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.identifier.scopusqualityQ1-
dc.identifier.wosqualityQ3-
dc.description.woscitationindexScience Citation Index Expanded-
item.fulltextNo Fulltext-
item.languageiso639-1en-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.cerifentitytypePublications-
item.grantfulltextnone-
item.openairetypeArticle-
crisitem.author.dept02.3. Department of Computer Engineering-
Appears in Collections:WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Show simple item record



CORE Recommender

Google ScholarTM

Check




Altmetric


Items in GCRIS Repository are protected by copyright, with all rights reserved, unless otherwise indicated.