Publication: Effects of Various Preprocessing Techniques to Turkish Text Categorization Using N-Gram Features
cris.virtual.department | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
cris.virtual.orcid | #PLACEHOLDER_PARENT_METADATA_VALUE# | |
cris.virtualsource.department | 22498ec0-7f46-4ad0-a84d-cd8ed95293ae | |
cris.virtualsource.orcid | 22498ec0-7f46-4ad0-a84d-cd8ed95293ae | |
dc.contributor.affiliation | Middle East Technical University; Turkish Aeronautical Association; Turk Hava Kurumu University | |
dc.contributor.author | Deniz, Ayca; Kiziloz, Llakan Ezgi | |
dc.date.accessioned | 2024-06-25T11:44:51Z | |
dc.date.available | 2024-06-25T11:44:51Z | |
dc.date.issued | 2017 | |
dc.description.abstract | Natural Language Processing (NLP) is a prominent subject which includes various subcategories such as text classification, error correction, machine translation, etc. Unlike other languages, there are limited number of Turkish NLP studies in literature. In this study, we apply text classification on Turkish documents by using n-gram features. Our algorithm applies different preprocessing techniques, namely, n-gram choice (character level or word level, bigram or trigram models), stemming, and use of punctuation, and then determines the Turkish document's author and genre, and the gender of the author. For this purpose, Naive Bayes, Support Vector Machines and Random Forest are used as classification techniques. Finally, we discuss the effects of above mentioned preprocessing techniques to the performance of Turkish text classification. | |
dc.description.endpage | 660 | |
dc.description.pages | 6 | |
dc.description.researchareas | Computer Science | |
dc.description.startpage | 655 | |
dc.description.woscategory | Computer Science, Software Engineering; Computer Science, Theory & Methods | |
dc.identifier.uri | https://acikarsiv.thk.edu.tr/handle/123456789/1170 | |
dc.language.iso | English | |
dc.publisher | IEEE | |
dc.relation.journal | 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK) | |
dc.subject | Turkish text classification; n-gram features; supervised machine learning | |
dc.title | Effects of Various Preprocessing Techniques to Turkish Text Categorization Using N-Gram Features | |
dc.type | Proceedings Paper | |
dspace.entity.type | Publication |