Araştırma Çıktıları / Research Outcomes

Permanent URI for this communityhttps://acikarsiv.thk.edu.tr/handle/123456789/2548

Browse

Search Results

Now showing 1 - 10 of 12
  • Thumbnail Image
    Publication
    The Use of Machine Learning Approaches for the Diagnosis of Acute Appendicitis
    (Hindawi Limited, 2020-04-25) Omer F. Akmese; Gul Dogan; Hakan Kor; Hasan Erbay; Emre Demir
    Acute appendicitis is one of the most common emergency diseases in general surgery clinics. It is more common, especially between the ages of 10 and 30 years. Additionally, approximately 7% of the entire population is diagnosed with acute appendicitis at some time in their lives and requires surgery. The study aims to develop an easy, fast, and accurate estimation method for early acute appendicitis diagnosis using machine learning algorithms. Retrospective clinical records were analyzed with predictive data mining models. The predictive success of the models obtained by various machine learning algorithms was compared. A total of 595 clinical records were used in the study, including 348 males (58.49%) and 247 females (41.51%). It was found that the gradient boosted trees algorithm achieves the best success with an accurate prediction success of 95.31%. In this study, an estimation method based on machine learning was developed to identify individuals with acute appendicitis. It is thought that this method will benefit patients with signs of appendicitis, especially in emergency departments in hospitals.
  • Thumbnail Image
    Publication
    Sample Reduction Strategies for Protein Secondary Structure Prediction
    (MDPI AG, 2019-10-18) Sema Atasever; Zafer Aydın; Hasan Erbay; Mostafa Sabzekar
    Predicting the secondary structure from protein sequence plays a crucial role in estimating the 3D structure, which has applications in drug design and in understanding the function of proteins. As new genes and proteins are discovered, the large size of the protein databases and datasets that can be used for training prediction models grows considerably. A two-stage hybrid classifier, which employs dynamic Bayesian networks and a support vector machine (SVM) has been shown to provide state-of-the-art prediction accuracy for protein secondary structure prediction. However, SVM is not efficient for large datasets due to the quadratic optimization involved in model training. In this paper, two techniques are implemented on CB513 benchmark for reducing the number of samples in the train set of the SVM. The first method randomly selects a fraction of data samples from the train set using a stratified selection strategy. This approach can remove approximately 50% of the data samples from the train set and reduce the model training time by 73.38% on average without decreasing the prediction accuracy significantly. The second method clusters the data samples by a hierarchical clustering algorithm and replaces the train set samples with nearest neighbors of the cluster centers in order to improve the training time. To cluster the feature vectors, the hierarchical clustering method is implemented, for which the number of clusters and the number of nearest neighbors are optimized as hyper-parameters by computing the prediction accuracy on validation sets. It is found that clustering can reduce the size of the train set by 26% without reducing the prediction accuracy. Among the clustering techniques Ward’s method provided the best accuracy on test data.
  • Thumbnail Image
    Publication
    Context-dependent model for spam detection on social networks
    (Springer Science and Business Media LLC, 2020-08-29) Razan Ghanem; Hasan Erbay
  • Thumbnail Image
    Publication
    Design and Analysis of a Novel Authorship Verification Framework for Hijacked Social Media Accounts Compromised by a Human
    (Hindawi Limited, 2021-01-23) Suleyman Alterkavı; Hasan Erbay; Mamoun Alazab
    Compromising the online social network account of a genuine user, by imitating the user’s writing trait for malicious purposes, is a standard method. Then, when it happens, the fast and accurate detection of intruders is an essential step to control the damage. In other words, an efficient authorship verification model is a binary classification for the investigation of the text, whether it is written by a genuine user or not. Herein, a novel authorship verification framework for hijacked social media accounts, compromised by a human, is proposed. Significant textual features are derived from a Twitter-based dataset. They are composed of 16124 tweets with 280 characters crawled and manually annotated with the authorship information. XGBoost algorithm is then used to highlight the significance of each textual feature in the dataset. Furthermore, the ELECTRE approach is utilized for feature selection, and the rank exponent weight method is applied for feature weighting. The reduced dataset is evaluated with many classifiers, and the achieved result of the F-score is 94.4%.
  • Thumbnail Image
    Publication
    Classification of Diabetic Rat Histopathology Images Using Convolutional Neural Networks
    (Springer Science and Business Media LLC, 2020) Ahmet Haşim Yurttakal; Hasan Erbay; Gökalp Çinarer; Hatice Baş
  • Thumbnail Image
    Publication
    3-State Protein Secondary Structure Prediction based on SCOPe Classes
    (FapUNIFESP (SciELO), 2021) Sema Atasever; Nuh Azgınoglu; Hasan Erbay; Zafer Aydın
  • Thumbnail Image
    Publication
    End-To-End Computerized Diagnosis of Spondylolisthesis Using Only Lumbar X-rays
    (Springer Science and Business Media LLC, 2021-01-11) Fatih Varçın; Hasan Erbay; Eyüp Çetin; İhsan Çetin; Turgut Kültür
  • Thumbnail Image
    Publication
    Correction to: Novel authorship verification model for social media accounts compromised by a human
    (Springer Science and Business Media LLC, 2021-02-16) Suleyman Alterkavı; Hasan Erbay
  • Thumbnail Image
    Publication
    Solar irradiation forecastby deep learning architectures
    (National Library of Serbia, 2022) Omer Dagistanli; Hasan Erbay; Hasim Yurttakal; Hakan Kor
    Global solar irradiation data is a crucial component to measure solar energy potential when we plan, size, and design solar photovoltaic fields. Often, due to the absence of measuring equipment at meteorological stations, data for the place of interest are not available. However, solar irradiation can be estimated by ordinary meteorological data such as humidity, and air temperature. Herein we propose two different deep learning methods, one based on a deep neural network regression and the other based on multivariate long short term memory unit networks, to estimate solar irradiation at given locations. Validation criteria include mean absolute error, mean squared error, and coefficient of determination (R2 value). According to the simulation results, multivariate long short term memory unit networks performs slightly better than deep neural network. Even though both have very close R2 values, multivariate long short term memory?s R2 values are more consistent. The same is true for mean squared error and mean absolute error.
  • Thumbnail Image
    Publication
    Multi-Label Classification of E-Commerce Customer Reviews via Machine Learning
    (MDPI AG, 2022-08-26) Emre Deniz; Hasan Erbay; Mustafa Coşar
    The multi-label customer reviews classification task aims to identify the different thoughts of customers about the product they are purchasing. Due to the impact of the COVID-19 pandemic, customers have become more prone to shopping online. As a consequence, the amount of text data on e-commerce is continuously increasing, which enables new studies to be carried out and important findings to be obtained with more detailed analysis. Nowadays, e-commerce customer reviews are analyzed by both researchers and sector experts, and are subject to many sentiment analysis studies. Herein, an analysis of customer reviews is carried out in order to obtain more in-depth thoughts about the product, rather than engaging in emotion-based analysis. Initially, we form a new customer reviews dataset made up of reviews by Turkish consumers in order to perform the proposed analysis. The created dataset contains more than 50,000 reviews in three different categories, and each review has multiple labels according to the comments made by the customers. Later, we applied machine learning methods employed for multi-label classification to the dataset. Finally, we compared and analyzed the results we obtained using a diverse set of statistical metrics. As a result of our experimental studies, we found the Micro Precision 0.9157, Micro Recall 0.8837, Micro F1 Score 0.8925, and Hamming Loss 0.0278 to be the most successful approaches.