Sample Reduction Strategies for Protein Secondary Structure Prediction

Sema Atasever; Zafer Aydın; Hasan Erbay; Mostafa Sabzekar

Publication:
Sample Reduction Strategies for Protein Secondary Structure Prediction

cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtualsource.department	2033eb85-2001-41ef-aea3-32f4f990e867
cris.virtualsource.department	04798b64-c1fa-483e-bcef-bd222cfc6859
cris.virtualsource.department	9ff6d294-c050-482d-bf0a-8e388e861097
cris.virtualsource.department	59d275d0-bfee-4e08-ae88-cf4c02ac7d84
cris.virtualsource.orcid	2033eb85-2001-41ef-aea3-32f4f990e867
cris.virtualsource.orcid	04798b64-c1fa-483e-bcef-bd222cfc6859
cris.virtualsource.orcid	9ff6d294-c050-482d-bf0a-8e388e861097
cris.virtualsource.orcid	59d275d0-bfee-4e08-ae88-cf4c02ac7d84
dc.contributor.author	Sema Atasever
dc.contributor.author	Zafer Aydın
dc.contributor.author	Hasan Erbay
dc.contributor.author	Mostafa Sabzekar
dc.date.accessioned	2024-05-23T14:08:30Z
dc.date.available	2024-05-23T14:08:30Z
dc.date.issued	2019-10-18
dc.description.abstract	<jats:p>Predicting the secondary structure from protein sequence plays a crucial role in estimating the 3D structure, which has applications in drug design and in understanding the function of proteins. As new genes and proteins are discovered, the large size of the protein databases and datasets that can be used for training prediction models grows considerably. A two-stage hybrid classifier, which employs dynamic Bayesian networks and a support vector machine (SVM) has been shown to provide state-of-the-art prediction accuracy for protein secondary structure prediction. However, SVM is not efficient for large datasets due to the quadratic optimization involved in model training. In this paper, two techniques are implemented on CB513 benchmark for reducing the number of samples in the train set of the SVM. The first method randomly selects a fraction of data samples from the train set using a stratified selection strategy. This approach can remove approximately 50% of the data samples from the train set and reduce the model training time by 73.38% on average without decreasing the prediction accuracy significantly. The second method clusters the data samples by a hierarchical clustering algorithm and replaces the train set samples with nearest neighbors of the cluster centers in order to improve the training time. To cluster the feature vectors, the hierarchical clustering method is implemented, for which the number of clusters and the number of nearest neighbors are optimized as hyper-parameters by computing the prediction accuracy on validation sets. It is found that clustering can reduce the size of the train set by 26% without reducing the prediction accuracy. Among the clustering techniques Ward’s method provided the best accuracy on test data.</jats:p>
dc.identifier.doi	10.3390/app9204429
dc.identifier.uri	https://acikarsiv.thk.edu.tr/handle/123456789/197
dc.publisher	MDPI AG
dc.relation.ispartof	Applied Sciences
dc.relation.issn	2076-3417
dc.title	Sample Reduction Strategies for Protein Secondary Structure Prediction
dc.type	journal-article
dspace.entity.type	Publication
oaire.citation.issue	20
oaire.citation.volume	9

Files

Original bundle

Now showing 1 - 1 of 1

Name:: applsci-09-04429-v4.pdf
Size:: 688.17 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Scopus
WOS - Web of Science

TÜRK HAVA KURUMU

ÜNİVERSİTESİ

Publication:
Sample Reduction Strategies for Protein Secondary Structure Prediction

Files

Original bundle

License bundle

Collections

Publication: Sample Reduction Strategies for Protein Secondary Structure Prediction

Files

Original bundle

License bundle

Collections

Publication:
Sample Reduction Strategies for Protein Secondary Structure Prediction