Publication: Spam detection on social networks using deep contextualized word representation
No Thumbnail Available
Date
2023
Authors
Ghanem, Razan; Erbay, Hasan
Journal Title
Journal ISSN
Volume Title
Publisher
SPRINGER
Abstract
Spam detection on social networks, considered a short text classification problem, is a challenging task in natural language processing due to the sparsity and ambiguity of the text. One of the key tasks to address this problem is a powerful text representation. Traditional word embedding models solve the data sparsity problem by representing words with dense vectors, but these models have some limitations that prevent them from handling some problems effectively. The most common limitation is the out of vocabulary problem, in which the models fail to provide any vector representation for the words that are not present in the model's dictionary. Another problem these models face is the independence from the context, in which the models output just one vector for each word regardless of the position of the word in the sentence. To overcome these problems, we propose to build a new model based on deep contextualized word representation, consequently, in this study, we develop CBLSTM (Contextualized Bi-directional Long Short Term Memory neural network), a novel deep learning architecture based on bidirectional long short term neural network with embedding from language models, to address the spam texts problem on social networks. The experimental results on three benchmark datasets show that our proposed method achieves high accuracy and outperforms the existing state-of-the-art methods to detect spam on social networks.
Description
Keywords
Spam detection; Deep learning; Word embedding; Recurrent neural network; Embedding from language model, ACCOUNTS