doi: 10.4304/jsw.6.1.116-123
Application of Linear Classifier on Chinese Spam Filtering
Abstract—Spam is a key problem in electronic communication. Especially in large-scale email systems. Content-based filtering is one mainstream method of combating this threat in its forms, an e-mail filtering system can learn directly from a user’s mail set, but the previous Content-based filtering methods are hard to find a balance between efficiency and effectiveness. Such algorithms of text categorization as Naïve Bayes, kNN, Decision Tree and Boosting can be applied in spam filtering. However, the effectiveness of Naïve Bayes is limited and it is not fit for instant feedback learning. Others algorithm such as SVM are more effective but complicated to compute. Because in a real email system a large volume of emails often need to be handled in a short time, efficiency will often be as important as effectiveness when implementing an anti-spam filtering method. So we intend to find a linear classifier to solve this problem, two online linear classifiers: the Perception and Winnow were explored for this task, which are two fast linear classifiers. The training of these two methods is online and mistake driven. Furthermore, they are suitable for feedback. We employ the two methods in three benchmark corpora, including PU1, Ling spam and 2005-Jun, the experiments in public e-mail corpus show an effective result. We conclude that the two online linear classifiers have a state-of-the-art performance for filtering spam, especially for Chinese spam emails.
Index Terms—anti-spam, information filtering, Winnow, Perception, linear classifier.
Cite: Yongqin Qiu, Yan Xu, Dan Li, "Application of Linear Classifier on Chinese Spam Filtering," Journal of Software vol. 6, no. 1, pp. 116-123, 2011.
General Information
ISSN: 1796-217X (Online)
Abbreviated Title: J. Softw.
Frequency: Quarterly
APC: 500USD
DOI: 10.17706/JSW
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Cecilia Xie
Abstracting/ Indexing: DBLP, EBSCO,
CNKI, Google Scholar, ProQuest,
INSPEC(IET), ULRICH's Periodicals
Directory, WorldCat, etcE-mail: jsweditorialoffice@gmail.com
-
Oct 22, 2024 News!
Vol 19, No 3 has been published with online version [Click]
-
Jan 04, 2024 News!
JSW will adopt Article-by-Article Work Flow
-
Apr 01, 2024 News!
Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec) [Click]
-
Apr 01, 2024 News!
Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP [Click]
-
Jun 12, 2024 News!
Vol 19, No 2 has been published with online version [Click]