TR-12-2.pdf

"Spam behavior analysis and detection in user generated content on 
social networks",  

Enhua Tan, Lei Guo, Songqing Chen, Xiaodong Zhang, and Yihong (Eric) Zhao 

Proceedings of 32nd International Conference on Distributed Computing Systems 
(ICDCS 2012), Macao, China, June 18-21, 2012.  


Abstract

Spam content is surging with an explosive increase
of user generated content (UGC) on the Internet. Spammers
often insert popular keywords or simply copy and paste recent
articles from the Web with spam links inserted, attempting to
disable content-based detection. In order to effectively detect
spam in user generated content, we first conduct a comprehensive
analysis of spamming activities on a large commercial UGC site
in 325 days covering over 6 million posts and nearly 400 thousand
users. Our analysis shows that UGC spammers exhibit unique
non-textual patterns, such as posting activities, advertised spam
link metrics, and spam hosting behaviors. Based on these nontextual
features, we show via several classification methods that
a high detection rate could be achieved offline. These results
further motivate us to develop a runtime scheme, BARS, to detect
spam posts based on these spamming patterns. The experimental
results demonstrate the effectiveness and robustness of BARS.