Spam - E-mail Fraud - Phishing White Papers

Improving Web Spam Classification Using Rank-Time Features

Overview This paper studies the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. The contributions are two fold. First, the paper find that the method of dataset construction is crucial for accurate spam classification and it notes that this problem occurs generally in learning problems and can be hard to detect. In particular, the paper find that ensuring no overlapping domains between test and training sets is necessary to accurately test a web spam classifier. In this case, classification performance can differ by as much as 40% in precision when using non-domain-separated data. Second, the paper shows rank-time features can improve the performance of a web spam classifier.

Further White Paper Details
PublisherAssociation for Computing Machinery File FormatPDF
Date PublishedMay 2007
FormatWhite Papers   
Topics

Quick Sitemap Links: