Spam - E-mail Fraud - Phishing White Papers
Improving Web Spam Classification Using Rank-Time Features
Overview This paper studies the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. The contributions are two fold. First, the paper find that the method of dataset construction is crucial for accurate spam classification and it notes that this problem occurs generally in learning problems and can be hard to detect. In particular, the paper find that ensuring no overlapping domains between test and training sets is necessary to accurately test a web spam classifier. In this case, classification performance can differ by as much as 40% in precision when using non-domain-separated data. Second, the paper shows rank-time features can improve the performance of a web spam classifier.
| Publisher | Association for Computing Machinery | File Format | |
|---|---|---|---|
| Date Published | May 2007 | ||
| Format | White Papers | ||
| Topics | |||



