Data Mining / Analysis White Papers
Trawling the Web for Emerging Cyber Communities
Overview The web harbors a large number of communities -- groups of content-creators sharing a common interest -- each of which manifests itself as a set of interlinked web pages. Newgroups and commercial web directories together contain of the order of 20000 such communities; our particular interest here is on emerging communities -- those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.
| Publisher | World Wide Web Conference | File Format | HTML |
|---|---|---|---|
| Date Published | August 2003 | Downloads | 1 |
| Format | White Papers | ||
| Topics | |||
An Extensive Examination of Data Structures Using C# 2.0 - Part 2: The Queue, Stack, and Hashtable
This paper examines three of the most commonly studied data structures: the Queue, the Stack, and the Hashtable. The Queue and Stack are specialized Lists, providing storage for a variable...
An Extensive Examination of Data Structures Using C# 2.0 - Part 1: An Introduction to Data Structures
Probably the most common and well-known data structure is the array, which contains a contiguous collection of data items that can be accessed by an ordinal index. This paper focuses...
Best Practices in Data Classification for Information Lifecycle Management
Information lifecycle management (ILM) is a sustainable storage strategy that balances the cost of storing and managing information with its business value. A well-executed ILM strategy will result in a...
Software Company Builds .NET-based Web Reporting Solution
This Microsoft case study profiles Ottawa-based Databeacon, a company that delivers Web reporting and Online Analytical Processing (OLAP) data analysis software to more than 1,100 customers worldwide. To more...
SQL Server 2000 Enterprise Edition (64-bit): Advantages of a 64-Bit Environment
HP has partnered with Microsoft to provide information about the advantages of a 64-Bit Environment. Microsoft SQL Server 2000 Enterprise Edition (64-bit) offers dramatic improvements in memory availability and...



