Data Mining / Analysis White Papers

Trawling the Web for Emerging Cyber Communities

Overview The web harbors a large number of communities -- groups of content-creators sharing a common interest -- each of which manifests itself as a set of interlinked web pages. Newgroups and commercial web directories together contain of the order of 20000 such communities; our particular interest here is on emerging communities -- those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.

Further White Paper Details
PublisherWorld Wide Web Conference File FormatHTML
Date PublishedAugust 2003 Downloads1
FormatWhite Papers   
Topics

An Extensive Examination of Data Structures Using C# 2.0 - Part 2: The Queue, Stack, and Hashtable

This paper examines three of the most commonly studied data structures: the Queue, the Stack, and the Hashtable. The Queue and Stack are specialized Lists, providing storage for a variable...

An Extensive Examination of Data Structures Using C# 2.0 - Part 1: An Introduction to Data Structures

Probably the most common and well-known data structure is the array, which contains a contiguous collection of data items that can be accessed by an ordinal index. This paper focuses...

Best Practices in Data Classification for Information Lifecycle Management

Information lifecycle management (ILM) is a sustainable storage strategy that balances the cost of storing and managing information with its business value. A well-executed ILM strategy will result in a...

Software Company Builds .NET-based Web Reporting Solution

This Microsoft case study profiles Ottawa-based Databeacon, a company that delivers Web reporting and Online Analytical Processing (OLAP) data analysis software to more than 1,100 customers worldwide. To more...

SQL Server 2000 Enterprise Edition (64-bit): Advantages of a 64-Bit Environment

HP has partnered with Microsoft to provide information about the advantages of a 64-Bit Environment. Microsoft SQL Server 2000 Enterprise Edition (64-bit) offers dramatic improvements in memory availability and...


Quick Sitemap Links: