Data Mining - Analysis White Papers

Two-Stage Variable Clustering for Large Data Sets

Overview In data mining, principal component analysis is a popular dimension reduction technique. It also provides a good remedy for the multicollinearity problem, but its interpretation of input space is not as good. To overcome the interpretation problem, principal components (cluster components) are obtained through variable clustering, which was implemented with PROC VARCLUS. The procedure uses oblique principal components analysis and binary iterative splits for variable clustering, and it provides non-orthogonal principal components. Even if this procedure sacrifices the orthogonal property among principal components, it provides good interpretable principal components and well-explained cluster structures of variables. However, the PROC VARCLUS implementation is inefficient to deal with high-dimensional data. This paper introduces the two-stage, variable clustering technique for large data sets.

Further White Paper Details
PublisherSAS Institute File FormatPDF
Date PublishedFebruary 2008
FormatWhite Papers   
Topics

Quick Sitemap Links: