Data Mining - Analysis White Papers
Two-Stage Variable Clustering for Large Data Sets
Overview In data mining, principal component analysis is a popular dimension reduction technique. It also provides a good remedy for the multicollinearity problem, but its interpretation of input space is not as good. To overcome the interpretation problem, principal components (cluster components) are obtained through variable clustering, which was implemented with PROC VARCLUS. The procedure uses oblique principal components analysis and binary iterative splits for variable clustering, and it provides non-orthogonal principal components. Even if this procedure sacrifices the orthogonal property among principal components, it provides good interpretable principal components and well-explained cluster structures of variables. However, the PROC VARCLUS implementation is inefficient to deal with high-dimensional data. This paper introduces the two-stage, variable clustering technique for large data sets.
| Publisher | SAS Institute | File Format | |
|---|---|---|---|
| Date Published | February 2008 | ||
| Format | White Papers | ||
| Topics | |||


