Scoring observations using PROC FASTCLUS

- June 19, 2013

PROC FASTCLUS can be used to perform a k-means clustering for observations. All the observations in the training dataset are assigned to clusters on the basis of the parametrization of the procedure and of their variable values. Scoring the observations in the validation dataset using PROC FASTCLUS seems a little bit challenging because the cluster assignment rules depend on new observations now.

Scoring new observations without changing the cluster assignment rules can be achieved by using a SEED dataset in PROC FASTCLUS.

/*original clustering */

%let indsn = input; *your input dataset;

%let nclus = maxclus; *number of clusters to request;

%let indvars = varlist; *independent variables to run proc fastclus on;

%let valid = val_data; *validation dataset to score;

proc fastclus data=&indsn maxclusters = &nclus outseed= clusterSeeds;

var &indvars;

run;

/*scoring new observations using the seed dataset */

proc fastclus data=&valid out=&valid._scored seed = clusterSeeds maxclusters = &nclus maxiter = 0;

var &indvars;

run;

Reference:

“Data Preparation for Analytics Using SAS” By Gerhard Svolba, Gerhard Svolba, Ph.D

Search This Blog

Decision Science in Action

Scoring observations using PROC FASTCLUS

Comments

Post a Comment

Popular posts from this blog

How to check whether a SAS dataset exist or not and throw an error in the log ?

Solution for ERROR: Some character data was lost during transcoding in the dataset

Multicollinearity