Scoring observations using PROC FASTCLUS



PROC FASTCLUS can be used to perform a k-means clustering for observations. All the observations in the training dataset are assigned to clusters on the basis of the parametrization of the procedure and of their variable values. Scoring the observations in the validation dataset using PROC FASTCLUS seems a little bit challenging because the cluster assignment rules depend on new observations now.

Scoring new observations without changing the cluster assignment rules can be achieved by using a SEED dataset in PROC FASTCLUS.

/*original clustering */

%let indsn = input; *your input dataset;

%let nclus = maxclus; *number of clusters to request;

%let indvars = varlist; *independent variables to run proc fastclus on;

%let valid = val_data; *validation dataset to score;

proc fastclus data=&indsn maxclusters = &nclus outseed= clusterSeeds;

var &indvars;

run;

/*scoring new observations using the seed dataset */

proc fastclus data=&valid out=&valid._scored seed = clusterSeeds maxclusters = &nclus maxiter = 0;

var &indvars;

run;





Reference:

“Data Preparation for Analytics Using SAS” By Gerhard Svolba, Gerhard Svolba, Ph.D



Comments

Popular posts from this blog

Solution for ERROR: Some character data was lost during transcoding in the dataset

How to check whether a SAS dataset exist or not and throw an error in the log ?

2018 plan for getting expertise in Machine Learning and Deep Learning