Posts

Showing posts from June, 2013

Scoring observations using PROC FASTCLUS

PROC FASTCLUS can be used to perform a k-means clustering for observations. All the observations in the training dataset are assigned to clusters on the basis of the parametrization of the procedure and of their variable values. Scoring the observations in the validation dataset using PROC FASTCLUS seems a little bit challenging because the cluster assignment rules depend on new observations now. Scoring new observations without changing the cluster assignment rules can be achieved by using a SEED dataset in PROC FASTCLUS. /*original clustering */ %let indsn = input; *your input dataset; %let nclus = maxclus; *number of clusters to request; %let indvars = varlist; *independent variables to run proc fastclus on; %let valid = val_data; *validation dataset to score; proc fastclus data=&indsn maxclusters = &nclus outseed= clusterSeeds; var &indvars; run; /*scoring new observations using the seed dataset */ proc fastclus data=&valid out=&valid.

Big Data

IBM coined the word called Big Data characterized by the volume and velocity of the data being created in the Universe.  It has become a buzz word since then. Many reports have been published that Big data is next big thing and will generate business valued billions of dollars. Basically in simple words on daily basis large volume of data are being generated by different activities of human being. I will also emphasis that not only activities of humans is generating tera bytes of data but activities of animals, non living things like planets, asteroids, stars etc. In fact non living things are producing volume of data since ancient times only recently we, human have been able to store and generate with advent of internet (one of the big thing of current era). With arrival of Big data skills sets to manage it, convert it into usable formate is required in the world and  which is actually in great demand these days. Big data has given opportunities learn many new things. It has ch

Moore’s Law

Image
Copying from my own blog @  http://www.gatecounsellor.com/blog/ This law is named after Gordon E. Moore (http://en.wikipedia.org/wiki/Gordon_Moore) who co-founded Intel Corporation to bring revolution in processor industry ultimately leading radical change in life of man kind. Moore’s Law states that  ” Over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years”. When I was undergraduate student I remember that I had something called Moore’s Law in most famous and highly recommended book Millaman and Halkias but never tried to understand what’s its significance. I can think of two reasons for  this situation. First topic  was never emphasized in class room teaching and was also not very important for most of exams.  This is in fact a reason many students of Electronics and VLSI do not know it. It was also preferred to understand Transistor/ diode characteristics. The man who was PhD in Chemistry co-founded Int