Posts

Showing posts from 2013

Fraud Claim Detection Framework

Image
There are several fraud claims in insurance industry. The manufacturing industry also suffers with similar fraud claims. The manufacturers sells the product in the market with warranty or guarantee of good quality. They also promise to replace or repair the product according to the agreed policy. Based on the extent of risk and product life pricing of the selling product is decided.  Manufacturers receive several warranty/guarantee claims in a year. Many of them are fraud claims. Identifying the authenticity of claims is rigorous and laborious process. But in the era of the machine learning and data analysis the cost and time of above process can be reduced.  I am discussing here a process that can be used to identify the veracity of the claim. This process framework is based on the machine learning algorithms. I believe that this framework has capability to reduce the time and cost required to investigate any claim manually.  The process flow of th...

Layout Design of Warehouse for an e-retailer

In an e-retail setting when a consumer orders products, information are passed to brick and mortar warehouse for delivery of the ordered product.  The brick and mortar warehouse may be owned by same e retailers or may owned by a third party, serving the on-line demands as well as the in-store demands. As most of the in-store retailers have also started selling products on line the layout of the warehouse should be redesigned to reduce the operational cost. So that the replenishment time for the store and delivery time for the on line order can be minimized.  In a traditional warehouse design a separate layout for the delivery of the on line orders has not been studied. Most of the studies focus on the layout of the warehouse designed for the in-store retail type of sales. In this research we study the impact of the on line sales on the warehouse design. When an online orders are received at the warehouse through the website of the e retailer delivery process is initiated. T...

Network Analysis - A Use case (Telecom Business)

Social Network Analysis is a well-researched area of computer Science and Mathematics. Any structure which can be represented in the form of edges and nodes is a network. Nodes are connected to each other by edges (sometimes called links). When we study any network, our objective is to evaluate the importance of each node and link. Network Analysis can be used in the areas of genetic engineering, social engineering, marketing, fraud detection, crime detection and economic research etc. Network analysis has been widely used in genetic engineering. Recently the application of network analysis has gained momentum in the field of business marketing and analytical sciences.  In this blog, I will briefly discuss the verticals of businesses in which network analysis can be used to gain insights of customers.  I will exclude the discussion on the application of network analysis in any of the biological fields as I do not have concrete knowledge of this area.    1. Tel...

Multicollinearity

Multicollinearity is defined as the linear relationship between two or more independent variables while performing regression analysis between a dependent variables and set of independent variables. Multicollinearity presents a severe problem during regression modelling. Inclusion of independent variables having linear relationship with each other leads to parameter estimation with higher standard error. This in turn leads to inaccurate parameter estimation. Furthermore, due to inaccurate parameters regression model becomes unstable. The unstable models performs badly on the validation and test samples. When model is unstable, its performance deteriorate very fast compared to stable model over the period, though model is scored on the data of sample of same population. In such a situation an analyst must investigate for the multicollinearity, before finalizing the model. Next question is how to investigate and which variable should be kept if some variables are found to ...

Predictive Modelling Lessions

In the statistics, we use data to derive the information. It helps in business, research and governance.  We also develop to predict the value or behaviour of any dependent variable based on the historical data. We have following categories of the model based on the type of the variable When the dependent variable is continuous variable: 1.OLS  Linear Regression Model :  When the dependent variable is continuous variable and independent variables is/are continuous variable(s) 2.ANOVA:  When the dependent variable is continuous variable and independent variables is/are categorical variable(s) 3.ANCOVA :  When the dependent variable is continuous variable and as independent variables, we have   continuous  as well as categorical variables as independent variable. When the dependent variable is categorical variable: 1. Maximum Likelihood Logistic Regression Model :  When the dependent variable is categorical variable and independent ...

Big data and predictive analytics

Today ‘big data’ has become a buzz word. Everyone is talking about it. Big data are characterized by three attributes, which are high volume, high velocity and high variety. A few researchers also recently proposed "high volatility" as the fourth attribute of big data. We can define big data as the collection of data which are so large, complex and ever growing that they cannot be processed and stored using traditional methods. The size of big data is greater than petabytes. This makes storage of big data very difficult. Examples of the big data are web data, telecom data, sensor data of jet engines, RNA-DNA data etc. The challenges in processing big data led to the development of new technologies such as Hodoop and Map Reduce. People claim that big data can be processed in reasonable time using these technologies. But I have yet to use the above technologies. Hence I cannot judge the efficiency of these technologies. When we deal with any data, then we come across ...

Scoring observations using PROC FASTCLUS

PROC FASTCLUS can be used to perform a k-means clustering for observations. All the observations in the training dataset are assigned to clusters on the basis of the parametrization of the procedure and of their variable values. Scoring the observations in the validation dataset using PROC FASTCLUS seems a little bit challenging because the cluster assignment rules depend on new observations now. Scoring new observations without changing the cluster assignment rules can be achieved by using a SEED dataset in PROC FASTCLUS. /*original clustering */ %let indsn = input; *your input dataset; %let nclus = maxclus; *number of clusters to request; %let indvars = varlist; *independent variables to run proc fastclus on; %let valid = val_data; *validation dataset to score; proc fastclus data=&indsn maxclusters = &nclus outseed= clusterSeeds; var &indvars; run; /*scoring new observations using the seed dataset */ proc fastclus data=&valid out=&valid....

Big Data

IBM coined the word called Big Data characterized by the volume and velocity of the data being created in the Universe.  It has become a buzz word since then. Many reports have been published that Big data is next big thing and will generate business valued billions of dollars. Basically in simple words on daily basis large volume of data are being generated by different activities of human being. I will also emphasis that not only activities of humans is generating tera bytes of data but activities of animals, non living things like planets, asteroids, stars etc. In fact non living things are producing volume of data since ancient times only recently we, human have been able to store and generate with advent of internet (one of the big thing of current era). With arrival of Big data skills sets to manage it, convert it into usable formate is required in the world and  which is actually in great demand these days. Big data has given opportunities learn many new things. ...

Moore’s Law

Image
Copying from my own blog @  http://www.gatecounsellor.com/blog/ This law is named after Gordon E. Moore (http://en.wikipedia.org/wiki/Gordon_Moore) who co-founded Intel Corporation to bring revolution in processor industry ultimately leading radical change in life of man kind. Moore’s Law states that  ” Over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years”. When I was undergraduate student I remember that I had something called Moore’s Law in most famous and highly recommended book Millaman and Halkias but never tried to understand what’s its significance. I can think of two reasons for  this situation. First topic  was never emphasized in class room teaching and was also not very important for most of exams.  This is in fact a reason many students of Electronics and VLSI do not know it. It was also preferred to understand Transistor/ diode characteristics. The man ...

Online Open Course Ware (OOCW)

The internet boom has metamorphosed our life. We are getting everything from books to furniture on e -stores. So is education sector also. Knowledge creation and distribution  has always been regarded as the noble profession. We all respect those who are in this profession. They are not only imparting their knowledge to their students but have uploaded their lectures on the internet which can be accessed by anyone for learning. Formally, this kind of video lectures available online are called Online Open Course Ware (OOCW). Number of renowned Universities like Stanford, Princeton, IITs have provided access to their video lectures free of any term. There are also private players  like Coursera, Open learning etc have joined hands with Universities to spread the knowledge all over the world. The special course designed for larger audience is delivered in innovative way like in Coursera. In Coursera upcoming course lecture is announced about 8-7 weeks before the comme...