Big data and predictive analytics

Today ‘big data’ has become a buzz word. Everyone is talking about it. Big data are characterized by three attributes, which are high volume, high velocity and high variety. A few researchers also recently proposed "high volatility" as the fourth attribute of big data. We can define big data as the collection of data which are so large, complex and ever growing that they cannot be processed and stored using traditional methods. The size of big data is greater than petabytes. This makes storage of big data very difficult. Examples of the big data are web data, telecom data, sensor data of jet engines, RNA-DNA data etc.

The challenges in processing big data led to the development of new technologies such as Hodoop and Map Reduce. People claim that big data can be processed in reasonable time using these technologies. But I have yet to use the above technologies. Hence I cannot judge the efficiency of these technologies.

When we deal with any data, then we come across two kinds of professionals. The first kind is the database architect and infrastructure developer. He/She designs and models the database to store the data. They also write programs to extract the raw data and convert the data into usable format. To acquire expertise in this field, a person can enroll in courses such as database engineering and design, computer networking etc. To learn about a specific database such as Oracle, Db, Red shift etc, a person can opt for a certificate program offered by many international organizations.

The second kind of the professionals are data scientists or analysts. They use data to derive useful information for research, business or governance. Earlier business leaders made decisions based on their convictions, but now they also try to verify their hypothesizes using the data. Professionals who are experts in processing data and studying them to get information out of it are called data scientists. They also develop new methods of data mining and learning algorithms.  Data scientists use several statistical algorithms to derive underlying information. These algorithms are also called machine learning algorithms.

Data driven decisions are made mostly in the banking, telecom, retail, and pharmaceutical and internet industry. In business, data are used to derive information for marketing, customer retention and growth, credit history assessment and clinical trials. Data analysis helps a business leader to make decisions which deliver their business goals with minimum costs. 


Comments

Popular posts from this blog

How to check whether a SAS dataset exist or not and throw an error in the log ?

Solution for ERROR: Some character data was lost during transcoding in the dataset

2018 plan for getting expertise in Machine Learning and Deep Learning