2 What is Big Data
WHAT IS BIG DATA?
-
Data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges. Oxford English Dictionary.
- A new attitude by businesses, non-profits, government agencies, and individuals that recognises that combining data from multiple sources could lead to better decisions. Gill Press in Forbes,2014
- High-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision-making. Gartner, 2014
Before we start, let’s set the scene by reading
MODEL T FORD
The Ford Model T is an automobile produced by Ford Motor Company from October 1, 1908, to May 26, 1927. It is generally regarded as the first affordable automobile, the car that opened travel to the common middle-class American.
The Ford Piquette Avenue Plant could not keep up with demand for the Model T, and only 11 cars were built there during the first full month of production. In 1910, after assembling nearly 12,000 Model Ts, Henry Ford moved the company to the new Highland Park complex. During this time, the Model T production system transitioned into an iconic example of assembly line production; in subsequent decades it would also come to be viewed as the classic example of the rigid, first-generation version of assembly line production.
In 1914, Ford produced more cars than all other automakers combined. The Model T was a great commercial success, and by the time Henry made his 10 millionth car, half of all cars in the world were Fords. It was so successful Ford did not purchase any advertising between 1917 and 1923.
Henry Ford’s ideological approach to Model T design was one of getting it right and then keeping it the same; he believed the Model T was all the car a person would, or could, ever need. As other companies offered comfort and styling advantages, at competitive prices, the Model T lost market share. Design changes were not as few as the public perceived, but the idea of an unchanging model was kept intact. Eventually, on May 26, 1927, Ford Motor Company ceased US production.
Interpretation:
In today’s world, the idea of only ONE type of product seems very out of date. Ford lost market share when consumers wanted different types of cars with different features to suit their needs and tastes.
Today’s consumers want personalized services. Profitability depends on it. We cannot generalize any more.
Compare the “one size fits all” Model T to today’s online retail where knowledge of a customers’ needs and interests are essential to winning their business.
And how can we find out about their needs and interests and tailor our service for them? That is the power of data.
Limitations of the one size fits all model in the era of digital data.
2.1 The 5 Vs of Big Data
Volume
Everything in the Data Management world is scaling massively, exponentially, and relentlessly.
As long as daily business is carried on online, data will continue to soar in volume and size.
Velocity
Big data technology allows us to analyze the data while it is generated, without ever putting it into databases.
For many businesses, the speed of data creation is even more important than the volume.
Real time insights
MIT Media Lab used location data from phones to infer how many people were in Macy’s parking lots on Black Friday.
They could estimate the retailer’s sales on that critical day even before Macy’s itself had recorded those sales.
Rapid insights provide an obvious competitive advantage to analysts and managers.
Variety
Before data was predominantly structured – it was numerical and highly organized. Today 80% of the world’s data is unstructured, including photos, social media updates, readings from sensors etc.
Today’s big data technology allows structured and unstructured data to be harvested, stored, and used simultaneously.
Veracity
Big data can be a crucial part of business strategy and growth, but high volumes of data are of no use if the data is not accurate.
The most common problems are data incompleteness and inconsistencies. When these are known and accounted for, data can be cleaned or issues can be taken into account.
Value
With so much data around, it is easy to fall into the buzz trap and embark on big data initiatives without a clear understanding of the business value it will bring.
Adapting data to suit your business needs will enable you to unlock the hidden potential within the information you’ve collected, which means you will get the most value out of your data.
2.2 Understanding Data
MACHINE GENERATED DATA
Includes financial systems transactions, cloud applications, call detail records, medical devices, GPS data and sensor data. It is valuable because it contains a definitive, real time record of the behaviour of users and their transactions.
SOCIAL DATA
Information that social media users publicly share, including metadata such as the user’s location, language spoken, biographical data and/or shared links. It is valuable to marketers looking for customer insights that may increase sales.
HUMAN GENERATED DATA
Exists as nonnumeric, unstructured data sets from online surveys, social media posts, even phone calls. It is valuable because it describes a person’s interests the social aspects of human interaction, but it can be very difficult to analyse.
META DATA
Data that provides information about other data. For example, information about the title, subject, author and size of a document constitute metadata about that document.
STRUCTURED
High degree of organization, such as relational database.
Examples: Dates, phone numbers, customer names, transaction information…
UNSTRUCTURED
Information that is difficult to organize using traditional mechanisms.
Examples: Images, social media…
SEMI STRUCTURED
Information not in a database but that does have some organizational properties that make it easier to analyze
Examples: Websites, XML, e-mails…