Wednesday, August 12, 2015

Big Data & Analytics!

What is "Big Data"?

Google says big data is "extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions".

Wikipedia says big data is "a term for data sets that are so large or complex that traditional data processing applications are inadequate".

In simple words, its just lots of data. Data maybe structured, semi-structured or unstructured.

What's the "Analytics" in Big Data Analytics?

SAS says , "Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights".

In simple words, deriving insights(useful info) from large data-sets. Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers.

Big Data Word Cloud

How BIG is "Big Data"?

Today, big data maybe a few petabytes(250 bytes) of data but in a few year it maybe in zettabytes(270 bytes) and then maybe in Yottabytes(280 bytes).
So, we may not strictly quantify "Big Data".

What is the Big Data problem?

In 2001, Gartner analyst Doug Laney described three dimensions of data management challenges. This characterization, which addresses volume, velocity, and variety, is frequently documented in scientific literature.Commonly known as the 3 V's can be understood as follows:

  1.  Volume refers to the size of the data. The massive volumes of data that we are currently dealing with has required scientists to rethink storage and processing paradigms in order to develop tools needed to properly analyze it.
  2. Velocity refers to the speed at which data can be received and analyzed. Here, we consider two types of processing, batch processing which is processing of historical data and real-time processing which deals with processing streams of data in real time(actually near real-time).
  3. Variety refers to the issue of disparate and incompatible data formats. Data can come in from many different sources and take on many different forms, and just preparing it for analysis takes a significant amount of time and effort. 
Additionally, a forth V is also said to be a factor of the Big Data problem. The forth V is Veracity which refers to the uncertainty of data. An info-graphic representing this is shown below:

Why do I hear "Big Data" so often these days?

Big data is not something that started a few days ago. Its just that we have started realizing its importance and how it could be used to generate useful insights.
Big Data & Analytics can help companies in many ways like:
  • Reduce Expenses
  • Better decision meeting
  • Help in launching new features and products

What are the tools or frameworks associated with Big Data?

Various tools and frameworks related to Big Data are:
  • Hadoop
  • Spark
  • Storm
  • NoSQL Databases
  • and many more..

A brief history:

<< Posts related to various topics coming up soon >>

No comments:

Post a Comment