Sunday, March 29, 2015

Data Here .. Data there...everywhere ..

Was thinking of an old song

Old MacDonald had a farm, E I E I O,
--
With a quack quack here and a quack quack there,
Here a quack, there a quack, ev'rywhere a quack quack.
--
It perfectly suits the data situation we have! here data ..there data..,everywhere data data !!

Source : http://shmsoft.blogspot.mx/2014/03/big-data-cartoon-data-is-new-oil.html




By now we have seen that from the tons of connected devices we are generating exponential quantities of data! We are currently using data and trust data more for decision making than human expertise. Its more reliable and unbiased input. So lets look at the basics of the data and from the IoT standpoint.

Now some basic questions arise
a. How exactly is this data stored? (Remember old days of storing content on CDs and guess what happens when you lose it?)
b. How is one going to process this enormous data?
c. Whats the use of this data anyway?

and so on...

--

Starting from very very basic , what is so BIG about the data?

So these 4 Vs make data BIG. First and the easiest is the quantity of the data. Its HUGE. For the sake of understanding , lets take the example of Facebook.
I remember reading somewhere last year that facebook users every minute share 2460000 pieces of content.

Now the second factor is variety, Gone are the nice microsoft access days where you knew what to expect. The mantra is expect the unexpected!

Veracity : The third V is not very commonly used but definitely is required. Going by dictionary.com , the meaning of veracity is "conformity to truth or fact; accuracy". This plays a critical role in decision making as its important to rely on accurate data.

The last one is velocity : I think this one is self explanatory and the same facebook example is valid here. The data is generated at lightening speed.


--

Now going to the first question : How on earth this enormous data is stored?

Ok , so first of all , the data is huge and comes in all shapes and sizes so the servers have to be always "READY"! Now there are two things that we want to know about the Physical storage.
1. It must be resilient (As in should be able to recoil once bent!) : So the system should be resilient to     failures. That means it should change when you have enough resources which are redundant

2. There should be redundancy : Redundancy to eliminate a single point of failure. Ask any girl why       carry two pairs of shoes while travelling, the answer is as simple as that
--