Sunday, April 26, 2015

And more data

Now that we have a realization that BIG data is really BIG. One more blog for it! :P
Data is the new oil. Its the treasure and somehow I can relate it to uncle scrooge. The way he used swim in his money, one needs to find the data in this huge pile. Its very easy to get lost!



So there is so much data , one will feel like uncle scrooge!
Just a Caution : DO NOT GET LOST!

There is just so much data that unless we know the right way out , its very easy to get lost!

There are two unanswered questions from the previous blog. First is how is one going to process the data and second one is how is it important anyways?

So first about managing Big data.HDFS and MapReduce.

Hadoop which is a software framework uses HDFS and MapReduce to analyze the data.
This is done on clusters of commodity hardware.

Lets just quickly understand HDFS : Hadoop Distributed File System :
- So a big problem is converted to chunks and then resolved :)
- As its easier to solve bigger problems when we look at one piece at a time. Similarly its qucik and cost effective to get the analysis done on small elements
- Is HFDS , a hard drive ? No its the service that is used when you have crazy amount of data

MapReduce : so this is where people write programs. So that they can process massive amounts of data in parallel and most importantly across multiple processors. (One need not wait for a year to get the output!)