What is BigData? It is the question that will arise in the minds when someone wants to learn Hadoop and other distributed processing tools.
There is no end to learn about any technology as technology is growing along with you each day.
Over the last decade, there has been an exponential increase in data in every sector. Estimation says it is 2.5 exabytes/day
Companies like Facebook, Twitter, Google are already generating Petabytes of data every day.
What is Big data?
“Big data are nothing but a collection of large and complex data sets that are difficult to process using traditional database management tools or traditional data processing applications”
When the complexity increases, the efficiency of the traditional tool will decrease.
Various sources for data(BigData)
Stock exchange 1 TB, Smartphones, Youtube upload contributes 48 hrs of data/min, social networks like Twitter, FB(more than 10 TB data daily), 30 million network sensors across the globe generating data throughout the day, Instagram, etc
There are a few types in which we can categorize these rapidly growing data.
Types of Data
- Structured – The structured data are data that have a proper schema like RDBMS(with structures).
- Semistructured – Semi-structured data has its own structure but it doesn’t have the detailed schematic way of storing data. This includes XML, JSON.
- Unstructured – These are data that don’t have a structure but only data in it. It includes weblogs, anti-virus logs, etc.
Big data is a broad term for large volume or complex data sets that are difficult to process using simple data-management tools and applications.
Attributes that describe Bigdata
- Variety – This is nothing but the different types of complex data that is getting generated like sensors, stock exchange updates, social network data, etc.
- Velocity – This is the pace in which the data are getting dumped and retrieved for processing.
- Volume – Volume represents the amount of data that is stored over a period of time.
So, handling this much data is a definite problem. Isn’t it? “Big data” is a problem statement and one of the solutions for handling this is with the help of a framework called “Hadoop”.
Hadoop is a framework that helps to solve this problem statement with its framework components. When you think about handling BigData there are two main things that will come in the mind.
- Storage
- Processing
- HDFS(Hadoop distributed file system) takes care of storage.
- MapReduce(MR) takes care of processing.
Also, check “What is Hadoop?“
Reference – BigData wiki