Big Data..
Why the Big Data problem arises?
every day usgae of social media increasing.They are sharing many photos, videos, many posts .Data is increased for every second in social media. have you ever think how the data is stored…
Facebook Data Storge:
Facebook revealed some big, big stats on big data to a few reporters at its HQ today, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour.
Facebook generates 4 petabytes of data per day — that’s a million gigabytes. All that data is stored in what is known as the Hive, which contains about 300 petabytes of data.
Google searches daily:
Google now processes over 40,000 search queries every second on average (visualize them here), which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide. 77% of searches of world wide is done on google. In world wide over 5 billion searches are done in one day.
What is Big Data?
Big data is data that contains greater variety arriving in increasing volumes and with ever-higher velocity. This is known as the three Vs.
Put simply, big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before.
The Three Vs of Big Data
Volume:
The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This can be data of unknown value, such as Twitter data feeds, clickstreams on a webpage or a mobile app, or sensor-enabled equipment. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes.
Velocity:
Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.
Variety:
Variety refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database. With the rise of big data, data comes in new unstructured data types. Unstructured and semistructured data types, such as text, audio, and video, require additional preprocessing to derive meaning and support metadata.
SOLUTION:
Distributed Storage:
A distributed storage system is infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.
Distributed storage is the basis for massively scalable cloud storage systems like Amazon S3 and Microsoft Azure Blob Storage, as well as on-premise distributed storage systems like Cloudian Hyperstore.
The Distributed storage stores the data in parallel by dividing/splitting the GB’s and GB’s of data in some species,. So that it will store the data within the seconds.Data dividing/splitting is done by master node and name node and it transfers data to all the respective Data Nodes / Slave nodes within seconds .
In this Distributed Storage Cluster, there are N-numbers of slaves and they are connected to the Master.
Distributed Storage is a core concept for many technologies ..like- Hadoop, Robust Hardware, Grid Computing,etc.
Hadoop:
Hadoop provides a common infrastructure for Facebook with efficiency and reliability. Beginning with searching, log processing, recommendation system, and data warehousing, to video and image analysis, Hadoop is empowering this social networking platform in each and every way possible. Facebook developed its first user-facing application, Facebook Messenger, based on the Hadoop database, i.e., Apache HBase.
Hadoop distributed file system (HDFS) and several related components such as Apache Hive, HBase.
THANKYOU GUYS FOR READING…