1.
Big data is a term that describes the large volume of structured,
semi-structured, un-structured data that is generating from our day to day life
2. Big data technologies talks not only about storing the Huge data, it also talks about how data will be processed and how the business is using that data to take decisions
3. 70% of the data has grown in recent 4-5 years only
E.g.
1.
The New York Stock Exchange generates about one terabyte of new
trade data per day.
2.
Facebook hosts approximately 10 billion photos, taking up one
petabyte of storage.
4.
Different kind of computer memory sizes are available
Computer
Memory size:
1 Bit =
1 Byte
8 Bits =
1 Byte
1024 Byte =
1 KB (Kilo Byte)
1024 KB =
1 MB( Mega Byte)
1024 MB =
1 GB(Giga Byte)
1024 GB = 1 TB(1 Terra Byte)
1024 TB = 1 PB( Peta Byte)
1024 PB = 1 EB(Exa Byte)
1024 EB =
1 ZB(Zetta Byte) currently ZB of
data generating everyday
1024 ZB = 1 YB(Yotta Byte)
1024 YB = 1 (Bronto Byte)
1024 BB = 1 (Geop Byte)
Geop Byte is the highest memory as of today
5.
Big data can be defined by using 4 V’s
- Volume
- Velocity
- Variety
- Veracity
- Value
Volume: Large amount of data (MB-> GB-> TB-> PB->EB-> ZB…)
e.g Whatsapp
Consider Whatsapp
previously chat history coming from users are in some GB or TB
Nowadays data is
coming in EB/ZB
Velocity: Rate of data coming and processing (Velocity
=(volume/Time))
e.g Facebook:
Consider Facebook
users upload more than 500 million photos/videos a day. Velocity is the measure
of how fast the data is coming in and how fast the data is processing.
Facebook has to handle
a tsunami of photographs/videos every day. It has to ingest it all, process it,
file it, and somehow, later, be able to retrieve it.
Variety: Structured Data, Semi structured data,
Unstructured
Structured: Data which can be stored in database SQL
in table with rows and columns
Semi structured: It doesn’t reside in a relational database
but that does have some organizational properties that make it easier to
analyze
E.g XML, JSON
Unstructured: Now a day’s world has around 80% of
unstructured data. It often include text and multimedia content. Examples
include e-mail messages, word processing documents, videos, photos, audio
files, presentations
E.G. Twitter tweets, Face book scraps, videos,
images, Radar data, Plan
Veracity:
Messiness of the data. With many forms
of big data, quality and accuracy are less controllable (e.g. twitter posts
with hash tags, abbreviations, typos..) so Big data and analytics technologies
now allow us to work with these types of data
Value: turn raw data into value
Why BiG Data
Big data doesn’t mean that how much data you
have, but what you do with it. We can take any kind of data from any source and
analyze it to find answers with less cost and less time. With Existing Data
warehouse system, we take some portion of data (like recent 2-3 years sales data)
and analyzing the trends of data to take the business decision. But with the
Big data technologies we can analyze the whole dataset and we can take better
business decisions
When we combine big data with analytics technologies
(Machine learning R), you can accomplish business-related tasks such as:
- Fraud Detections in banking system
- Finding the most churn customer in telecommunications
- etc...
Who uses Big Data
Almost all industries are using the Big Data.
Ø Banking
Ø Education
Ø Government
Ø Health Care
Ø Manufacturing
Ø Retail
Ø Education
Ø Government
Ø Health Care
Ø Manufacturing
Ø Retail
How Big Data Works
With Big Data technologies (Hadoop), we can store, process, and analyze the large amount of data with
With Big Data technologies (Hadoop), we can store, process, and analyze the large amount of data with
Ø
Less cost of hardware
Ø Fast processing
Ø Fast processing
Storage + Processing + Analyze =Big Data
Hadoop specially
designed to handle Big Data. We will discuss about hadoop in next sessionSession 2