Big Data Challenges


Big Data Characteristics

Often Big data characteristics are described with the help of Five Vs (Big Data Volume Velocity Variety and Veracity). They are as follows.

Volume –  How Big is data
o   The Volume of Big data is growing at exponential rate and expected to reach around 44 ZB (1021) by 2020.
Velocity – How Fast is data processed
o   speed at which new data is generated and the speed at which data moves around.
o   The latency of processing big data and decision making is very important and that’s where it makes huge difference with conventional RDBMS.
Variety  – The various types of data
o   ConventionalRDBMS supports only Structured Data but Big data supports three kinds of data.
§  Structured – Highly structured and Usually stored in an RDBMS. Approximately 20% of all world’s data is structured. Examples – Numbers, Dates, and groups or tables of words and numbers (for example, a customer table with name, age, address, and so on columns).
§  Semi-Structured – Semi-structured data does not necessarily conform to a fixed schema (structure) but may be self-describing and may have simple key/value pairs. Cannot be stored in rows and tables in a typical database. For example, JSON, XML, Logs, Tweets.
§  Unstructured  – Lacks structure or parts of it lack structure. 80% of the world’s data is unstructured. Example Formats – Free-Form Text, Emails, Images, Videos, Voice Recordings, Social media conversations, Sensor data, etc.
Veracity – How accuracy/meaningful/trustworthy are the results to the given problem space. 
Value – Useful Business value extracted out of big data.
Big Data Analytic Companies include all these Five characteristic Vs into consideration before they decide to build programs for data analysis.

Big Data Management
Currently many large enterprises (Google, LinkedIn, Facebook, IBM, Oracle) are already entered into Big data management life cycle, which will include collection of data to decision making phases as shown below.
§  Data Collection
§  Data Storage & Organization
§  Summarizing
§  Analysis
§  Synthesizing
§  Decision Making

Below is the high level architecture of Big Data Software companies big data analysis model. Data is collected from various sources like Web servers, social media, etc and stored in Hadoop Cluster and supplied through Analytics Platform and Big Data Warehouse and made available to Business Intelligence Users.

Big data analytics software companies like IBM, Facebook, LinkedIn, Google, Twitter are already evolving into technology that allows analyzing the data while it is being generated (sometimes referred to as real-time in-memory analytics), without ever putting it into databases. So, if any enterprise doesn’t recognize the importance of Big data analytics, it will definitely fall behind the future market trends.

Big Data Challenges

Below are the current challenges of Big Data management and decision making faced by big data analytic companies.
§  HighVolume of Data. Scalablity.
§  High Velocity of data generation
§  Complex and Variety data types especially Semi-structured and Unstructured
§  Disk Storage and Transmission capacities. By 2013, a single disk can store upto 4 TB data and its maximum data transfer speed is upto 128 mb/sec only. With this storage and transfer limitations, one can read entire disk in roughly 5 hours.
§  Data management issues of access, utilization, updating, governance, and reference.
§  Privacy and Security is another major challenge in Big data. For Example, Information regarding the people is collected and used in order to add value to the business of the organization. This is done by creating insights in their lives which they are unaware of.
§  Data Sharing between big data companies, about their clients and operations threatens the culture of secrecy and competitiveness.
Big Data Analytic Challenges
§  Ability to determine what data to collect and how to analyse it to find patterns and correlations as the data is very huge.
§  Ability to understand big data business intelligence objectives & information needs and come with Appropriate computer algorithms.
§  Need experienced mathematics and statistics knowledge to build the relations between data.
§  Ability to present data (both verbal and written) to ensure the insights are understood and acted upon.

Big Data Solutions

Below are the solutions for the above discussed big data challenges
§  Distributed storage across multiple disks
§  Implement Parallel Processing
§  Bring the code to the data for processing instead of bringing data to code.

One and Only technology that meets all the above expectations is Hadoop, an open source framework for storing and parallel processing of distributed data across multiple nodes.

Followers