7/26/17

Big Data – Hadoop is buzzword today

Almost every business organization from all sectors and technical professional from all type of industry is becoming interested for knowing Big Data. Hence many technical professionals are trying to get answer of some key questions what exactly is Big Data? Why is it required? Which are Big Data technologies? How is this field different from our traditional data technologies field?

Today market is widely open for Big Data and employers are looking for professionals who have assimilated Big Data related skills.  Many students and professional aspirants of Big Data skills have questions like what are job opportunities in Big Data. What is market trend for Big Data? Which are pre-requisite for learning Big Data technologies?  Which key skills are required? What is future of Big Data technologies?
Let’s understand answers of these all these questions.
Big Data means collection of large data sets in terms of Terabytes, PB, ZB, etc. According to information published, Facebook stores, accesses, and analyzes more than 30 Petabytes of user generated data. Wal-Mart handles more than 1 million customer transactions every hour. In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a day. More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide. YouTube users upload 48 hours of new video every minute of the day. According to Twitter’s own research in early 2012, it sees roughly 175 million tweets every day, and has more than 465 million accounts. 100 terabytes of data uploaded daily to Facebook.
If you look at above statistics, the amount of data generated is huge (volume) at very high rate (velocity) and the kind of data being generated is in the form of structured data (relational data), semi-structured (xml) and most of data generated is real-time and unstructured like documents, text, pdf, media logs, web logs.  Hence one can easily understand that our traditional data management systems RDBMS totally fail to store manage, process such different variety of data at high speed.
Hence ultimately questions raises which are technologies available today in the market to manage such large data sets? The key answer is big data technology – Hadoop.
Hadoop is open source big data project of Apache Software Foundation. It is used to store and process big data in a distributed environment across clusters of computers using simple programming model called MapReduce. It can scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel on different CPU nodes.
Hadoop framework is written in java language. Hadoop gives solution not only to process data but also to capture, transfer, store and distribute the data over the thousands of machines of commodity servers as a part of Hadoop clusters.
HDFS (Hadoop Distributed File System) core part of Hadoop stores large data sets of any size (PB, ZB) in terms of data blocks over the clusters. It is rather a data service, thereby makes data always available. Though any machine as a part of cluster is failed, you can dynamically remove the machine from cluster and add new machine to cluster. Replication mechanism is the key of HDFS.
MapReduce Framework another core part of Apache Hadoop is used to process the data locally on each machine over the cluster. MapReduce has two tasks, first is Map tasks and Reduce task. Map tasks run parallel on all the machine where data is stored and reduce tasks is used for aggregating the data from all map tasks. Hence reduce tasks always run after the map tasks.
The top five industries which are looking for Big Data-related skills are Professional, Scientific and Technical Services, IT, Manufacturing, Finance, Insurance and Retail. Many major IT giant like Google, Apple, Facebook, Amazon, Oracle, IBM, Adobe, Cisco, and Accenture are using Hadoop and looking for Big Data – Hadoop professionals.
Different job titles available today are
  1. Hadoop Developer
  2. Data Scientist
  3. Big Data Engineer
  4. Data Visualization Developer
  5. Business Intelligence (BI) Engineer
  6. BI Solutions Architect
  7. Analytics Manager

According to marketresearch by IDC, the big data market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015. According to the IDC forecast, the Big Data market is predicted to be worth $46.34 billion by 2018 and is expected to have a sturdy growth across Big Data related infrastructure, software and services over the next five years.According to IDG Enterprise Big Data Research, in the next 1 to 1.5 years, organizations plan to invest in skill sets necessary for Big Data deployments, including Data Scientists, Data Architects, Data Analysts, Data Visualizers, Research Analysts, and Business Analysts.

10 comments:

  1. This post by Grace is very well written and also well formatted.

    ReplyDelete
  2. A very well-developed post with step by step guidance.

    ReplyDelete
  3. This is a very well written post, my compliments.

    ReplyDelete
  4. Thanks a bunch for sharing.

    ReplyDelete
  5. Really nice post. Thanks for sharing with us..

    ReplyDelete
  6. I highly appreciate your hard work for creating this post that is very useful.

    ReplyDelete
  7. I like this article because this is very helpful for me.

    ReplyDelete
  8. Its indeed a wonderful blog

    ReplyDelete
  9. Thank you! Awesome explanations

    ReplyDelete
  10. A good Explanation on big data concepts. good work

    ReplyDelete