Hadoop PIG Notes


Basic

  • Pig is a scripting platform for processing and analysing large data sets.
  • very usefulfor people who did not have java knowledge
  • used for high level data flow and processing the data available on HDFS.
  • PIG is named pig because like the animal, it can consume and process any type of data, and has lots of usage in data cleansing.
  • Internally, whatever you write in Pig, it internally converts to Map reduce(MR) jobs.
  • Pig is client side installation, it need not sit on hadoop cluster.
  • Pig script will execute a set of commands, which will be converted to Map Reduce(MR) jobs and submitted to hadoop running locally or remotely.
  • A hadoop cluster will not care whether the job was submitted from pig or from some other environment.
  • map reduce programs get executed only when the DUMP or STORE command is called(more on this later).

Hadoop Notes


Technologies in Hadoop Ecosystem

  • Hadoop => framework for distributed processing of large datasets across clusters of computers, it can scale up to thousands of machines, where each machine offers computation and data storage.
  • HBase => a scalable, distributed database that supports structured data storage for large tables.
  • Hive => a data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Mahout => A scalable machine learning and data mining library.
  • Pig => data flow language and execution framework for parallel computation.
  • Spark => A fast engine for computing hadoop data.
  • Storm => a scalable and reliable engine for processing data as it flows into the system
  • Zookeeper => high performance coordination service for distributed applications.

Top Java Frequently Asked Questions- 2


51. Are true and false keywords? 
The values true and false are not keywords. 

52. What is a void return type? 
A void return type indicates that a method does not return a value after its execution. 

53. What is the difference between the File and RandomAccessFile classes? 
The File class encapsulates the files and directories of the local file system. The RandomAccessFile class provides the methods needed to directly access data contained in any part of a file. 

Top Java Frequently Asked Questions- 1


1. Can you write a Java class that could be used both as an applet as well as an application? 
Yes. Just, add a main() method to the applet. 

2. Explain the usage of Java packages. 
This is a way to organize files when a project consists of multiple modules. It also helps resolve naming conflicts when different packages have classes with the same names. Packages access level also allows you to protect data from being used by the non-authorized classes. 

Followers