What is pig?
Components/architecture of pig:
Pig Data Model
Atom
Tuple
Map
Bag
Relation
Pig Execution Modes(connection to the pig)
1) Local mode ---> Data will be used from Local file system. commands will run locally.
[cloudera@quickstart ~]$ pig -x local
2) MapReduce Mode(HDFS mode) ---> Data should be part of HDFS. commands will run in MapReduce(Hadoop)
[cloudera@quickstart ~]$ pig -x mapreduce (or) [cloudera@quickstart ~]$ pig
Execution Mechanisms( how many ways we can execute pig scripts)
1) Interactive mode (in grunt shell)
2) Batch mode (in unix/linux prompt)
Interactive mode (in grunt shell)
grunt> customers= LOAD '/home/cloudera/training/pigdata/customers.txt' USING PigStorage(',');
grubt> dump;
Batch mode (in unix/linux prompt)
1) Local mode
[cloudera@quickstart ~]$ cat pig_local.pig
customers= LOAD '/home/cloudera/training/pigdata/customers.txt' USING PigStorage(',') as (id:int,name:chararray,age:int,address:chararray,salary:int);
dump customers;
[cloudera@quickstart ~]$ pig -x local pig_local.pig
2) MapReducemode (HDFS Mode)
[cloudera@quickstart ~]$ cat pig_global.pig
customers= LOAD '/user/cloudera/customers.txt' USING PigStorage(',') as (id:int,name:chararray,age:int,address:chararray,salary:int);
dump customers;
[cloudera@quickstart ~]$ pig -x mapreduce pig_global.pig
[cloudera@quickstart ~]$ hdfs dfsadmin -safemode leave; ---> Optional
Note: script file extention is .pig
- Implemented by Yahoo.
- Pig Hadoop echo system s/w from apache foundation used for analysing the data.
- Pig uses pig latin language.
- Data flow language.
- handle structured, semi-structured and un-structured
- Replacement of mapreduce(not 100%)
- Pig internally uses MapReduce.
Components/architecture of pig:
Pig Data Model
Atom
Tuple
Map
Bag
Relation
Pig Execution Modes(connection to the pig)
1) Local mode ---> Data will be used from Local file system. commands will run locally.
[cloudera@quickstart ~]$ pig -x local
2) MapReduce Mode(HDFS mode) ---> Data should be part of HDFS. commands will run in MapReduce(Hadoop)
[cloudera@quickstart ~]$ pig -x mapreduce (or) [cloudera@quickstart ~]$ pig
Execution Mechanisms( how many ways we can execute pig scripts)
1) Interactive mode (in grunt shell)
2) Batch mode (in unix/linux prompt)
Interactive mode (in grunt shell)
grunt> customers= LOAD '/home/cloudera/training/pigdata/customers.txt' USING PigStorage(',');
grubt> dump;
Batch mode (in unix/linux prompt)
1) Local mode
[cloudera@quickstart ~]$ cat pig_local.pig
customers= LOAD '/home/cloudera/training/pigdata/customers.txt' USING PigStorage(',') as (id:int,name:chararray,age:int,address:chararray,salary:int);
dump customers;
[cloudera@quickstart ~]$ pig -x local pig_local.pig
2) MapReducemode (HDFS Mode)
[cloudera@quickstart ~]$ cat pig_global.pig
customers= LOAD '/user/cloudera/customers.txt' USING PigStorage(',') as (id:int,name:chararray,age:int,address:chararray,salary:int);
dump customers;
[cloudera@quickstart ~]$ pig -x mapreduce pig_global.pig
[cloudera@quickstart ~]$ hdfs dfsadmin -safemode leave; ---> Optional
Note: script file extention is .pig