Apache Sqoop Jobs

Sqoop job is a file contains sqoop commands. we can run this job at any time.

create   --create
listing jobs   --list
detailed information about the job    --show
execute job   --exec
delete job   --delete

Create a new job to import data from MySQL table to HDFS.

sqoop job --create myjob -- import --connect jdbc:// --username cloudera --password --cloudera --table customers -- m1;

Listing available jobs

[cloudera@quickstart ~]$ sqoop job --list

Display detailed information about the job

[cloudera@quickstart ~]$ sqoop job --show myjob

Delete existing job

[cloudera@quickstart ~]$ sqoop job --delete myjob

[cloudera@quickstart ~]$ hdfs dfsadmin -safemode leave   ---> changing HDFS save mode off

Execute sqoop job

[cloudera@quickstart ~]$ sqoop job --exec myjob

codgen command : generates code files for sqoop commands

[cloudera@quickstart ~]$ sqoop codegen --connect jdbc:mysql:// --username cloudera --password cloudera --table customers

Verify files
[cloudera@quickstart ~]$ ls /tmp/sqoop-cloudera/compile/752b65100f5001f20bd57eb85a460b51/

Apache Sqoop Commands

Sqoop commands
  1. import-all-tables
  2. list-databases
  3. list-tables
  4. create-hive-table
  5. hive-import
  6. eval
  7. export

import-all-tables : import data from all mysql tables to HDFS

[cloudera@quickstart ~]$ hadoop dfsadmin -safemode leave   // you need to run this if you get error.

[cloudera@quickstart ~]$ sqoop import-all-tables --connect jdbc:mysql:// --username cloudera --password cloudera

list-databases: list available databases from Mysql

[cloudera@quickstart ~]$ sqoop list-databases --connect jdbc:mysql:// --username cloudera --password cloudera

list-tables: list avaialable tables in the database

[cloudera@quickstart ~]$ sqoop list-tables --connect jdbc:mysql:// --username cloudera --password cloudera

create-hive-table : import a table definition in to hive

step 1) import mysql table data in to hdfs.
[cloudera@quickstart ~]$ sqoop import --connect jdbc:mysql:// --username cloudera --password cloudera --table customers --m 1;

step 2) import mysql table definition in to hive.(create-hive-table)

[cloudera@quickstart ~]$ sqoop create-hive-table --connect jdbc:mysql:// --username cloudera --password cloudera --table customers --fields-terminated-by ',' --lines-terminated-by '\n';

step 3) load the data from HDFS to hive table.

hive> load data inpath '/user/cloudera/customers' into table customers;

'hive-import' option for import command  ( used for reduce the above steps)

[cloudera@quickstart ~]$ sqoop import --connect jdbc:mysql:// --username cloudera --password cloudera --table customers --m 1 --hive-import;

eval: evaluate SQL statement and display the result

[cloudera@quickstart ~]$ sqoop eval --connect jdbc:mysql:// --username cloudera --password cloudera --query "select * from customers limit 10";

export: export the data from HDFS to MySQL

  • insert mode
  • update update

mysql> create database hr;   // creating new daTabase in MySQL
mysql> use hr;
mysql> create table employees(name varchar(30),email varchar(40));  // creating table

insert mode

[cloudera@quickstart hivedata]$ sqoop export --connect jdbc:mysql:// --username cloudera --password cloudera --table employees --export-dir /user/hive/warehouse/Employees.csv;

update mode

[cloudera@quickstart hivedata]$ sqoop export --connect jdbc:mysql:// --username cloudera --password cloudera --table employees --export-dir /user/hive/warehouse/Employees.csv --update-key name;