Sqoop is open source s/w from Apache used for transfer data between RDBMS(oRACLE, sLQSERVER, mYSQL...) and HDFS.
MySQL Database
Connecting to MySQL Database:
root user: root/cloudera
other user: cloudera/cloudera
[cloudera@quickstart ~]$ mysql -u root -p
Create new user
---------------
mysql> create user cloudera1 identified by 'cloudera1';
mysql> grant all privileges on *.* to 'cloudera1';
listing databases in MySQL
--------------
mysql> show databases;
switching to db
--------------------
mysql> USE retail_db;
listing tables in retail_db database
---------
mysql> show tables;
Connecting to Sqoop
--------------------
version of sqoop
-------------------
[cloudera@quickstart hivedata]$ sqoop version
[cloudera@quickstart hivedata]$ sqoop help
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
import command
----------------
import --> importing data from RDBMS to HDFS
1) import data from a table to HDFS(all the rows and columns)
2) import data from a table to HDFS (all the rows but specific columns)
3) import data from a table to HDFS (all the columns, filter the rows by usign where)
4) import data from a table to HDFS (Specific columns,filter the rows by usign where)
Table: retail_db.customers
import data from a table to HDFS(all the rows and columns)
---------------------------------------------
[cloudera@quickstart hivedata]$ sqoop import --connect jdbc:mysql://192.168.13.142/retail_db --username cloudera --password cloudera --table customers --m 1;
imported data will be stored in /user/cloudera (default location)
view the data
-------
[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/customers/part-m-00000
import data from a table to HDFS (all the rows but specific columns)
---------------------------------------------------------------------
[cloudera@quickstart hivedata]$ sqoop import --connect jdbc:mysql://192.168.13.142/retail_db --username cloudera --password cloudera --table customers --columns "customer_id","customer_fname","customer_email" --m 1;
imported data will be stored in /user/cloudera (default location)
view the data
-------
[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/customers/part-m-00000
import data from a table to HDFS (all the columns, filter the rows by usign where)
-------------------------------------------------------------------------------------
[cloudera@quickstart hivedata]$ sqoop import --connect jdbc:mysql://192.168.13.142/retail_db --username cloudera --password cloudera --table customers --where "customer_state='IN'" --m 1 --target-dir /user/hive/warehouse/customers
view the data
-------
[cloudera@quickstart ~]$ hdfs dfs -cat /user/hive/warehouse/customers/part-m-00000
4) import data from a table to HDFS (Specific columns,filter the rows by usign where)
---------------------
[cloudera@quickstart ~]$ sqoop import --connect jdbc:mysql://192.168.13.142/retail_db --username cloudera --password cloudera --table customers --columns "customer_id","customer_fname" --where "customer_state='IN'" --m 1 --target-dir /user/hive/warehouse/customers
view the data
-------
[cloudera@quickstart ~]$ hdfs dfs -cat /user/hive/warehouse/customers/part-m-00000
-------------------------------------------------------
Sqoop commands
------------------------------------------------------
import-all-tables
list-databases
list-tables
create-hive-table
hive-import
eval
export
---------------------
import-all-tables : import data from all mysql tables to HDFS
--------------------
[cloudera@quickstart ~]$ hadoop dfsadmin -safemode leave // you need to run this if you get error.
[cloudera@quickstart ~]$ sqoop import-all-tables --connect jdbc:mysql://192.168.13.142/retail_db --username cloudera --password cloudera
list-databases: list available databasses from Mysql
-----------------
[cloudera@quickstart ~]$ sqoop list-databases --connect jdbc:mysql://192.168.13.142/ --username cloudera --password cloudera
list-tables: list avaialable tables in the database
---------------------------
[cloudera@quickstart ~]$ sqoop list-tables --connect jdbc:mysql://192.168.13.142/retail_db --username cloudera --password cloudera
create-hive-table : import a table definition in to hive
-----------------------------------------------------------
step 1) import mysql table data in to hdfs.
[cloudera@quickstart ~]$ sqoop import --connect jdbc:mysql://192.168.13.142/retail_db --username cloudera --password cloudera --table customers --m 1;
step 2) import mysql table definition in to hive.(create-hive-table)
-----------------------------------------------------
[cloudera@quickstart ~]$ sqoop create-hive-table --connect jdbc:mysql://192.168.13.142/retail_db --username cloudera --password cloudera --table customers --fields-terminated-by ',' --lines-terminated-by '\n';
step 3) load the data from HDFS to hive table.
--------------------------------------
hive> load data inpath '/user/cloudera/customers' into table customers;
'hive-import' option for import command ( used for reduce the abouve steps)
------------------------------------------------------------------------------
[cloudera@quickstart ~]$ sqoop import --connect jdbc:mysql://192.168.13.142/retail_db --username cloudera --password cloudera --table customers --m 1 --hive-import;
eval: evaluate sql statement and display the result
------------------------------------------------
[cloudera@quickstart ~]$ sqoop eval --connect jdbc:mysql://192.168.13.142/retail_db --username cloudera --password cloudera --query "select * from customers limit 10";
export: export the data from HDFS to MySQL
---------------------------------------------
insert mode
update update
mysql> create database hr; // creating new daabase in MySQL
mysql> use hr;
mysql> create table employees(name varchar(30),email varchar(40)); // creating table
insert mode
-------------
[cloudera@quickstart hivedata]$ sqoop export --connect jdbc:mysql://192.168.13.142/hr --username cloudera --password cloudera --table employees --export-dir /user/hive/warehouse/Employees.csv;
update mode
--------------
[cloudera@quickstart hivedata]$ sqoop export --connect jdbc:mysql://192.168.13.142/hr --username cloudera --password cloudera --table employees --export-dir /user/hive/warehouse/Employees.csv --update-key name;
Sqoop Commands (Cont..)
----------------------------
Sqoop Jobs
codgen command
-----------------------
Sqoop job
--------
is a file cotains quoop commands. we can run this job at any time.
create --create
listing jobs --list
detailed information about the job --show
execute job --exec
delete job --delete
create a new job to import data from MySQL table to HDFS.
----------------------
sqoop job --create myjob1 -- import --connect jdbc://192.168.13.136/retail_db --username cloudera --password --cloudera --table customers -- m 1;
listing available jobs
---------------------------
[cloudera@quickstart ~]$ sqoop job --list
display detailed information about the job
---------------------------------------
[cloudera@quickstart ~]$ sqoop job --show myjob
delete existing job
-----------------------------
[cloudera@quickstart ~]$ sqoop job --delete myjob
[cloudera@quickstart ~]$ hdfs dfsadmin -safemode leave ---> changing HDFS save mode off
Execute sqoop job
-----------------------
[cloudera@quickstart ~]$ sqoop job --exec myjob
codgen command : generate code files for sqoop commands
--------------------------------------------
[cloudera@quickstart ~]$ sqoop codegen --connect jdbc:mysql://192.168.13.136/retail_db --username cloudera --password cloudera --table customers
Verify files
----------------
[cloudera@quickstart ~]$ ls /tmp/sqoop-cloudera/compile/752b65100f5001f20bd57eb85a460b51/